# Hive Demo Assignment: subnets

### Step 0. Prepare script to run commands only when local

In this step we create a Shell script called `run_if_local.sh` which executes all passed arguments as a command if environment `LOCAL_MODE=true` is set.

### Step 1. Create database

In [None]:
%%writefile create_db.hql
DROP DATABASE IF EXISTS demodb CASCADE;
CREATE DATABASE demodb LOCATION '/user/jovyan/demodb';

In [None]:
# ! hive -f create_db.hql

### Step 2. Create tables

Let us our source dataset have 2 collumns:
* ip-address,
* its subnet's mask.

For example:
```
148.45.113.216	255.255.255.248
203.98.141.0	255.255.255.240
183.168.36.0	255.255.255.128
111.157.172.232	255.255.255.248
80.46.87.0	255.255.255.0
247.248.233.0	255.255.255.128
```
Now we'll create the external table with 2 fields: ip and mask.

In [None]:
%%writefile create_table.hql
ADD JAR /opt/cloudera/parcels/CDH/lib/hive/lib/hive-contrib.jar;

USE demodb;
DROP TABLE IF EXISTS Subnets;

CREATE EXTERNAL TABLE Subnets (
    ip STRING,
    mask STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY  '\t'
STORED AS TEXTFILE
LOCATION '/data/subnets/ips';

In [None]:
! hive -f create_table.hql

### Step 3. Demo query on created table

Let's write a simpe query:
 > Compute avarage value of IPs for each subnet's mask.

In [None]:
%%writefile query.hql

ADD JAR /opt/cloudera/parcels/CDH/lib/hive/lib/hive-contrib.jar;
USE demodb;

In [None]:
%%writefile -a query.hql

SELECT AVG(counts.cnt)
FROM (
    SELECT mask, count(ip) as cnt
    FROM Subnets
    GROUP BY mask
) counts;

In [None]:
hive -f query.hql