# Exercise

We will be working with the same dataset and questions we dealt with in the previous lab, so that you have some reference to work on (and check if your results match).

We will use the [Adult UCI dataset](https://archive.ics.uci.edu/ml/datasets/adult) with a few modifications; download the following files: [description](./files/adults/adults.names), [data - part1](./files/adults/adults1.csv) [data - part2](./files/adults/adults2.csv).

Follow the instructions below and answer the questions. 

1. Create the two tables in the DB with the right data dypes. In order to use enumerated types in the table definition you will first need to create the type. For example:

```
CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
CREATE TABLE person (
    name text,
    current_mood mood
);
INSERT INTO person VALUES ('Moe', 'happy');
SELECT * FROM person WHERE current_mood = 'happy';
 name | current_mood 
------+--------------
 Moe  | happy
```

Notice that each row/sample does not have an id. Instead of using an INTEGER data type for the id, we recommend you look into SERIAL.

2. Load the datasets. We should use this version of the copy_from command where we specify the name of the columns and the value for NULL in the file of origin. Notice how we are not specifying the id column, since that will be autogenerated by the DB. Caveat: once you have loaded the data, double check that the id/SERIAL column starts at 1. 
```
cursor.copy_from(f, 'adults1', columns=('age', 'workclass', ...), sep=',', null='?')
```
3. How many people under 18 years old have never worked? Of the never having worked people (all ages) is there any race bias (how many by race)? Is there any sex bias? - Note: no need to elaborate, just present the data in tables. 
4. Look at the hours per week of people with a paying job, by sex. Look at how many's income is above and below 50k. Compare and analyse. - Note: no need to elaborate, produce a single table that shows the data for both sex and salary.
5. How many people with college education do manual labour?
6. What is the minimum, mean and maximum capital gain and capital loss for every marital status?

## 1. Create the two tables in the DB with the right data dypes. 

First we are going to connect to our database. Then thanks to lab 1 we already know what the two databases are and what types should be each elements so we just have to create the two tables with the right types.

In [4]:
import psycopg2


try:
    conn = psycopg2.connect("dbname='postgres' user='postgres' host='localhost' password='pass1234'")
    print('Success connecting to the DB')
except:
    print('I am unable to connect to the database')


Success connecting to the DB


In [5]:
cursor = conn.cursor()
try:
    cursor.execute("""  CREATE TYPE workclass AS ENUM ('Private', 'Self-emp-not-inc', 'Self-emp-inc', 'Federal-gov', 'Local-gov', 'State-gov', 'Without-pay', 'Never-worked');
                        CREATE TYPE education as ENUM('Bachelors', 'Some-college', '11th', 'HS-grad', 'Prof-school', 'Assoc-acdm', 'Assoc-voc', '9th', '7th-8th', '12th', 'Masters', '1st-4th', '10th', 'Doctorate', '5th-6th', 'Preschool');
                        CREATE TYPE "marital-status" as ENUM('Married-civ-spouse', 'Divorced', 'Never-married', 'Separated', 'Widowed', 'Married-spouse-absent', 'Married-AF-spouse');
                        CREATE TYPE occupation as ENUM('Tech-support', 'Craft-repair', 'Other-service', 'Sales', 'Exec-managerial', 'Prof-specialty', 'Handlers-cleaners', 'Machine-op-inspct', 'Adm-clerical', 'Farming-fishing', 'Transport-moving', 'Priv-house-serv', 'Protective-serv', 'Armed-Forces');
                        CREATE TABLE adults1 (
                            age bigint,
                            workclass workclass,
                            fnlwgt bigint,
                            education education,
                            "education-num" bigint,
                            "marital-status" "marital-status",
                            occupation occupation
                        );
                        """)
    
    cursor.execute("""  CREATE TYPE relationship AS ENUM ('Wife', 'Own-child', 'Husband', 'Not-in-family', 'Other-relative', 'Unmarried');
                        CREATE TYPE race as ENUM('White', 'Asian-Pac-Islander', 'Amer-Indian-Eskimo', 'Other', 'Black');
                        CREATE TYPE sex as ENUM('Female', 'Male');
                        CREATE TYPE "native-country" as ENUM('United-States', 'Cambodia', 'England', 'Puerto-Rico', 'Canada', 'Germany', 'Outlying-US(Guam-USVI-etc)', 'India', 'Japan', 'Greece', 'South', 'China', 'Cuba', 'Iran', 'Honduras', 'Philippines', 'Italy', 'Poland', 'Jamaica', 'Vietnam', 'Mexico', 'Portugal', 'Ireland', 'France', 'Dominican-Republic', 'Laos', 'Ecuador', 'Taiwan', 'Haiti', 'Columbia', 'Hungary', 'Guatemala', 'Nicaragua', 'Scotland', 'Thailand', 'Yugoslavia', 'El-Salvador', 'Trinadad&Tobago', 'Peru', 'Hong', 'Holand-Netherlands');
                        CREATE TYPE income as ENUM('>50K', '<=50K');
                        CREATE TABLE adults2 (
                            relationship relationship,
                            race race,
                            sex sex,
                            "capital-gain" bigint,
                            "capital-loss" bigint,
                            "hours-per-week" bigint,
                            "native-country" "native-country",
                            income income
                        );
                        """)
    
    conn.commit()
except Exception as e:
    # if the transaction aborts we will need to rollback
    cursor.execute("ROLLBACK")
    print(e)

## 2. Load the datasets.

We simply need to open each adults.csv file and load their data in the adults table.

In [12]:
try:
    with open('./files/adults/adults1.csv', 'r') as f:
        cursor.copy_from(f, 'adults1', columns=('age','workclass','fnlwgt','education','education-num','marital-status','occupation'), sep=',', null='?')
        
    with open('./files/adults/adults2.csv', 'r') as g:
        cursor.copy_from(g, 'adults2', columns=('relationship','race','sex','capital-gain','capital-loss','hours-per-week','native-country','income'), sep=',', null='?')
        
    conn.commit()
except Exception as e:
    # if the transaction aborts we will need to rollback
    cursor.execute("ROLLBACK")
    print(e)