ENEM Study - How is education performing in Brazil?
===========================================

Overview
-------------
This is a report that compares ENEM (Exame Nacional do Ensino Médio - High School National Exam) with other metrics in order to analyse:

* statistical accuracy
* improvement over years
* comparison among different regions of the country

The project consists of different steps of data manipulation and is a good Data Science exercise.

Data Source
-----------------

We will be gathering our data from http://inep.gov.br/microdados , where they are available for download on the form of a zipped csv file. As you might have already noticed, each file has around 3GB of size, so we will have to use a Database to perform our queries and extract data of interest. To keep it simple, we will be using PostGRESQL.

In [2]:
import db
import sqlalchemy

session = db.get_db_session(create_schema=True)

In [3]:
import os
import os.path
import csv

csv_path = 'Data/microdados_enem2017/DADOS/MICRODADOS_ENEM_2017.csv'
with open(csv_path,'r') as f:
    reader = csv.reader(f)
    header = next(reader)[0].split(';')
    for i in range(0,100):
        values = next(reader)[0].split(';')
        items = dict(zip(header,values))
        
        for key in items.keys():
            if items[key]=='':
                items[key]=None
        
        exame = db.Exame()
        exame.candidato_id = int(items['NU_INSCRICAO']) 
        exame.ano = int(items['NU_ANO']) if items['NU_ANO'] else None
        exame.idade = int(items['NU_IDADE']) if items['NU_IDADE'] else None
        exame.racial_id = int(items['TP_COR_RACA']) if items['TP_COR_RACA'] else None
        exame.sexo = items['TP_SEXO']
        
        
        if items['CO_MUNICIPIO_PROVA']:
            if not session.query(db.Local).filter_by(id=int(items['CO_MUNICIPIO_PROVA'])).all():
                local = db.Local()
                local.id = int(items['CO_MUNICIPIO_PROVA'])
                local.municipio = items['NO_MUNICIPIO_PROVA']
                local.estado = items['SG_UF_PROVA']
                session.add(local)
                session.commit()
            exame.exame_local_id = int(items['CO_MUNICIPIO_PROVA']) 
            
            
        if items['CO_MUNICIPIO_NASCIMENTO']:
            if not session.query(db.Local).filter_by(id=int(items['CO_MUNICIPIO_NASCIMENTO'])).all():
                local = db.Local()
                local.id = int(items['CO_MUNICIPIO_NASCIMENTO'])
                local.municipio = items['NO_MUNICIPIO_NASCIMENTO']
                local.estado = items['SG_UF_NASCIMENTO']
                session.add(local)
                session.commit()
            exame.local_nasc_id = int(items['CO_MUNICIPIO_NASCIMENTO'])
        
        if items['CO_MUNICIPIO_RESIDENCIA']:
            if not session.query(db.Local).filter_by(id=int(items['CO_MUNICIPIO_RESIDENCIA'])).all():
                local = db.Local()
                local.id = int(items['CO_MUNICIPIO_RESIDENCIA'])
                local.municipio = items['NO_MUNICIPIO_RESIDENCIA']
                local.estado = items['SG_UF_RESIDENCIA']
                session.add(local)
                session.commit()
        exame.residencia_id = int(items['CO_MUNICIPIO_RESIDENCIA'])
        
        session.add(exame)
        
    session.commit()    #commits to session
    session.close()     #closes session
