# ENEM 2018
For this project, I will be using the dataset used in my [Capstone project](https://github.com/davidsondefaria/Capstone) of the Udacity Data Engineering course. Such dataset is composed of Brazilian demographic and educational data. In this course, I used the knowledge acquired to carry out the ETL process for the dataset.

Now, starting the Data Science course, I will perform some analysis on this dataset. I intend to analyze the relationship between the grades obtained in ENEM, the *Exame Nacional do Ensino Médio* (in English 'High School National Exam', an application test for Universities) with the educational HDI of the Brazilian cities.

## Imports

In [2]:
import re
import os
import pandas as pd
from treatData import treatCities, treatEnem

## 1. Business Understanding
In this project, I will be interested in analyzing some influences that localization can have on a student who is applying for university. Here are some questions to ask.
1. Are the grades obtained in ENEM proportional to the educational HDI of the students' city?
2. How can the type of school influence the grade?
3. Is there any influence of other HDIs on the grade? (GNI or Life expectancy)
4. Is there a difference in the grade of students who live in the same cities and who have a disability compared to those who do not?
5. Does the color/race, gender or age of students influence their results or does the influence come from the place and situation in which they live? 

## 2. Data Understanding

Two datasets of different bases were used:

- [Brazil Cities](https://www.kaggle.com/crisparada/brazilian-cities): this dataset is a compilation of data on Brazilian cities taken from different websites. Although there are many interesting fields for analysis, the focus of the project is to prepare a basis for the analysis of HDI data.
- [Enem2018](http://portal.inep.gov.br/web/guest/microdados): this dataset presents all the non-sensitive data of the students who took the ENEM 2018. Through it, we can analyze the grades of each student by city, by age, by financial conditions, if they have any disabilities and other specifics.

### Source Files
The files can be found in [Google Drive](https://drive.google.com/drive/folders/1BoA9AlCZWviPwzGHz71rrIgKCUXGHr2q).

- `brazil_cities.csv`: Original dataset about cities.
- `brazil_cities_dictionary.csv`: File with subtitles for the dataset columns. 
- `enem/enem_2018.csv`: Dataset about ENEM 2018. This dataset has previously been reduced its number of columns due to its size. But keeping the original name of the columns in Portuguese.
- `enem/enem_2018_dictionary.csv` and `enem/enem_2018_dictionary.xlsx`: Both files have subtitles for ENEM columns. In Portuguese.

In [None]:
enem_2018_path = os.getcwd() + '/data/enem2018/enem_2018.csv'
brazil_cities_path = os.getcwd() + '/data/brazil_cities.csv'

## 3. Prepare Data

### Treating Data
After the data is processed, they will have the following columns:

##### Brazil Cities Columns Subtitles

|   Columns   |               Legend                  |
|-------------|---------------------------------------|
|city         |Name of Cities                         |
|state        |State of Cities                        |
|capital      |Is State Capital?                      |
|hdi_ranking  |Human Development Index Ranking        |
|hdi          |Human Development Index                |
|hdi_gni      |Human Development Index GNI per Capita |
|hdi_life     |Human Development Index Life Expectancy|
|hdi_education|Human Development Index Educational    |
|longitude    |Longitude                              |
|latitude     |Latitude                               |
|altitude     |Altitude                               |

##### Enem 2018 Columns Subtitles

|         Columns           | Legend                       |         Columns        | Legend                         |
|---------------------------|------------------------------|------------------------|--------------------------------|
|registration               |Number of Registration        |def_dyslexia            |Is dyslexic?                    |
|city_residence_code        |Code of Residence City        |def_dyscalculia         |Has dyscalculia?                |
|city_residence             |Residence City                |def_autism              |Is autistic?                    |
|state_residence_code       |Code of Residence State       |def_monocular_vision    |Has Monocular Vision?           |
|state_residence            |Residence State               |def_other               |Has any Other Disability?       |
|age                        |Age                           |social_name             |Social Name                     |
|gender                     |Gender                        |city_test_code          |Code of Application City        |
|matiral_status             |Marital Status                |city_test               |Application City                |
|color_race                 |Color or Race                 |state_test_code         |Code of Application State       |
|nationality                |Nationality                   |state_test              |Application State               |
|high_school_status         |Has finished High School?     |presence_natural_science|Presence in Natural Science Test|
|high_school_year_conclusion|Year of High School Conclusion|presence_human_science  |Presence in Human Science Test  |
|school_type                |Type of High School           |presence_languages      |Presence in Languages Test      |
|def_low_vision             |Has Low Vision Deficiency?    |presence_math           |Presence in Math Test           |
|def_blind                  |Is Blind?                     |grade_natural_science   |Grade in Natural Science Test   |
|def_deaf                   |Is Deaf?                      |grade_human_science     |Grade in Human Science Test     |
|def_low_hearing            |Has Low Hearing Deficiency?   |grade_languages         |Grade in Languages Test         |
|def_blind_deaf             |Is Blind and Deaf?            |grade_math              |Grade in Math Test              |
|def_physical               |Has Physical Deficiency?      |essay_status            |Essay Status                    |
|def_mental                 |Has Mental Deficiency?        |grade_essay             |Grade in Essay                  |

In [3]:


# treatEnem(enem_2018_path)
# treatCities(brazil_cities_path)

## 4. Data Model

## 5. Evaluate the Results

## 6. Implementation