## Inegi Jalisco Census Data

Goal
- Drive Insights with education and economics for each municipality in the Mexico state of Jalisco for a subequent geographic analysis. 

Approach:
1. First look at the pueblos by population strictly and split into quartiles based on population total s
2. Second look at specific municipalies and group them by that 
3. Finally look at specific pueblos and what each one says based on census data (this one will take the longest)

In [2]:
import pandas as pd
import numpy as np 

In [3]:
#load in the data
df = pd.read_csv('RESAGEBURB2020 - 14 Jalisco (1).csv')

- PEA: Population over 12 that is economically active or has job
- PEA_F: Population over 12 that is economically active or has job for females
- PEA_M: Population over 12 that is economically active or has job for males
- PEA_INAC:Population over 12 that is NOT economically active 
- PE_INAC_F:Population over 12 that is NOT economically active for females
- PE_INAC_M:Population over 12 that is NOT economically active for males
- GRAPROES:Median school grade 
- GRAPROES_F: Median school grade for females
- GRAPROES_M: Median school grade for males
- P15YM_SE: Population over 15 without school
- PROM_HNV:Amount of children
- POBTOT: Total population 

In [4]:
df.head(5)

Unnamed: 0,ENTIDAD,NOM_ENT,MUN,NOM_MUN,LOC,NOM_LOC,AGEB,MZA,POCUPADA,PEA,...,POCUPADA_M,PDESOCUP,PDESOCUP_F,PDESOCUP_M,GRAPROES,GRAPROES_F,GRAPROES_M,P15YM_SE,PROM_HNV,POBTOT
0,14,Jalisco,1,Acatic,1,Acatic,194,18,33,33,...,21,0,0,0,7.76,8.38,7.21,*,1.82,66
1,14,Jalisco,1,Acatic,1,Acatic,194,19,32,32,...,17,0,0,0,8.33,8.48,8.14,8,1.96,61
2,14,Jalisco,1,Acatic,1,Acatic,194,2,23,24,...,18,*,0,*,6.51,5.91,7.14,6,3.26,55
3,14,Jalisco,1,Acatic,1,Acatic,194,20,31,31,...,24,0,0,0,6.9,6.04,7.59,5,3.36,72
4,14,Jalisco,1,Acatic,1,Acatic,194,21,32,33,...,19,*,0,*,6.89,6.47,7.38,11,3.03,61


In [53]:
#change dtypes of every column into an int64
print(df.dtypes)

NOM_ENT       object
MUN            int64
NOM_MUN       object
AGEB          object
PEA           object
PEA_F         object
PEA_M         object
PE_INAC       object
PE_INAC_F     object
PE_INAC_M     object
POCUPADA_F    object
POCUPADA_M    object
PDESOCUP      object
PDESOCUP_F    object
PDESOCUP_M    object
GRAPROES      object
GRAPROES_F    object
GRAPROES_M    object
P15YM_SE      object
PROM_HNV      object
POBTOT         int64
dtype: object


In [56]:
columns_to_convert = ['PEA', 'PEA_F', 'PEA_M', 'PE_INAC', 'PE_INAC_F',
       'PE_INAC_M', 'POCUPADA_F', 'POCUPADA_M', 'PDESOCUP', 'PDESOCUP_F',
       'PDESOCUP_M', 'GRAPROES', 'GRAPROES_F', 'GRAPROES_M', 'P15YM_SE',
       'PROM_HNV', 'POBTOT']

In [60]:
#replace all the * with nan so that I can then change the object columns into float columns and work with them more flexibly
df.replace({'*': np.nan, '?': np.nan}, inplace=True)

In [62]:
df.head()

Unnamed: 0,NOM_ENT,MUN,NOM_MUN,AGEB,PEA,PEA_F,PEA_M,PE_INAC,PE_INAC_F,PE_INAC_M,...,POCUPADA_M,PDESOCUP,PDESOCUP_F,PDESOCUP_M,GRAPROES,GRAPROES_F,GRAPROES_M,P15YM_SE,PROM_HNV,POBTOT
0,Jalisco,1,Acatic,194,33,12,21,14,10,4,...,21,0.0,0,0.0,7.76,8.38,7.21,,1.82,66
1,Jalisco,1,Acatic,194,32,15,17,16,13,3,...,17,0.0,0,0.0,8.33,8.48,8.14,8.0,1.96,61
2,Jalisco,1,Acatic,194,24,5,19,21,18,3,...,18,,0,,6.51,5.91,7.14,6.0,3.26,55
3,Jalisco,1,Acatic,194,31,7,24,31,21,10,...,24,0.0,0,0.0,6.9,6.04,7.59,5.0,3.36,72
4,Jalisco,1,Acatic,194,33,13,20,23,17,6,...,19,,0,,6.89,6.47,7.38,11.0,3.03,61


In [61]:
df[columns_to_convert] = df[columns_to_convert].astype('float')

ValueError: could not convert string to float: 'N/D'

In [5]:
df = df[['NOM_ENT', 'MUN', 'NOM_MUN', 'AGEB','PEA', 'PEA_F', 'PEA_M', 'PE_INAC', 'PE_INAC_F',
       'PE_INAC_M', 'POCUPADA_F', 'POCUPADA_M', 'PDESOCUP', 'PDESOCUP_F',
       'PDESOCUP_M', 'GRAPROES', 'GRAPROES_F', 'GRAPROES_M', 'P15YM_SE',
       'PROM_HNV', 'POBTOT']]

In [6]:
df = df.dropna()

In [7]:
df['NOM_MUN'].unique()

array(['Acatic', 'Atemajac de Brizuela', 'Tomatlán', 'Tonalá',
       'Zacoalco de Torres', 'Atenguillo', 'Zapopan', 'Zapotlanejo',
       'San Ignacio Cerro Gordo', 'Atotonilco el Alto', 'Atoyac',
       'Autlán de Navarro', 'Ayotlán', 'Ayutla', 'La Barca', 'Bolaños',
       'Acatlán de Juárez', 'Cabo Corrientes', 'Casimiro Castillo',
       'Cihuatlán', 'Encarnación de Díaz', 'Etzatlán', 'El Grullo',
       'Guachinango', 'Guadalajara', 'Ameca', 'Mexticacán', 'Mezquitic',
       'Mixtlán', 'Ocotlán', 'Ojuelos de Jalisco', 'Pihuamo', 'Poncitlán',
       'El Salto', 'San Cristóbal de la Barranca',
       'San Diego de Alejandría', 'San Juan de los Lagos', 'San Julián',
       'San Marcos', 'San Martín de Bolaños', 'San Martín Hidalgo',
       'Tepatitlán de Morelos', 'Tequila', 'Teuchitlán',
       'Tizapán el Alto', 'Tlajomulco de Zúñiga', 'San Pedro Tlaquepaque',
       'Tolimán'], dtype=object)

In [8]:
#change format options to supress scientic notation
pd.options.display.float_format = '{:.2f}'.format

In [9]:
print(df['POBTOT'].describe())

count     30000.00
mean        449.84
std       17073.64
min           0.00
25%          22.00
50%          59.00
75%         114.00
max     1476491.00
Name: POBTOT, dtype: float64


In [10]:
df['POBTOT'].loc[29996]

15

In [11]:
#0-22,23-59,60-114,115- 1476491
Q1_boolean_search = (df['POBTOT'] < 22)
Q1_Pueblos = df[Q1_boolean_search]
Q1_Pueblos = Q1_Pueblos.dropna()

In [12]:
Q2_boolean = (df['POBTOT'] >= 22) & (df['POBTOT'] <= 59)
Q2_Pueblos = df[Q2_boolean]
Q2_Pueblos.head()

Unnamed: 0,NOM_ENT,MUN,NOM_MUN,AGEB,PEA,PEA_F,PEA_M,PE_INAC,PE_INAC_F,PE_INAC_M,...,POCUPADA_M,PDESOCUP,PDESOCUP_F,PDESOCUP_M,GRAPROES,GRAPROES_F,GRAPROES_M,P15YM_SE,PROM_HNV,POBTOT
2,Jalisco,1,Acatic,194,24,5,19,21,18,3,...,18,*,0,*,6.51,5.91,7.14,6,3.26,55
5,Jalisco,1,Acatic,194,14,5,9,14,11,3,...,9,0,0,0,7.25,7.94,6.33,6,1.44,34
7,Jalisco,1,Acatic,194,16,5,11,15,13,*,...,11,0,0,0,7.1,7.0,7.21,4,3.5,39
10,Jalisco,1,Acatic,194,23,9,14,23,17,6,...,14,0,0,0,9.88,10.25,9.42,3,2.0,57
11,Jalisco,1,Acatic,194,21,10,11,9,7,*,...,11,0,0,0,10.37,10.06,10.77,3,2.53,38


In [13]:
Q3_boolean = (df['POBTOT']>=60) & (df['POBTOT']<=114)
Q3_boolean

0         True
1         True
2        False
3         True
4         True
         ...  
29995     True
29996    False
29997     True
29998    False
29999     True
Name: POBTOT, Length: 30000, dtype: bool

In [19]:
Q3_Pueblo = df[Q3_boolean]

#Q3_Pueblo['NOM_MUN'].unique()

In [23]:
Q4_boolean = (df['POBTOT']>=115) & (df['POBTOT']<=1476491)
Q4_Pueblo = df[Q4_boolean]

In [48]:
Q1_Pueblos.describe()

Unnamed: 0,MUN,POBTOT
count,7305.0,7305.0
mean,64.91,6.85
std,37.61,7.12
min,1.0,0.0
25%,22.0,0.0
50%,70.0,5.0
75%,98.0,13.0
max,125.0,21.0


In [47]:
Q2_Pueblos.describe()

Unnamed: 0,MUN,POBTOT
count,7751.0,7751.0
mean,71.11,40.18
std,37.5,10.87
min,1.0,22.0
25%,39.0,31.0
50%,74.0,40.0
75%,101.0,50.0
max,125.0,59.0


In [46]:
Q3_Pueblo.describe()

Unnamed: 0,MUN,POBTOT
count,7505.0,7505.0
mean,75.51,84.61
std,36.98,15.69
min,1.0,60.0
25%,39.0,71.0
50%,97.0,84.0
75%,101.0,98.0
max,125.0,114.0


In [50]:
Q4_Pueblo.describe()

Unnamed: 0,MUN,POBTOT
count,7439.0,7439.0
mean,78.61,1680.15
std,35.97,34259.31
min,1.0,115.0
25%,39.0,137.0
50%,97.0,170.0
75%,101.0,258.0
max,125.0,1476491.0


## Observations 
- Q1_Pueblos has 7305 records
- Q2_Pueblos has 7751 records
- Q3_Pueblos has 7505 records 
- Q4_Purblos has 7439 records