# #100Viz 03: Reckoning Roots

Source: INEGI, Encuesta Intercensal 2015. <br>
Shapefile: Empirical Studies of Conflict, Princeton. <br>
Notes: TBD.

***
**Set up**

In [None]:
import pandas as pd
import geopandas as gdp

***
**We will load both datasets:**
1. `base_map` for the shapefiles using `geopandas`
2. INEGI data into `df` using `pandas`

In [None]:
base_map = gdp.read_file("../data/raw/GIS Mexican Municipalities/GIS Mexican Municipalities/Mexican Municipalities.shp")

df = pd.read_stata("../data/raw/mxmun.dta", convert_categoricals=False,)

***
**Clean up**

INEGI's data is *individual* level records so we need to aggregate back to *municipalidad* level.

In [None]:
dff = df.groupby(["geo2_mx2015", "mx2015a_afrdes"])["perwt"].sum().to_frame().reset_index()

dff.loc[dff["mx2015a_afrdes"] <= 2, "black"] = "Yes"
dff.loc[dff["mx2015a_afrdes"] > 2, "black"] = "No/Unknown"

# dff.groupby(['geo2_mx2015', 'black'])['perwt'].sum().to_frame().reset_index()

In [None]:
per_cnt = dff.groupby(['geo2_mx2015', 'black'])['perwt'].sum()
# Change: groupby state_office and divide by sum
cnt_pctgs = per_cnt.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))

In [None]:
df3 = cnt_pctgs.to_frame().reset_index()
df3 = df3[df3["black"] == "Yes"].copy()

In [None]:
df3.columns = ["IDUNICO", "black", "rate"]

In [None]:
df3['rate'].describe()

***
**We can now merge both datasets and save our new GeoJSON file. Which we will use for `altair`**

In [None]:
gdf = pd.merge(base_map, df3[["IDUNICO", "rate"]], how = "left", on = "IDUNICO")

In [None]:
gdf[["NOM_MUN","IDUNICO","rate","geometry"]].head()

In [None]:
gdf[["NOM_MUN","IDUNICO","rate","geometry"]].to_file("../data/processed/data2.geojson", driver = "GeoJSON")