# Table of Contents
* [Introduction: How Regional Administration Works in Myanmar](#Introduction:-How-Regional-Administration-Works-in-Myanmar)
* [Load Data](#Load-Data)
* [Master Merge](#Master-Merge)


# Introduction: How Regional Administration Works in Myanmar



![Imgur](http://i.imgur.com/XpYZYAb.png)

From high-level to more granular administrative divisions of the country (titles link to Wiki articles): 

- **Administrative Divisions**: 7 states and 7 divisions, along with 7 other kinds of territory (self-administered zones). 
- **District**: State, Regions, and Union Territories have districts. There are 67 total districts in Myanmar. 
- **Township**: Districts and self-administered zones have townships. There are around 330 townships in Myanmar. 
- **Towns**: Not mentioned on Wikipedia but townships include towns that are composed of wards.
- **Ward and Village Tracts**: Subdivisions of Townships. Wards are pieces urban areas (basically neighborhoods, I imagine), rural areas are Village Tracts. There are about 3,183 wards in Myanmar. There are 13,602 Village Tracts. 
- **Village**: A subdivision of Village Tracts. There are 70,838 villages. Wards do not have Villages. 

![Imgur](http://i.imgur.com/uTCJbJY.png)


This notebook obtains coordinate data for 13974 Village Tracts and 3170 Wards, and of those population data for 13064 Village Tracts and 3011 Wards. 


# Load Data

In [1]:
!ls -t

Village Tract and Ward Geo and Population Data .ipynb
Village_Ward_Geo.ipynb
Myanmar PCodes Release-VIII_Aug2015 (Villages).xlsx
~$Myanmar PCodes Release-VIII_Aug2015 (Villages).xlsx
~$Myanmar PCodes Release-VIII_Aug2015 (StRgn_Dist_Tsp_Town_Ward_VT).xlsx
districts.txt
VT_Ward_geo.csv
geo_df.p
VT_geo.csv
wards_geo.csv
BaselineData_Census_VTWard_with_Pcode_MIMU_09Sep2015_MMR.xlsx
Myanmar PCodes Release-VIII_Aug2015 (StRgn_Dist_Tsp_Town_Ward_VT).xlsx


**Data Descriptions:**
Together, these datasets give the geolocations of Wards and Village Tracts

- `Myanmar PCodes Release-VIII_Aug2015 (StRgn_Dist_Tsp_Town_Ward_VT).xlsx`: Excel sheet with lat/long of *towns*, which are essentially the lat/long for wards. 
- `Myanmar PCodes Release-VIII_Aug2015 (Villages).xlsx`: Excel Sheet with lat/long of Village Tracts 
- `BaselineData_Census_VTWard_with_Pcode_MIMU_09Sep2015_MMR.xlsx`: Excel Sheet with Ward and Village Tract populations (Found from here: https://data.humdata.org/dataset/myanmar-population-census-2014) 




In [3]:
towns_df = pd.read_excel('Myanmar PCodes Release-VIII_Aug2015 (StRgn_Dist_Tsp_Town_Ward_VT).xlsx', sheetname='Towns')
wards_df = pd.read_excel('Myanmar PCodes Release-VIII_Aug2015 (StRgn_Dist_Tsp_Town_Ward_VT).xlsx', sheetname='Wards')
VT_df = pd.read_excel('Myanmar PCodes Release-VIII_Aug2015 (Villages).xlsx', sheetname=0)
VT_pop_df = pd.read_excel('BaselineData_Census_VTWard_with_Pcode_MIMU_09Sep2015_MMR.xlsx', sheetname=0)
ward_pop_df = pd.read_excel('BaselineData_Census_VTWard_with_Pcode_MIMU_09Sep2015_MMR.xlsx', sheetname=1)

# Get Ward Locations

Join towns_df and wards_df on `Town_Pcode`

In [4]:
wards_geo_df= pd.merge(towns_df[['Township', 'District','Town_Pcode','Longitude','Latitude']], wards_df[['Town_Pcode', 'Ward_Pcode', 'Ward']], on = 'Town_Pcode')

# Get Village Tract Lat/Long

Group by Village Tract, get average Lat and Long 

In [6]:
VT_df.columns

Index(['SR_Pcode', 'State_Region', 'D_Pcode', 'District', 'TS_Pcode',
       'Township', 'VT_Pcode', 'Village_Tract', 'Village_Pcode', 'Village',
       'Village_Mya_MMR3', 'Alternate_Vlg_name_Eng', 'Alternate_Vll_Name_Mya',
       'Longitude', 'Latitude', 'Source', 'Remark', 'Remark_2'],
      dtype='object')

In [11]:
VT_geo_df = VT_df.groupby(['VT_Pcode', 'Village_Tract', 'District', 'Township'])['Longitude', 'Latitude'].mean().reset_index()

# Get Village Tract and Ward Population


Oddly enough, the `ward_pop_df` has duplicate Ward entries that differ on `Ward Name_MMR` entries in MMR language, but then have different population counts. I'll group by Ward_Pcode and take the average total population.

In [22]:
print(wards_geo_df.shape)
print(ward_pop_df.shape) #OK, there are some missing wards for population data 
print(wards_geo_df.Ward_Pcode.nunique())
print(ward_pop_df.Ward_Pcode.nunique())

(3170, 7)
(3058, 12)
3170
3011


In [24]:
ward_pop_df = ward_pop_df.groupby(['Ward_Pcode', 'Ward']).agg({'Total':'mean'}).reset_index()

In [27]:
print(wards_geo_df.shape)
print(ward_pop_df.shape) #OK, there are some missing wards for population data 
print(wards_geo_df.Ward_Pcode.nunique())
print(ward_pop_df.Ward_Pcode.nunique())

(3170, 7)
(3011, 3)
3170
3011


In [28]:
wards_geo_pop_df = pd.merge(wards_geo_df, ward_pop_df, on = 'Ward_Pcode', how = 'left')

In [30]:
wards_geo_pop_df.info() #OK, so there are Pcodes that are exclusive to each dataset

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3170 entries, 0 to 3169
Data columns (total 9 columns):
Township      3170 non-null object
District      3170 non-null object
Town_Pcode    3170 non-null object
Longitude     3170 non-null float64
Latitude      3170 non-null float64
Ward_Pcode    3170 non-null object
Ward_x        3170 non-null object
Ward_y        2998 non-null object
Total         2998 non-null float64
dtypes: float64(3), object(6)
memory usage: 247.7+ KB


Now, let's join the Village Tract data together 

In [33]:
print(VT_geo_df.shape)
print(VT_pop_df.shape) 
print(VT_geo_df.VT_Pcode.nunique())
print(VT_pop_df.VT_Pcode.nunique()) #OK, there are some missing Village Tracts for population data 

(13974, 6)
(13240, 11)
13970
13071


In [34]:
VT_pop_df.columns

Index(['State/Region Code', 'State/Region', 'Tsp_code', 'Township',
       'Township MMR', 'VT_Pcode', 'Village Tract', 'Village_Tract_MMR',
       'Total', 'Male', 'Female'],
      dtype='object')

In [36]:
VT_pop_df = VT_pop_df.groupby(['VT_Pcode', 'Village Tract']).agg({'Total':'mean'}).reset_index()

In [37]:
VT_geo_pop_df = pd.merge(VT_geo_df, VT_pop_df, on = 'VT_Pcode', how = 'left')

In [38]:
VT_geo_pop_df.info() #nice 

<class 'pandas.core.frame.DataFrame'>
Int64Index: 13974 entries, 0 to 13973
Data columns (total 8 columns):
VT_Pcode         13974 non-null object
Village_Tract    13974 non-null object
District         13974 non-null object
Township         13974 non-null object
Longitude        13119 non-null float64
Latitude         13119 non-null float64
Village Tract    13064 non-null object
Total            13064 non-null float64
dtypes: float64(3), object(5)
memory usage: 982.5+ KB
