<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

# Setup

* Ensure that you have already downloaded and installed Python on your machine
* Now, install the relevant libraries for this project

In [1]:
# !pip install pandas
# !pip install geopandas
# !pip install numpy
# !pip install folium
# !pip install shapely
# !pip install gdown

In [1]:
# run this cell to import libraries
import pandas as pd
import geopandas as gpd
import numpy as np

# functions to process PeopleGroups datasets
from features import *

### Links
* GitHub: https://github.com/andrewjoc/ihs
* Google Drive: https://drive.google.com/drive/folders/1_4R9ut87eemnxWH53VN8QCSyRugit27s?usp=sharing

### Notes:
* It is important to run the cells of this jupyter notebook **sequentially**
* If you are just getting started with this notebook, run each cell **individually** because running all cells will take a very long time
* Things aren't working as expected? Errors?
    * try restarting your kernel and clearing all outputs
    * run cells sequentially (from top to bottom)
    * is the error explicit in what went wrong? google/stackoverflow probably has an answer
    * email Andrew or David
* Current implementation only works for validating populations at the ADM1 level.

<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

# Validation Code 

* If you find that there's a country that you'd like to validate but currently has no ADM1 data (according to the output of the function below), feel free to navigate to the `global_adm1_populations.xlsx` spreadsheet in the google drive and manually input the ADM1 data. Be sure to include a source and the year of the population estimate.
* uncomment the line below (delete only the #) to view a list of countries ready to be validated

### Countries that can be Validated (ADM1)

In [2]:
countries_with_data()

A: "Afghanistan"
B: "Bangladesh", "Brunei", "Burkina Faso", "Burundi"
C: "Cambodia", "Chad", "Comoros", "Congo, Republic of", "Cuba"
I: "India", "Indonesia"
J: "Jamaica"
K: "Kazakhstan", "Kyrgyzstan"
L: "Laos"
M: "Malaysia", "Myanmar"
N: "Nauru"
P: "Papua New Guinea", "Philippines"
S: "Singapore", "Suriname"
T: "Thailand", "Timor-Leste", "Tonga", "Turkmenistan"
V: "Vietnam"


### Validation Function

* only change the country variable
* If you get an error, it is either because the country is spelled incorrectly/differently or there is no ADM1 population data available for the country (work in progress)
* small countries will take 1-4 minutes (example: Brunei, Vietnam, Philippines)
* medium to large size countries (in terms of number of ADM1s) will take anywhere from 8-12 minutes. (examples: Indonesia, India)

In [3]:
country = "Brunei"

# no need to edit anything below this line
results = validate_country(country, verbose=True)
results

Processing input data
Finding overlapping polygons
All people groups in Brunei are valid.


Unnamed: 0,People Group,Alpha-3 Code,People Group Population,Country,geometry,ADM1 Boundaries Present,Total Boundary Population,Valid People Group,Percent Boundary Population
3039,Tutung,BRN,17500,Brunei,"MULTIPOLYGON (((114.76083 4.88447, 114.76488 4...","[Belait, Brunei-Muara, Tutong]",432131.0,True,4.049698
4461,Bisayan Tutong,BRN,29000,Brunei,"MULTIPOLYGON (((114.80305 4.65803, 114.80081 4...","[Belait, Brunei-Muara, Tutong]",432131.0,True,6.710928
4942,Brunei Malay,BRN,328000,Brunei,"MULTIPOLYGON (((115.06276 5.03770, 115.07354 5...","[Belait, Brunei-Muara, Temburong, Tutong]",441631.0,True,74.270149
5876,Gurkha,BRN,1700,Brunei,"MULTIPOLYGON (((115.06276 5.03770, 115.07354 5...",[Brunei-Muara],319300.0,True,0.532415
5892,Punan,BRN,80,Brunei,"MULTIPOLYGON (((114.45625 4.38532, 114.47203 4...",[Belait],65531.0,True,0.12208
7409,British,BRN,6600,Brunei,"MULTIPOLYGON (((115.06276 5.03770, 115.07354 5...",[Brunei-Muara],319300.0,True,2.067022
7820,Indo-Pakistani,BRN,11500,Brunei,"MULTIPOLYGON (((115.06276 5.03770, 115.07354 5...",[Brunei-Muara],319300.0,True,3.601629
7905,Dusun,BRN,31500,Brunei,"MULTIPOLYGON (((115.06276 5.03770, 115.07354 5...",[Brunei-Muara],319300.0,True,9.86533
8335,Han Chinese,BRN,7900,Brunei,"MULTIPOLYGON (((115.06276 5.03770, 115.07354 5...",[Brunei-Muara],319300.0,True,2.474162
9988,Iban,BRN,20000,Brunei,"MULTIPOLYGON (((114.81925 4.50731, 114.81856 4...","[Belait, Temburong, Tutong]",122331.0,True,16.349086


<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

# Map

* If the country you validated in the section before noted that all people groups in the country were valid, this section is irrelevant
* You should be able to view a map of the people groups that did not intersect with an ADM1 boundary  

In [16]:
map_results(results)

This country has no people groups that did not intersect with ADM1 boundaries.


<hr style="border: 5px solid #003262;" />
<hr style="border: 1px solid #fdb515;" />

# Saving Results

* optional section to save your work from above

### CSV file

In [15]:
# run this cell to save the results from above as a csv file
import os

if os.path.isdir('output'):
    pass
else:
    os.mkdir('output')
    
results.to_csv(f'./output/{country}_validated.csv')

### Excel

In [14]:
# run this cell to save the results from above as an xlsx file
if os.path.isdir('output'):
    pass
else:
    os.mkdir('output')

results.to_excel(f'./output/{country}_validated.xlsx')