<table align="left" style="width:100%">
  <tr>
      <td style="width:15%"></td>
      <td style="width:85%"></td>
  </tr>
  <tr>
    <td style="width:10%">
        <img style="float" src="./data/images/ihs-logo.png" alt="IHS Logo" style="width: 10px;"/>
      </td>
    <td align ="left" style="width:20%">
        <div>
            <h2 style="margin-top: 0;">People Groups Validation Notebook</h2>
            <h5 style="margin-top: 5px;">Author: Andrew O'Connor</h5>
            <h7 style="margin-top: 5px;">Contributors: David Getzen</h7>
            </div>
      </td>
  </tr>
</table>

# Introduction
* The purpose of this notebook is to validate the `Population` attribute of each People Group in the People Group Areas (.geojson) dataset
    * Main Question: Is the population of each People Group less than or equal to the official population of all the level-specific administrative divisions the group intersects?

### Using this Notebook
* It is important to run the cells of this jupyter notebook **sequentially**
* If you are just getting started with this notebook, run each cell **individually**
* Things aren't working as expected/errors?
    * Restart your kernel and clear all outputs
    * Run cells sequentially (from top to bottom)
    * Search for your error on Google/Stackoverflow
    * Email Andrew or David

### Attributions/Links
* GitHub: https://github.com/andrewjoc/ihs
* Google Drive: https://drive.google.com/drive/folders/1_4R9ut87eemnxWH53VN8QCSyRugit27s?usp=sharing
* People Group Areas: https://go-imb.opendata.arcgis.com/datasets/imb::apg-people-group-areas/about
    * This information was provided by *The International Mission Board - Global Research*, March 2022, www.peoplegroups.org.
* Subnational population & Boundaries: https://fieldmaps.io/ (Max Malynowsky)
    * Subnational Population: Common Operational Datasets
    * Subnational Boundaries: Humanitarian
    * Attribution: FieldMaps, geoBoundaries, U.S. Department of State, U.S. Geological Survey
    * License: Creative Commons Attribution 4.0 International (CC BY 4.0)

<hr style="border: 5px solid #005555;" />
<hr style="border: 1px solid #mkl234;" />

# Validation Code

In [1]:
# important! - run this cell 
from functions import *

### Instructions
* Search for an area to validate by looking at the `country_inputs.txt` file
    * Ensure that you spell the country/territory correctly when you assign it to the `country` variable 
* If you see the message "IOStream.flush timed out", ignore it
* Setting adm_level = to 3 or 4 may cause you to run into memory issues
    * Consider starting with adm_level = 1 and work up to adm_level = 4
* If you're interested in seeing the directory structure of the project, uncomment the cell below by deleting only the `#` symbol

In [2]:
# view_project_structure()

---

### 1. Check what tests are available for the country

In [3]:
available_tests_widget();

interactive(children=(Dropdown(description='country', options=('Afghanistan', 'Albania', 'Algeria', 'American …

--- 

### 2. Run the Validation Code

* Copy down the name of the country/territory as shown above in the dropdown

In [4]:
# choose a level to run the validation code on
country = 'Malaysia'
adm_level = 1

# run this cell to validate, no need to change anything below this line
results = validate_country(country, adm_level)
results.head(10)

started initial loading of subnational data
loading people areas data
merged subnational data
first spatial join complete
cleaned spatial join result


Unnamed: 0,People Group,Alpha-3 Code,Country,People Group Population,geometry,boundaries_present,total_boundary_population,valid,percent_total_boundary,test_type
81,Malay,MYS,Malaysia,13670000,"MULTIPOLYGON (((109.85735 1.66335, 109.86960 1...","[MYS-20210215-01, MYS-20210215-06, MYS-2021021...",33837047.0,True,40.399506,1
107,Sea Dayak,MYS,Malaysia,657000,"MULTIPOLYGON (((113.60283 3.32615, 113.63974 3...",[MYS-20210215-13],2954635.0,True,22.236249,1
56,Javanese,MYS,Malaysia,702000,"MULTIPOLYGON (((117.83575 4.30286, 117.83202 4...",[MYS-20210215-12],4114803.0,True,17.060355,1
42,"Han Chinese, Min Nan",MYS,Malaysia,2245000,"MULTIPOLYGON (((103.92707 1.69347, 103.92775 1...","[MYS-20210215-01, MYS-20210215-06, MYS-2021021...",17986904.0,True,12.481303,1
137,Tausug,MYS,Malaysia,500000,"MULTIPOLYGON (((118.38138 5.71978, 118.54256 5...",[MYS-20210215-12],4114803.0,True,12.15125,1
126,Southern Sama,MYS,Malaysia,500000,"MULTIPOLYGON (((115.60046 5.63103, 115.60827 5...",[MYS-20210215-12],4114803.0,True,12.15125,1
39,"Han Chinese, English",MYS,Malaysia,2260000,"MULTIPOLYGON (((101.36023 2.91641, 101.36001 2...","[MYS-20210215-01, MYS-20210215-06, MYS-2021021...",23273380.0,True,9.710665,1
130,Tagalog,MYS,Malaysia,844000,"MULTIPOLYGON (((101.69035 3.27474, 101.68964 3...","[MYS-20210215-15, MYS-20210215-04, MYS-2021021...",8780705.0,True,9.611984,1
38,"Han Chinese, Cantonese",MYS,Malaysia,1795000,"MULTIPOLYGON (((101.00243 4.66351, 101.00973 4...","[MYS-20210215-06, MYS-20210215-07, MYS-2021021...",20569963.0,True,8.726316,1
133,Tamil,MYS,Malaysia,2475000,"MULTIPOLYGON (((101.29028 3.77242, 101.29273 3...","[MYS-20210215-01, MYS-20210215-06, MYS-2021021...",33580983.0,True,7.370243,1


<hr style="border: 5px solid #005555;" />
<hr style="border: 1px solid #mkl234;" />

# Maps

* The purpose of this section is to view/save a map of the results dataframe from the above section 
    * **IMPORTANT**:`save_map` will save the validation test from above in a new folder called `output`. You will likely use this function the most.
        * Note: If all people groups are valid in a country for a specific test (e.g. if adm_level = 1), then there is no map to be saved. It will only save if there are any people groups that *fail* the test (People groups are invalid if their population is greater than the total boundary population or if they do not intersect with any boundaries.)
    * `view_map` will show a map of the results dataframe from the above section. You can type in a specific people group or keep it as 'all' to view all people groups in the country. 
    
* Uncomment the cell which is most relevant for you. You can uncomment a cell by deleting only the `#` in front and running the cell.

In [11]:
# view_map(results, color='blue', people_group='all')

In [12]:
# save_map(results)

<hr style="border: 5px solid #005555;" />
<hr style="border: 1px solid #mkl234;" />