## We Need to Talk + MIT Code for Good '22
This notebook reads from this [data spreadsheet](https://docs.google.com/spreadsheets/d/1_OsK5jXUoQP0JRrfKKwzCxPS936-Qp3fE6RgNR_a82I/edit#gid=2114958450) that our CFG team has gathered, and utilizes a simple model to calculate period poverty scores across 81 different provinces in Turkey. In the future, as more data is obtained, the model and spreadsheet can be modified to accomodate for these changes.

The following links are helpful to get kickstarted with the Google Sheets API:
- https://developers.google.com/sheets/api/quickstart/python
- https://blog.coupler.io/python-to-google-sheets/

This notebook requires:
- pandas
- google-auth 2.3.3
- google-api-python-client 2.35.0
- google-api-core 2.4.0
- google-auth-oauthlib 0.4.6

In [9]:
from googleapiclient.discovery import build
from google.oauth2 import service_account
from googleapiclient.errors import HttpError
import pandas as pd
import json
from collections import defaultdict

In [36]:
# Scope to allow read/write to the service account's files
SCOPES = ['https://www.googleapis.com/auth/spreadsheets']
SERVICE_ACCOUNT_FILE = "we-need-to-talk-338617-1bfc415b1e1b.json"

CREDENTIALS = service_account.Credentials.from_service_account_file(SERVICE_ACCOUNT_FILE, scopes=SCOPES)
SPREADSHEET_ID = "1_OsK5jXUoQP0JRrfKKwzCxPS936-Qp3fE6RgNR_a82I"

In [24]:
# Try an example first

SHEET_RANGE = "Data!A1:M82"
try:
    service = build('sheets', 'v4', credentials=creds)

    # Call the Sheets API
    sheet = service.spreadsheets()
    result = sheet.values().get(spreadsheetId=DATA_SHEET_ID, range=SHEET_RANGE).execute()
    values = result.get('values', [])

    if not values:
        raise Exception('No data found.')

except HttpError as err:
    print(err)

In [25]:
df = pd.DataFrame(values[1:], columns=values[0])
df

Unnamed: 0,Province Name,Region,Phone Prefix,Population (2019-2020 Estimate),Number of Menstruators (Estimate),Period Poverty Score
0,Adana,Mediterranean,322,2237940,680581,0
1,Adıyaman,Southeastern Anatolia,416,626465,190515,0
2,Afyonkarahisar,Aegean,272,729483,221843,0
3,Ağrı,Eastern Anatolia,472,536199,163064,0
4,Aksaray,Central Anatolia,382,416567,126682,0
...,...,...,...,...,...,...
76,Uşak,Aegean,276,370509,112676,0
77,Van,Eastern Anatolia,432,1136757,345700,0
78,Yalova,Marmara,226,270976,82407,0
79,Yozgat,Central Anatolia,354,421200,128091,0


In [65]:
MONTHLY_MENSTRUAL_COSTS = 200  # Assuming the purchase of pads, units in Turkish Liras (TRY)
AVG_NUM_FEMALES_PER_HOUSEHOLD = 2  # TODO: Remove this + alter model code below if using per person vs per household

def setup_sheets_api_client(creds):
    """
    Returns the sheets api client. 
    The client object can be called as follows to read from a spreadsheet:
    
    client.values().get(spreadsheetId=xyzid, range=xyzrange).execute()
    
    Raises HttpError if the connection fails.
    """
    service = build('sheets', 'v4', credentials=creds)
    return service.spreadsheets()

def add_sheet_data_to_dict(sheets_api_client, sheet_range, data_dict):
    """
    Uses sheets api client to read a specified range from the data spreadsheet
    (Id can be found in the URL: https://docs.google.com/spreadsheets/d/<ID HERE>/edit#gid=blah).
    Adds the data spanning the range to data_dict in the format: 
    {
        Adana: {
            "Region": "Mediterranean", "Population": 200, ...
        },
        Istanbul: {
            "Region": "Marmara", ...
        }, ...
    }
    
    Raises Exception if no data is found, or HttpError if the connection fails.
    """
    result = sheets_api_client.values().get(spreadsheetId=SPREADSHEET_ID, range=sheet_range).execute()
    values = result.get('values', [])

    if not values:
        raise Exception('No data found.')

    df = pd.DataFrame(values[1:], columns=values[0])
    
    for i in range(len(df)):
        province = df.iloc[i]["Province Name"]
        for col in df.columns:
            data_dict[province][col] = df.iloc[i][col]
            
def calculate_period_poverty_per_province(data_dict, c1=1, c2=100, c3=1):
    """
    Calculates and stores period poverty score by province in data_dict.
    Tentative formula:
    
    c1 * wb + c2 * mmp/mi + c3 * hr * hc
    where (all values are calculated estimates or data taken from reputable organizations):
        - c1, c2, and c3 are tunable coefficients
        - wb  := well being index (HDI) for females
        - mmp := monthly menstrual product costs per person
        - mi  := monthly income per person
        - hr  := hospitalization rate per person due to vaginitis or related infection/disease
        - hc  := avg hospitalization costs due to vaginitis or related infection/disease
        
    Default values for c1, c2, c3 chosen arbitrarily. These can and should be adjusted.
    """
    for province in data_dict:
        try:
            wb = float(data_dict[province]["Well-Being Index (HDI) Females"])
        except KeyError:
            print(province)
            break
        
        # TODO: figure out whether we want mmp and mi to be per household or per person
        mmp = MONTHLY_MENSTRUAL_COSTS
        mi = float(data_dict[province]["Mean Household Income"]) / AVG_NUM_FEMALES_PER_HOUSEHOLD
        
        # TODO: make this correct + add 'hospital' visit to the column title
        hr = float(data_dict[province]['Number of Visits per Person']) * 1
        hc = 1  # TODO: Placeholder, make this number correct
        data_dict[province]["Period Poverty Score"] = c1 * wb + c2 * mmp/mi + c3 * hr * hc

#### Notes:
- The Well-Being Index comes from the [Human Development Indices project](https://globaldatalab.org/shdi/2019/gender-development/TUR/?levels=1%2B4&interpolation=1&extrapolation=0&nearest_real=0) with its technical details explained [here](http://hdr.undp.org/sites/default/files/hdr2020_technical_notes.pdf).
- Mean income data taken from [here](https://data.tuik.gov.tr/Bulten/Index?p=Income-and-Living-Conditions-Survey-Regional-Results-2020-37405).
- See the Sources tab in our master spreadsheet for more details on how we obtained the data from international organizations' online public data.

In [74]:
if __name__ == "__main__":
    client = setup_sheets_api_client(CREDENTIALS)
    data_dict = defaultdict(dict)
    
    # Add more tabs and columns to this list to change the information processed by the model
    relevant_sheet_tab_ranges = [
        "Data!A1:F82", "Distress Index!A:G", "Income/Expenditure!A:G", "Hospitalization Rate/Cost!A:E"
    ]
    
    # Read in all relevant data into data_dict
    for sheet_range in relevant_sheet_tab_ranges:
        add_sheet_data_to_dict(client, sheet_range, data_dict)

    # Use our model to calculate the period poverty score per region
    calculate_period_poverty_per_province(data_dict)
    
    # Save results into a json file
    with open("provinces_data.json", "w") as f:
        json.dump(data_dict, f)