### Converting Google Earth Assets to ~50 CSV Files 

* National average embeddings data per county for all states (2017 to 2024).
* Each asset represents one state (according to the FIPS code).
* State FIPS Codes available here: https://transition.fcc.gov/oet/info/maps/census/fips/fips.txt

#### Saving as CSV to `.\notebooks\national_embeddings\all_embeddings_csvs`

Using the `convert_to_df()` function from `utils.py`: 

In [16]:
from pathlib import Path
import sys

# utils import error: add wnv_embeddings as root
PROJECT_ROOT = Path.cwd().parents[1]  # <-- wnv_embeddings
sys.path.insert(0, str(PROJECT_ROOT))

from utils.utils import convert_to_df
import pandas as pd
import ee
import requests

In [3]:
# will prompt you to authorize access to GEE
# this is needed to obtain assets from the cloud saved under your account
ee.Authenticate()

# enter your own registered project name here
ee.Initialize(project="wnv-embeddings")

In [4]:
state_fips_codes = [
    "01", "02", "04", "05", "06", "08", "09", "10", "11", "12",
    "13", "15", "16", "17", "18", "19", "20", "21", "22", "23",
    "24", "25", "26", "27", "28", "29", "30", "31", "32", "33",
    "34", "35", "36", "37", "38", "39", "40", "41", "42", "44",
    "45", "46", "47", "48", "49", "50", "51", "53", "54", "55", "56"
  ]

In [None]:
# =============CONVERT GEE ASSETS TO CSVS============= #
# ONLY RUN ONCE TO CONVERT ALL 56 ASSETS AS CSV #

# now obtaining the csvs
# csv_destination = Path("all_embeddings_csvs")
# csv_destination.mkdir(parents=True, exist_ok=True)

# for fips in state_fips_codes:
# 	gee_path = f"users/angel314/{fips}_2017_2024_embeddings"
	
# 	save_to = csv_destination / f"{fips}-avg-embeddings-2017-2024.csv"

# 	convert_to_df(gee_path, True, save_to)

#### Appending Yearly WNV Case Data + County Population Data

##### Getting WNV Case Data:
* Source: https://www.cdc.gov/west-nile-virus/data-maps/historic-data.html  
* Section: "Explore county level data for 1999-2024" - "Yearly data"
	* Returns: one CSV with case data at a county level for 1999-2024

##### Getting Population Data:
* Using this endpoint to get population data for a given county for a given year:
"https://api.census.gov/data/{current_year}/pep/population?get=GEONAME,POP&for=county:*"

	* Examples here: https://api.census.gov/data/2015/pep/population/examples.html 

	* Returns: county name, state name, population, state FIPS, county FIPS. The state fips and county fips will be concatenated to get the complete FIPS code for each county and merge with `all_embeddings_csvs` data.

* Can get API key here if needed (for 500+ API calls): https://api.census.gov/data/key_signup.html

County population data is needed for each year to normalize based on this formula:

$\textnormal{Cases per 100k} = \frac{\textnormal{Number of disease cases}}{\textnormal{County population}} \times 100,000$

Normalized cases (cases per 100k) will be the target variable when measuring machine learning models' performance.

This is a preview of WNV County Cases from 1999 to 2024.

In [14]:
cases = pd.read_csv("./national_wnv_case_data/wnv_county_cases_1999_2024.csv")
cases.head()

Unnamed: 0,FullGeoName,Year,Location,Activity,Total human disease cases,Neuroinvasive disease cases,**Presumptive viremic blood donors,Notes
0,"AL, Autauga",2024,1001,Human infections,2.0,1.0,0.0,
1,"AL, Baldwin",2024,1003,Human infections and non-human activity,2.0,1.0,0.0,
2,"AL, Chilton",2024,1021,Human infections,1.0,1.0,0.0,
3,"AL, Cullman",2024,1043,Human infections,2.0,1.0,0.0,
4,"AL, Dallas",2024,1047,Human infections,1.0,0.0,0.0,


In [31]:
for year in range(2017, 2024 + 1):
  csv_save_path = f"./all_county_populations/{year}_national_populations.csv"

  # * is a wildcard for all counties
  url = f"https://api.census.gov/data/{year}/pep/population?get=GEONAME,POP&for=county:*"
  r = requests.get(url)
  data = r.json()
  
  # data[1:] = all rows except header, data[0] = header
  df = pd.DataFrame(data[1:], columns=data[0])
  df["year"] = year
  df["GEOID"] = df["state"] + df["county"]
  
  df.to_csv(csv_save_path, index=False)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [35]:
url = (
    "https://api.census.gov/data/2018/pep/population"
    "?get=GEONAME,POP&for=county:*&in=state:*"
)
r = requests.get(url)
print(r.status_code)


200
