## <span style=color:blue>Fetching weather data from NASA POWER </span>

<span style=color:blue>Start by merging year_state_county_yield.csv and state_county_lon_lat.csv into a df.  Will use this below as the backbone for fetching and formating the weather data</span>

In [83]:
# This useful if I want to give unique names to directories or files
import datetime
def curr_timestamp():
    current_datetime = datetime.datetime.now()
    formatted_datetime = current_datetime.strftime("%Y-%m-%d_%H-%M-%S")
    return formatted_datetime

In [91]:
import pandas as pd

archive_dir = '/Users/rick/AG-CODE--v03/ML-ARCHIVES--v01/'
yscy_file = 'year_state_county_yield.csv'
scll_file = 'state_county_lon_lat.csv'

df_yscy = pd.read_csv(archive_dir + yscy_file)
df_scll = pd.read_csv(archive_dir + scll_file)

# Recall that when getting the lon-lat, I changed the name of "DU PAGE, ILLINOIS" to "DUPAGE, ILLINOIS"
# I will make the same name substitution in df_yscy
index_list = df_yscy.index[(df_yscy['county_name'] == 'DU PAGE') | (df_yscy['county_name'] == 'DUPAGE')].tolist()
print(index_list)
for i in index_list:
    df_yscy.at[i, 'county_name'] = 'DUPAGE'
    print(df_yscy.at[i, 'county_name'])

[279, 280, 281, 282, 283]
DUPAGE
DUPAGE
DUPAGE
DUPAGE
DUPAGE


In [93]:
print(len(df_yscy), len(df_scll))

df_yscyll = pd.merge(df_yscy,df_scll, on=['state_name','county_name'],how='left')

print()
print(len(df_yscyll))
print()
# print(df_yscyll.head(30))

# sanity check - that lon/lat's in new df correspond to the lon/lat's from table state_county_lon_lat.csv
print(df_yscyll[df_yscyll['year'] == 2022].head(10))
print(df_scll.head(10))

# checking on the DU PAGE county entries
print()
print(df_yscyll.iloc[279:284].head())

yscyll_filename = 'year_state_county_yield_lon_lat.csv'
df_yscyll.to_csv(archive_dir + yscyll_filename, index=False)

print('wrote file: ', archive_dir + yscyll_filename)

9952 559

9952

     year state_name  county_name  yield        lon        lat
0    2022   ILLINOIS       BUREAU   67.5 -89.534118  41.401629
20   2022   ILLINOIS      CARROLL   68.9 -89.955679  42.064735
40   2022   ILLINOIS        HENRY   66.8 -90.117744  41.341855
60   2022   ILLINOIS   JO DAVIESS   62.6 -90.174374  42.350666
79   2022   ILLINOIS          LEE   66.8 -89.286030  41.747311
99   2022   ILLINOIS       MERCER   65.0 -90.739872  41.201973
118  2022   ILLINOIS         OGLE   67.6 -89.313860  42.039701
138  2022   ILLINOIS       PUTNAM   64.6 -89.267641  41.202591
156  2022   ILLINOIS  ROCK ISLAND   66.3 -90.576614  41.441179
175  2022   ILLINOIS   STEPHENSON   62.3 -89.673564  42.350347
  state_name  county_name        lon        lat
0   ILLINOIS       BUREAU -89.534118  41.401629
1   ILLINOIS      CARROLL -89.955679  42.064735
2   ILLINOIS        HENRY -90.117744  41.341855
3   ILLINOIS   JO DAVIESS -90.174374  42.350666
4   ILLINOIS          LEE -89.286030  41.747311
5  

### <span style=color:blue>Building a function that, given a year and a lon-lat, pulls 7 momth's worth of weather data from NASA POWER for that lon-lat.  We focus on 2003 to 2022. We pull data only for the growing season for soybeans, which is April to October. 
    
<span style=color:blue>(I am inferring the growing season for my 7 states based on https://crops.extension.iastate.edu/encyclopedia/soybean-planting-date-can-have-significant-impact-yield which is about Iowa, which is about central north-south in my region of interest.)</span>

In [7]:
# setting up a URL template for making requests to NASA POWER

# growing season from April to October

'''
import json

working_dir = '/Users/rick/AG-CODE--v03/NASA-POWER/OUTPUTS/'
county_file = 'county_lat_long.csv'

dfcty = pd.read_csv(working_dir + county_file)
# print(dfcty.head())
'''

# see https://gist.github.com/abelcallejo/d68e70f43ffa1c8c9f6b5e93010704b8
#   for available parameters
# I will focus on the following parameters
weather_params = ['T2M_MAX','T2M_MIN', 'PRECTOTCORR', 'GWETROOT', 'EVPTRNS', 'ALLSKY_SFC_PAR_TOT']
'''
   T2M_MAX: The maximum hourly air (dry bulb) temperature at 2 meters above the surface of the 
             earth in the period of interest.
   T2M_MIN: The minimum hourly air (dry bulb) temperature at 2 meters above the surface of the 
            earth in the period of interest.
   PRECTOTCORR: The bias corrected average of total precipitation at the surface of the earth 
                in water mass (includes water content in snow)
   EVPTRNS: The evapotranspiration energy flux at the surface of the earth
   ALLSKY_SFC_PAR_TOT: The total Photosynthetically Active Radiation (PAR) incident 
         on a horizontal plane at the surface of the earth under all sky conditions
'''

# Now setting up parameterized URLs which will pull weather data,
# focused on growing season, which is April to October
# following 
#     https://power.larc.nasa.gov/docs/tutorials/service-data-request/api/
base_url = r"https://power.larc.nasa.gov/api/temporal/daily/point?"
base_url += 'parameters=T2M_MAX,T2M_MIN,PRECTOTCORR,GWETROOT,EVPTRNS,ALLSKY_SFC_PAR_TOT&'
base_url += 'community=RE&longitude={longitude}&latitude={latitude}&start={year}0401&end={year}1031&format=JSON'
# print(base_url)

In [107]:
import json
import requests

def fetch_weather_county_year(year, state, county):
    row = df_yscyll.loc[(df_yscyll['state_name'] == state) & \
                        (df_yscyll['county_name'] == county) & \
                        (df_yscyll['year'] == year)]    
    # print(row)
    # print(type(row))
    lon = row.iloc[0]['lon']
    lat = row.iloc[0]['lat']
    # print(lon, lat)
    api_request_url = base_url.format(longitude=lon, latitude=lat, year=str(year))

    # this api request returns a json file
    response = requests.get(url=api_request_url, verify=True, timeout=30.00)
    # print(response.status_code)
    content = json.loads(response.content.decode('utf-8'))
    
    # print('\nType of content object is: ', type(content))
    # print(json.dumps(content, indent=4))

    # print('\nKeys of content dictionary are: \n', content.keys())
    # print('\nKeys of "properties" sub-dictionary are: \n', content['properties'].keys())
    # print('\nKeys of "parameter" sub-dictionary are: \n', content['properties']['parameter'].keys())
    # print()
    
    weather = content['properties']['parameter']

    df = pd.DataFrame(weather)    
    return df

# sanity check
df = fetch_weather_county_year(2022, 'ILLINOIS', 'LEE')

# examining the output
print(len(df))
print()
print(df.head())

214

          T2M_MAX  T2M_MIN  PRECTOTCORR  GWETROOT  EVPTRNS  ALLSKY_SFC_PAR_TOT
20220401     7.94    -5.22         0.05      0.74     0.06              116.01
20220402     7.16    -1.35         4.71      0.74     0.00               35.17
20220403    11.33    -3.40         2.21      0.74     0.10               97.62
20220404    11.17     3.71         2.17      0.74     0.09               46.62
20220405    12.72    -1.72         4.39      0.74     0.02               63.50


<span style=color:blue>Create dictionary keyed by the indices of the year_state_county_yield_lon_lat df, and where each entry is a df of weather info pulled from NASA POWER for the given year and lon-lat</span>

In [108]:
# w_df will be a dictionary of df's, each holding weather info for
#    one year-state-county triple
# The dictionary keys will be the index values of df_yscyll that we
#    built above (also archived in year_state_county_yield_lon_lat.csv)
w_df = {}

# will archive each w_df[i] value into a csv as we go along, for 2 reasons:
#   - because this takes forever to run
#   - network connectivity or other errors in middle of run may abort the process
out_dir = archive_dir + 'WEATHER-DATA--v01/'
filename = r'weather-data-for-index__{index}.csv'

starttime = datetime.datetime.now().strftime("%Y-%m-% %H:%M:%S")

# for i in range(0,len(df_yscyll)):
# for i in range(278,280):    # had to fix issue of DU PAGE county...
# for i in range(280,len(df_yscyll)):
# for i in range(1779,1780):  # when running big loop it failed at 1779; but worked in this run; network issue?
# for i in range(1779, len(df_yscyll)): # blocked at 2534
# for i in range(2534,2535):    # when running big loop it failed at 1779; but worked in this run; network issue?
for i in range(2535, len(df_yscyll)):
    row = df_yscyll.iloc[i]
    # print(row)
    w_df[i] = fetch_weather_county_year(row['year'], row['state_name'], row['county_name'])
    outfilename = out_dir + filename.format(index=str(i).zfill(4))
    w_df[i].to_csv(outfilename)
    # adding this to get a feeling of forward progress
    if i % 10 == 0:
        print('\nFinished work on index: ', i, '     at time: ', datetime.datetime.now().strftime("%Y-%m-% %H:%M:%S"))
        print('   This involved fetching weather data for the following row:')
        print(row['year'],row['state_name'],row['county_name'], row['lon'],row['lat'])
        print('Wrote file: ', outfilename)

'''
# sanity check
print()
index = 2
print(r'The contents of yscyll for index {index} is:'.format(index=index), '\n')
print(df_yscyll.iloc[index])
print()
print(r'The head of the weather data for index {index} is:'.format(index=index), '\n')
print(w_df[index].head(10))
'''

endtime = datetime.datetime.now().strftime("%Y-%m-% %H:%M:%S")
print('start and end times were: ', starttime, endtime)


Finished work on index:  2540      at time:  2023-05- 04:33:02
   This involved fetching weather data for the following row:
2006 INDIANA TIPPECANOE -86.8879689 40.3916066
Wrote file:  /Users/rick/AG-CODE--v03/ML-ARCHIVES--v01/WEATHER-DATA--v01/weather-data-for-index__2540.csv

Finished work on index:  2550      at time:  2023-05- 04:33:30
   This involved fetching weather data for the following row:
2015 INDIANA VERMILLION -87.4635672 39.8763626
Wrote file:  /Users/rick/AG-CODE--v03/ML-ARCHIVES--v01/WEATHER-DATA--v01/weather-data-for-index__2550.csv

Finished work on index:  2560      at time:  2023-05- 04:33:59
   This involved fetching weather data for the following row:
2005 INDIANA VERMILLION -87.4635672 39.8763626
Wrote file:  /Users/rick/AG-CODE--v03/ML-ARCHIVES--v01/WEATHER-DATA--v01/weather-data-for-index__2560.csv

Finished work on index:  2570      at time:  2023-05- 04:34:29
   This involved fetching weather data for the following row:
2014 INDIANA VIGO -87.3974269 39.4174

KeyboardInterrupt: 