## <span style=color:blue>This notebook creates a function that, given a county, will generate some number (e.g., 1000) of lon-lat pairs that are all within that county.   </span>

<span style=color:blue>First, a function that builds an approximate bounding box around a county. </span>

<span style=color:blue>This is a little sloppy - we build a box that is 1 degree x 1 degree that is centered on the central lon-lat of the county.  Most of the counties in my 7-state soy region have this characteristic. </span>

In [61]:
import json
import pandas as pd

# will fetch the lon-lats at center of each county from the file state_county_lon_lats.csv

archive_dir = '/Users/rick/AG-CODE--v03/ML-ARCHIVES--v01/'
scll = 'state_county_lon_lat.csv'

df_scll = pd.read_csv(archive_dir + scll)
print(df_scll.head())



# Geocoding function to retrieve coordinates for a county
def approx_county_bbox(state, county):
    rows = df_scll.loc[(df_scll['state_name'] == state) & (df_scll['county_name'] == county)]
    # print(rows)
    lon = rows['lon'].values[0]
    lat = rows['lat'].values[0]
    # print(lon,lat)
    
    if True:
        west_lon = lon - 0.5
        east_lon = lon + 0.5
        north_lat = lat + 0.5
        south_lat = lat - 0.5
        return {'center_lon' : lon,
                'center_lat' : lat,
                'west_lon' : west_lon,
                'east_lon': east_lon,
                'north_lat': north_lat,
                'south_lat': south_lat
               }
    else:
        print('no lat-lon found for ', state, county)
        return {'error': 'no lat-lon found for ' + county + ', ' + state}
    

# test for Bureau County, IL
# center point lon for this county is: -89.5341179  
# center point lat for this county is:  41.4016294

bbox = approx_county_bbox('ILLINOIS', 'JO DAVIESS')
# bbox = approx_county_bbox('ILLINOIS', 'FAKE NAME')

print(json.dumps(bbox, indent=4, sort_keys=True))

  state_name county_name        lon        lat
0   ILLINOIS      BUREAU -89.534118  41.401629
1   ILLINOIS     CARROLL -89.955679  42.064735
2   ILLINOIS       HENRY -90.117744  41.341855
3   ILLINOIS  JO DAVIESS -90.174374  42.350666
4   ILLINOIS         LEE -89.286030  41.747311
{
    "center_lat": 42.3506664,
    "center_lon": -90.1743742,
    "east_lon": -89.6743742,
    "north_lat": 42.8506664,
    "south_lat": 41.8506664,
    "west_lon": -90.6743742
}


### <span style=color:blue>Now working towards a function that tests if lat-lon is in a county    </span>

<span style=color:blue>As a first step, I downloaded files from https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.html that hold polygon specifications for all of the US counties.  In particular, I fetched the Counties file that was 1:20,000,000 at the link https://www2.census.gov/geo/tiger/GENZ2022/shp/cb_2022_us_county_20m.zip.  (This was the least precise, and don't see a need for more precision.) From inside the zip directory I retrieved, the ".dbf" file seemed most useful. </span>

In [51]:
import geopandas as gpd
from shapely.geometry import Point

# downloaded this from 
county_dir = '/Users/rick/AG-CODE--v03/COUNTY-BOUNDING-POLYGONS/'
county_file = 'cb_2022_us_county_20m.dbf'
county_path = county_dir + county_file

# Load county boundary data from Shapefile
counties = gpd.read_file(county_path)

# Print column names
print(counties.head())

  STATEFP COUNTYFP  COUNTYNS        AFFGEOID  GEOID     NAME        NAMELSAD  \
0      17      127  01784730  0500000US17127  17127   Massac   Massac County   
1      27      017  00659454  0500000US27017  27017  Carlton  Carlton County   
2      37      181  01008591  0500000US37181  37181    Vance    Vance County   
3      47      079  01639755  0500000US47079  47079    Henry    Henry County   
4      06      021  00277275  0500000US06021  06021    Glenn    Glenn County   

  STUSPS      STATE_NAME LSAD       ALAND    AWATER  \
0     IL        Illinois   06   614218330  12784614   
1     MN       Minnesota   06  2230473967  36173451   
2     NC  North Carolina   06   653701481  42190675   
3     TN       Tennessee   06  1455320362  81582236   
4     CA      California   06  3403160299  33693344   

                                            geometry  
0  POLYGON ((-88.92876 37.30285, -88.90507 37.335...  
1  POLYGON ((-93.06133 46.76655, -92.30168 46.764...  
2  POLYGON ((-78.49778 

<span style=color:blue>The state_name, county_name values from the USDA NASS yield data are all capitals, and need to convert to the format above, which is first-letter-is-capitalized     </span>

In [52]:
# test
print('NEW JERSEY'.title())
print('DU PAGE'.title())

New Jersey
Du Page


<span style=color:blue>Function to test with a given lon-lat is in a state-county     </span>

In [53]:
# Load county boundary data; this is a .dbf file

# downloaded this from 
county_dir = '/Users/rick/AG-CODE--v03/COUNTY-BOUNDING-POLYGONS/'
county_file = 'cb_2022_us_county_20m.dbf'
county_path = county_dir + county_file
counties = gpd.read_file(county_path)

def lon_lat_in_county(longitude, latitude, state_name, county_name):
    # Load county boundary data; this is a .dbf file
    counties = gpd.read_file(county_path)

    # Find the specified county
    county = counties[(counties['NAME'] == county_name.title()) & (counties['STATE_NAME'] == state_name.title())]
    # print(county)

    if county.empty:
        print(f"County '{county_name}' not found.")
        return False

    # Create shapely point from the provided latitude and longitude
    point = Point(longitude, latitude)

    # Check if the point is within the county polygon
    return point.within(county.geometry.values[0])

     

# test
state_name = 'ILLINOIS'
county_name = "JO DAVIESS"
lon_in = -90.174374
lat_in = 42.350666
lon_out = -95
lat_out = 35

print(lon_lat_in_county(lon_in, lat_in, state_name, county_name))
print(lon_lat_in_county(lon_out, lat_out, state_name, county_name))

True
False


<span style=color:blue>Function that generates some number of lon-lat pairs that are within a county     </span>

In [62]:
# assumes state_name, county_name are all-caps, as in the USDA NASS yield data sets

import random

def gen_lon_lat_in_county(state_name, county_name, count):
    list = []
    bbox = approx_county_bbox(state_name, county_name)
    # print(json.dumps(bbox, indent=4, sort_keys=True))
    for i in range(0,count):
        r1 = random.uniform(0,1)
        r2 = random.uniform(0,1)
        # print(r1,r2)
        lon = round(bbox['east_lon'] + r1*(bbox['west_lon'] - bbox['east_lon']), 7)
        lat = round(bbox['south_lat'] + r2*(bbox['north_lat'] - bbox['south_lat']), 7)
        list += [[lon,lat]]
    return list
    
# test
list = gen_lon_lat_in_county('ILLINOIS','JO DAVIESS',1000)
print(json.dumps(list[0:5], indent=4))
print()
print(json.dumps(list[995:1000], indent=4))

[
    [
        -90.0610479,
        42.4696693
    ],
    [
        -90.1680891,
        42.2966279
    ],
    [
        -89.7655824,
        42.4688609
    ],
    [
        -89.9835371,
        42.1096186
    ],
    [
        -90.0349361,
        42.2902176
    ]
]

[
    [
        -90.301416,
        42.0801605
    ],
    [
        -90.33245,
        42.0691869
    ],
    [
        -90.1663721,
        42.6104402
    ],
    [
        -90.4424567,
        42.2796999
    ],
    [
        -89.7167605,
        42.1619217
    ]
]


<span style=color:blue>     </span>

<span style=color:blue>Based on the file state_county_lon_lat.csv, build a dictionary with shape state / county / seq_of_lon_lat_in_county.  Actually, this cell is a warm up.    </span>

In [63]:


print(df_scll.state_name.unique())
# answer is: ['ILLINOIS' 'INDIANA' 'IOWA' 'MISSOURI' 'NEBRASKA' 'OHIO']

# oh - realizing now that somehow Minnesota got dropped from my set of states
# It was in my notebook ML-for-soybeans-part-01--fetching-yield-data, where
# I mispelled MINNESTOTA.  Not fixing it for now...

dict = {}
for state in df_scll.state_name.unique():
    dict[state] = {}

print(json.dumps(dict, indent=4, sort_keys=True))


['ILLINOIS' 'INDIANA' 'IOWA' 'MISSOURI' 'NEBRASKA' 'OHIO']
{
    "ILLINOIS": {},
    "INDIANA": {},
    "IOWA": {},
    "MISSOURI": {},
    "NEBRASKA": {},
    "OHIO": {}
}


<span style=color:blue>Here is a function that walks through all the state-county pairs of df_scll, and for each one creates a sequence of 1000 lon-lats in that state-county, and puts that into dict.     </span>

In [66]:
import datetime


def create_lon_lat_seqs(count):
    dict = {}
    for state in df_scll.state_name.unique():
        dict[state] = {}
    for i in range(0, len(df_scll)):
        row = df_scll.iloc[i]
        # print(row)
        state = row['state_name']
        county = row['county_name']
        dict[state][county] = gen_lon_lat_in_county(state, county, count)
        if i % 50 == 0:
            print(f'Have completed generation of {str(i)} sequences of lon-lats')
    return dict
    
    
print(datetime.datetime.now())
dict = create_lon_lat_seqs(5000)
print(datetime.datetime.now())

# print(json.dumps(dict, indent=4, sort_keys=True))
    

2023-05-28 21:46:50.959669
Have completed generation of 0 sequences of lon-lats
Have completed generation of 50 sequences of lon-lats
Have completed generation of 100 sequences of lon-lats
Have completed generation of 150 sequences of lon-lats
Have completed generation of 200 sequences of lon-lats
Have completed generation of 250 sequences of lon-lats
Have completed generation of 300 sequences of lon-lats
Have completed generation of 350 sequences of lon-lats
Have completed generation of 400 sequences of lon-lats
Have completed generation of 450 sequences of lon-lats
Have completed generation of 500 sequences of lon-lats
Have completed generation of 550 sequences of lon-lats
2023-05-28 21:47:20.237959


<span style=color:blue>Save dict as json  </span>

In [67]:
archive_dir = '/Users/rick/AG-CODE--v03/ML-ARCHIVES--v01/'
out_file = 'state_county__seq_of_lon_lats.json'

with open(archive_dir + out_file, 'w') as fp:
    json.dump(dict, fp)