### eBird API - Bird Data in Illinois (test)

Bird groups are defined according to the shared document.

**Test Time frame:** December 01, 2025 - December 31, 2025

**Dabbling Ducks (testing with this group only):**  
● Mallard					  
● American black duck 				  
● Mallard/Black duck hybrid					  
● Gadwall					  
● American wigeon				  
● Northern pintail				  
● Northern shoveler					  
● Green-winged teal				  
● Blue-winged teal				  
● Wood duck	

Refer to API docs:  
https://documenter.getpostman.com/view/664302/S1ENwy59#intro

In [30]:
import pandas as pd
import geopandas as gpd
import requests
from dotenv import load_dotenv
import os
from datetime import datetime, timedelta
import time
import json
from pathlib import Path
# for i/o-bound processes
from concurrent.futures import ThreadPoolExecutor, as_completed
import re

In [3]:
dabbling = ["Mallard","American black duck", 
            "Black duck hybrid", "Gadwall", 
            "American wigeon", "Northern pintail",
              "Northern shoveler", "Green-winged teal",
                "Blue-winged teal", "Wood duck"]

midwestern_states = ["Illinois","Wisconsin","Minnesota",
                     "Iowa","Missouri","Tennessee",
                     "Kentucky","Indiana","Ohio",
                     "Michigan"]

Apart from defining the dabbling birds and midwestern states, I have to define the Illinois counties I will get bird data for.

2025 TIGER/Line® Shapefiles: Counties (and equivalent):  
https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2025&layergroup=Counties+%28and+equivalent%29


Illinois fips code: 17

##### Crowdsourced data eBird API:

flow:
* Request data using the historical bird data api.
* take advantage of the endpoint's subregion  

In [4]:
load_dotenv()

True

In [5]:
def get_counties(state_fips: str) -> list:
	us_counties = gpd.read_file("./shapefiles/tl_2025_us_county.zip")
	spec_counties = us_counties[us_counties["STATEFP"]==state_fips]
	# I will iterate over this in the next section to get county specific data for the state.
	county_fips_codes = list(spec_counties["COUNTYFP"].unique())
	return county_fips_codes

In [19]:
def get_bird_data_date(state: str, county: str, date: str) -> pd.DataFrame:
	'''
	Handles a single request (since I run this with ThreadPoolExecutor).\n
	Uses the historic data endpoint to pull observation data for specified state.\n
	Enter date in the following format: mm/dd/yy. Ex. 12/01/25

	Uses the following format:
	https://api.ebird.org/v2/data/obs/{state}-{county_fips}/historic/{y}/{m}/{d}
	'''

	api_key = os.getenv("EBIRD_API_KEY")
	headers = {'X-eBirdApiToken':api_key}
	
	# start_date = datetime.strptime(start_date, "%m/%d/%y")
	# end_date = datetime.strptime(end_date, "%m/%d/%y")

	cache_file = Path("cache") / f"{state}-{county}_{date.strftime("%Y-%m-%d")}.json"
	# check if i already pulled this data
	if cache_file.exists():
		with open(cache_file, "r") as f:
			# day_data = json.load(f)
			# data.extend(day_data)
			return json.load(f)
		# continue

	url = f"https://api.ebird.org/v2/data/obs/{state}-{county}/historic/{date.year}/{date.month:02}/{date.day:02}"
	response = requests.get(url, headers=headers)

	if response.status_code == 200:
			day_data = response.json()
			# data.extend(day_data)
			with open(cache_file, "w") as f:
				json.dump(day_data, f)
			return day_data
	else:
		print(f"Failed {state}-{county} {date.date()} ({response.status_code})")
		return []

In [21]:
# now to test with illinois (fips 17)
county_fips_codes = get_counties("17")
start_date = datetime.strptime("12/01/25", "%m/%d/%y")
end_date = datetime.strptime("12/31/25", "%m/%d/%y")
state = "US-IL"

tasks = []

for county in county_fips_codes:
	# note: county is a string in the format "000" represnting county fips code
	for day in range((end_date - start_date).days + 1):
		date = start_date + timedelta(days=day)
		tasks.append((county, date))

data = []	
with ThreadPoolExecutor(max_workers=4) as executor:
	futures = {executor.submit(get_bird_data_date, state, county, date): 
						(county, date) for county, date in tasks}
	for fut in as_completed(futures):
		day_data = fut.result()
		data.extend(day_data)

In [22]:
test_run = pd.DataFrame(data)
test_run

Unnamed: 0,speciesCode,comName,sciName,locId,locName,obsDt,howMany,lat,lng,obsValid,obsReviewed,locationPrivate,subId,exoticCategory
0,rebwoo,Red-bellied Woodpecker,Melanerpes carolinus,L1939440,Brown Residence (Private Property),2025-12-01 15:50,1.0,42.194623,-90.196417,True,False,False,S286985601,
1,dowwoo,Downy Woodpecker,Dryobates pubescens,L1939440,Brown Residence (Private Property),2025-12-01 15:50,4.0,42.194623,-90.196417,True,False,False,S286985601,
2,haiwoo,Hairy Woodpecker,Leuconotopicus villosus,L1939440,Brown Residence (Private Property),2025-12-01 15:50,1.0,42.194623,-90.196417,True,False,False,S286985601,
3,blujay,Blue Jay,Cyanocitta cristata,L1939440,Brown Residence (Private Property),2025-12-01 15:50,5.0,42.194623,-90.196417,True,False,False,S286985601,
4,bkcchi,Black-capped Chickadee,Poecile atricapillus,L1939440,Brown Residence (Private Property),2025-12-01 15:50,6.0,42.194623,-90.196417,True,False,False,S286985601,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45441,norcar,Northern Cardinal,Cardinalis cardinalis,L23050850,Shabbona State Park,2025-12-31 11:00,2.0,41.743751,-88.860947,True,False,True,S291477170,
45442,moudov,Mourning Dove,Zenaida macroura,L4117597,Home,2025-12-31 07:51,6.0,41.905216,-88.766747,True,False,True,S291377690,
45443,dowwoo,Downy Woodpecker,Dryobates pubescens,L4117597,Home,2025-12-31 07:51,1.0,41.905216,-88.766747,True,False,True,S291377690,
45444,houspa,House Sparrow,Passer domesticus,L4117597,Home,2025-12-31 07:51,13.0,41.905216,-88.766747,True,False,True,S291377690,N


In [63]:
dabbling

['Mallard',
 'American black duck',
 'Black duck hybrid',
 'Gadwall',
 'American wigeon',
 'Northern pintail',
 'Northern shoveler',
 'Green-winged teal',
 'Blue-winged teal',
 'Wood duck']

In [64]:
birds = ["Blue-winged teal", "American black duck"]

# re.escape -> all bird names are treated as literal strings in the pattern
pattern = "|".join([re.escape(name) for name in birds])
print(pattern)

Blue\-winged\ teal|American\ black\ duck


In [65]:
pattern = "|".join([re.escape(name) for name in dabbling])
dabbling_data = test_run[test_run["comName"].str.contains(pattern, case=False)]
dabbling_data["comName"].unique()

array(['Mallard', 'Northern Shoveler', 'Gadwall', 'American Wigeon',
       'American Black Duck', 'Northern Pintail', 'Wood Duck',
       'Mallard x American Black Duck (hybrid)', 'Green-winged Teal',
       'Blue-winged Teal'], dtype=object)

This is performing the grouping based on species.

In [66]:
dabbling_data["comName"].value_counts()

comName
Mallard                                   723
Gadwall                                   294
American Black Duck                       224
Northern Shoveler                         159
American Wigeon                           114
Green-winged Teal                         104
Northern Pintail                           93
Wood Duck                                  55
Mallard x American Black Duck (hybrid)     12
Blue-winged Teal                            3
Name: count, dtype: int64

In [67]:
start_date = pd.to_datetime("2025-12-01")
dabbling_data['obsDt'] = pd.to_datetime(dabbling_data['obsDt'], format='mixed')
dabbling_data['week'] = ((dabbling_data['obsDt'] - start_date).dt.days // 7) + 1

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dabbling_data['obsDt'] = pd.to_datetime(dabbling_data['obsDt'], format='mixed')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dabbling_data['week'] = ((dabbling_data['obsDt'] - start_date).dt.days // 7) + 1


In [68]:
dabbling_data.columns

Index(['speciesCode', 'comName', 'sciName', 'locId', 'locName', 'obsDt',
       'howMany', 'lat', 'lng', 'obsValid', 'obsReviewed', 'locationPrivate',
       'subId', 'exoticCategory', 'week'],
      dtype='object')

In [None]:
sorted_df = dabbling_data[['comName','obsDt','lat','lng','week']].sort_values(by=['week']).reset_index(drop=True)
sorted_df

Unnamed: 0,comName,obsDt,lat,lng,week
0,Mallard,2025-12-06 15:40:00,41.943043,-87.930159,1
1,American Black Duck,2025-12-05 11:15:00,41.687772,-87.981420,1
2,Mallard,2025-12-05 15:20:00,41.976634,-88.001834,1
3,Mallard,2025-12-04 10:08:00,41.834188,-88.175961,1
4,Gadwall,2025-12-04 10:08:00,41.834188,-88.175961,1
...,...,...,...,...,...
1776,Northern Shoveler,2025-12-30 10:49:00,38.600220,-89.844589,5
1777,Mallard,2025-12-30 10:49:00,38.600220,-89.844589,5
1778,Northern Pintail,2025-12-30 10:49:00,38.600220,-89.844589,5
1779,Northern Shoveler,2025-12-29 12:50:00,38.600220,-89.844589,5


In [72]:
sorted_df.to_csv("illinois_test_run_dec2025.csv")

##### eBird Status and Trends Data Products (weekly):