# HPAI Wild Birds CSV Comparison

This notebook compares the manually downloaded HPAI wild birds csv from the USDA site with the version obtained using the browser inspect element tool:

**Direct URL:**  
[https://www.aphis.usda.gov/sites/default/files/hpai-wild-birds.csv](https://www.aphis.usda.gov/sites/default/files/hpai-wild-birds.csv)

---

### Objectives

- Determine if the manually downloaded file and the automatically retrieved file are identical
- Assess whether the automated version (using the direct URL) is reliable enough to be used in future data exploration, removing the need for manual downloading.

In [2]:
import pandas as pd
import requests
from io import StringIO


In [4]:
# Load csv from the direct USDA link (Automated Download)
url = "https://www.aphis.usda.gov/sites/default/files/hpai-wild-birds.csv"
download = requests.get(url)

try:
    download = requests.get(url) 
    download.raise_for_status()
    csv_online = pd.read_csv(StringIO(download.text))
    print("Successfully loaded csv from url.")
    csv_online.head()
except Exception as e:
    print(f" Failed to load CSV from URL: {e}")
    csv_online = None

Successfully loaded csv from url.


In [27]:
# Load manually downloaded CSV # "C:/Users/hxa6/OneDrive - CDC/Repos/hpai_wild_birds_data_analysis/usda_wild_bird_avian_influenza_detections.csv"

manual_csv_path ="C:/Users/hxa6/OneDrive - CDC/Repos/hpai_wild_birds_data_analysis/manual_usda_hpai_ detections_wild_birds.csv"
csv_manual = pd.read_csv(manual_csv_path)
print("Successfully loaded manually downloaded csv.")

Successfully loaded manually downloaded csv.


In [13]:
csv_manual.head()

Unnamed: 0,State,County,Collection Date,Date Detected,HPAI Strain,Bird Species,WOAH Classification,Sampling Method,Submitting Agency
0,West Virginia,Monongalia,2/18/2025,2/27/2025,EA H5N1,Canada goose,Wild bird,Morbidity/Mortality,NWDP
1,West Virginia,Monongalia,2/18/2025,2/27/2025,EA H5N1,Canada goose,Wild bird,Morbidity/Mortality,NWDP
2,Massachusetts,Worcester,2/18/2025,2/27/2025,EA H5,Mallard,Wild bird,Live bird,NWDP
3,Texas,Bexar,2/13/2025,2/27/2025,EA H5,Black vulture,Wild bird,Morbidity/Mortality,NWDP
4,Texas,Bexar,2/13/2025,2/27/2025,EA H5,Black vulture,Wild bird,Morbidity/Mortality,NWDP


In [28]:
# Compare manual and online CSVs
if csv_online is not None:
    print("Manual csv shape:", csv_manual.shape)
    print("Online csv shape:", csv_online.shape)

    if csv_manual.equals(csv_online):
        print("The csv files are IDENTICAL.")
    else:
        print("The csv files are DIFFERENT.")
        try:
            differences = csv_manual.compare(csv_online)
            display(differences.head())
        except Exception as e:
            print(f"Something went wrong while checking the differences: {e}")
else:
    print("Comparison skipped due to failed online csv load.")

Manual csv shape: (13225, 9)
Online csv shape: (13225, 9)
The csv files are DIFFERENT.


Unnamed: 0_level_0,State,State,County,County,Collection Date,Collection Date,Date Detected,Date Detected,HPAI Strain,HPAI Strain,Bird Species,Bird Species,WOAH Classification,WOAH Classification,Sampling Method,Sampling Method,Submitting Agency,Submitting Agency
Unnamed: 0_level_1,self,other,self,other,self,other,self,other,self,other,self,other,self,other,self,other,self,other
0,Iowa,South Carolina,Polk,Colleton,1/21/2025,12/30/2021,6/6/2025,1/13/2022,EA/AM H5N1,EA H5N1,Peregrine falcon,American wigeon,,,Morbidity/Mortality,Hunter harvest,IA DNR,NWDP
1,New York,South Carolina,Oswego,Colleton,1/18/2025,12/30/2021,6/6/2025,1/13/2022,EA/AM H5N1,EA H5N1,Canada goose,Blue-winged teal,,,Morbidity/Mortality,Hunter harvest,NY DEC,NWDP
2,New York,North Carolina,Onondaga,Hyde,1/18/2025,12/30/2021,6/6/2025,1/12/2022,EA/AM H5N1,EA H5N1,Mallard,Northern shoveler,,,Morbidity/Mortality,Hunter harvest,NY DEC,NWDP
3,New York,North Carolina,Seneca,Hyde,1/17/2025,1/8/2022,6/6/2025,1/20/2022,EA/AM H5N1,EA H5N1,Snowy owl,American wigeon,,,Morbidity/Mortality,Hunter harvest,Cornell University,NWDP
4,New York,North Carolina,Schenectady,Hyde,1/17/2025,1/8/2022,6/6/2025,1/20/2022,EA/AM H5N1,EA H5,Bald eagle,Gadwall,,,Morbidity/Mortality,Hunter harvest,Cornell University,NWDP


In [30]:
# Check if column names are the same
print("Are column names the same?", list(csv_manual.columns) == list(csv_online.columns))

# Check if index is the same
print("Are indexes the same?", csv_manual.index.equals(csv_online.index))

# Check for dtype mismatches
print("Column data types:")
print("Manual:\n", csv_manual.dtypes)
print("Online:\n", csv_online.dtypes)

# Optional: sort and reset index if needed
#csv_manual_sorted = csv_manual.sort_values(by=csv_manual.columns.tolist()).reset_index(drop=True)
#csv_online_sorted = csv_online.sort_values(by=csv_online.columns.tolist()).reset_index(drop=True)

#print("Now rechecking equality after sorting and resetting index...")
#print("Equal after sorting and reset?", csv_manual_sorted.equals(csv_online_sorted))


Are column names the same? True
Are indexes the same? True
Column data types:
Manual:
 State                  object
County                 object
Collection Date        object
Date Detected          object
HPAI Strain            object
Bird Species           object
WOAH Classification    object
Sampling Method        object
Submitting Agency      object
dtype: object
Online:
 State                  object
County                 object
Collection Date        object
Date Detected          object
HPAI Strain            object
Bird Species           object
WOAH Classification    object
Sampling Method        object
Submitting Agency      object
dtype: object


In [31]:
# Check where any cell is different
mask = (csv_manual != csv_online)

# Get the rows with at least one difference
diff_rows = csv_manual[mask.any(axis=1)]

print(f"Number of rows with differences: {len(diff_rows)}")
display(diff_rows.head())


Number of rows with differences: 13223


Unnamed: 0,State,County,Collection Date,Date Detected,HPAI Strain,Bird Species,WOAH Classification,Sampling Method,Submitting Agency
0,Iowa,Polk,1/21/2025,6/6/2025,EA/AM H5N1,Peregrine falcon,Wild bird,Morbidity/Mortality,IA DNR
1,New York,Oswego,1/18/2025,6/6/2025,EA/AM H5N1,Canada goose,Wild bird,Morbidity/Mortality,NY DEC
2,New York,Onondaga,1/18/2025,6/6/2025,EA/AM H5N1,Mallard,Wild bird,Morbidity/Mortality,NY DEC
3,New York,Seneca,1/17/2025,6/6/2025,EA/AM H5N1,Snowy owl,Wild bird,Morbidity/Mortality,Cornell University
4,New York,Schenectady,1/17/2025,6/6/2025,EA/AM H5N1,Bald eagle,Wild bird,Morbidity/Mortality,Cornell University


In [32]:
# Check if the entire row values match visually
for i in range(5):
    print("Manual row:", csv_manual.iloc[i].to_list())
    print("Online row:", csv_online.iloc[i].to_list())
    print("-" * 80)


Manual row: ['Iowa', 'Polk', '1/21/2025', '6/6/2025', 'EA/AM H5N1', 'Peregrine falcon', 'Wild bird', 'Morbidity/Mortality', 'IA DNR']
Online row: ['South Carolina', 'Colleton', '12/30/2021', '1/13/2022', 'EA H5N1', 'American wigeon', 'Wild bird', 'Hunter harvest', 'NWDP']
--------------------------------------------------------------------------------
Manual row: ['New York', 'Oswego', '1/18/2025', '6/6/2025', 'EA/AM H5N1', 'Canada goose', 'Wild bird', 'Morbidity/Mortality', 'NY DEC']
Online row: ['South Carolina', 'Colleton', '12/30/2021', '1/13/2022', 'EA H5N1', 'Blue-winged teal', 'Wild bird', 'Hunter harvest', 'NWDP']
--------------------------------------------------------------------------------
Manual row: ['New York', 'Onondaga', '1/18/2025', '6/6/2025', 'EA/AM H5N1', 'Mallard', 'Wild bird', 'Morbidity/Mortality', 'NY DEC']
Online row: ['North Carolina', 'Hyde', '12/30/2021', '1/12/2022', 'EA H5N1', 'Northern shoveler', 'Wild bird', 'Hunter harvest', 'NWDP']
-------------------

In [29]:
# See what rows are in the online file but not in the manual one
#new_rows = csv_online[~csv_online.isin(csv_manual)].dropna(how='all')
#display(new_rows.head())


Unnamed: 0,State,County,Collection Date,Date Detected,HPAI Strain,Bird Species,WOAH Classification,Sampling Method,Submitting Agency
0,South Carolina,Colleton,12/30/2021,1/13/2022,EA H5N1,American wigeon,,Hunter harvest,NWDP
1,South Carolina,Colleton,12/30/2021,1/13/2022,EA H5N1,Blue-winged teal,,Hunter harvest,NWDP
2,North Carolina,Hyde,12/30/2021,1/12/2022,EA H5N1,Northern shoveler,,Hunter harvest,NWDP
3,North Carolina,Hyde,1/8/2022,1/20/2022,EA H5N1,American wigeon,,Hunter harvest,NWDP
4,North Carolina,Hyde,1/8/2022,1/20/2022,EA H5,Gadwall,,Hunter harvest,NWDP
