# HPAI Wild Birds CSV Comparison

This notebook compares the manually downloaded HPAI wild birds csv from the USDA site with the version obtained using the browser inspect element tool:

**Direct URL:**  
[https://www.aphis.usda.gov/sites/default/files/hpai-wild-birds.csv](https://www.aphis.usda.gov/sites/default/files/hpai-wild-birds.csv)

---

### Objectives

- Determine if the manually downloaded file and the automatically retrieved file are identical
- Assess whether the automated version (using the direct URL) is reliable enough to be used in future data exploration, removing the need for manual downloading.

In [2]:
import pandas as pd
import requests
from io import StringIO


In [4]:
# Load csv from the direct USDA link (Automated Download)
url = "https://www.aphis.usda.gov/sites/default/files/hpai-wild-birds.csv"
download = requests.get(url)

try:
    download = requests.get(url) 
    download.raise_for_status()
    csv_online = pd.read_csv(StringIO(download.text))
    print("Successfully loaded csv from url.")
    csv_online.head()
except Exception as e:
    print(f" Failed to load CSV from URL: {e}")
    csv_online = None

Successfully loaded csv from url.


In [12]:
# Load manually downloaded CSV 
manual_csv_path = "C:/Users/hxa6/OneDrive - CDC/Repos/hpai_wild_birds_data_analysis/usda_wild_bird_avian_influenza_detections.csv"
csv_manual = pd.read_csv(manual_csv_path)
print("Successfully loaded manually downloaded csv.")

Successfully loaded manually downloaded csv.


In [13]:
csv_manual.head()

Unnamed: 0,State,County,Collection Date,Date Detected,HPAI Strain,Bird Species,WOAH Classification,Sampling Method,Submitting Agency
0,West Virginia,Monongalia,2/18/2025,2/27/2025,EA H5N1,Canada goose,Wild bird,Morbidity/Mortality,NWDP
1,West Virginia,Monongalia,2/18/2025,2/27/2025,EA H5N1,Canada goose,Wild bird,Morbidity/Mortality,NWDP
2,Massachusetts,Worcester,2/18/2025,2/27/2025,EA H5,Mallard,Wild bird,Live bird,NWDP
3,Texas,Bexar,2/13/2025,2/27/2025,EA H5,Black vulture,Wild bird,Morbidity/Mortality,NWDP
4,Texas,Bexar,2/13/2025,2/27/2025,EA H5,Black vulture,Wild bird,Morbidity/Mortality,NWDP


In [24]:
# Compare manual and online CSVs
if csv_online is not None:
    print("Manual csv shape:", csv_manual.shape)
    print("Online csv shape:", csv_online.shape)

    if csv_manual.equals(csv_online):
        print("The csv files are IDENTICAL.")
    else:
        print("The csv files are DIFFERENT.")
        try:
            differences = csv_manual.compare(csv_online)
            display(differences.head())
        except Exception as e:
            print(f"Something went wrong while checking the differences: {e}")
else:
    print("Comparison skipped due to failed online csv load.")

Manual csv shape: (12524, 9)
Online csv shape: (13225, 9)
The csv files are DIFFERENT.
Something went wrong while checking the differences: Can only compare identically-labeled (both index and columns) DataFrame objects
