<h1>Food Recall Trends and Impact (2012 - 2025)</h1>

<h5>Food recalls in the United States have had a significant impact on public health status and imposed financial burdens on businesses over the years. This project analyze food recall trends from 2012 to 2025 to identify the most common causes and evaluate their effects on consumer health and business operations. </h5>

In [None]:
<h3>Data Cleaning</h3>
<h5>Separate distribution location by state and region.</h5>

In [42]:
!pip install ipython-sql
!pip install sqlalchemy
!pip install us #install us library
%load_ext sql
db_path = "Food_Recall.db"
%sql sqlite:///{db_path}

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


ERROR: Invalid requirement: '#install'


In [43]:
import sqlite3
import pandas as pd
import re #import regular expressions module (set of string that matches the criteria)
from collections import defaultdict
import us #import us library
df = pd.read_csv("Food_Recall.csv")
conn = sqlite3.connect("Food_Recall.db")
df.to_sql("Food_Recall", conn, if_exists="replace", index=False)

27526

In [44]:
%%sql
--check for any null value
SELECT "Distribution Pattern"
FROM Food_Recall
WHERE "Distribution Pattern" IS NULL;

 * sqlite:///Food_Recall.db
Done.


Distribution Pattern


In [53]:
#inspecting data using Python
df.head(5)

Unnamed: 0,FEI Number,Recalling Firm Name,Product Type,Product Classification,Status,Distribution Pattern,Recalling Firm City,Recalling Firm State,Recalling Firm Country,Center Classification Date,Reason for Recall,Product Description,Event ID,Event Classification,Product ID,Center,Recall Details
0,3004312676,HAR Maspeth Corp.,Food/Cosmetics,Class I,Ongoing,"NY, NJ, MA, CT, PA, MD, NC, IL, MI, GA",Maspeth,New York,United States,3/21/2025,Contains undeclared egg,Jinga Glass Noodles with vegetables 8 oz. and ...,96503,Class I,213039,CFSAN,https://www.accessdata.fda.gov/scripts/ires/?P...
1,2938745,Lyons Magnus LLC,Food/Cosmetics,Class I,Ongoing,Distribution centers located throughout the U....,Fresno,California,United States,3/20/2025,Potential contamination with Listeria monocyto...,"Ready Care Chocolate Shake (4 oz), UPC 1004579...",96376,Class I,212741,CFSAN,https://www.accessdata.fda.gov/scripts/ires/?P...
2,2938745,Lyons Magnus LLC,Food/Cosmetics,Class I,Ongoing,Distribution centers located throughout the U....,Fresno,California,United States,3/20/2025,Potential contamination with Listeria monocyto...,"Ready Care Strawberry Shake (4 oz), UPC 100457...",96376,Class I,212977,CFSAN,https://www.accessdata.fda.gov/scripts/ires/?P...
3,2938745,Lyons Magnus LLC,Food/Cosmetics,Class I,Ongoing,Distribution centers located throughout the U....,Fresno,California,United States,3/20/2025,Potential contamination with Listeria monocyto...,"Ready Care Vanilla Shake (4 oz), UPC 100457960...",96376,Class I,212978,CFSAN,https://www.accessdata.fda.gov/scripts/ires/?P...
4,2938745,Lyons Magnus LLC,Food/Cosmetics,Class I,Ongoing,Distribution centers located throughout the U....,Fresno,California,United States,3/20/2025,Potential contamination with Listeria monocyto...,Ready Care Chocolate Shake No Sugar Added (4 o...,96376,Class I,212982,CFSAN,https://www.accessdata.fda.gov/scripts/ires/?P...


In [46]:
#checking for missing values
df.isnull().sum()

FEI Number                    0
Recalling Firm Name           0
Product Type                  0
Product Classification        0
Status                        0
Distribution Pattern          0
Recalling Firm City           0
Recalling Firm State          0
Recalling Firm Country        0
Center Classification Date    0
Reason for Recall             0
Product Description           0
Event ID                      0
Event Classification          0
Product ID                    0
Center                        0
Recall Details                0
dtype: int64

In [47]:
#check for data types
df.dtypes

FEI Number                    object
Recalling Firm Name           object
Product Type                  object
Product Classification        object
Status                        object
Distribution Pattern          object
Recalling Firm City           object
Recalling Firm State          object
Recalling Firm Country        object
Center Classification Date    object
Reason for Recall             object
Product Description           object
Event ID                       int64
Event Classification          object
Product ID                     int64
Center                        object
Recall Details                object
dtype: object

In [51]:
#checking each columns for anomalies
#df["Product Classification"].value_counts()
df.nunique()

FEI Number                     4599
Recalling Firm Name            4914
Product Type                      1
Product Classification            3
Status                            3
Distribution Pattern           5777
Recalling Firm City            1969
Recalling Firm State             53
Recalling Firm Country           33
Center Classification Date     2654
Reason for Recall              7809
Product Description           27379
Event ID                       7230
Event Classification              3
Product ID                    27526
Center                            1
Recall Details                27526
dtype: int64

In [54]:
df = df.drop(columns = ["Event ID", "Event Classification", "Center", "Recall Details"])

In [6]:
#Count each state once per row even if both abbreviation and full name appear
#Count either abbreviation or full name, not both, only one per state per row
#Sum that up across all rows for each state

#create dictionary of state abbreviations and full names
states = {state.abbr: state.name for state in us.states.STATES}

#Store in count number
state_counts = defaultdict(int)

#loop through each state abbreviation and full name
for pattern in df['Distribution Pattern'].dropna():
    for abbr, full in states.items():
        if (re.search(r'\b' + abbr + r'\b', pattern, flags = re.IGNORECASE) or
             re.search(r'\b' + full + r'\b', pattern, flags = re.IGNORECASE)):
              state_counts[full] += 1

#Get a count of how many rows mention each state
#Print output line by line, more clean
for full, count in sorted(state_counts.items()):
    print(f"{full}: {count}")

Alabama: 3637
Alaska: 772
Arizona: 4548
Arkansas: 3178
California: 7455
Colorado: 4133
Connecticut: 4267
Delaware: 2259
Florida: 7010
Georgia: 5560
Hawaii: 1282
Idaho: 2079
Illinois: 7233
Indiana: 11629
Iowa: 3666
Kansas: 3181
Kentucky: 4511
Louisiana: 3458
Maine: 2558
Maryland: 4990
Massachusetts: 4735
Michigan: 6162
Minnesota: 4528
Mississippi: 2951
Missouri: 5120
Montana: 2070
Nebraska: 2626
Nevada: 3378
New Hampshire: 2783
New Jersey: 5874
New Mexico: 2106
New York: 8009
North Carolina: 6053
North Dakota: 1700
Ohio: 7476
Oklahoma: 3184
Oregon: 4862
Pennsylvania: 7569
Rhode Island: 1883
South Carolina: 4343
South Dakota: 1434
Tennessee: 4631
Texas: 6711
Utah: 2464
Vermont: 1138
Virginia: 6401
Washington: 4918
West Virginia: 1959
Wisconsin: 5570
Wyoming: 2083
