# EDA on the efap site on a PROGRAM LEVEL

In [1]:
import pandas as pd

In [2]:
efap = pd.read_csv('/Users/ayemaq/Desktop/CID_Food_Access/data/clean/efap_cleaned.csv')
mapping = pd.read_csv('/Users/ayemaq/Desktop/CID_Food_Access/data/clean/efap_nta_mapping.csv')
df = efap.merge(mapping, on="efap_id", how="left")

In [3]:
import sqlite3
import pandas as pd

# Connect to your database
conn = sqlite3.connect("/Users/ayemaq/Desktop/CID_Food_Access/data/food_access.db")

In [4]:
df.shape

(561, 10)

In [5]:
df['nta_id'].isna().sum()

8

#### insight:
8 out of 561 EFAP sites (1.4%) did not map to an NTA. Because this represents a small proportion of total sites, we proceed by excluding these records from neighborhood-level aggregation.

In [6]:
df[df["nta_id"].isna()]

Unnamed: 0,efap_id,program_name,access_type,has_pantry_access,has_kitchen_access,weekday_available,weekend_available,nta_id,lat,lon
54,81727,AFRICAN SERVICES COMMITTEE,Pantry,1,0,1,0,,,
123,87357,ST. EDWARD FOOD PANTRY,Pantry,1,0,1,0,,,
440,85774,NEW BEGINNINGS FOOD PANTRY,Pantry,1,0,1,0,,,
468,80891,CORPUS CHRISTI FOOD PANTRY,Pantry,1,0,1,0,,,
474,85348,THE LEGACY CENTER COMMUNITY DEVELOPMENT CORP,Pantry,1,0,1,0,,,
517,85284,"VETS INC, HOLLIS GARDENS",Pantry,1,0,1,0,,,
537,83567,"THE HARDING FORD VISION, INC",Pantry,1,0,1,1,,,
550,85710,MUNA SOCIAL SERVICE SOUTH JAMAICA,Pantry,1,0,1,0,,,


In [7]:
df = df.dropna(subset=["nta_id"])
print(df["nta_id"].isna().sum())

0


8 EFAP sites (1.4% of total) lacked geospatial coordinates (latitude and longitude), and therefore could not be mapped to an NTA. Because neighborhood-level aggregation requires valid geographic identifiers, these records were excluded from subsequent analysis. This exclusion represents a minimal proportion of total sites and is unlikely to materially affect results.

In [8]:
mapping["efap_id"].nunique()
mapping.shape

(553, 4)

# CONTEXT - Program-Level EFAP Structure and Access Model
The EFAP dataset represents the supply-side structure of New York City’s emergency food assistance system. At the program level, each site is categorized by its access type, primarily distinguishing between pantry access (take-home groceries) and kitchen access (prepared hot meals). Pantry programs provide groceries such as canned goods, rice, pasta, or produce, which households are expected to store and prepare themselves. Kitchen programs, in contrast, provide prepared meals on-site, typically consumed the same day. Some programs may offer both, but the distinction between pantry and kitchen access reflects fundamentally different models of food support.

If the EFAP system is predominantly pantry-based, this suggests that the city’s emergency food infrastructure is structured around assumptions of household stability and food preparation capacity. Pantry programs implicitly assume that recipients have access to refrigeration, food storage, cooking appliances, and predictable routines for meal preparation. In contrast, kitchen-based programs provide immediate, ready-to-eat meals and require fewer household-level resources, making them more accessible for individuals or families without consistent cooking facilities.

This distinction is especially important in the context of families living in shelters. Shelter conditions vary by facility type. Some family shelters, including commercial hotel placements, may offer limited or no in-room kitchen access, shared facilities with restricted hours, or minimal storage capacity. In such contexts, a pantry-dominant food assistance system may not align with the lived realities of shelter residents. While pantry programs increase food availability, they may not translate into functional access if families lack the means to safely store or prepare groceries. Kitchen-based programs may better support immediate food needs for families with constrained living arrangements.

Therefore, analyzing the distribution of pantry versus kitchen access within EFAP is not merely descriptive. It provides insight into the underlying design of the city’s food assistance system and allows us to evaluate whether the structure of supply aligns with the housing instability and facility constraints experienced by families in shelters. This structural lens strengthens our later neighborhood-level analysis by clarifying what “food access” functionally means before examining where programs are geographically located.

## Context - why I created  specific binary indictors such as pantry acccess or kitchen access 
- While access_type provides categorical information about the service model, separate binary indicators (has_pantry_access, has_kitchen_access) were engineered to allow clearer analytical and modeling flexibility. These flags preserve overlap for hybrid sites and enable direct measurement of pantry and kitchen capacity without requiring categorical encoding. This structure simplifies aggregation and supports later neighborhood-level modeling of supply composition.

In [9]:
df.keys()

Index(['efap_id', 'program_name', 'access_type', 'has_pantry_access',
       'has_kitchen_access', 'weekday_available', 'weekend_available',
       'nta_id', 'lat', 'lon'],
      dtype='object')

In [21]:
# let's check the distribution using .value_counts(), .value_counts(normalize=True) 
# to see %'s and .crosstab to compare categorical variables 

# check distribution of access_type
df["access_type"].value_counts()

access_type
Pantry              444
Kitchen              81
Pantry + Kitchen     28
Name: count, dtype: int64

In [18]:
df["access_type"].value_counts(normalize=True)

access_type
Pantry              0.802893
Kitchen             0.146474
Pantry + Kitchen    0.050633
Name: proportion, dtype: float64

In [20]:
print(df["has_pantry_access"].mean() * 100)
print(df["has_kitchen_access"].mean() * 100)


85.35262206148282
19.710669077757686


### Insight for distribution 
- The distribution of access types shows that approximately 80% of EFAP programs are pantry-only, 15% are kitchen-only, and 5% provide both pantry and kitchen access. When examining access flags more broadly, 85% of programs offer pantry access (including hybrid sites), while only about 20% offer kitchen access. This indicates that NYC’s EFAP system is heavily structured around a take-home grocery model rather than prepared meal distribution.

- Side note: The predominance of pantry-based EFAP programs suggests a supply model centered on take-home food preparation, which may not fully align with the constraints faced by families in shelter settings with limited cooking and storage access.
    - This makes me ask why do they priotrize pantries? However that's scope creeping because our CRQ is not "Why is the system pantry dominant?” but "Do high-priority neighborhoods have fewer food assistance options?"! 

## Next is to look at temporal accessbility - weekend, weekday 
- Which access type has the highest weekend availability rate?

In [None]:
# crosstab to compare access_type and weekend avabilibity 
pd.crosstab(
    df["access_type"],
    df["weekend_available"],
    normalize="index"
) * 100

weekend_available,0,1
access_type,Unnamed: 1_level_1,Unnamed: 2_level_1
Kitchen,65.432099,34.567901
Pantry,66.891892,33.108108
Pantry + Kitchen,82.142857,17.857143


#### Key takeaway
- Among kitchen sites, 34.6% are weekend available.
- Among pantry sites, 33.1% are weekend available.
- Among hybrid sites, 17.9% are weekend available.
    - Weekend availability appears structurally limited across all access types, and hybrid sites are the least likely to operate on weekends.

In [28]:
pd.crosstab(
    df["access_type"],
    df['weekday_available'],
    normalize="index"
) * 100


weekday_available,0,1
access_type,Unnamed: 1_level_1,Unnamed: 2_level_1
Kitchen,28.395062,71.604938
Pantry,25.0,75.0
Pantry + Kitchen,7.142857,92.857143


In [32]:
pd.crosstab(df["access_type"], df["weekday_available"])

weekday_available,0,1
access_type,Unnamed: 1_level_1,Unnamed: 2_level_1
Kitchen,23,58
Pantry,111,333
Pantry + Kitchen,2,26


In [33]:
pd.crosstab(df["access_type"], df["weekend_available"])

weekend_available,0,1
access_type,Unnamed: 1_level_1,Unnamed: 2_level_1
Kitchen,53,28
Pantry,297,147
Pantry + Kitchen,23,5


##### Key takeaway
- While pantry-based programs dominate NYC’s EFAP system, temporal accessibility reveals an even stronger structural pattern: 
    - the vast majority of programs operate on weekdays, while weekend availability is limited across all access types. Notably, hybrid (Pantry + Kitchen) sites are highly likely to operate on weekdays (~93%) but are among the least likely to offer weekend services (~18%). This suggests that temporal flexibility is concentrated during the workweek, potentially limiting access for families whose schedules or shelter constraints make weekday attendance difficult.

In [38]:
df.head()

Unnamed: 0,efap_id,program_name,access_type,has_pantry_access,has_kitchen_access,weekday_available,weekend_available,nta_id,lat,lon
0,80604,HOLY APOSTLES SOUP KITCHEN,Kitchen,0,1,1,0,MN0401,40.749385,-73.999131
1,85547,HOLY APOSTLES SOUP KITCHEN PANTRY,Pantry,1,0,1,0,MN0401,40.749385,-73.999131
2,80757,ST. JOHN'S BREAD OF LIFE,Pantry,1,0,1,0,MN0501,40.74869,-73.992824
3,85701,ARTISTS ATHLETES ACTIVISTS INCORPORATED,Pantry,1,0,1,0,MN0302,40.718893,-73.979216
4,80546,DEWITT REFORMED CHURCH,Pantry,1,0,0,1,MN0302,40.717508,-73.979751


In [48]:
count_programs = df.groupby('program_name').size().sort_values(ascending=False)
count_programs

program_name
AAIDS CENTER OF QUEENS COUNTY                3
HOLY TABERNACLE CHURCH INC.                  2
MAKE THE ROAD NEW YORK                       2
ST. ANN'S CHURCH OF MORRISANIA               2
THE URBAN OUTREACH CENTER OF NYC             2
                                            ..
COMMUNITY HEALTH ACTION OF STATEN ISLAND     1
COMMUNITY CHURCH OF CHRIST FOOD PANTRY       1
COMMUNITY CARE FOOD PANTRY                   1
COMMUNITY ALLIANCE INITIATIVE                1
ZEINA LORRAINE INC                           1
Length: 506, dtype: int64

In [51]:
# print counts where its >1
count_programs.value_counts()

1    460
2     45
3      1
Name: count, dtype: int64

#### key takeaway
The EFAP supply system is highly decentralized. Approximately 91% of program names operate a single site, with only 9% operating more than one location. Multi-site operators account for roughly 17% of total sites, indicating limited structural concentration within the system.