# Brooklyn Census Data 2022

## Table of Contents

1. [**Dataset Creation**](#DCreation)

    1.1 [**Census API Call**](#DAPICall)

    1.2 [**Handling "Unknown" Data**](#UnknownD)

    1.3 [**Checking and Correcting Data Format**](#Dform)

## 1. Dataset Creation <a name="DCreation"></a>

### 1.1 Census API Call <a name="DAPICall"></a>

In [1]:
# Dependencies
import requests
import pandas as pd
import numpy as np #not needed
from census import Census

In [2]:
# Import U.S. Census API Key
from config import api_key

# Create an instance of the Census library
c = Census(
    api_key,
    year = 2022
)

References - Retrieve data from the U.S. Census using the Census library <a name="References"></a>

* Review the following page to learn more about the 2022 data labels: <https://www.census.gov/programs-surveys/acs/technical-documentation/table-and-geography-changes/2022/5-year.html>

* Review the following page to learn more about the 2022 data labels: <https://api.census.gov/data/2022/acs/acs5/variables.html>

* Review the following page to learn more about the data labels: <https://gist.github.com/afhaque/60558290d6efd892351c4b64e5c01e9b>

In [3]:
# Run Census Search to retrieve data on all Brooklyn Zipcodes (2022 ACS5 Census)
census_data = c.acs5.get(
    (
        "NAME",
        "B19013_001E",
        "B01003_001E",
        "B01002_001E",
        "B19301_001E",
        "B17001_002E",
        "B25064_001E",
        "B25089_001E"
    ),
    {'for': 'zip code tabulation area:*'}
)

# Convert to DataFrame
census_pd = pd.DataFrame(census_data)

# Column renaming
census_pd = census_pd.rename(
    columns = {
        "B01003_001E": "Population",
        "B01002_001E": "Median Age",
        "B19013_001E": "Household Income",
        "B19301_001E": "Per Capita Income",
        "B17001_002E": "Poverty Count",
        "B25064_001E": "Median Gross Rent",
        "NAME": "Name",
        "zip code tabulation area": "Zipcode"
    }
)

# Add a Poverty Rate column (Poverty Count / Population)
census_pd["Poverty Rate"] = 100 * census_pd["Poverty Count"].astype(int) / census_pd["Population"].astype(int)

# Configure the final DataFrame
census_pd = census_pd[
    [
        "Zipcode",
        "Population",
        "Median Age",
        "Household Income",
        "Per Capita Income",
        "Median Gross Rent",
        "Poverty Count",
        "Poverty Rate"
    ]
]

# Display DataFrame length and sample data
census_pd.head()

Unnamed: 0,Zipcode,Population,Median Age,Household Income,Per Capita Income,Median Gross Rent,Poverty Count,Poverty Rate
0,601,16834.0,44.0,17526.0,9012.0,401.0,10440.0,62.017346
1,602,37642.0,45.2,20260.0,11379.0,459.0,17768.0,47.202593
2,603,49075.0,45.0,17703.0,13010.0,448.0,23551.0,47.989812
3,606,5590.0,46.2,19603.0,9274.0,394.0,3021.0,54.042934
4,610,25542.0,44.4,22796.0,12726.0,524.0,11597.0,45.403649


In [4]:
zipcodes = ['11201', '11202', '11203', '11204','11205','11206','11207','11208',
'11209','11210','11211','11212','11213','11214','11215','11216','11217','11218',
'11219','11220','11221','11222','11223','11224','11225','11226','11228','11229',
'11230','11231','11232','11233','11234','11235','11236','11237','11238','11239',
'11240','11241','11242','11243','11247','11249','11252','11252','11256',
'10013', '10007', '10069', '10282', '10453', '11355', '10457', '11368'] # Zipcodes other than Brooklyn

census_pd=census_pd.loc[census_pd['Zipcode'].isin(zipcodes)]

print(f"Number of rows in the DataFrame: {len(census_pd)}")

Number of rows in the DataFrame: 46


In [5]:
# Export the City_Data into a csv
census_pd.to_csv("output_data/census.csv", encoding="utf-8", index=False)

In [6]:
# Read saved data
census_df = pd.read_csv("output_data/census.csv", encoding="utf-8")

# Display sample data
census_df.head()

Unnamed: 0,Zipcode,Population,Median Age,Household Income,Per Capita Income,Median Gross Rent,Poverty Count,Poverty Rate
0,10007,7506.0,34.7,250001.0,191709.0,3501.0,152.0,2.025047
1,10013,29453.0,40.2,150675.0,157378.0,2113.0,2762.0,9.377653
2,10069,6259.0,40.0,197680.0,148668.0,3286.0,1493.0,23.853651
3,10282,6450.0,38.7,250001.0,190521.0,3501.0,373.0,5.782946
4,10453,80385.0,33.8,34800.0,19016.0,1362.0,27796.0,34.578591
