# Final Project - Chicago Business Intelligence for Strategic Planning

In this project, you have been tasked as a full-stack developer to build an application that will be used by data scientists and business analysts for exploratory data analysis and to create different business intelligence reports for the city of Chicago; these reports will be utilized in the strategic planning and the industrial and neighborhood infrastructure investments. The City of Chicago publishes and updates its datasets on its data portal server (https://data.cityofchicago.org/ ) in 16 categories. The 3 categories that this project will utilize for exploratory data analysis and creating the business intelligence reports are: Transportation, Buildings, and Health & Human Services.

## Connect to Postgres

Before running any cells, ensure that the virtual environment is activated by navigating to the `MSDS-432` directory and executing `.\venv\Scripts\activate`. Additionally, ensure all dependencies from the `requirements.txt` file are installed in the virtual environment.

In [1]:
import dotenv
import os
import pandas as pd
import psycopg2

In [2]:
# Load environment variables
dotenv.load_dotenv()

DB_NAME = os.getenv("POSTGRES_DB")
DB_USER = os.getenv("POSTGRES_USER")
DB_PASSWORD = os.getenv("POSTGRES_PASSWORD")
DB_HOST = os.getenv("POSTGRES_HOST")
DB_PORT = os.getenv("POSTGRES_PORT")

try:
    # Establish connection
    conn = psycopg2.connect(
        dbname=DB_NAME,
        user=DB_USER,
        password=DB_PASSWORD,
        host=DB_HOST,
        port=DB_PORT
    )
    print("Connected to PostgreSQL successfully!")

except Exception as e:
    print("Error connecting to PostgreSQL:", e)


Connected to PostgreSQL successfully!


## Requirement 1

The business intelligence reports are geared toward tracking and forecasting events that have direct or indirect negative or positive impacts on businesses and neighborhoods in different zip codes within the city of Chicago. The business intelligence reports will be used to send alerts to taxi drivers about the state of COVID-19 in the different zip codes in order to avoid taxi drivers to be the super spreaders in the different zip codes and neighborhoods. For this report, the taxi trips and daily COVID19 datasets for the city of Chicago will be used.

In [None]:
r1_cur = conn.cursor()

r1_cur.execute("""
                SELECT 
                    c.week_start, 
                    c.week_end, 
                    c.zip_code, 
                    SUM(c.cases_cumulative) AS total_cases, 
                    t.pickup_zipcode, 
                    t.dropoff_zipcode, 
                    t.trip_start_timestamp
                FROM 
                    public.covid_cases c
                JOIN 
                    taxi_trips t
                    ON t.trip_start_timestamp BETWEEN c.week_start AND c.week_end
                    AND (t.pickup_zipcode = c.zip_code OR t.dropoff_zipcode = c.zip_code)
                GROUP BY 
                    c.week_start, c.week_end, c.zip_code, t.pickup_zipcode, t.dropoff_zipcode, t.trip_start_timestamp
                ORDER BY 
                    total_cases DESC;
               """
)

covid_state = r1_cur.fetchall()

r1_cur.close()

covid_state_df = pd.DataFrame(covid_state, columns=["Week Start", "Week End", "Zip Code", "Total Cases", "Pickup Zipcode", "Dropoff Zipcode", "Trip Start"])

covid_state_df

Unnamed: 0,Week Start,Week End,Zip Code,Total Cases,Pickup Zipcode,Dropoff Zipcode,Trip Start
0,2023-12-31T00:00:00Z,2024-01-06T00:00:00Z,60625,23299.0,,60625,2023-12-31T23:30:00Z
1,2023-12-31T00:00:00Z,2024-01-06T00:00:00Z,60625,23299.0,60638.0,60625,2023-12-31T23:00:00Z
2,2023-12-31T00:00:00Z,2024-01-06T00:00:00Z,60625,23299.0,60625.0,60646,2023-12-31T23:45:00Z
3,2023-12-31T00:00:00Z,2024-01-06T00:00:00Z,60625,23299.0,60610.0,60625,2023-12-31T23:15:00Z
4,2023-12-31T00:00:00Z,2024-01-06T00:00:00Z,60625,23299.0,,60625,2023-12-31T22:30:00Z
5,2023-12-31T00:00:00Z,2024-01-06T00:00:00Z,60625,23299.0,60640.0,60625,2023-12-31T23:30:00Z
6,2023-12-31T00:00:00Z,2024-01-06T00:00:00Z,60625,23299.0,60625.0,60660,2023-12-31T23:30:00Z
7,2023-12-31T00:00:00Z,2024-01-06T00:00:00Z,60625,23299.0,60626.0,60625,2023-12-31T23:45:00Z
8,2023-12-31T00:00:00Z,2024-01-06T00:00:00Z,60626,14474.0,60657.0,60626,2023-12-31T23:30:00Z
9,2023-12-31T00:00:00Z,2024-01-06T00:00:00Z,60626,14474.0,60626.0,60610,2023-12-31T23:15:00Z


The City of Chicago is also interested to forecast COVID-19 alerts (Low, Medium, High) on daily/weekly basis to the residents of the different neighborhoods considering the counts of the taxi trips and COVID-19 positive test cases.

In [None]:
#TODO

## Requirement 2

There are two major airports within the city of Chicago: O’Hare and Midway. And the City of Chicago is interested to track trips from these airports to the different zip codes and the reported COVID-19 positive test cases. The city of Chicago is interested to monitor the traffic of the taxi trips from these airports to the different neighborhoods and zip codes.

In [5]:
r2_cur = conn.cursor()

r2_cur.execute("""
                SELECT pickup_zipcode,
                    pickup_community_area,
                    dropoff_zipcode,
                    dropoff_community_area,
                    covid_cases_in_dropoff
                FROM
                    (SELECT pickup_zipcode,
                            pickup_community_area,
                            dropoff_zipcode,
                            dropoff_community_area
                    FROM taxi_trips
                    WHERE pickup_zipcode = '60666'
                        OR pickup_zipcode = '60638'
                        OR pickup_community_area = '76'
                        OR pickup_community_area = '56'
                        OR pickup_community_area = '64'
                    UNION SELECT pickup_zipcode,
                                pickup_community_area,
                                dropoff_zipcode,
                                dropoff_community_area
                    FROM transportation_trips
                    WHERE pickup_zipcode = '60666'
                        OR pickup_zipcode = '60638'
                        OR pickup_community_area = '76'
                        OR pickup_community_area = '56'
                        OR pickup_community_area = '64' ) AS trips
                JOIN
                    (SELECT zip_code,
                            MAX(cases_cumulative) AS covid_cases_in_dropoff
                    FROM covid_cases
                    GROUP BY zip_code) AS covid ON trips.dropoff_zipcode = covid.zip_code;
               """
)

trips_from_airport_covid = r2_cur.fetchall()

r2_cur.close()

trips_from_airport_covid_df = pd.DataFrame(trips_from_airport_covid, columns=["Pickup Zipcode", "Pickup Community", "Dropoff Zipcode", "Dropoff Community", "Covid Cases in Dropoff Area"])

trips_from_airport_covid_df

Unnamed: 0,Pickup Zipcode,Pickup Community,Dropoff Zipcode,Dropoff Community,Covid Cases in Dropoff Area
0,,76,60612,28,11270.0
1,,76,60622,24,15840.0
2,60638,56,60605,32,8042.0
3,,76,60618,5,26808.0
4,60638,56,60611,8,9921.0
...,...,...,...,...,...
61,,76,60641,16,23128.0
62,60638,56,60625,14,23630.0
63,60638,56,60610,8,12122.0
64,60638,56,60657,6,20214.0


## Requirement 3

The city of Chicago has created the COVID-19 Community Vulnerability Index (CCVI) to identify communities that have been disproportionately affected by COVID-19 and are vulnerable to barriers to COVID-19 vaccine uptake. The city of Chicago is interested to track the number of taxi trips from/to the neighborhoods that have CCVI Category with value HIGH

In [None]:
#TODO

## Requirement 4

For streetscaping investment and planning, the city of Chicago is interested to forecast daily, weekly, and monthly traffic patterns utilizing the taxi trips for the different zip codes.

In [None]:
#TODO

## Requirement 5

For industrial and neighborhood infrastructure investment, the city of Chicago is interested to invest in top 5 neighborhoods with highest unemployment rate and poverty rate and waive the fees for building permits in those neighborhoods in order to encourage businesses to develop and invest in those neighborhoods. Both, building permits and unemployment, datasets will be used in this report.

In [None]:
#TODO

## Requirement 6

According to a report published by Crain’s Chicago Business, The “little guys”, small businesses, have trouble competing with the big players like Amazon and Walmart for warehouse spaces. To help small business, assume a new imaginary program has been piloted with the name Illinois Small Business Emergency Loan Fund Delta to offer small businesses low interest loans of up to $250,000 for those applicants with PERMIT_TYPE of PERMIT - NEW CONSTRUCTION in the zip code that has the lowest number of PERMIT - NEW CONSTRUCTION applications and PER CAPITA INCOME is less than 30,000 for the planned construction site. Both, building permits and unemployment, datasets will be used in this report.

In [None]:
#TODO

In [None]:
# Close the Postgres connection
conn.close()