## Project 1

# **Step 1**
In this project, we examine the *FDNY_Monthly_Response_Times* data disclosed by the Fire Department of NYC.

The dataset has 6552 rows and 5 columns, with one numeric column.

Detailed information about the dataset can be found here: 

https://data.cityofnewyork.us/Social-Services/FDNY-Monthly-Response-Times/j34j-vqvt/about_data




# **Step 2**

The numeric column we are going to use is "INCIDENT COUNT".

Other columns: "YEAR MONTH", "INCIDENT CLASSIFICATION", "INCIDENT BOROUGH" and "AVERAGE RESPONSE TIME"

# **Step 3**

We created a *Project 1* notebook here.

# **Step 4 - Using Pandas**

**— PART 4.A: Read in the data**

In [35]:
import pandas as pd
project = pd.read_csv("FDNY_Monthly_Response_Times.csv")
project

Unnamed: 0,YEARMONTH,INCIDENTCLASSIFICATION,INCIDENTBOROUGH,INCIDENTCOUNT,AVERAGERESPONSETIME
0,2009/07,All Fire/Emergency Incidents,Citywide,40850,04:27
1,2009/07,All Fire/Emergency Incidents,Manhattan,10709,04:32
2,2009/07,All Fire/Emergency Incidents,Bronx,8137,04:37
3,2009/07,All Fire/Emergency Incidents,Staten Island,2205,04:45
4,2009/07,All Fire/Emergency Incidents,Brooklyn,11505,04:01
...,...,...,...,...,...
6547,FY 2021,Structural Fires,Brooklyn,7357,1899-12-31T03:51:00.000
6548,FY 2021,Structural Fires,Citywide,24359,1899-12-31T04:22:00.000
6549,FY 2021,Structural Fires,Manhattan,4798,1899-12-31T04:23:00.000
6550,FY 2021,Structural Fires,Queens,5227,1899-12-31T04:43:00.000


**—— PART 4.B: Compute the Mean, the Median and the Mode**

We choose to focus on the statistics of monthly "Incident Count" of "All Fire/Emergency Incidents" for "Manhattan" in 2019.

Before calculating, we filter the data for the monthly "Incident Count" of "All Fire/Emergency Incidents" for "Manhattan" in 2019. 

(*Citation note: I used AI tools to help correct my conversion function for "YEARMONTH" to "YYYY/MM".*)

In [65]:
# Convert "YEARMONTH" to YYYY/MM format and process only rows in YYYY/MM format
project['YEARMONTH_dt'] = pd.to_datetime(project['YEARMONTH'], errors='coerce', format='%Y/%m')

# Filter the data for 2019
data_2019 = project[(project['INCIDENTCLASSIFICATION'] == 'All Fire/Emergency Incidents') &
                    (project['INCIDENTBOROUGH'] == 'Manhattan') &
                    (project['YEARMONTH_dt'].dt.year == 2019)]

data_2019

Unnamed: 0,YEARMONTH,INCIDENTCLASSIFICATION,INCIDENTBOROUGH,INCIDENTCOUNT,AVERAGERESPONSETIME,YEARMONTH_dt
5127,2019/01,All Fire/Emergency Incidents,Manhattan,13605,1899-12-31T05:35:00.000,2019-01-01
5169,2019/02,All Fire/Emergency Incidents,Manhattan,12238,1899-12-31T05:39:00.000,2019-02-01
5211,2019/03,All Fire/Emergency Incidents,Manhattan,13115,1899-12-31T05:29:00.000,2019-03-01
5253,2019/04,All Fire/Emergency Incidents,Manhattan,12541,1899-12-31T05:31:00.000,2019-04-01
5295,2019/05,All Fire/Emergency Incidents,Manhattan,13997,1899-12-31T05:33:00.000,2019-05-01
5337,2019/06,All Fire/Emergency Incidents,Manhattan,14425,1899-12-31T05:36:00.000,2019-06-01
5379,2019/07,All Fire/Emergency Incidents,Manhattan,15782,1899-12-31T05:46:00.000,2019-07-01
5421,2019/08,All Fire/Emergency Incidents,Manhattan,14725,1899-12-31T05:30:00.000,2019-08-01
5463,2019/09,All Fire/Emergency Incidents,Manhattan,13964,1899-12-31T05:37:00.000,2019-09-01
5505,2019/10,All Fire/Emergency Incidents,Manhattan,13405,1899-12-31T05:33:00.000,2019-10-01


**a. The Mean** 

In [69]:
mean_monthly_incident_2019 = data_2019['INCIDENTCOUNT'].mean()
mean_monthly_incident_2019

13575.5

**b. The Median**

In [73]:
median_monthly_incident_2019 = data_2019['INCIDENTCOUNT'].median()
median_monthly_incident_2019

13505.0

**c. The Mode**

In [76]:
mode_monthly_incident_2019 = data_2019['INCIDENTCOUNT'].mode()
mode_monthly_incident_2019

0     12238
1     12359
2     12541
3     12750
4     13115
5     13405
6     13605
7     13964
8     13997
9     14425
10    14725
11    15782
Name: INCIDENTCOUNT, dtype: int64

As all the monthly counts of incidents in 2019 did not repeat themselves - each number occurred only once, we can see that the output for mode actually listed all the monthly counts in 2019, indicating no mode. 

# **Step 5**

Repeat the previous step using only the Python standard library (“the hard way” - not using pandas, a spreadsheet program, etc).


In [117]:
# Import modules and read in the data using Python standard library

import csv
from datetime import datetime
from statistics import mean, median, multimode

data = []

with open('FDNY_Monthly_Response_Times.csv', mode = 'r') as file:
     reader = csv.DictReader(file)
     for row in reader:
         # Process only rows in YYYY/MM format
         if len(row['YEARMONTH']) == 7 and row['YEARMONTH'][4] == '/':
             year_month = datetime.strptime(row['YEARMONTH'], '%Y/%m')
             row['YEARMONTH_dt'] = year_month
             data.append(row)

In [119]:
# Filter the data for 2019
data_2019 = [
    int(row['INCIDENTCOUNT'])
    for row in data 
    if row ['INCIDENTCLASSIFICATION'] == 'All Fire/Emergency Incidents'
    and row['INCIDENTBOROUGH'] == 'Manhattan'
    and row['YEARMONTH_dt'].year == 2019
]

In [121]:
# Compute the mean
mean_monthly_incident = mean(data_2019)
mean_monthly_incident

13575.5

In [123]:
# Compute the median
median_monthly_incident = median(data_2019)
median_monthly_incident

13505.0

In [125]:
# Compute the mode
mode_monthly_incident = multimode(data_2019)
mode_monthly_incident

[13605,
 12238,
 13115,
 12541,
 13997,
 14425,
 15782,
 14725,
 13964,
 13405,
 12359,
 12750]

We can see that the outputs in Step 5 are the same as those in Step 4.

# **Step 6 - Data Visualization**

We are createing a text-based sparklinne for the "Incident Count" of "All Fire/Emergency Incidents" in "Manhattan" for each month in 2019. 

We use only the Python standard library, and it's not hard-coded.

In [137]:
# Find the max count so that the bars are scaled appropriately 
max_count = max(data_2019)
max_count

15782

In [139]:
# Display each month and its incident count as a text-based bar chart

for row in data:
    if (
        row['INCIDENTCLASSIFICATION'] == 'All Fire/Emergency Incidents'
        and row['INCIDENTBOROUGH'] == 'Manhattan'
        and row['YEARMONTH_dt'].year == 2019
    ):
        Month = row['YEARMONTH_dt'].strftime('%Y-%m')
        Count = int(row['INCIDENTCOUNT'])
        # Scale the bar to a width of up to 40 characters based on the max_count
        Bar = '|' + '🔥'*(Count * 40// max_count)
        print(f"{Month}: {Bar} ({Count})")

2019-01: |🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥 (13605)
2019-02: |🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥 (12238)
2019-03: |🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥 (13115)
2019-04: |🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥 (12541)
2019-05: |🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥 (13997)
2019-06: |🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥 (14425)
2019-07: |🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥 (15782)
2019-08: |🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥 (14725)
2019-09: |🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥 (13964)
2019-10: |🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥 (13405)
2019-11: |🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥 (12359)
2019-12: |🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥 (12750)


We have here a simple text-based sparkline for "All Fire/Emergency Incidents" in Manhattan for each month in 2019. Each bar's length of 🔥 is scaled relative to the max count of 15782 in July 2019. 
(*Citation note: I used Al tools to help fix my function for the scaling.*)