# Analysis of the NYC 311 Dataset

The NYC 311 dataset contains records of non-emergency service requests made by residents of New York City. These complaints are submitted via phone, web, or mobile app and include issues like noise complaints, illegal parking, blocked driveways, and more.

Each record typically includes:
- The type of complaint
- Date and time the complaint was created and closed
- The location of the incident (ZIP code, street address)
- The agency responsible
- Status and resolution details

You will use your knowledge of Pandas to do the following activity.

### Step 1: Load and Inspect the NYC 311 Dataset

In this step, you'll load the NYC 311 complaint dataset from CSV and explore its structure.

- Call the DataFrame `df_311`
- Check how many rows and columns it contains.
- View the first 5 rows to understand the kind of data you're working with.
- List all column names.
- **Pay special attention to the date columns** (e.g., `Created Date`, `Closed Date`, etc.) and observe any formatting inconsistencies.

**Question:**  
What kinds of issues do you notice in the formatting of the date columns?

In [61]:
# load NYC311data_cleaned.csv into a pandas dataframe as df_311
import pandas as pd
df_311 = pd.read_csv('NYC311data_cleaned.csv') 
# print the first 5 rows of df_311
df_311.head()


Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board
0,31685705,10-05-15 22:45,10-06-15 2:09,Blocked Driveway,No Access,Street/Sidewalk,11373,43-31 ELBERTSON STREET,ELBERTSON STREET,LAMONT AVENUE,43 AVENUE,ADDRESS,Closed,10-06-15 6:45,The Police Department responded to the complai...,10-06-15 2:09,04 QUEENS
1,31426484,08/30/2015 09:04:52 PM,08/30/2015 11:53:46 PM,Noise - Street/Sidewalk,Loud Talking,Street/Sidewalk,10030,100 WEST 141 STREET,WEST 141 STREET,LENOX AVENUE,7 AVENUE,ADDRESS,Closed,08/31/2015 05:04:52 AM,The Police Department responded to the complai...,08/30/2015 11:53:46 PM,10 MANHATTAN
2,31473909,09-06-15 23:56,09-07-15 7:30,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,10034,73 VERMILYEA AVENUE,VERMILYEA AVENUE,ACADEMY STREET,WEST 204 STREET,ADDRESS,Closed,09-07-15 7:56,The Police Department responded to the complai...,09-07-15 7:30,12 MANHATTAN
3,31530153,09/14/2015 11:05:21 AM,09/14/2015 12:52:54 PM,Illegal Parking,Posted Parking Sign Violation,Street/Sidewalk,11385,,,,,INTERSECTION,Closed,09/14/2015 07:05:21 PM,The Police Department responded and upon arriv...,09/14/2015 12:52:54 PM,05 QUEENS
4,31562497,09/18/2015 09:53:11 PM,09/19/2015 04:34:07 AM,Derelict Vehicle,With License Plate,Street/Sidewalk,11235,611 BANNER AVENUE,BANNER AVENUE,BRIGHTON 6 STREET,BRIGHTON 7 STREET,ADDRESS,Closed,09/19/2015 05:53:11 AM,The Police Department responded to the complai...,09/19/2015 04:34:07 AM,13 BROOKLYN


In [62]:
df_311.shape

(89396, 17)

In [63]:
df_311.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89396 entries, 0 to 89395
Data columns (total 17 columns):
 #   Column                          Non-Null Count  Dtype 
---  ------                          --------------  ----- 
 0   Unique Key                      89396 non-null  int64 
 1   Created Date                    89396 non-null  object
 2   Closed Date                     89381 non-null  object
 3   Complaint Type                  89396 non-null  object
 4   Descriptor                      87544 non-null  object
 5   Location Type                   89352 non-null  object
 6   Incident Zip                    89396 non-null  int64 
 7   Incident Address                76111 non-null  object
 8   Street Name                     76111 non-null  object
 9   Cross Street 1                  75267 non-null  object
 10  Cross Street 2                  75260 non-null  object
 11  Address Type                    89334 non-null  object
 12  Status                          89396 non-null

### Step 2: Convert Date Columns to Datetime

In this step, you'll convert the following columns to proper datetime format:

- `Created Date`
- `Closed Date`
- `Due Date`
- `Resolution Action Updated Date`

Since the dataset contains **mixed date formats**, you should **not specify a date format**. Instead, use `errors='coerce'` to safely handle problematic rows (they will become `NaT`).

You may get warnings during this step — **you may ignore them**. The warning may ask you to specify a format, but since there are different kinds of date/time formats in those columns, letting Pandas to fix them is the most robust method.

In [64]:
# Your code here
df_311['Created Date'] = pd.to_datetime(df_311['Created Date'],errors='coerce') # Convert 'Created Date' to datetime and handle errors
df_311['Closed Date'] = pd.to_datetime(df_311['Closed Date'],errors='coerce') # Convert 'Closed Date' to datetime and handle errors 
df_311['Due Date'] = pd.to_datetime(df_311['Due Date'],errors='coerce') # Convert 'Due Date' to datetime and handle errors
df_311['Resolution Action Updated Date'] = pd.to_datetime(df_311['Resolution Action Updated Date'],errors='coerce') # Convert 'Resolution Action Updated Date' to datetime and handle errors    


  df_311['Created Date'] = pd.to_datetime(df_311['Created Date'],errors='coerce') # Convert 'Created Date' to datetime and handle errors
  df_311['Closed Date'] = pd.to_datetime(df_311['Closed Date'],errors='coerce') # Convert 'Closed Date' to datetime and handle errors
  df_311['Due Date'] = pd.to_datetime(df_311['Due Date'],errors='coerce') # Convert 'Due Date' to datetime and handle errors
  df_311['Resolution Action Updated Date'] = pd.to_datetime(df_311['Resolution Action Updated Date'],errors='coerce') # Convert 'Resolution Action Updated Date' to datetime and handle errors


### Step 3: Sorting Practice Questions to Explore `df_311`

Use the dataset `df_311` to practice various **sorting techniques**.

1. **Basic sorting (ascending)**:  
   - Sort the DataFrame by `Created Date` in ascending order and show the first 5 rows.

2. **Descending sort**:  
   - Sort by `Closed Date` in descending order and display the top 5 complaints with the latest closing times.

3. **Sorting by multiple columns**:  
   - Sort first by `Complaint Type` (A–Z), then by `Created Date` (newest first).

4. **Sorting with a custom function**:  
   - Sort complaints by the **length** of the `Descriptor` column, longest first.

5. **In-place sorting**:  
   - Sort the DataFrame by `Due Date` in-place, then show the last 5 rows.

In [65]:
# 1. Sort the DataFrame by Created Date in ascending order and show the first 5 rows.
df_311.sort_values(by='Created Date', ascending=True).head()

Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board
87211,30283424,2015-03-29 00:33:03,2015-03-29 03:40:20,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,11206,162 THROOP AVENUE,THROOP AVENUE,HOPKINS STREET,ELLERY STREET,ADDRESS,Closed,2015-03-29 08:33:03,The Police Department responded to the complai...,2015-03-29 03:40:20,03 BROOKLYN
34299,30283432,2015-03-29 00:35:28,2015-03-29 04:14:27,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,11233,120 CHAUNCEY STREET,CHAUNCEY STREET,STUYVESANT AVENUE,MALCOLM X BOULEVARD,ADDRESS,Closed,2015-03-29 08:35:28,The Police Department responded to the complai...,2015-03-29 04:14:27,03 BROOKLYN
50645,30280732,2015-03-29 00:37:15,2015-03-29 01:02:39,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,10014,22 9 AVENUE,9 AVENUE,WEST 13 STREET,WEST 14 STREET,ADDRESS,Closed,2015-03-29 08:37:15,The Police Department reviewed your complaint ...,2015-03-29 01:02:39,02 MANHATTAN
19065,30280506,2015-03-29 00:43:16,2015-03-29 04:25:50,Noise - Vehicle,Car/Truck Music,Street/Sidewalk,10028,415 EAST 86 STREET,EAST 86 STREET,1 AVENUE,YORK AVENUE,ADDRESS,Closed,2015-03-29 08:43:16,The Police Department responded and upon arriv...,2015-03-29 04:25:50,08 MANHATTAN
116,30281090,2015-03-29 00:49:27,2015-03-29 04:25:53,Noise - Vehicle,Car/Truck Music,Street/Sidewalk,10028,420 EAST 86 STREET,EAST 86 STREET,1 AVENUE,YORK AVENUE,ADDRESS,Closed,2015-03-29 08:49:27,The Police Department responded and upon arriv...,2015-03-29 04:25:53,08 MANHATTAN


In [66]:
# 2. Sort by Closed Date in descending order and display the top 5 complaints with the latest closing times.
df_311.sort_values(by='Closed Date', ascending=False).head()

Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board
6831,32308423,2015-12-31 23:31:40,2016-01-03 16:22:00,Blocked Driveway,No Access,Street/Sidewalk,10467,3025 WALLACE AVENUE,WALLACE AVENUE,ADEE AVENUE,BURKE AVENUE,ADDRESS,Closed,2016-01-01 07:31:00,The Police Department responded to the complai...,2016-01-03 16:22:00,12 BRONX
47242,32306260,2015-12-31 23:50:57,2016-01-01 10:58:00,Blocked Driveway,No Access,Street/Sidewalk,10453,1770 UNDERCLIFF AVENUE,UNDERCLIFF AVENUE,WEST 176 STREET,SEDGWICK AVENUE,ADDRESS,Closed,2016-01-01 07:50:00,The Police Department responded to the complai...,2016-01-01 10:58:00,05 BRONX
59869,32310624,2015-12-31 18:22:08,2016-01-01 08:27:00,Blocked Driveway,Partial Access,Street/Sidewalk,11416,97-46 77 STREET,77 STREET,97 AVENUE,101 AVENUE,ADDRESS,Closed,2016-01-01 02:22:00,The Police Department responded and upon arriv...,2016-01-01 08:27:00,09 QUEENS
17264,32306268,2015-12-31 20:02:03,2016-01-01 07:47:00,Blocked Driveway,No Access,Street/Sidewalk,10461,1659 WILLIAMSBRIDGE ROAD,WILLIAMSBRIDGE ROAD,PIERCE AVENUE,VAN NEST AVENUE,ADDRESS,Closed,2016-01-01 04:02:00,The Police Department responded to the complai...,2016-01-01 07:47:00,11 BRONX
2304,32308708,2015-12-31 21:43:01,2016-01-01 07:44:00,Illegal Parking,Blocked Hydrant,Street/Sidewalk,11220,167 SENATOR STREET,SENATOR STREET,COLONIAL ROAD,RIDGE BOULEVARD,ADDRESS,Closed,2016-01-01 05:43:00,The Police Department responded and upon arriv...,2016-01-01 07:44:00,10 BROOKLYN


In [67]:
# 3. Sort first by Complaint Type (A–Z), then by Created Date (newest first).
df_311.sort_values(by=['Complaint Type', 'Created Date'], ascending=[True, False])

Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board
6937,32306470,2015-12-31 22:38:04,2015-12-31 23:04:14,Animal Abuse,Tortured,Residential Building/House,11427,89-27 218 STREET,218 STREET,89 AVENUE,90 AVENUE,ADDRESS,Closed,2016-01-01 06:38:00,The Police Department responded to the complai...,2015-12-31 23:04:14,13 QUEENS
4650,32309522,2015-12-31 21:22:54,2015-12-31 22:33:00,Animal Abuse,Neglected,Residential Building/House,11415,84-87 129 STREET,129 STREET,KEW GARDENS ROAD,METROPOLITAN AVENUE,ADDRESS,Closed,2016-01-01 05:22:00,The Police Department responded to the complai...,2015-12-31 22:33:00,09 QUEENS
66899,32310128,2015-12-31 16:59:34,2015-12-31 21:14:28,Animal Abuse,Neglected,Residential Building/House,11416,102-04 89 STREET,89 STREET,102 AVENUE,102 ROAD,ADDRESS,Closed,2016-01-01 00:59:00,The Police Department responded to the complai...,2015-12-31 21:14:28,09 QUEENS
36331,32305067,2015-12-31 14:08:45,2015-12-31 15:02:57,Animal Abuse,Neglected,Residential Building/House,10472,1327 STRATFORD AVENUE,STRATFORD AVENUE,EAST 172 STREET,EAST 174 STREET,ADDRESS,Closed,2015-12-31 22:08:45,The Police Department responded to the complai...,2015-12-31 15:02:57,09 BRONX
82916,32310105,2015-12-31 13:54:38,2015-12-31 14:38:22,Animal Abuse,Other (complaint details),Street/Sidewalk,10023,BROADWAY,BROADWAY,WEST 73 STREET,WEST 74 STREET,BLOCKFACE,Closed,2015-12-31 21:54:38,The Police Department responded to the complai...,2015-12-31 14:38:22,07 MANHATTAN
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
44782,30299840,2015-03-31 21:03:04,2015-03-31 22:08:22,Vending,In Prohibited Area,Street/Sidewalk,10029,,,,,INTERSECTION,Closed,2015-04-01 05:03:00,The Police Department responded to the complai...,2015-03-31 22:08:22,11 MANHATTAN
35800,30300768,2015-03-31 13:15:56,2015-03-31 15:16:44,Vending,Unlicensed,Street/Sidewalk,10460,OLD KINGSBRIDGE ROAD,OLD KINGSBRIDGE ROAD,GROTE STREET,SOUTHERN BOULEVARD,BLOCKFACE,Closed,2015-03-31 21:15:56,The Police Department responded and upon arriv...,2015-03-31 15:16:44,06 BRONX
33608,30305982,2015-03-31 08:57:59,2015-03-31 09:43:37,Vending,In Prohibited Area,Street/Sidewalk,10014,350 HUDSON STREET,HUDSON STREET,CHARLTON STREET,KING STREET,ADDRESS,Closed,2015-03-31 16:57:59,The Police Department responded to the complai...,2015-03-31 09:43:37,02 MANHATTAN
73049,30292464,2015-03-30 16:37:37,2015-03-31 01:18:43,Vending,In Prohibited Area,Street/Sidewalk,10011,394 6 AVENUE,6 AVENUE,WAVERLY PLACE,GREENWICH AVENUE,ADDRESS,Closed,2015-03-31 00:37:37,The Police Department responded to the complai...,2015-03-31 01:18:43,02 MANHATTAN


In [68]:
# 4. Sort complaints by the length of the Descriptor column, longest first. Second option.
df_311.sort_values(by='Descriptor', key=lambda x: x.str.len(), ascending=False)


# when using a function, we can use def and lambda. def allows for more complex functions, while lambda is for simple functions. lambda is often used for one-liners.


Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board
12336,31851995,2015-10-23 02:07:16,2015-10-23 03:54:31,Illegal Parking,Double Parked Blocking Traffic,Street/Sidewalk,11374,67-63 WOODHAVEN BOULEVARD,WOODHAVEN BOULEVARD,67 DRIVE,68 AVENUE,ADDRESS,Closed,2015-10-23 10:07:16,The Police Department responded to the complai...,2015-10-23 03:54:23,06 QUEENS
86521,30304791,2015-03-31 12:06:32,2015-03-31 13:02:05,Illegal Parking,Double Parked Blocking Traffic,Street/Sidewalk,11209,86 STREET,86 STREET,3 AVENUE,4 AVENUE,BLOCKFACE,Closed,2015-03-31 20:06:32,The Police Department responded to the complai...,2015-03-31 13:02:05,10 BROOKLYN
9055,31502050,2015-09-10 14:20:00,2015-09-10 18:50:00,Illegal Parking,Double Parked Blocking Traffic,Street/Sidewalk,11693,,,,,INTERSECTION,Closed,2015-09-10 22:20:00,The Police Department issued a summons in resp...,2015-09-10 18:50:00,14 QUEENS
61810,31323271,2015-08-16 10:18:43,2015-08-16 12:34:32,Illegal Parking,Double Parked Blocking Vehicle,Street/Sidewalk,10026,218 WEST 116 STREET,WEST 116 STREET,7 AVENUE,8 AVENUE,ADDRESS,Closed,2015-08-16 18:18:43,The Police Department responded to the complai...,2015-08-16 12:34:32,10 MANHATTAN
3461,30444489,2015-04-20 18:54:31,2015-04-20 20:59:30,Illegal Parking,Double Parked Blocking Traffic,Street/Sidewalk,10034,101 POST AVENUE,POST AVENUE,WEST 204 STREET,WEST 207 STREET,ADDRESS,Closed,2015-04-21 02:54:31,The Police Department responded to the complai...,2015-04-20 20:59:30,12 MANHATTAN
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
89186,31640323,2015-09-29 11:37:58,2015-09-29 15:57:04,Homeless Encampment,,Street/Sidewalk,10035,10 EAST 125 STREET,EAST 125 STREET,5 AVENUE,MADISON AVENUE,ADDRESS,Closed,2015-09-29 19:37:58,The Police Department responded and upon arriv...,2015-09-29 15:57:04,11 MANHATTAN
89229,30894981,2015-06-20 22:18:03,2015-06-20 23:29:41,Illegal Fireworks,,Street/Sidewalk,11435,89 AVENUE,89 AVENUE,148 STREET,150 STREET,BLOCKFACE,Closed,2015-06-21 06:18:03,The Police Department responded and upon arriv...,2015-06-20 23:29:42,12 QUEENS
89297,30536627,2015-05-03 15:20:00,2015-05-03 20:42:00,Homeless Encampment,,Street/Sidewalk,10003,404 LAFAYETTE STREET,LAFAYETTE STREET,EAST 4 STREET,ASTOR PLACE,ADDRESS,Closed,2015-05-03 23:20:00,The Police Department responded to the complai...,2015-05-03 20:42:00,02 MANHATTAN
89319,31348164,2015-08-19 21:39:33,2015-08-19 22:46:06,Homeless Encampment,,Street/Sidewalk,10027,,,,,INTERSECTION,Closed,2015-08-20 05:39:33,The Police Department responded to the complai...,2015-08-19 22:46:06,09 MANHATTAN


In [69]:
# 4. Sort complaints by the length of the Descriptor column, longest first.
df_311['Descriptor Length'] = df_311['Descriptor'].str.len()  # Create a new column for the length of 'Descriptor'
df_311.sort_values(by='Descriptor Length', ascending=False).head()

Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board,Descriptor Length
12336,31851995,2015-10-23 02:07:16,2015-10-23 03:54:31,Illegal Parking,Double Parked Blocking Traffic,Street/Sidewalk,11374,67-63 WOODHAVEN BOULEVARD,WOODHAVEN BOULEVARD,67 DRIVE,68 AVENUE,ADDRESS,Closed,2015-10-23 10:07:16,The Police Department responded to the complai...,2015-10-23 03:54:23,06 QUEENS,30.0
86521,30304791,2015-03-31 12:06:32,2015-03-31 13:02:05,Illegal Parking,Double Parked Blocking Traffic,Street/Sidewalk,11209,86 STREET,86 STREET,3 AVENUE,4 AVENUE,BLOCKFACE,Closed,2015-03-31 20:06:32,The Police Department responded to the complai...,2015-03-31 13:02:05,10 BROOKLYN,30.0
9055,31502050,2015-09-10 14:20:00,2015-09-10 18:50:00,Illegal Parking,Double Parked Blocking Traffic,Street/Sidewalk,11693,,,,,INTERSECTION,Closed,2015-09-10 22:20:00,The Police Department issued a summons in resp...,2015-09-10 18:50:00,14 QUEENS,30.0
61810,31323271,2015-08-16 10:18:43,2015-08-16 12:34:32,Illegal Parking,Double Parked Blocking Vehicle,Street/Sidewalk,10026,218 WEST 116 STREET,WEST 116 STREET,7 AVENUE,8 AVENUE,ADDRESS,Closed,2015-08-16 18:18:43,The Police Department responded to the complai...,2015-08-16 12:34:32,10 MANHATTAN,30.0
3461,30444489,2015-04-20 18:54:31,2015-04-20 20:59:30,Illegal Parking,Double Parked Blocking Traffic,Street/Sidewalk,10034,101 POST AVENUE,POST AVENUE,WEST 204 STREET,WEST 207 STREET,ADDRESS,Closed,2015-04-21 02:54:31,The Police Department responded to the complai...,2015-04-20 20:59:30,12 MANHATTAN,30.0


In [70]:
# 5. Sort the DataFrame by Due Date in-place, then show the last 5 rows.
df_311.sort_values(by='Due Date', ascending=True, inplace=True)
df_311.tail()

Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board,Descriptor Length
47242,32306260,2015-12-31 23:50:57,2016-01-01 10:58:00,Blocked Driveway,No Access,Street/Sidewalk,10453,1770 UNDERCLIFF AVENUE,UNDERCLIFF AVENUE,WEST 176 STREET,SEDGWICK AVENUE,ADDRESS,Closed,2016-01-01 07:50:00,The Police Department responded to the complai...,2016-01-01 10:58:00,05 BRONX,9.0
77179,32305071,2015-12-31 23:52:58,2016-01-01 07:41:00,Blocked Driveway,No Access,Street/Sidewalk,11372,34-06 73 STREET,73 STREET,34 AVENUE,35 AVENUE,ADDRESS,Closed,2016-01-01 07:52:00,The Police Department responded and upon arriv...,2016-01-01 07:41:00,03 QUEENS,9.0
41975,32306559,2015-12-31 23:55:32,2016-01-01 01:53:00,Illegal Parking,Blocked Hydrant,Street/Sidewalk,10032,524 WEST 169 STREET,WEST 169 STREET,AMSTERDAM AVENUE,AUDUBON AVENUE,ADDRESS,Closed,2016-01-01 07:55:00,The Police Department issued a summons in resp...,2016-01-01 01:53:00,12 MANHATTAN,15.0
80818,32306529,2015-12-31 23:56:58,2016-01-01 03:24:00,Illegal Parking,Blocked Sidewalk,Street/Sidewalk,11373,87-14 57 ROAD,57 ROAD,SEABURY STREET,HOFFMAN DRIVE,ADDRESS,Closed,2016-01-01 07:56:00,The Police Department responded and upon arriv...,2016-01-01 03:24:00,04 QUEENS,16.0
15623,32310363,2015-12-31 23:59:45,2016-01-01 00:55:00,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,10034,71 VERMILYEA AVENUE,VERMILYEA AVENUE,ACADEMY STREET,WEST 204 STREET,ADDRESS,Closed,2016-01-01 07:59:00,The Police Department responded and upon arriv...,2016-01-01 00:55:00,12 MANHATTAN,16.0


### Step 4: Filtering Practice Questions to Explore `df_311`

Use the dataset `df_311` to answer the following questions by applying **filtering techniques** covered in this module.

1. **Boolean indexing**:  
   - Show all rows where the status is `"Closed"`.

2. **isin() for multiple values**:  
   - Show complaints where the complaint type is either `"Illegal Parking"`, `"Noise - Street/Sidewalk"`, or `"Blocked Driveway"`.

3. **Multiple conditions**:  
   - Show `"Blocked Driveway"` complaints where the incident ZIP is either `10007` or `10307`.

4. **String method – startswith()**:  
   - Find all complaints where the street name starts with the letter `'E'`.

5. **iloc[]**:  
   - Display the first 3 rows and the 2nd to 5th columns using integer-based indexing.

6. **loc[]**:  
   - Use label-based selection to show `Complaint Type`, `Created Date`, and `Status` for ZIP code `11373`.

7. **between()**:  
   - Filter complaints where the `Created Date` falls between **October 1, 2015** and **October 2, 2015**.


In [71]:
# 1. Show all rows where the status is "Closed".
df_311[df_311['Status'] == 'Closed']

Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board,Descriptor Length
87211,30283424,2015-03-29 00:33:03,2015-03-29 03:40:20,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,11206,162 THROOP AVENUE,THROOP AVENUE,HOPKINS STREET,ELLERY STREET,ADDRESS,Closed,2015-03-29 08:33:03,The Police Department responded to the complai...,2015-03-29 03:40:20,03 BROOKLYN,16.0
34299,30283432,2015-03-29 00:35:28,2015-03-29 04:14:27,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,11233,120 CHAUNCEY STREET,CHAUNCEY STREET,STUYVESANT AVENUE,MALCOLM X BOULEVARD,ADDRESS,Closed,2015-03-29 08:35:28,The Police Department responded to the complai...,2015-03-29 04:14:27,03 BROOKLYN,16.0
50645,30280732,2015-03-29 00:37:15,2015-03-29 01:02:39,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,10014,22 9 AVENUE,9 AVENUE,WEST 13 STREET,WEST 14 STREET,ADDRESS,Closed,2015-03-29 08:37:15,The Police Department reviewed your complaint ...,2015-03-29 01:02:39,02 MANHATTAN,16.0
19065,30280506,2015-03-29 00:43:16,2015-03-29 04:25:50,Noise - Vehicle,Car/Truck Music,Street/Sidewalk,10028,415 EAST 86 STREET,EAST 86 STREET,1 AVENUE,YORK AVENUE,ADDRESS,Closed,2015-03-29 08:43:16,The Police Department responded and upon arriv...,2015-03-29 04:25:50,08 MANHATTAN,15.0
116,30281090,2015-03-29 00:49:27,2015-03-29 04:25:53,Noise - Vehicle,Car/Truck Music,Street/Sidewalk,10028,420 EAST 86 STREET,EAST 86 STREET,1 AVENUE,YORK AVENUE,ADDRESS,Closed,2015-03-29 08:49:27,The Police Department responded and upon arriv...,2015-03-29 04:25:53,08 MANHATTAN,15.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
47242,32306260,2015-12-31 23:50:57,2016-01-01 10:58:00,Blocked Driveway,No Access,Street/Sidewalk,10453,1770 UNDERCLIFF AVENUE,UNDERCLIFF AVENUE,WEST 176 STREET,SEDGWICK AVENUE,ADDRESS,Closed,2016-01-01 07:50:00,The Police Department responded to the complai...,2016-01-01 10:58:00,05 BRONX,9.0
77179,32305071,2015-12-31 23:52:58,2016-01-01 07:41:00,Blocked Driveway,No Access,Street/Sidewalk,11372,34-06 73 STREET,73 STREET,34 AVENUE,35 AVENUE,ADDRESS,Closed,2016-01-01 07:52:00,The Police Department responded and upon arriv...,2016-01-01 07:41:00,03 QUEENS,9.0
41975,32306559,2015-12-31 23:55:32,2016-01-01 01:53:00,Illegal Parking,Blocked Hydrant,Street/Sidewalk,10032,524 WEST 169 STREET,WEST 169 STREET,AMSTERDAM AVENUE,AUDUBON AVENUE,ADDRESS,Closed,2016-01-01 07:55:00,The Police Department issued a summons in resp...,2016-01-01 01:53:00,12 MANHATTAN,15.0
80818,32306529,2015-12-31 23:56:58,2016-01-01 03:24:00,Illegal Parking,Blocked Sidewalk,Street/Sidewalk,11373,87-14 57 ROAD,57 ROAD,SEABURY STREET,HOFFMAN DRIVE,ADDRESS,Closed,2016-01-01 07:56:00,The Police Department responded and upon arriv...,2016-01-01 03:24:00,04 QUEENS,16.0


In [72]:
# show all unique values in the 'Status' column and their counts
df_311['Status'].value_counts()


Status
Closed      89362
Open           18
Assigned       15
Draft           1
Name: count, dtype: int64

In [73]:
# 2. Show complaints where the complaint type is either "Illegal Parking", "Noise - Street/Sidewalk", or "Blocked Driveway".
df_311[df_311['Complaint Type'].isin(['Illegal Parking', 'Noise - Street/Sidewalk', 'Blocked Driveway'])].shape

(59810, 18)

In [74]:
# 3. Show "Blocked Driveway" complaints where the incident ZIP is either 10007 or 10307.
df_311[(df_311['Complaint Type'] == 'Blocked Driveway') & (df_311['Incident Zip'].isin([10007, 10307]))].shape


(9, 18)

In [75]:
# 4. Find all complaints where the street name starts with the letter 'E'.
df_311[df_311['Street Name'].str.startswith('E', na=False)]

Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Address Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board,Descriptor Length
19065,30280506,2015-03-29 00:43:16,2015-03-29 04:25:50,Noise - Vehicle,Car/Truck Music,Street/Sidewalk,10028,415 EAST 86 STREET,EAST 86 STREET,1 AVENUE,YORK AVENUE,ADDRESS,Closed,2015-03-29 08:43:16,The Police Department responded and upon arriv...,2015-03-29 04:25:50,08 MANHATTAN,15.0
116,30281090,2015-03-29 00:49:27,2015-03-29 04:25:53,Noise - Vehicle,Car/Truck Music,Street/Sidewalk,10028,420 EAST 86 STREET,EAST 86 STREET,1 AVENUE,YORK AVENUE,ADDRESS,Closed,2015-03-29 08:49:27,The Police Department responded and upon arriv...,2015-03-29 04:25:53,08 MANHATTAN,15.0
72193,30280817,2015-03-29 00:57:25,2015-03-29 02:25:31,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,11203,196 EAST 51 STREET,EAST 51 STREET,WINTHROP STREET,CLARKSON AVENUE,ADDRESS,Closed,2015-03-29 08:57:25,The Police Department responded to the complai...,2015-03-29 02:25:31,17 BROOKLYN,16.0
8400,30281419,2015-03-29 01:12:31,2015-03-29 01:47:54,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,10003,100 EAST 20 STREET,EAST 20 STREET,PARK AVENUE SOUTH,GRAMERCY PARK,ADDRESS,Closed,2015-03-29 09:12:31,Your request can not be processed at this time...,2015-03-29 01:47:54,06 MANHATTAN,16.0
83905,30280099,2015-03-29 01:33:12,2015-03-29 04:50:25,Noise - Vehicle,Car/Truck Horn,Street/Sidewalk,10029,169 EAST EAST 111 STREET,EAST 111 STREET,LEXINGTON AVENUE,3 AVENUE,ADDRESS,Closed,2015-03-29 09:33:12,The Police Department issued a summons in resp...,2015-03-29 04:50:25,11 MANHATTAN,14.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6135,32305164,2015-12-31 13:40:53,2015-12-31 19:37:52,Noise - Commercial,Loud Music/Party,Store/Commercial,10029,112 EAST 116 ST-MARIN BOULEVARD,EAST 116 ST-MARIN BOULEVARD,PARK AVENUE,LEXINGTON AVENUE,ADDRESS,Closed,2015-12-31 21:40:53,The Police Department responded to the complai...,2015-12-31 19:37:52,11 MANHATTAN,16.0
85438,32307008,2015-12-31 14:43:17,2015-12-31 15:40:34,Blocked Driveway,No Access,Street/Sidewalk,11213,1231 EASTERN PARKWAY,EASTERN PARKWAY,ROCHESTER AVENUE,BUFFALO AVENUE,ADDRESS,Closed,2015-12-31 22:43:17,The Police Department responded and upon arriv...,2015-12-31 15:40:34,08 BROOKLYN,9.0
34787,32306573,2015-12-31 17:08:07,2015-12-31 18:57:26,Illegal Parking,Blocked Sidewalk,Street/Sidewalk,11210,622 EAST 28 STREET,EAST 28 STREET,FARRAGUT ROAD,FLATBUSH AVENUE,ADDRESS,Closed,2016-01-01 01:08:00,The Police Department responded to the complai...,2015-12-31 18:57:26,14 BROOKLYN,16.0
83902,32308422,2015-12-31 21:12:43,2015-12-31 22:50:08,Blocked Driveway,No Access,Street/Sidewalk,11236,1319 EAST 85 STREET,EAST 85 STREET,AVENUE M,AVENUE N,ADDRESS,Closed,2016-01-01 05:12:00,The Police Department responded and upon arriv...,2015-12-31 22:50:08,18 BROOKLYN,9.0


In [76]:
# 5. Display the first 3 rows and the 2nd to 5th columns using integer-based indexing.
df_311.iloc[0:3, 1:5]


Unnamed: 0,Created Date,Closed Date,Complaint Type,Descriptor
87211,2015-03-29 00:33:03,2015-03-29 03:40:20,Noise - Commercial,Loud Music/Party
34299,2015-03-29 00:35:28,2015-03-29 04:14:27,Noise - Street/Sidewalk,Loud Music/Party
50645,2015-03-29 00:37:15,2015-03-29 01:02:39,Noise - Commercial,Loud Music/Party


In [77]:
# 6. Show Complaint Type, Created Date, and Status for ZIP code 11373 using label-based selection.
df_311.loc[df_311['Incident Zip'] == 11373, ['Complaint Type', 'Created Date', 'Status']]

Unnamed: 0,Complaint Type,Created Date,Status
50242,Noise - Vehicle,2015-03-29 16:31:02,Closed
71983,Illegal Parking,2015-03-31 07:15:04,Closed
26707,Blocked Driveway,2015-03-31 13:59:09,Closed
13233,Illegal Parking,2015-04-01 03:18:00,Closed
88572,Blocked Driveway,2015-04-02 17:02:00,Closed
...,...,...,...
5687,Illegal Parking,2015-12-31 17:23:20,Closed
12694,Blocked Driveway,2015-12-31 21:55:30,Closed
39687,Illegal Parking,2015-12-31 22:43:14,Closed
67979,Blocked Driveway,2015-12-31 22:52:24,Closed


In [78]:
# 7. Filter complaints where the Closed Date falls between October 1, 2015 and October 2, 2015.
df_311[(df_311['Closed Date'].between('2015-10-01', '2015-10-02'))].shape

(323, 18)

### Step 5: Calculate Response Time in Hours

To analyze how long it takes to respond to a 311 complaint, we'll calculate the **response time** by subtracting the `Created Date` from the `Closed Date`.

- Subtracting two datetime columns gives a **Timedelta** object.
- To convert the timedelta to a numeric value (in seconds), use `.dt.total_seconds()`.
- Finally, divide by 3600 to convert seconds to **hours**.

We will store the result in a new column called `Response Time (hrs)`.

In [79]:
# Calculate response time by subtracting Created Date from Closed Date. It should be in hours.
df_311['Response Time'] = (df_311['Closed Date'] - df_311['Created Date']).dt.total_seconds() / 3600  # Convert to hours
# Show the first 5 rows of the DataFrame with the new 'Response Time' column.
df_311[['Created Date', 'Closed Date', 'Response Time']].head()

Unnamed: 0,Created Date,Closed Date,Response Time
87211,2015-03-29 00:33:03,2015-03-29 03:40:20,3.121389
34299,2015-03-29 00:35:28,2015-03-29 04:14:27,3.649722
50645,2015-03-29 00:37:15,2015-03-29 01:02:39,0.423333
19065,2015-03-29 00:43:16,2015-03-29 04:25:50,3.709444
116,2015-03-29 00:49:27,2015-03-29 04:25:53,3.607222


### Step 6: Analyze Response Times

1. **Filter long response times**  
   Identify complaints where the response time exceeded **24 hours**.

2. **Classify as 'Fast' or 'Slow'**  
   Based on the `Response Time (hrs)`, classify each complaint as:
   - `'Fast'` if response time is **6 hours or less**
   - `'Slow'` if it took **more than 6 hours**

This helps in understanding how efficiently complaints were addressed.

In [80]:
# max time to close a complaint
df_311['Response Time'].max()

223.37

In [81]:
# Filter complaints that took longer than 24 hours to close
df_311[df_311['Response Time'] > 24].shape

(1115, 19)

In [82]:
# Classify responses based on a 6-hour threshold
df_311['Response Classification'] = pd.cut(df_311['Response Time'], bins=[0, 6 , 224], labels=['Fast', 'Slow'])
                                           
# Show 'Response Classification','Response Time'
df_311[['Response Classification', 'Response Time']]

Unnamed: 0,Response Classification,Response Time
87211,Fast,3.121389
34299,Fast,3.649722
50645,Fast,0.423333
19065,Fast,3.709444
116,Fast,3.607222
...,...,...
47242,Slow,11.117500
77179,Slow,7.800556
41975,Fast,1.957778
80818,Fast,3.450556


### Step 7: Average Response Time by Complaint Type

To understand which types of complaints take longer to resolve on average:

- Use `groupby()` on the `Complaint Type` column.
- Calculate the **mean** of `Response Time (hrs)` for each group.

In [83]:
# Group complaints by Complaint Type and calculate the average response time for each type.
df_311.groupby('Complaint Type')['Response Time'].mean().sort_values(ascending=False)



Complaint Type
Graffiti                     7.888535
Derelict Vehicle             7.545651
Animal Abuse                 5.213205
Blocked Driveway             4.780392
Illegal Parking              4.451517
Homeless Encampment          4.392481
Vending                      3.941029
Bike/Roller/Skate Chronic    3.885261
Drinking                     3.843890
Noise - Vehicle              3.720984
Urinating in Public          3.631113
Noise - Street/Sidewalk      3.410908
Noise - House of Worship     3.374917
Noise - Park                 3.358954
Traffic                      3.335331
Panhandling                  3.318418
Illegal Fireworks            3.173807
Noise - Commercial           3.097158
Disorderly Youth             2.977696
Posting Advertisement        1.969223
Squeegee                     1.179167
Name: Response Time, dtype: float64

### Step 8: Load and Inspect ZIP Code Information

In this step, you'll load the ZIP code information dataset and inspect its structure. The file name is `zip_code_info.csv`

- How many ZIP codes are listed?
- What columns are available?
- View the first few rows to understand the kind of information it provides (e.g., population, borough).


In [84]:
# Load zip_code_info.csv into a pandas DataFrame
zip_code_info = pd.read_csv('zip_code_info.csv')

In [85]:
zip_code_info.describe()

Unnamed: 0,zip,population
count,196.0,196.0
mean,10807.494898,45629.239796
std,576.507482,27842.176745
min,10001.0,0.0
25%,10250.5,25796.5
50%,11105.5,42073.5
75%,11361.25,67503.75
max,11697.0,107060.0


In [86]:
zip_code_info.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 196 entries, 0 to 195
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   zip         196 non-null    int64  
 1   Borough     196 non-null    object 
 2   city        196 non-null    object 
 3   state_id    196 non-null    object 
 4   population  196 non-null    float64
dtypes: float64(1), int64(1), object(3)
memory usage: 7.8+ KB


In [87]:
zip_code_info['zip'].value_counts()  # Show unique zip codes

zip
10463    2
11385    2
11208    2
11237    2
11414    2
        ..
10453    1
10454    1
10455    1
10456    1
11697    1
Name: count, Length: 191, dtype: int64

### Step 9: Find Complaints in the Most Populated ZIP Code

To understand how complaints are distributed in areas with dense populations:

1. Identify the ZIP code with the **highest population** from `zip_info`.
2. Filter the complaints dataframe to include only those that occurred in that ZIP.
3. Get the total number of complaints in that zip code.
4. Display a few sample complaints from that ZIP.

In [88]:
zip_code_info[zip_code_info['population']==zip_code_info['population'].max()]

Unnamed: 0,zip,Borough,city,state_id,population
153,11368,QUEENS,Corona,NY,107060.0


In [None]:
zip_max_pop = zip_code_info['zip'][zip_code_info['population']==zip_code_info['population'].max()].iloc[0] # Get the zip code with the highest population   

In [93]:
df_311[df_311['Incident Zip'] == zip_max_pop].shape[0]  # Show complaints in the zip code with the highest population

1244

### Step 10: Merge Complaint Data with ZIP Code Demographics

We want to enrich the 311 dataset (`df_311`) with additional info from `zip_info`, such as population and borough name.

Here's what we do:

- Use `.merge()` to combine `df_311` and `zip_info`.
- Match `Incident Zip` from the 311 data with `zip` from the ZIP info.
- Use a **left join** to:
  - Keep all rows from `df_311` (even if some ZIPs don’t match).
  - Add population and borough data **only where a match exists**.

After merging:
- You'll see all columns from `zip_info`.
- The `zip` column (from `zip_info`) becomes redundant — it duplicates `Incident Zip` — so drop it.

In [94]:
df_merged = pd.merge(left=df_311, right=zip_code_info, left_on='Incident Zip', right_on='zip', how='left')  # Merge DataFrames on 'Incident Zip' and 'zip'  
df_merged.head()  # Show the first 5 rows of the merged DataFrame

Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,...,Resolution Action Updated Date,Community Board,Descriptor Length,Response Time,Response Classification,zip,Borough,city,state_id,population
0,30283424,2015-03-29 00:33:03,2015-03-29 03:40:20,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,11206,162 THROOP AVENUE,THROOP AVENUE,HOPKINS STREET,...,2015-03-29 03:40:20,03 BROOKLYN,16.0,3.121389,Fast,11206.0,BROOKLYN,Brooklyn,NY,89231.0
1,30283432,2015-03-29 00:35:28,2015-03-29 04:14:27,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,11233,120 CHAUNCEY STREET,CHAUNCEY STREET,STUYVESANT AVENUE,...,2015-03-29 04:14:27,03 BROOKLYN,16.0,3.649722,Fast,11233.0,BROOKLYN,Brooklyn,NY,82711.0
2,30280732,2015-03-29 00:37:15,2015-03-29 01:02:39,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,10014,22 9 AVENUE,9 AVENUE,WEST 13 STREET,...,2015-03-29 01:02:39,02 MANHATTAN,16.0,0.423333,Fast,10014.0,MANHATTAN,New York,NY,29772.0
3,30280506,2015-03-29 00:43:16,2015-03-29 04:25:50,Noise - Vehicle,Car/Truck Music,Street/Sidewalk,10028,415 EAST 86 STREET,EAST 86 STREET,1 AVENUE,...,2015-03-29 04:25:50,08 MANHATTAN,15.0,3.709444,Fast,10028.0,MANHATTAN,New York,NY,45679.0
4,30281090,2015-03-29 00:49:27,2015-03-29 04:25:53,Noise - Vehicle,Car/Truck Music,Street/Sidewalk,10028,420 EAST 86 STREET,EAST 86 STREET,1 AVENUE,...,2015-03-29 04:25:53,08 MANHATTAN,15.0,3.607222,Fast,10028.0,MANHATTAN,New York,NY,45679.0


### Step 11: Map Borough Names to Abbreviations

To simplify analysis and plots, we'll map full borough names to shorter codes using a dictionary.

- Use the `.map()` function on the `Borough` column.
- Provide a dictionary where keys are full names and values are abbreviations.
- Store the result in a new column called `Borough Code`.

Example:
- `"MANHATTAN"` → `"MH"`
- `"BROOKLYN"` → `"BK"`

In [95]:
borough_map = {'MANHATTAN': 'MH', 'BROOKLYN': 'BK', 'QUEENS': 'QN', 'BRONX': 'BX', 'STATEN ISLAND': 'SI'}
# Your code here

In [97]:
df_merged['Borough Code'] = df_merged['Borough'].map(borough_map)  # Map borough names to codes
df_merged.head()  # Show the first 5 rows of the DataFrame with the

Unnamed: 0,Unique Key,Created Date,Closed Date,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,...,Community Board,Descriptor Length,Response Time,Response Classification,zip,Borough,city,state_id,population,Borough Code
0,30283424,2015-03-29 00:33:03,2015-03-29 03:40:20,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,11206,162 THROOP AVENUE,THROOP AVENUE,HOPKINS STREET,...,03 BROOKLYN,16.0,3.121389,Fast,11206.0,BROOKLYN,Brooklyn,NY,89231.0,BK
1,30283432,2015-03-29 00:35:28,2015-03-29 04:14:27,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,11233,120 CHAUNCEY STREET,CHAUNCEY STREET,STUYVESANT AVENUE,...,03 BROOKLYN,16.0,3.649722,Fast,11233.0,BROOKLYN,Brooklyn,NY,82711.0,BK
2,30280732,2015-03-29 00:37:15,2015-03-29 01:02:39,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,10014,22 9 AVENUE,9 AVENUE,WEST 13 STREET,...,02 MANHATTAN,16.0,0.423333,Fast,10014.0,MANHATTAN,New York,NY,29772.0,MH
3,30280506,2015-03-29 00:43:16,2015-03-29 04:25:50,Noise - Vehicle,Car/Truck Music,Street/Sidewalk,10028,415 EAST 86 STREET,EAST 86 STREET,1 AVENUE,...,08 MANHATTAN,15.0,3.709444,Fast,10028.0,MANHATTAN,New York,NY,45679.0,MH
4,30281090,2015-03-29 00:49:27,2015-03-29 04:25:53,Noise - Vehicle,Car/Truck Music,Street/Sidewalk,10028,420 EAST 86 STREET,EAST 86 STREET,1 AVENUE,...,08 MANHATTAN,15.0,3.607222,Fast,10028.0,MANHATTAN,New York,NY,45679.0,MH


### Step 12: Calculate Total Complaints and Complaints per 1,000 Residents

We want to understand how complaint volume compares across ZIP codes, adjusting for population size.

#### Goal:
- Count how many 311 complaints were made **per ZIP code**.
- Normalize by population to get **complaints per 1,000 residents**. Complaints per 1000 is calculated as: total complaints divided by population, then multiplied by 1000.

#### How?
1. Use `.groupby()` to group data by ZIP.
2. Use `.agg()` to compute:
   - The total number of complaints using `count` on the `Unique Key`.
   - The population using the `first` non-null value from the `population` column.
   
   You can pass this as a dictionary with:
   ```python
   .agg({'column name 1': 'aggregation function 1', 'column name 2': 'aggregation function 2', ...})


In [102]:
complaint_by_zip = df_merged.groupby('Incident Zip').agg(
    total_complaints=('Unique Key', 'count'),
    pop = ('population', 'first')
).reset_index()
complaint_by_zip

Unnamed: 0,Incident Zip,total_complaints,pop
0,10000,24,
1,10001,380,29079.0
2,10002,857,75517.0
3,10003,811,53825.0
4,10004,128,3875.0
...,...,...,...
185,11691,218,68704.0
186,11692,75,23247.0
187,11693,113,13066.0
188,11694,218,21430.0


Unnamed: 0,Incident Zip,total_complaints,pop,complaints_per_1000
0,10000,24,,
1,10001,380,29079.0,13.067850
2,10002,857,75517.0,11.348438
3,10003,811,53825.0,15.067348
4,10004,128,3875.0,33.032258
...,...,...,...,...
185,11691,218,68704.0,3.173032
186,11692,75,23247.0,3.226223
187,11693,113,13066.0,8.648400
188,11694,218,21430.0,10.172655


### Step 13: How to JOIN the DataFrames?

- `.join()` is mainly used to **combine two DataFrames by their index**.
- It's best used when both DataFrames have a meaningful index (e.g., ZIP code).

For this question, you already have a DataFrame:
- `zip_info`: Demographic data per ZIP.

And you create another one called `complaints_by_zip`
- `zip_summary`: Total complaints per ZIP

**Index both of them** with zip column.

**Task:**
1. Use `.join()` to perform a **left join** to add demographic info only for ZIPs that appear in the complaint summary.
2. Use `.join()` again with **how='right'** to include **all ZIPs**, even those with no complaints.
3. Compare the size of the two results.

In [None]:
# Your code here
# zip_summary = ?
# zip_info_indexed = ?

In [104]:
complaint_by_zip['complaints_per_1000'] = (complaint_by_zip['total_complaints'] / complaint_by_zip['pop']) * 1000
complaint_by_zip

Unnamed: 0,Incident Zip,total_complaints,pop,complaints_per_1000
0,10000,24,,
1,10001,380,29079.0,13.067850
2,10002,857,75517.0,11.348438
3,10003,811,53825.0,15.067348
4,10004,128,3875.0,33.032258
...,...,...,...,...
185,11691,218,68704.0,3.173032
186,11692,75,23247.0,3.226223
187,11693,113,13066.0,8.648400
188,11694,218,21430.0,10.172655


### Step 14: Analyze Complaint Types Across Boroughs with a Pivot Table

To summarize and compare complaint volumes across boroughs and types, we use a **pivot table**. Pivot tables are used when you want to **summarize grouped data** in a 2D format for easy comparison.

#### What is a pivot table?
A pivot table reshapes data:
- **Rows (`index`)** represent categories you want to group by (e.g., `Borough Code`).
- **Columns (`columns`)** represent subcategories (e.g., `Complaint Type`).
- **Values (`values`)** are the numbers you want to compute (e.g., count of complaints).
- **aggfunc** defines what to compute: `count`, `sum`, `mean`, etc.

####  In this case:
- We group complaints by **borough code**.
- For each borough, we count the number of complaints of each **type**.
- We use `fill_value=0` to replace missing combinations with zero (no complaints of that type).

SyntaxError: invalid syntax. Perhaps you forgot a comma? (4115554111.py, line 3)

### Step 15: Convert Pivot Table to Long Format with `melt()`

We previously created a **pivot table** to compare complaint counts by borough and type. Now, we'll use `melt()` to convert that wide-format table back into a **long-format** DataFrame.


#### What is `melt()`?

- `melt()` is used to **unpivot** or **flatten** a DataFrame.
- It turns columns into rows, which is useful when:
  - You want to plot or analyze categorical data more easily.
  - You need a **tidy format**: one row per observation.


#### In this case:
- `id_vars='Borough Code'` means we'll keep that column as-is.
- All other columns (complaint types) become values in a new column: `'Complaint Type'`.
- Their counts go into `'Complaint Count'`.

This is useful for **grouped bar plots**, heatmaps, or exporting clean data.

### Step 16: Save Complaint Summary and ZIP Info for Visualization

We've computed the **total complaints and complaints per 1,000 residents** by ZIP code (`complaints_by_zip`), we’ll save this summary to a file for future use in visualizations (e.g., maps, bar charts).

Do these:
- Save `complaints_by_zip` to a **CSV file** for general use. Remember to deal with the index.
- Also use the given code to export both `complaints_by_zip` and the full `zip_info` DataFrame to an **Excel file** with two separate sheets.

**Note:** To use `pd.ExcelWriter` for saving `.xlsx` files, you need to have `openpyxl` library installed. If you get an error for that code cell, you can install it using the command:  
`pip install openpyxl`


In [None]:
# Save to CSV
# Your code here

In [None]:
# Save both DataFrames to separate sheets in one Excel file
with pd.ExcelWriter("zip_complaints_summary.xlsx") as writer:
    complaints_by_zip.to_excel(writer, sheet_name="Complaint Summary", index=False)
    zip_info.to_excel(writer, sheet_name="ZIP Info", index=False)

# Bonus Section: Performance Comparison: Vectorized Operations vs. apply() vs. iterrows()

When working with pandas, **how** you write your operations can make a huge difference in performance — especially on large datasets.

In this example, we compare three common ways to assign a new column based on `Response Time (hrs)`. We have already solved this problem but let's revisit it:

**Question was**: Label each row as `'Fast'` if the response time is 6 hours or less, and `'Slow'` otherwise.

You can solve this in different ways:

#### 1. `apply()` with lambda as we did
#### 2. `np.where` – Vectorized
#### 3. `iterrows()` – Avoid if Possible. This is like a for loop on rows of the dataframe

**time.time() is used to compute the time taken to compute a block of code**

`np.where` works like this and you can assign it to a column:
```python
np.where(df['col'] <= x, 'A', 'B')

`iterrows()` loops over a DataFrame row by row, returning each row as a (index, Series) pair.

Use only when absolutely necessary (e.g., complex logic that can't be vectorized) as this is very time-consuming.

```python
for index, row in df.iterrows():
    # Access values like a dictionary
    value = row['ColumnName']
    # Perform operations here


In [None]:
import time
import numpy as np

# 1. Vectorized with np.where
start = time.time()
df_311['Speed Category'] = np.where(df_311['Response Time (hrs)'] <= 6, 'Fast', 'Slow')
print("np.where time:", round(time.time() - start, 4), "seconds")

# 2. apply() with lambda
start = time.time()
df_311['Speed Category'] = df_311['Response Time (hrs)'].apply(lambda x: 'Fast' if x <= 6 else 'Slow')
print("apply() time:", round(time.time() - start, 4), "seconds")

# 3. iterrows()
start = time.time()
speed_labels = []
for _, row in df_311.iterrows():
    speed_labels.append('Fast' if row['Response Time (hrs)'] <= 6 else 'Slow')
df_311['Speed Category'] = speed_labels
print("iterrows() time:", round(time.time() - start, 4), "seconds")

### A randomly generated DataFrame
to show the difference of vectorized operations better

In [None]:
import pandas as pd
import numpy as np
import time

# Create a large DataFrame
df_test = pd.DataFrame({'value': np.arange(1000000)})

# 1. Vectorized
start = time.time()
df_test['squared_vec'] = df_test['value'] ** 2
print("Vectorized time:", round(time.time() - start, 4), "seconds")

# 2. apply() with lambda
start = time.time()
df_test['squared_apply'] = df_test['value'].apply(lambda x: x ** 2)
print("apply() time:", round(time.time() - start, 4), "seconds")

# 3. iterrows()
start = time.time()
squared = []
for _, row in df_test.iterrows():
    squared.append(row['value'] ** 2)
df_test['squared_iterrows'] = squared
print("iterrows() time:", round(time.time() - start, 4), "seconds")
