Using the same nyc-ticket-violation dataset, let’s assume that tickets can be dismissed if the license plate, state, and/or street name are all there but without requiring the make of car. Remove rows that are missing one or more of these. How many rows remain? Assuming $100/ticket, how much money would the city lose as a result of this?

In [1]:
import pandas as pd
import numpy as np

In [2]:
# importing the nyc_ticket_violation_dataset

nyc_park_violation_data = pd.read_csv('nyc-parking-violations-2020.csv', 
                                      usecols=["Plate ID", "Registration State", "Street Name", "Vehicle Make"])

In [3]:
# Getting information on selected portions of data

nyc_park_violation_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12495734 entries, 0 to 12495733
Data columns (total 4 columns):
 #   Column              Dtype 
---  ------              ----- 
 0   Plate ID            object
 1   Registration State  object
 2   Vehicle Make        object
 3   Street Name         object
dtypes: object(4)
memory usage: 381.3+ MB


In [4]:
nyc_park_violation_data

Unnamed: 0,Plate ID,Registration State,Vehicle Make,Street Name
0,J58JKX,NJ,HONDA,43 ST
1,KRE6058,PA,ME/BE,UNION ST
2,444326R,NJ,LEXUS,CLERMONT AVENUE
3,F728330,OH,CHEVR,DIVISION AVE
4,FMY9090,NY,JEEP,GRAND ST
...,...,...,...,...
12495729,62161MM,NY,FORD,3RD AVE
12495730,GYE7330,NY,HONDA,PELHAM PARK DR
12495731,HNY4802,NY,FORD,LYDIG AVE
12495732,T687081C,NY,TOYOT,E 68 STREET


### QUESTION 1 - Removing missing data from dataframe

In [7]:
# Getting the number of rows with missing data

nyc_park_violation_data.isnull().sum()


Plate ID                202
Registration State        0
Vehicle Make          62420
Street Name            1417
dtype: int64

In [6]:
# Getting the rows with data 

rows_with_data = nyc_park_violation_data.dropna()

rows_with_data.isnull().sum()

Plate ID              0
Registration State    0
Vehicle Make          0
Street Name           0
dtype: int64

### QUESTION 2 - Rows left after removal of missing data

In [8]:
# Calculating number of rows left after cleaning data

rows_left = len(nyc_park_violation_data.dropna())
print("Number of rows left after cleaning data is " ,rows_left)

Number of rows left after cleaning data is  12431949


### QUESTION 3 - Calculating money lost by State

In [8]:
# calculating removed rows

removed_rows = nyc_park_violation_data.shape[0] - rows_with_data.shape[0]
print("Total number of rows removed is ",removed_rows)

Total number of rows removed is  63785


In [9]:
# calculating the amount lost by the city due to missing data

money_lost = removed_rows * 100

print(f"Money lost by state due to missing data is ${money_lost:,.2f}")

Money lost by state due to missing data is $6,378,500.00
