# Exploring New York City's Parking Violations in 2022

# Introduction

New York City (NYC) is one of the most traffic-congested cities in the world. As a result, one problem faced in NYC is motorists' parking violations. This project will explore parking violation data from NYC for the year 2022 to inform the stakeholders of the realities of the problem and provide insights to help improve the situation.     

# Skills Used
- SQL and Tableau
In this project, I will use MYSQL to query and analyze a large volume of parking violation data from NYC Open Data. I will then use Tableau to visualize the insights obtained from the SQL queries on the data.   

# Objective
The main objective of the project is to explore NYC traffic violation data for the year 2022 to provide answers to the following questions:
. Who were the main contributors to parking violations in NYC based on the state of vehicle registration?
. What was the distribution of parking violations across NYC? Which areas were the violation hotspots?
. How did the parking violations vary during the year? Which months had the most violations? 
. How did parking violations vary based on the time of day? Which time of day are you most likely to get ticketed?
. Which enforcement agency issued the most parking violations?




# Data
The data used in this project was obtained from [NYC OpenData](https://data.cityofnewyork.us/City-Government/Parking-Violations-Issued-Fiscal-Year-2023/869v-vr48/about_data). The data used in this project was extracted for the year 2022 from the NYC OpenData database by filtering the data accordingly. The resulting dataset consists of 16,377,276 rows and  

The data dictionary is provided below:

In [None]:
import mysql.connector
import pandas as pd

# Establish connection
connection = mysql.connector.connect(
    host='localhost',       # or the IP address of your MySQL server
    user='root',       # your MySQL username
    password='CheckBigD@ta24', # your MySQL password
    database='nyc_parking_vio', # the name of the database you are using
    port=3306               # the port number, default is 3306
)
)

# Example query
query = "SELECT * FROM parking_violations LIMIT 10;"

# Fetch data
df = pd.read_sql(query, connection)


# loading the data.
The dataset, in the form of a CSV file, was downloaded from NYC OpenData and then loaded into MYSQL for analysis.

# Create a database and data table

CREATE DATABASE nyc_parking_vio;
USE nyc_parking_vio;

I encountered challenges trying to create the table for the database. Though the data dictionary indicated that some of the columns were stored as numbers, etc., I had to load some of the data initially as varchar(255) and then convert the relevant columns to their respective data types. Therefore a staging table (parking_violations_staging) was used to load the data. The issue_date was then converted to date STR_TO_DATE(issue_date, '%m/%d/%Y'). The final table (parking_violations) was created and the data was then inserted into the final (parking_violation table). The missing values in all columns of the table were replaced with 'null' values.



# Create the database table

create table vio_22
(
Plate varchar(255), 
State varchar(255), 
License_Type varchar(255), 
Summons_Number varchar(255), 
Issue_Date varchar(255), 
Violation_Time varchar(255), 
Violation varchar(255), 
Judgment_Entry_Date varchar(255), 
Fine_Amount decimal(10,2), 
Penalty_Amount decimal(10,2), 
Interest_Amount decimal(10,2), 
Reduction_Amount decimal(10,2), 
Payment_Amount decimal(10,2), 
Amount_Due decimal(10,2), 
Precinct int, 
County varchar(255), 
Issuing_Agency varchar(255), 
Violation_Status varchar(255), 
Summons_Image varchar(255)
);


# Data Cleaning


## Missing Values

The missing values were set to null using the code below:
SET
  Plate = NULLIF(Plate, ''),
  State = NULLIF(State, ''),
  License_Type = NULLIF(License_Type, ''),
  Summons_Number = NULLIF(Summons_Number, ''),
  Issue_Date = NULLIF(Issue_Date, ''),
  Violation_Time = NULLIF(Violation_Time, ''),
  Violation = NULLIF(Violation, ''),
  Judgment_Entry_Date = NULLIF(Judgment_Entry_Date, ''),
  Fine_Amount = NULLIF(Fine_Amount, ''),
  Penalty_Amount = NULLIF(Penalty_Amount, ''),
  Interest_Amount = NULLIF(Interest_Amount, ''),
  Reduction_Amount = NULLIF(Reduction_Amount, ''),
  Payment_Amount = NULLIF(Payment_Amount, ''),
  Amount_Due = NULLIF(Amount_Due, ''),
  Precinct = NULLIF(Precinct, ''),
  County = NULLIF(County, ''),
  Issuing_Agency = NULLIF(Issuing_Agency, ''),
  Violation_Status = NULLIF(Violation_Status, ''),
  Summons_Image = NULLIF(Summons_Image, '');


In [None]:
## Convert date columns
The data in the 

In [None]:
# Note. some of the code used in this project was achieved with the assistance of AI.