# Data cleaning in SQL

General notes:

- To execute SQL commands, prefix with `%sql`. For multiple lines, use `%%sql` and then a newline.

## SQL environment setup

We'll be using [ipython-sql](https://github.com/catherinedevlin/ipython-sql) to work with SQL directly within a notebook.

In [1]:
%load_ext sql

In [3]:
%sql postgresql://postgres:postgres@db/data-cleaning

Connection info needed in SQLAlchemy format, example:
               postgresql://username:password@hostname/dbname
               or an existing connection: dict_keys([])
No module named 'psycopg2'
Connection info needed in SQLAlchemy format, example:
               postgresql://username:password@hostname/dbname
               or an existing connection: dict_keys([])


### Load data

We're taking data from the CSV and putting it in an in-memory [SQLite](https://sqlite.org/index.html) database via [pandas](https://pandas.pydata.org/).

In [3]:
import pandas as pd

requests = pd.read_csv("311_jan_2022.csv", index_col="Unique Key")

In [4]:
%sql --persist requests

 * sqlite://


'Persisted requests'

Ensure records were loaded:

In [7]:
%sql SELECT * FROM requests LIMIT 3;

 * sqlite://
Done.


Unique Key,Created Date,Closed Date,Agency,Agency Name,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,Cross Street 1,Cross Street 2,Intersection Street 1,Intersection Street 2,Address Type,City,Landmark,Facility Type,Status,Due Date,Resolution Description,Resolution Action Updated Date,Community Board,BBL,Borough,X Coordinate (State Plane),Y Coordinate (State Plane),Open Data Channel Type,Park Facility Name,Park Borough,Vehicle Type,Taxi Company Borough,Taxi Pick Up Location,Bridge Highway Name,Bridge Highway Direction,Road Ramp,Bridge Highway Segment,Latitude,Longitude,Location
52940375,01/01/2022 12:00:00 AM,01/03/2022 08:39:00 AM,DEP,Department of Environmental Protection,Air Quality,"Air: Odor/Fumes, Vehicle Idling (AD3)",,10036.0,640 8 AVENUE,8 AVENUE,W 41 ST,W 42 ST,,,ADDRESS,NEW YORK,,,Closed,,The Department of Environmental Protection determined that this complaint is a duplicate of a previously filed complaint. The original complaint is being addressed.,01/03/2022 08:39:00 AM,05 MANHATTAN,1010137501.0,MANHATTAN,986967.0,214950.0,ONLINE,Unspecified,MANHATTAN,,,,,,,,40.75666417742652,-73.99019293432467,"(40.75666417742652, -73.99019293432467)"
52934953,01/01/2022 12:00:10 AM,01/01/2022 01:00:11 AM,NYPD,New York City Police Department,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,11222.0,126 DRIGGS AVENUE,DRIGGS AVENUE,NORTH HENRY STREET,RUSSELL STREET,NORTH HENRY STREET,RUSSELL STREET,ADDRESS,BROOKLYN,DRIGGS AVENUE,,Closed,,The Police Department responded to the complaint and with the information available observed no evidence of the violation at that time.,01/01/2022 01:00:15 AM,01 BROOKLYN,,BROOKLYN,999866.0,202742.0,MOBILE,Unspecified,BROOKLYN,,,,,,,,40.72314288436064,-73.94366208445774,"(40.72314288436064, -73.94366208445774)"
52933158,01/01/2022 12:00:57 AM,01/01/2022 12:58:22 AM,NYPD,New York City Police Department,Noise - Residential,Loud Talking,Residential Building/House,11214.0,45 BAY 38 STREET,BAY 38 STREET,86 STREET,BENSON AVENUE,86 STREET,BENSON AVENUE,ADDRESS,BROOKLYN,BAY 38 STREET,,Closed,,The Police Department responded to the complaint and with the information available observed no evidence of the violation at that time.,01/01/2022 12:58:29 AM,11 BROOKLYN,3068667501.0,BROOKLYN,987344.0,156952.0,MOBILE,Unspecified,BROOKLYN,,,,,,,,40.59747269272421,-73.98885877127528,"(40.59747269272421, -73.98885877127528)"


[Display the schema](https://sqlite.org/cli.html#querying_the_database_schema):

In [6]:
%sql SELECT sql FROM sqlite_schema ORDER BY tbl_name, type DESC, name;

 * sqlite://
Done.


sql
"CREATE TABLE requests ( 	""Unique Key"" BIGINT, ""Created Date"" TEXT, ""Closed Date"" TEXT, ""Agency"" TEXT, ""Agency Name"" TEXT, ""Complaint Type"" TEXT, ""Descriptor"" TEXT, ""Location Type"" TEXT, ""Incident Zip"" FLOAT, ""Incident Address"" TEXT, ""Street Name"" TEXT, ""Cross Street 1"" TEXT, ""Cross Street 2"" TEXT, ""Intersection Street 1"" TEXT, ""Intersection Street 2"" TEXT, ""Address Type"" TEXT, ""City"" TEXT, ""Landmark"" TEXT, ""Facility Type"" TEXT, ""Status"" TEXT, ""Due Date"" TEXT, ""Resolution Description"" TEXT, ""Resolution Action Updated Date"" TEXT, ""Community Board"" TEXT, ""BBL"" FLOAT, ""Borough"" TEXT, ""X Coordinate (State Plane)"" FLOAT, ""Y Coordinate (State Plane)"" FLOAT, ""Open Data Channel Type"" TEXT, ""Park Facility Name"" TEXT, ""Park Borough"" TEXT, ""Vehicle Type"" TEXT, ""Taxi Company Borough"" TEXT, ""Taxi Pick Up Location"" TEXT, ""Bridge Highway Name"" TEXT, ""Bridge Highway Direction"" TEXT, ""Road Ramp"" TEXT, ""Bridge Highway Segment"" TEXT, ""Latitude"" FLOAT, ""Longitude"" FLOAT, ""Location"" TEXT )"
"CREATE INDEX ""ix_requests_Unique Key"" ON requests (""Unique Key"")"
