# Accessing Chicago Data with SQLite and Python
The datasets used in this project are as follows:
- **ChicagoCensusData.csv**: Socioeconomic data for the years 2008-2012. The data set and more information on it can be found [here.](https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2/about_data?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork-20127838&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvo_campaign=000026UJ&cvosrc=email.Newsletter.M12345678&utm_content=000026UJ&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork20127838-2021-01-01&utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_term=10006555 "Census Data")
- **ChicagoPublicSchool**: 2011-2012 data on school performance levels used to create report cards. More information can be found on the [website](https://data.cityofchicago.org/Education/Chicago-Public-Schools-Progress-Report-Cards-2011-/9xs2-f89t/about_data?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork-20127838&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvo_campaign=000026UJ&cvosrc=email.Newsletter.M12345678&utm_content=000026UJ&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork20127838-2021-01-01&utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_term=10006555 "Chicago Schools")
- **ChicagoCrimeData**: the data reflects reports instances of crime from the year 2021 and is publicly available [here](https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present/ijzp-q8t2/about_data?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork-20127838&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvo_campaign=000026UJ&cvosrc=email.Newsletter.M12345678&utm_content=000026UJ&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork20127838-2021-01-01&utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_term=10006555)

To get started with the project, the required packages would first be imported. 

In [16]:
import pandas as pd 
import prettytable 
import sqlite3 

prettytable.DEFAULT = 'DEFAULT'

### Store The Data in Tables of a DB
To do this, first establish a connection to a db. Then load the data a pandas dataframe and create the respective tables for the data in the database.

In [2]:
# Create a db connection 
connection = sqlite3.connect('CHICAGO.db')
cursor = connection.cursor()

# Load the data sets to dataframes
census_data = pd.read_csv('file_csv/ChicagoCensusData.csv')
chicago_crime_data = pd.read_csv('file_csv/ChicagoCrimeData.csv')
school_data = pd.read_csv('file_csv/ChicagoPublicSchools.csv')

# Load the dataframes to the database
census_data.to_sql('CENSUS_DATA', connection, if_exists='replace', index=False)
chicago_crime_data.to_sql('CHICAGO_CRIME_DATA', connection, if_exists='replace', index=False)
school_data.to_sql('CHICAGO_PUBLIC_SCHOOLS', connection, if_exists='replace', index=False)

566

## Use SQL Magic to answer questions about the data
First create a connection to the database with sqlmagic.

In [None]:
# Connect SQL Magic to the database
%load_ext sql 
%sql sqlite:///CHICAGO.db


The sql extension is already loaded. To reload it, use:
  %reload_ext sql


Find the total number of crimes recorded.


In [33]:
%sql SELECT COUNT(PRIMARY_TYPE) AS TOTAL_CRIME FROM CHICAGO_CRIME_DATA;


 * sqlite:///CHICAGO.db
Done.


TOTAL_CRIME
533


List community area names and numbers with per capita income less than 11000.

In [23]:
%sql SELECT COMMUNITY_AREA_NAME, COMMUNITY_AREA_NUMBER, PER_CAPITA_INCOME FROM CENSUS_DATA WHERE PER_CAPITA_INCOME < 11000;

 * sqlite:///CHICAGO.db
Done.


COMMUNITY_AREA_NAME,COMMUNITY_AREA_NUMBER,PER_CAPITA_INCOME
West Garfield Park,26.0,10934
South Lawndale,30.0,10402
Fuller Park,37.0,10432
Riverdale,54.0,8201


List all case numbers for crimes involving minors (children are not considered minors for the purposes of crime analysis)

In [56]:
%sql SELECT CASE_NUMBER FROM CHICAGO_CRIME_DATA WHERE DESCRIPTION LIKE '%CHILD%'

 * sqlite:///CHICAGO.db
Done.


CASE_NUMBER
HN567387
HR391350
HN144152


List all kidnapping crimes involving a child

In [32]:
%sql SELECT * FROM CHICAGO_CRIME_DATA WHERE PRIMARY_TYPE LIKE '%KIDNAP%' AND DESCRIPTION LIKE '%CHILD%'

 * sqlite:///CHICAGO.db
Done.


ID,CASE_NUMBER,DATE,BLOCK,IUCR,PRIMARY_TYPE,DESCRIPTION,LOCATION_DESCRIPTION,ARREST,DOMESTIC,BEAT,DISTRICT,WARD,COMMUNITY_AREA_NUMBER,FBICODE,X_COORDINATE,Y_COORDINATE,YEAR,LATITUDE,LONGITUDE,LOCATION
5276766,HN144152,2007-01-26,050XX W VAN BUREN ST,1792,KIDNAPPING,CHILD ABDUCTION/STRANGER,STREET,0,0,1533,15,29.0,25.0,20,1143050.0,1897546.0,2007,41.87490841,-87.75024931,"(41.874908413, -87.750249307)"


List the kind of crimes that were recorded at schools

In [57]:
%sql SELECT DISTINCT PRIMARY_TYPE, LOCATION_DESCRIPTION FROM CHICAGO_CRIME_DATA WHERE LOCATION_DESCRIPTION LIKE '%SCHOOL%'

 * sqlite:///CHICAGO.db
Done.


PRIMARY_TYPE,LOCATION_DESCRIPTION
BATTERY,"SCHOOL, PUBLIC, GROUNDS"
BATTERY,"SCHOOL, PUBLIC, BUILDING"
CRIMINAL DAMAGE,"SCHOOL, PUBLIC, GROUNDS"
NARCOTICS,"SCHOOL, PUBLIC, GROUNDS"
NARCOTICS,"SCHOOL, PUBLIC, BUILDING"
ASSAULT,"SCHOOL, PUBLIC, GROUNDS"
CRIMINAL TRESPASS,"SCHOOL, PUBLIC, GROUNDS"
PUBLIC PEACE VIOLATION,"SCHOOL, PRIVATE, BUILDING"
PUBLIC PEACE VIOLATION,"SCHOOL, PUBLIC, BUILDING"


List the type of schools along with the average safety score for each type.

In [38]:
%sql SELECT "Elementary, Middle, or High School" AS SCHOOL_TYPE, AVG(SAFETY_SCORE) AS AVG_S_SCORE FROM CHICAGO_PUBLIC_SCHOOLS GROUP BY SCHOOL_TYPE

 * sqlite:///CHICAGO.db
Done.


SCHOOL_TYPE,AVG_S_SCORE
ES,49.52038369304557
HS,49.62352941176471
MS,48.0


List 5 community areas with highest % of households below poverty line.

In [40]:
%sql SELECT COMMUNITY_AREA_NAME, PERCENT_HOUSEHOLDS_BELOW_POVERTY FROM CENSUS_DATA ORDER BY PERCENT_HOUSEHOLDS_BELOW_POVERTY DESC LIMIT 5

 * sqlite:///CHICAGO.db
Done.


COMMUNITY_AREA_NAME,PERCENT_HOUSEHOLDS_BELOW_POVERTY
Riverdale,56.5
Fuller Park,51.2
Englewood,46.6
North Lawndale,43.1
East Garfield Park,42.4


Which community area is most crime prone? Display the coumminty area number only.

In [42]:
%sql SELECT COMMUNITY_AREA_NUMBER, COUNT(PRIMARY_TYPE) FROM CHICAGO_CRIME_DATA GROUP BY COMMUNITY_AREA_NUMBER ORDER BY COUNT(PRIMARY_TYPE) DESC LIMIT 1

 * sqlite:///CHICAGO.db
Done.


COMMUNITY_AREA_NUMBER,COUNT(PRIMARY_TYPE)
25.0,43


Use a sub-query to find the name of the community area with highest hardship index

In [58]:
%sql SELECT COMMUNITY_AREA_NAME FROM CENSUS_DATA WHERE HARDSHIP_INDEX IN (SELECT MAX(HARDSHIP_INDEX) FROM CENSUS_DATA)

 * sqlite:///CHICAGO.db
Done.


COMMUNITY_AREA_NAME
Riverdale


Use a sub-query to determine the Community Area Name with most number of crimes?

In [60]:
%%sql
SELECT CD.COMMUNITY_AREA_NAME
FROM CENSUS_DATA AS CD
INNER JOIN CHICAGO_CRIME_DATA AS CCD ON CCD.COMMUNITY_AREA_NUMBER = CD.COMMUNITY_AREA_NUMBER
GROUP BY CD.COMMUNITY_AREA_NAME
ORDER BY COUNT(CCD.PRIMARY_TYPE) DESC
LIMIT 1

 * sqlite:///CHICAGO.db
Done.


COMMUNITY_AREA_NAME
Austin
