# Introduction

Using this Python notebook we will:

1.  Understand three Chicago datasets
2.  Load the three datasets into three tables in a SQLIte database
3.  Execute SQL queries to answer assignment questions


## Understand the datasets

In this  project we will be using three datasets that are available on the city of Chicago's Data Portal:

### 1. Socioeconomic Indicators in Chicago

This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” for each Chicago community area, for the years 2008 – 2012.

### 2. Chicago Public Schools

This dataset shows all school level performance data used to create CPS School Report Cards for the 2011-2012 school year. This dataset is provided by the city of Chicago's Data Portal.

### 3. Chicago Crime Data

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days.



### Download the datasets

This project requires to have these three tables populated with a subset of the whole datasets.

In many cases the dataset to be analyzed is available as a .CSV (comma separated values) file, perhaps on the internet. Click on the links below to download and save the datasets (.CSV files):

*   <a href="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoCensusData.csv?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork20127838-2021-01-01" target="_blank">Chicago Census Data</a>

*   <a href="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoPublicSchools.csv?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork20127838-2021-01-01" target="_blank">Chicago Public Schools</a>

*   <a href="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoCrimeData.csv?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork20127838-2021-01-01" target="_blank">Chicago Crime Data</a>




### Store the datasets in database tables

To analyze the data using SQL, it first needs to be loaded into SQLite DB.
We will create three tables in as under:

1.  **CENSUS_DATA**
2.  **CHICAGO_PUBLIC_SCHOOLS**
3.  **CHICAGO_CRIME_DATA**

Let us now load the ipython-sql  extension and establish a connection with the database

* Here we will be loading the csv files into the pandas Dataframe and then loading the data into the above mentioned sqlite tables.

* Next we will be connecting to the sqlite database  **FinalDB**.






In [2]:
%load_ext sql

In [3]:
import warnings

warnings.filterwarnings('ignore')

In [4]:
import sqlite3

import pandas as pd

In [5]:
!pip install -q pandas==1.1.5

In [6]:
connect_object = sqlite3.connect("FinalDB.db")

cursor_object = connect_object.cursor()

In [7]:
%sql sqlite:///FinalDB.db

'Connected: @FinalDB.db'

In [8]:
# load the ChicagoCensusData into pandas DataGrame 
df_census = pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoCensusData.csv")

# Load the ChicagoCrimeData into pandas DataFrame
df_crime = pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoCrimeData.csv")

# load the ChocagoPublicSchoolData into the pandas DataFrame
df_PS = pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoPublicSchools.csv")




In [9]:
# converting the aboe three Dataset into the SQLite Database as tbles 
# convert the ChicagoCCenusus Data into the Database as CENSUS_DATA table 
df_census.to_sql("CENSUS_DATA", connect_object, if_exists = 'replace', index=False, method="multi")

In [10]:
# convert the chicagoCrimeData into the DataBase as CHICAGO_CRIME_DATA table 
df_crime.to_sql("CHICAGO_CRIME_DATA", connect_object, if_exists = 'replace', index=False, method="multi")

In [11]:
df_PS.to_sql("CHICAGO_PUBLIC_SCHOOLS", connect_object, if_exists = 'replace', index=False, method="multi")

In [12]:
%sql SELECT name FROM sqlite_master WHERE type = 'table'

 * sqlite:///FinalDB.db
Done.


name
CENSUS_DATA
CHICAGO_CRIME_DATA
CHICAGO_PUBLIC_SCHOOLS


## Problems

Now write and execute SQL queries to solve the problems

### Problem 1

##### Find the total number of crimes recorded in the CRIME table.


In [13]:
%sql SELECT COUNT(*) FROM Chicago_crime_data 

 * sqlite:///FinalDB.db
Done.


COUNT(*)
533


### Problem 2

##### List community areas with per capita income less than 11000.


In [14]:
%sql SELECT Community_Area_Name  from Census_Data where Per_capita_income < 11000

 * sqlite:///FinalDB.db
Done.


COMMUNITY_AREA_NAME
West Garfield Park
South Lawndale
Fuller Park
Riverdale


### Problem 3

##### List all case numbers for crimes  involving minors?(children are not considered minors for the purposes of crime analysis)


In [15]:
%sql SELECT Case_Number from Chicago_crime_data where Description Like "%MINOR%"

 * sqlite:///FinalDB.db
Done.


CASE_NUMBER
HL266884
HK238408


### Problem 4

##### List all kidnapping crimes involving a child?


In [16]:
%sql SELECT * FROM Chicago_crime_data WHERE Primary_type LIKE '%KIDNAP%' and description LIKE '%CHILD%'

 * sqlite:///FinalDB.db
Done.


ID,CASE_NUMBER,DATE,BLOCK,IUCR,PRIMARY_TYPE,DESCRIPTION,LOCATION_DESCRIPTION,ARREST,DOMESTIC,BEAT,DISTRICT,WARD,COMMUNITY_AREA_NUMBER,FBICODE,X_COORDINATE,Y_COORDINATE,YEAR,LATITUDE,LONGITUDE,LOCATION
5276766,HN144152,2007-01-26,050XX W VAN BUREN ST,1792,KIDNAPPING,CHILD ABDUCTION/STRANGER,STREET,0,0,1533,15,29.0,25.0,20,1143050.0,1897546.0,2007,41.87490841,-87.75024931,"(41.874908413, -87.750249307)"


### Problem 5

##### What kinds of crimes were recorded at schools?


In [17]:
%sql select distinct(primary_type) from chicago_crime_data where location_description like '%school%'

 * sqlite:///FinalDB.db
Done.


PRIMARY_TYPE
BATTERY
CRIMINAL DAMAGE
NARCOTICS
ASSAULT
CRIMINAL TRESPASS
PUBLIC PEACE VIOLATION


### Problem 6

##### List the average safety score for each type of school.


In [18]:
%sql select "Elementary, Middle, or High School",AVG(safety_score) from CHICAGO_PUBLIC_SCHOOLS group by "Elementary, Middle, or High School"

 * sqlite:///FinalDB.db
Done.


"Elementary, Middle, or High School",AVG(safety_score)
ES,49.52038369304557
HS,49.62352941176471
MS,48.0


### Problem 7

##### List 5 community areas with highest % of households below poverty line


In [19]:
%sql select Community_Area_Name, PERCENT_HOUSEHOLDS_BELOW_POVERTY from Census_data order by PERCENT_HOUSEHOLDS_BELOW_POVERTY desc limit 5

 * sqlite:///FinalDB.db
Done.


COMMUNITY_AREA_NAME,PERCENT_HOUSEHOLDS_BELOW_POVERTY
Riverdale,56.5
Fuller Park,51.2
Englewood,46.6
North Lawndale,43.1
East Garfield Park,42.4


### Problem 8

##### Which community area is most crime prone?


In [20]:
%sql select community_area_number, count(*) as frequency from chicago_crime_data \
    group by community_area_number \
    order by frequency desc limit 1

 * sqlite:///FinalDB.db
Done.


COMMUNITY_AREA_NUMBER,frequency
25.0,43


### Problem 9

##### Use a sub-query to find the name of the community area with highest hardship index


In [21]:
%sql select community_area_name from census_data where Hardship_index = (select max(Hardship_index) from census_data)

 * sqlite:///FinalDB.db
Done.


COMMUNITY_AREA_NAME
Riverdale


### Problem 10

##### Use a sub-query to determine the Community Area Name with most number of crimes?


In [27]:
%sql select community_area_name from census_data where community_area_number = (select community_area_number from CHICAGO_PUBLIC_SCHOOLS where community_area_number = 25 limit 1)

 * sqlite:///FinalDB.db
Done.


COMMUNITY_AREA_NAME
Austin
