# Introduction

I have been hired by an organization that strives to improve educational outcomes for children and young people in Chicago. My job is to analyze the census, crime, and school data for a given neighborhood or district. 

I will identify causes that impact the enrollment, safety, health, environment ratings of schools.

## Understand the datasets

I will be using three datasets that are available on the city of Chicago's Data Portal:

### 1. Socioeconomic Indicators in Chicago

This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” for each Chicago community area, for the years 2008 – 2012.

A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at:
[https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2](https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork22-2022-01-01&cm_mmc=Email_Newsletter-\_-Developer_Ed%2BTech-\_-WW_WW-\_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork-20127838&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)

### 2. Chicago Public Schools

This dataset shows all school level performance data used to create CPS School Report Cards for the 2011-2012 school year. This dataset is provided by the city of Chicago's Data Portal.

A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at:
[https://data.cityofchicago.org/Education/Chicago-Public-Schools-Progress-Report-Cards-2011-/9xs2-f89t](https://data.cityofchicago.org/Education/Chicago-Public-Schools-Progress-Report-Cards-2011-/9xs2-f89t?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork22-2022-01-01&cm_mmc=Email_Newsletter-\_-Developer_Ed%2BTech-\_-WW_WW-\_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork-20127838&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)

### 3. Chicago Crime Data

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days.

A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at:
[https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2](https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork22-2022-01-01&cm_mmc=Email_Newsletter-\_-Developer_Ed%2BTech-\_-WW_WW-\_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork-20127838&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)


### Download the datasets

I will use these three tables populated with a subset of the whole datasets.

In many cases the dataset to be analyzed is available as a .CSV (comma separated values) file, perhaps on the internet. Click on the links below to download and save the datasets (.CSV files):

*   <a href="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoCensusData.csv?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork22-2022-01-01" target="_blank">Chicago Census Data</a>

*   <a href="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoPublicSchools.csv?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork22-2022-01-01" target="_blank">Chicago Public Schools</a>

*   <a href="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoCrimeData.csv?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork22-2022-01-01" target="_blank">Chicago Crime Data</a>


# Method

First I generated all the three tables in the PostgreSQL database.

Then I insert the data and fill the tables.

Now these tables are ready to be analysed.

### Connect to the database


In [1]:
import psycopg2 as pg2

conn = pg2.connect(database='Assignment IBM', user='postgres',password='password')

## Analysis and Results



### Problem 1

##### Find the total number of crimes recorded in the CRIME table.


In [2]:
import pandas as pd 

df=pd.read_sql("SELECT COUNT(*) FROM chicago_crime", conn)
df



Unnamed: 0,count
0,533


### Problem 2

##### List community areas with per capita income less than 11000.


In [21]:
df=pd.read_sql("SELECT community_area_name, community_area_number, per_capita_income \ 
               FROM chicago_census WHERE per_capita_income <11000", conn)
df




Unnamed: 0,community_area_name,community_area_number,per_capita_income
0,West Garfield Park,26,10934
1,South Lawndale,30,10402
2,Fuller Park,37,10432
3,Riverdale,54,8201


### Problem 3

##### List all case numbers for crimes  involving minors?

In [23]:
df=pd.read_sql("SELECT case_number FROM chicago_crime2 WHERE primary_type ILIKE '%kid%'", conn)
df




Unnamed: 0,case_number
0,HN144152


### Problem 4

##### List all kidnapping crimes involving a child?


In [9]:
df=pd.read_sql("SELECT * FROM chicago_crime2 WHERE primary_type ILIKE '%Kid%'", conn)
df



Unnamed: 0,id_,case_number,block,description,location_description,community_area_number,year_,primary_type,nullo
0,5276766,HN144152,050XX W VAN BUREN ST,CHILD ABDUCTION/STRANGER,STREET,25,2007,KIDNAPPING,;;;;;;;;;;;;


### Problem 5

##### What kinds of crimes were recorded at schools?


In [17]:
df=pd.read_sql("SELECT DISTINCT(primary_type) FROM chicago_crime2 WHERE location_description ILIKE '%school%'", conn)
df





Unnamed: 0,primary_type
0,PUBLIC PEACE VIOLATION
1,ASSAULT
2,CRIMINAL DAMAGE
3,BATTERY
4,CRIMINAL TRESPASS
5,NARCOTICS


### Problem 6

##### List the average safety score for each type of school.


In [16]:
df=pd.read_sql("SELECT school, AVG(safety) FROM chicago_public_school2 GROUP BY school ", conn)
df




Unnamed: 0,school,avg
0,HS,49.623529
1,MS,48.0
2,ES,49.520384


### Problem 7

##### List 5 community areas with highest % of households below poverty line


In [11]:
df=pd.read_sql("SELECT community_area_number, community_area_name,  percent_households_below_poverty \
FROM chicago_census ORDER BY percent_households_below_poverty DESC LIMIT 5", conn)
df





Unnamed: 0,community_area_number,community_area_name,percent_households_below_poverty
0,54,Riverdale,56.5
1,37,Fuller Park,51.2
2,68,Englewood,46.6
3,29,North Lawndale,43.1
4,27,East Garfield Park,42.4


### Problem 8

##### Which community area is most crime prone?


In [12]:
df=pd.read_sql("SELECT COUNT(*),community_area_number FROM chicago_crime2 \
GROUP BY community_area_number ORDER BY count DESC", conn)
df





Unnamed: 0,count,community_area_number
0,41,25.0
1,41,
2,22,23.0
3,21,68.0
4,17,8.0
...,...,...
67,1,60.0
68,1,47.0
69,1,9.0
70,1,55.0


### Problem 9

##### Use a sub-query to find the name of the community area with highest hardship index


In [14]:
df=pd.read_sql("SELECT community_area_name, hardship_index FROM chicago_census \
WHERE hardship_index IS NOT NULL ORDER BY hardship_index DESC LIMIT 1 ", conn)
df






Unnamed: 0,community_area_name,hardship_index
0,Riverdale,98


### Problem 10

##### Use a sub-query to determine the Community Area Name with most number of crimes?


In [15]:
df=pd.read_sql("SELECT chicago_census.community_area_name, chicago_census.community_area_number FROM chicago_census \
WHERE chicago_census.community_area_number IN ( SELECT chicago_crime.community_area_number FROM chicago_crime \
WHERE chicago_crime.community_area_number IS NOT NULL \
GROUP BY chicago_crime.community_area_number ORDER BY COUNT(*) DESC LIMIT 1)", conn)
df







Unnamed: 0,community_area_name,community_area_number
0,Austin,25


In [None]:
conn.close()