# Introduction

I have been hired by an organization that strives to improve educational outcomes for children and young people in Chicago. My job is to analyze the census, crime, and school data for a given neighborhood or district. 

I will identify causes that impact the enrollment, safety, health, environment ratings of schools.

## Understand the datasets

I will be using three datasets that are available on the city of Chicago's Data Portal:

### 1. Socioeconomic Indicators in Chicago

This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” for each Chicago community area, for the years 2008 – 2012.

A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at:
[https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2](https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork22-2022-01-01&cm_mmc=Email_Newsletter-\_-Developer_Ed%2BTech-\_-WW_WW-\_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork-20127838&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)

### 2. Chicago Public Schools

This dataset shows all school level performance data used to create CPS School Report Cards for the 2011-2012 school year. This dataset is provided by the city of Chicago's Data Portal.

A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at:
[https://data.cityofchicago.org/Education/Chicago-Public-Schools-Progress-Report-Cards-2011-/9xs2-f89t](https://data.cityofchicago.org/Education/Chicago-Public-Schools-Progress-Report-Cards-2011-/9xs2-f89t?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork22-2022-01-01&cm_mmc=Email_Newsletter-\_-Developer_Ed%2BTech-\_-WW_WW-\_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork-20127838&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)

### 3. Chicago Crime Data

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days.

A detailed description of this dataset and the original dataset can be obtained from the Chicago Data Portal at:
[https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2](https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork22-2022-01-01&cm_mmc=Email_Newsletter-\_-Developer_Ed%2BTech-\_-WW_WW-\_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork-20127838&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)


### Download the datasets

I will use these three tables populated with a subset of the whole datasets.

In many cases the dataset to be analyzed is available as a .CSV (comma separated values) file, perhaps on the internet. Click on the links below to download and save the datasets (.CSV files):

*   <a href="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoCensusData.csv?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork22-2022-01-01" target="_blank">Chicago Census Data</a>

*   <a href="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoPublicSchools.csv?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork22-2022-01-01" target="_blank">Chicago Public Schools</a>

*   <a href="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoCrimeData.csv?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork22-2022-01-01" target="_blank">Chicago Crime Data</a>


# Method

First I generated all the three tables in the PostgreSQL database.

Then I insert the data and fill the tables.

Now these tables are ready to be analysed.

### Connect to the database


In [1]:
import psycopg2 as pg2
import pandas as pd 

conn = pg2.connect(database='Assignment IBM', user='postgres',password='password')

## Problems


### Problem 1

##### Write and execute a SQL query to list the school names, community names for communities with a hardship index of 98.

In [12]:
df=pd.read_sql("SELECT chicago_public_school5.name_of_school, chicago_public_school5.community_area_name, \
               chicago_public_school5.average_teacher_attendance, chicago_public_school5.average_student_attendance \
               FROM chicago_public_school5 LEFT JOIN chicago_census \
               ON chicago_census.community_area_number = chicago_public_school5.community_area_number \
               WHERE chicago_census.hardship_index = 98", conn)
df



Unnamed: 0,name_of_school,community_area_name,average_teacher_attendance,average_student_attendance
0,GeorgeWashingtonCarverMilitaryAcademyHighSchool,RIVERDALE,96.40%,91.60%
1,GeorgeWashingtonCarverPrimarySchool,RIVERDALE,94.70%,90.90%
2,IraFAldridgeElementarySchool,RIVERDALE,96.30%,92.90%
3,WilliamEBDuboisElementarySchool,RIVERDALE,94.40%,93.30%


### Problem 2

##### Write and execute a SQL query to list all crimes that took place at a school. Include case number, crime type and community name.

In [14]:
df=pd.read_sql("SELECT chicago_census.community_area_name, chicago_crime2.case_number, chicago_crime2.primary_type \
FROM chicago_census LEFT JOIN chicago_crime2 \
ON chicago_crime2.community_area_number = chicago_census.community_area_number \
WHERE chicago_crime2.location_description ILIKE '%school%'", conn)
df



Unnamed: 0,community_area_name,case_number,primary_type
0,Lincoln Square,HL353697,BATTERY
1,Hermosa,HL725506,BATTERY
2,Rogers Park,HP716225,BATTERY
3,Portage Park,HH639427,BATTERY
4,Near North Side,JA460432,BATTERY
5,Portage Park,HS200939,CRIMINAL DAMAGE
6,West Town,HK577020,NARCOTICS
7,Edison Park,HS305355,NARCOTICS
8,Jefferson Park,HT315369,ASSAULT
9,Near North Side,HR585012,CRIMINAL TRESPASS


In [None]:
conn.close()