## Chicago Public Schools - Progress Report Cards (2011-2012)

The city of Chicago released a dataset showing all school level performance data used to create School Report Cards for the 2011-2012 school year.

This dataset includes a large number of metrics: [https://data.cityofchicago.org/api/assets/AAD41A13-BE8A-4E67-B1F5-86E711E09D5F?download=true](https://data.cityofchicago.org/api/assets/AAD41A13-BE8A-4E67-B1F5-86E711E09D5F?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDB0201ENSkillsNetwork20127838-2021-01-01&download=true&cm_mmc=Email_Newsletter-\_-Developer_Ed%2BTech-\_-WW_WW-\_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork-20127838&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)

**NOTE**:

Do not download the datasets directly from City of Chicago portal. Use csv files provided.



### Connect to the database

Let us now load the ipython-sql  extension and establish a connection with the database


In [30]:
import csv, sqlite3

con = sqlite3.connect("RealWorldData.db")
cur = con.cursor()

In [31]:
#!pip install pandas
#!pip install ipython-sql prettytable

import prettytable
prettytable.DEFAULT = 'DEFAULT'

In [32]:
#!pip install ipython-sql
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [33]:
%sql sqlite:///RealWorldData.db

### Store the dataset in a Table

##### To analyze the data using SQL, it first needs to be stored in the database.

##### We will first read the csv files  from the given url into pandas dataframes.

##### Next we will be using the  df.to_sql() function to convert each csv file  to a table in sqlite  with the csv data loaded in it.


In [34]:
import pandas
df = pandas.read_csv('ChicagoPublicSchools.csv')
df.to_sql("CHICAGO_PUBLIC_SCHOOLS_DATA", con, if_exists='replace', index=False, chunksize =200, method="multi")


566

In [35]:
#seconday dataset for further questions
df1 = pandas.read_csv("ChicagoCensusData.csv")
df1.to_sql("CENSUS_DATA", con, if_exists='replace', index=False,method="multi")

78

### Verify that the tables creation was successful by retrieving the list of all tables and checking whether the SCHOOLS and CENSUS tables were created


In [36]:
# type in your query to retrieve list of all tables in the database
%sql SELECT name FROM sqlite_master WHERE type='table'

 * sqlite:///RealWorldData.db
Done.


name
CHICAGO_PUBLIC_SCHOOLS_DATA
CENSUS_DATA


### Query the database system catalog to retrieve column metadata

##### The SCHOOLS table contains a large number of columns. How many columns does this table have?


In [74]:
%sql SELECT count(name) FROM PRAGMA_TABLE_INFO('CHICAGO_PUBLIC_SCHOOLS_DATA');

 * sqlite:///RealWorldData.db
Done.


count(name)
78


Now retrieve the the list of columns in SCHOOLS table and their column type (datatype) and length.


In [73]:
%sql SELECT name,type,length(type) FROM PRAGMA_TABLE_INFO('CHICAGO_PUBLIC_SCHOOLS_DATA');


 * sqlite:///RealWorldData.db
Done.


name,type,length(type)
School_ID,INTEGER,7
NAME_OF_SCHOOL,TEXT,4
"Elementary, Middle, or High School",TEXT,4
Street_Address,TEXT,4
City,TEXT,4
State,TEXT,4
ZIP_Code,INTEGER,7
Phone_Number,TEXT,4
Link,TEXT,4
Network_Manager,TEXT,4



##### How many Elementary Schools are in the dataset?


In [None]:
%%sql 
SELECT count(*) 
FROM CHICAGO_PUBLIC_SCHOOLS_DATA 
WHERE "Elementary, Middle, or High School"='ES'

 * sqlite:///RealWorldData.db
Done.


count(*)
462



##### What is the highest Safety Score?


In [None]:
%%sql 
SELECT MAX(Safety_Score) 
AS MAX_SAFETY_SCORE 
FROM CHICAGO_PUBLIC_SCHOOLS_DATA

 * sqlite:///RealWorldData.db
Done.


MAX_SAFETY_SCORE
99.0



##### Which schools have highest Safety Score?


In [68]:
%%sql 
SELECT Name_of_School, Safety_Score 
FROM CHICAGO_PUBLIC_SCHOOLS_DATA 
WHERE Safety_Score= (select MAX(Safety_Score) from CHICAGO_PUBLIC_SCHOOLS_DATA)


 * sqlite:///RealWorldData.db
Done.


NAME_OF_SCHOOL,SAFETY_SCORE
Abraham Lincoln Elementary School,99.0
Alexander Graham Bell Elementary School,99.0
Annie Keller Elementary Gifted Magnet School,99.0
Augustus H Burley Elementary School,99.0
Edgar Allan Poe Elementary Classical School,99.0
Edgebrook Elementary School,99.0
Ellen Mitchell Elementary School,99.0
James E McDade Elementary Classical School,99.0
James G Blaine Elementary School,99.0
LaSalle Elementary Language Academy,99.0



##### What are the top 10 schools with the highest "Average Student Attendance"?


In [76]:
%%sql 
SELECT Name_of_School, Average_Student_Attendance 
FROM CHICAGO_PUBLIC_SCHOOLS_DATA 
ORDER BY Average_Student_Attendance DESC NULLS LAST LIMIT 10 

 * sqlite:///RealWorldData.db
Done.


NAME_OF_SCHOOL,AVERAGE_STUDENT_ATTENDANCE
John Charles Haines Elementary School,98.40%
James Ward Elementary School,97.80%
Edgar Allan Poe Elementary Classical School,97.60%
Orozco Fine Arts & Sciences Elementary School,97.60%
Rachel Carson Elementary School,97.60%
Annie Keller Elementary Gifted Magnet School,97.50%
Andrew Jackson Elementary Language Academy,97.40%
Lenart Elementary Regional Gifted Center,97.40%
Disney II Magnet School,97.30%
John H Vanderpoel Elementary Magnet School,97.20%



##### Retrieve the list of 5 Schools with the lowest Average Student Attendance sorted in ascending order based on attendance


In [65]:
%%sql 
SELECT Name_of_School, Average_Student_Attendance  
FROM CHICAGO_PUBLIC_SCHOOLS_DATA 
ORDER BY Average_Student_Attendance 
LIMIT 5

 * sqlite:///RealWorldData.db
Done.


NAME_OF_SCHOOL,AVERAGE_STUDENT_ATTENDANCE
Velma F Thomas Early Childhood Center,
Richard T Crane Technical Preparatory High School,57.90%
Barbara Vick Early Childhood & Family Center,60.90%
Dyett High School,62.50%
Wendell Phillips Academy High School,63.00%


### Remove the '%' sign from the above result set for Average Student Attendance column


In [64]:
%%sql 
SELECT Name_of_School, REPLACE(Average_Student_Attendance, '%', '') 
FROM CHICAGO_PUBLIC_SCHOOLS_DATA 
ORDER BY Average_Student_Attendance 
LIMIT 5

 * sqlite:///RealWorldData.db
Done.


NAME_OF_SCHOOL,"REPLACE(Average_Student_Attendance, '%', '')"
Velma F Thomas Early Childhood Center,
Richard T Crane Technical Preparatory High School,57.9
Barbara Vick Early Childhood & Family Center,60.9
Dyett High School,62.5
Wendell Phillips Academy High School,63.0



##### Which Schools have Average Student Attendance lower than 70%?


In [77]:
%%sql 
SELECT Name_of_School, Average_Student_Attendance  
FROM CHICAGO_PUBLIC_SCHOOLS_DATA 
WHERE CAST (REPLACE(Average_Student_Attendance, '%', '') AS DOUBLE) < 70 
ORDER BY Average_Student_Attendance

 * sqlite:///RealWorldData.db
Done.


NAME_OF_SCHOOL,AVERAGE_STUDENT_ATTENDANCE
Richard T Crane Technical Preparatory High School,57.90%
Barbara Vick Early Childhood & Family Center,60.90%
Dyett High School,62.50%
Wendell Phillips Academy High School,63.00%
Orr Academy High School,66.30%
Manley Career Academy High School,66.80%
Chicago Vocational Career Academy High School,68.80%
Roberto Clemente Community Academy High School,69.60%



##### Get the total College Enrollment for each Community Area


In [78]:
%%sql 
SELECT Community_Area_Name, sum(College_Enrollment) AS TOTAL_ENROLLMENT 
FROM CHICAGO_PUBLIC_SCHOOLS_DATA 
GROUP BY Community_Area_Name 

 * sqlite:///RealWorldData.db
Done.


COMMUNITY_AREA_NAME,TOTAL_ENROLLMENT
ALBANY PARK,6864
ARCHER HEIGHTS,4823
ARMOUR SQUARE,1458
ASHBURN,6483
AUBURN GRESHAM,4175
AUSTIN,10933
AVALON PARK,1522
AVONDALE,3640
BELMONT CRAGIN,14386
BEVERLY,1636



##### Get the 5 Community Areas with the least total College Enrollment  sorted in ascending order


In [79]:
%%sql 
SELECT Community_Area_Name, sum(College_Enrollment) AS TOTAL_ENROLLMENT 
FROM CHICAGO_PUBLIC_SCHOOLS_DATA 
GROUP BY Community_Area_Name 
ORDER BY TOTAL_ENROLLMENT asc 
LIMIT 5 

 * sqlite:///RealWorldData.db
Done.


COMMUNITY_AREA_NAME,TOTAL_ENROLLMENT
OAKLAND,140
FULLER PARK,531
BURNSIDE,549
OHARE,786
LOOP,871



##### List 5 schools with lowest safety score.


In [80]:
%%sql 
SELECT name_of_school, safety_score 
FROM CHICAGO_PUBLIC_SCHOOLS_DATA  
WHERE safety_score !='None' 
ORDER BY safety_score 
LIMIT 5

 * sqlite:///RealWorldData.db
Done.


NAME_OF_SCHOOL,SAFETY_SCORE
Edmond Burke Elementary School,1.0
Luke O'Toole Elementary School,5.0
George W Tilton Elementary School,6.0
Foster Park Elementary School,11.0
Emil G Hirsch Metropolitan High School,13.0



##### Get the hardship index for the community area of the school which has College Enrollment of 4368


In [81]:
%%sql 
SELECT hardship_index 
FROM CENSUS_DATA CD, CHICAGO_PUBLIC_SCHOOLS_DATA CPS 
WHERE CD.community_area_number = CPS.community_area_number 
AND college_enrollment = 4368

 * sqlite:///RealWorldData.db
Done.


HARDSHIP_INDEX
6.0



##### Get the hardship index for the community area which has the highest value for College Enrollment


In [82]:
%%sql 
SELECT community_area_number, community_area_name, hardship_index 
FROM CENSUS_DATA 
WHERE community_area_number IN (SELECT community_area_number FROM CHICAGO_PUBLIC_SCHOOLS_DATA ORDER BY college_enrollment DESC LIMIT 1)

 * sqlite:///RealWorldData.db
Done.


COMMUNITY_AREA_NUMBER,COMMUNITY_AREA_NAME,HARDSHIP_INDEX
5.0,North Center,6.0
