## Chicago Public Schools - Progress Report Cards (2011-2012)

<p>The city of Chicago released a dataset showing all school level performance data used to create School Report Cards for the 2011-2012 school year. The dataset is available from <a href="https://data.cityofchicago.org/Education/Chicago-Public-Schools-Progress-Report-Cards-2011-/9xs2-f89t/about_data">the Chicago Data Portal</a>.</p>
<p>This dataset includes a large number of metrics. Start by familiarizing yourself with the types of metrics in the database <a href="https://data.cityofchicago.org/api/assets/AAD41A13-BE8A-4E67-B1F5-86E711E09D5F">by here</a>.</p>
<p><b>NOTE:</b></p>
<p>Do not download the dataset directly from City of Chicago portal. Instead, download a static copy which is a more database friendly version from <a href="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoPublicSchools.csv">this link</a>.</p>
<p>Now review some of its contents.</p>

## Connect to the database

In [1]:
import sqlite3
import os

dir_path = os.path.join(".", "data")
db_name = os.path.join(dir_path, "RealWorldData.db")

In [2]:
con = sqlite3.connect(db_name)
cur = con.cursor()

## Store the dataset in a Table

<p>In many cases the dataset to be analyzed is available as a .CSV (comma separated values) file, perhaps on the internet. To analyze the data using SQL, it first needs to be stored in the database.</p>
<p>We will first read the csv files from the given url into pandas dataframes.</p>
<p>Next we will be using the <code>df.to_sql()</code> function to convert each csv file to a table in sqlite with the csv data loaded in it.</p>

In [3]:
import pandas as pd

In [4]:
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoPublicSchools.csv"
df = pd.read_csv(url)
df.to_sql("chicago_public_schools_data", con=con, if_exists="replace", index=False, method="multi", chunksize=100)

566

## Query the database system catalog to retrieve table metadata

<p>You can verify that the table creation was successful by retrieving the list of all tables in your schema and checking whether the <code>schools</code> table was created.</p>

In [5]:
cur.execute("SELECT name FROM sqlite_master WHERE type='table';")
cur.fetchall()[0][0]

'chicago_public_schools_data'

## Query the database system catalog to retrieve column metadata

<p>The <code>schools</code> table contains a large number of columns. How many columns does this table have?</p>

In [6]:
cur.execute("SELECT COUNT(name) FROM PRAGMA_TABLE_INFO('chicago_public_schools_data')")
cur.fetchall()[0][0]

78

<p>Now retrieve the list of columns in <code>schools</code> table and their column type (datatype) and length.</p>

In [7]:
cur.execute("SELECT name, type, LENGTH(type) FROM PRAGMA_TABLE_INFO('chicago_public_schools_data');")

for row in cur.fetchall():
    print(row)

('School_ID', 'INTEGER', 7)
('NAME_OF_SCHOOL', 'TEXT', 4)
('Elementary, Middle, or High School', 'TEXT', 4)
('Street_Address', 'TEXT', 4)
('City', 'TEXT', 4)
('State', 'TEXT', 4)
('ZIP_Code', 'INTEGER', 7)
('Phone_Number', 'TEXT', 4)
('Link', 'TEXT', 4)
('Network_Manager', 'TEXT', 4)
('Collaborative_Name', 'TEXT', 4)
('Adequate_Yearly_Progress_Made_', 'TEXT', 4)
('Track_Schedule', 'TEXT', 4)
('CPS_Performance_Policy_Status', 'TEXT', 4)
('CPS_Performance_Policy_Level', 'TEXT', 4)
('HEALTHY_SCHOOL_CERTIFIED', 'TEXT', 4)
('Safety_Icon', 'TEXT', 4)
('SAFETY_SCORE', 'REAL', 4)
('Family_Involvement_Icon', 'TEXT', 4)
('Family_Involvement_Score', 'TEXT', 4)
('Environment_Icon', 'TEXT', 4)
('Environment_Score', 'REAL', 4)
('Instruction_Icon', 'TEXT', 4)
('Instruction_Score', 'REAL', 4)
('Leaders_Icon', 'TEXT', 4)
('Leaders_Score', 'TEXT', 4)
('Teachers_Icon', 'TEXT', 4)
('Teachers_Score', 'TEXT', 4)
('Parent_Engagement_Icon', 'TEXT', 4)
('Parent_Engagement_Score', 'TEXT', 4)
('Parent_Environmen

## Problems

<p><b>Problem 1</b> How many Elementary Schools are in dataset?</p>

In [8]:
cur.execute("SELECT COUNT(*) FROM chicago_public_schools_data WHERE `Elementary, middle, or High School`='ES';")
cur.fetchall()[0][0]

462

<p><b>Problem 2</b> What is the highest Safety Score?</p>

In [9]:
cur.execute("SELECT MAX(Safety_Score) FROM chicago_public_schools_data;")
cur.fetchall()[0][0]

99.0

<p><b>Problem 3</b> Which schools have highest Safety Score?</p>

In [10]:
statement = """
SELECT NAME_OF_SCHOOL
FROM chicago_public_schools_data
WHERE Safety_Score = (
    SELECT MAX(Safety_Score) FROM chicago_public_schools_data
);
"""
cur.execute(statement)
for row in cur.fetchall():
    for school in row:
        print(school)

Abraham Lincoln Elementary School
Alexander Graham Bell Elementary School
Annie Keller Elementary Gifted Magnet School
Augustus H Burley Elementary School
Edgar Allan Poe Elementary Classical School
Edgebrook Elementary School
Ellen Mitchell Elementary School
James E McDade Elementary Classical School
James G Blaine Elementary School
LaSalle Elementary Language Academy
Mary E Courtenay Elementary Language Arts Center
Northside College Preparatory High School
Northside Learning Center High School
Norwood Park Elementary School
Oriole Park Elementary School
Sauganash Elementary School
Stephen Decatur Classical Elementary School
Talman Elementary School
Wildwood Elementary School


<p><b>Problem 4</b> What are the top 10 schools with the highest <code>Average Student Attendance</code>?</p>

In [11]:
statement = """
SELECT NAME_OF_SCHOOL, AVERAGE_STUDENT_ATTENDANCE
FROM chicago_public_schools_data
ORDER BY AVERAGE_STUDENT_ATTENDANCE DESC NULLS LAST LIMIT 10;
"""
cur.execute(statement)

print("School Name, ", "Average Student Attendance")
print("=" * 40)
for row in cur.fetchall():
    print(f"{row[0]}, ", row[1])

School Name,  Average Student Attendance
John Charles Haines Elementary School,  98.40%
James Ward Elementary School,  97.80%
Edgar Allan Poe Elementary Classical School,  97.60%
Orozco Fine Arts & Sciences Elementary School,  97.60%
Rachel Carson Elementary School,  97.60%
Annie Keller Elementary Gifted Magnet School,  97.50%
Andrew Jackson Elementary Language Academy,  97.40%
Lenart Elementary Regional Gifted Center,  97.40%
Disney II Magnet School,  97.30%
John H Vanderpoel Elementary Magnet School,  97.20%


<p><b>Problem 5</b> Retrieve the list of 5 schools with the lowest <code>Average Student Attendance</code> sorted in ascending order based on attendance.</p>

In [12]:
statement = """
SELECT NAME_OF_SCHOOL, AVERAGE_STUDENT_ATTENDANCE
FROM chicago_public_schools_data
ORDER BY AVERAGE_STUDENT_ATTENDANCE ASC LIMIT 5;
"""
cur.execute(statement)

print("School Name, ", "Average Student Attendance")
print("=" * 40)
for row in cur.fetchall():
    print(f"{row[0]}, ", row[1])

School Name,  Average Student Attendance
Velma F Thomas Early Childhood Center,  None
Richard T Crane Technical Preparatory High School,  57.90%
Barbara Vick Early Childhood & Family Center,  60.90%
Dyett High School,  62.50%
Wendell Phillips Academy High School,  63.00%


<p><b>Problem 6</b> Now remove the <code>%</code> sign from the above result set for <code>Average Student Attendance</code> column.</p>

In [13]:
statement = """
SELECT NAME_OF_SCHOOL, REPLACE(AVERAGE_STUDENT_ATTENDANCE, '%', '')
FROM chicago_public_schools_data
ORDER BY AVERAGE_STUDENT_ATTENDANCE LIMIT 5
"""
cur.execute(statement)

print("School Name, ", "Average Student Attendance")
print("=" * 40)
for row in cur.fetchall():
    print(f"{row[0]}, ", row[1])

School Name,  Average Student Attendance
Velma F Thomas Early Childhood Center,  None
Richard T Crane Technical Preparatory High School,  57.90
Barbara Vick Early Childhood & Family Center,  60.90
Dyett High School,  62.50
Wendell Phillips Academy High School,  63.00


<p><b>Problem 7</b> Which schools have <code>Average Student Attendance</code> lower than 70%?</p>

In [14]:
statement = """
SELECT NAME_OF_SCHOOL, AVERAGE_STUDENT_ATTENDANCE
FROM chicago_public_schools_data
WHERE CAST(REPLACE(AVERAGE_STUDENT_ATTENDANCE, '%', '') AS DOUBLE PRECISION) < 70
ORDER BY AVERAGE_STUDENT_ATTENDANCE
"""
cur.execute(statement)

print("School Name, ", "Average Student Attendance")
print("=" * 40)
for row in cur.fetchall():
    print(f"{row[0]}, ", row[1])

School Name,  Average Student Attendance
Richard T Crane Technical Preparatory High School,  57.90%
Barbara Vick Early Childhood & Family Center,  60.90%
Dyett High School,  62.50%
Wendell Phillips Academy High School,  63.00%
Orr Academy High School,  66.30%
Manley Career Academy High School,  66.80%
Chicago Vocational Career Academy High School,  68.80%
Roberto Clemente Community Academy High School,  69.60%


<p><b>Problem 8</b> Get the total <code>College Enrollment</code> for each <code>Community Area</code>.</p>

In [15]:
statement = """
SELECT COMMUNITY_AREA_NAME, SUM(COLLEGE_ENROLLMENT)
FROM chicago_public_schools_data
GROUP BY COMMUNITY_AREA_NAME
"""
cur.execute(statement)

print("Community Area Name, ", "Total College Enrollment")
print("=" * 40)
for row in cur.fetchall():
    print(f"{row[0]}, ", row[1])

Community Area Name,  Total College Enrollment
ALBANY PARK,  6864
ARCHER HEIGHTS,  4823
ARMOUR SQUARE,  1458
ASHBURN,  6483
AUBURN GRESHAM,  4175
AUSTIN,  10933
AVALON PARK,  1522
AVONDALE,  3640
BELMONT CRAGIN,  14386
BEVERLY,  1636
BRIDGEPORT,  3167
BRIGHTON PARK,  9647
BURNSIDE,  549
CALUMET HEIGHTS,  1568
CHATHAM,  5042
CHICAGO LAWN,  7086
CLEARING,  2085
DOUGLAS,  4670
DUNNING,  4568
EAST GARFIELD PARK,  5337
EAST SIDE,  5305
EDGEWATER,  4600
EDISON PARK,  910
ENGLEWOOD,  6832
FOREST GLEN,  1431
FULLER PARK,  531
GAGE PARK,  9915
GARFIELD RIDGE,  4552
GRAND BOULEVARD,  2809
GREATER GRAND CROSSING,  4051
HEGEWISCH,  963
HERMOSA,  3975
HUMBOLDT PARK,  8620
HYDE PARK,  1930
IRVING PARK,  7764
JEFFERSON PARK,  1755
KENWOOD,  4287
LAKE VIEW,  7055
LINCOLN PARK,  5615
LINCOLN SQUARE,  4132
LOGAN SQUARE,  7351
LOOP,  871
LOWER WEST SIDE,  7257
MCKINLEY PARK,  1552
MONTCLARE,  1317
MORGAN PARK,  3271
MOUNT GREENWOOD,  2091
NEAR NORTH SIDE,  3362
NEAR SOUTH SIDE,  1378
NEAR WEST SIDE,  797

<p><b>Problem 9</b> Get the 5 <code>Community Areas</code> with the least total <code>College Enrollment</code> sorted in ascending order.</p>

In [16]:
statement = """
SELECT COMMUNITY_AREA_NAME, SUM(COLLEGE_ENROLLMENT) AS TOTAL_COLLEGE_ENROLLMENT
FROM chicago_public_schools_data
GROUP BY COMMUNITY_AREA_NAME
ORDER BY TOTAL_COLLEGE_ENROLLMENT ASC LIMIT 5;
"""
cur.execute(statement)

print("Community Area Name, ", "Total College Enrollment")
print("=" * 40)
for row in cur.fetchall():
    print(f"{row[0]}, ", row[1])

Community Area Name,  Total College Enrollment
OAKLAND,  140
FULLER PARK,  531
BURNSIDE,  549
OHARE,  786
LOOP,  871


<p><b>Problem 10</b> List 5 schools with lowest <code>Safety Score</code>.</p>

In [17]:
statement = """
SELECT NAME_OF_SCHOOL, SAFETY_SCORE
FROM chicago_public_schools_data
WHERE SAFETY_SCORE <> 'None'
ORDER BY SAFETY_SCORE ASC LIMIT 5;
"""
cur.execute(statement)

print("School Name, ", "Safety Score")
print("=" * 40)
for row in cur.fetchall():
    print(f"{row[0]}, ", row[1])

School Name,  Safety Score
Edmond Burke Elementary School,  1.0
Luke O'Toole Elementary School,  5.0
George W Tilton Elementary School,  6.0
Foster Park Elementary School,  11.0
Emil G Hirsch Metropolitan High School,  13.0


<p><b>Problem 11</b> Get the <code>Hardship Index</code> for the <code>Community Area</code> of the school which has <code>College Enrollment</code> of 4368.</p>

In [18]:
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork/labs/FinalModule_Coursera_V5/data/ChicagoCensusData.csv"
df = pd.read_csv(url)
df.to_sql("chicago_census_data", con=con, if_exists="replace", index=False, method="multi")

78

In [19]:
statement = """
SELECT HARDSHIP_INDEX
FROM chicago_census_data AS ccd, chicago_public_schools_data AS cpsd
WHERE ccd.COMMUNITY_AREA_NUMBER = cpsd.COMMUNITY_AREA_NUMBER AND COLLEGE_ENROLLMENT = 4368
"""
cur.execute(statement)

cur.fetchall()[0][0]

6.0

<p><b>Problem 12</b> Get the <code>Hardship Index</code> for the <code>Community Area</code> which has the highest value for <code>College Enrollment</code>.</p>

In [20]:
statement = """
SELECT COMMUNITY_AREA_NUMBER, COMMUNITY_AREA_NAME, HARDSHIP_INDEX
FROM chicago_census_data
WHERE COMMUNITY_AREA_NUMBER IN (
    SELECT COMMUNITY_AREA_NUMBER
    FROM chicago_public_schools_data
    ORDER BY COLLEGE_ENROLLMENT DESC LIMIT 1
);
"""
cur.execute(statement)
cur.fetchall()[0][2]

6.0

In [21]:
con.close()

****
This is the end of the file.
****