<a href="https://cognitiveclass.ai"><img src = "https://ibm.box.com/shared/static/ugcqz6ohbvff804xp84y4kqnvvk3bq1g.png" width = 300, align = "center"></a>

<h1 align=center><font size = 5>Lab: Working with a real world data-set using SQL and Python</font></h1>

# Introduction

This notebook shows how to work with a real world dataset using SQL and Python. In this lab you will:
1. Understand the dataset for Chicago Public School level performance 
1. Store the dataset in an Db2 database on IBM Cloud instance
1. Retrieve metadata about tables and columns and query data from mixed case columns
1. Solve example problems to practice your SQL skills including using built-in database functions

## Chicago Public Schools - Progress Report Cards (2011-2012) 

The city of Chicago released a dataset showing all school level performance data used to create School Report Cards for the 2011-2012 school year. The dataset is available from the Chicago Data Portal: https://data.cityofchicago.org/Education/Chicago-Public-Schools-Progress-Report-Cards-2011-/9xs2-f89t

This dataset includes a large number of metrics. Start by familiarizing yourself with the types of metrics in the database: https://data.cityofchicago.org/api/assets/AAD41A13-BE8A-4E67-B1F5-86E711E09D5F?download=true

__NOTE__: Do not download the dataset directly from City of Chicago portal. Instead download a more database friendly version from the link below.
Now download a static copy of this database and review some of its contents:
https://ibm.box.com/shared/static/f9gjvj1gjmxxzycdhplzt01qtz0s7ew7.csv



### Store the dataset in a Table
In many cases the dataset to be analyzed is available as a .CSV (comma separated values) file, perhaps on the internet. To analyze the data using SQL, it first needs to be stored in the database.

While it is easier to read the dataset into a Pandas dataframe and then PERSIST it into the database as we saw in the previous lab, it results in mapping to default datatypes which may not be optimal for SQL querying. For example a long textual field may map to a CLOB instead of a VARCHAR. 

Therefore, __it is highly recommended to manually load the table using the database console LOAD tool, as indicated in Week 2 Lab 1 Part II__. The only difference with that lab is that in Step 5 of the instructions you will need to click on create "(+) New Table" and specify the name of the table you want to create and then click "Next". 

##### Now open the Db2 console, open the LOAD tool, Select / Drag the .CSV file for the CHICAGO PUBLIC SCHOOLS dataset and load the dataset into a new table called __SCHOOLS__.

<a href="https://cognitiveclass.ai"><img src = "https://ibm.box.com/shared/static/uc4xjh1uxcc78ks1i18v668simioz4es.jpg"></a>

### Connect to the database
Let us now load the ipython-sql  extension and establish a connection with the database

In [2]:
%load_ext sql

In [3]:
# Enter the connection string for your Db2 on Cloud database instance below
# %sql ibm_db_sa://my-username:my-password@my-hostname:my-port/my-db-name
%sql ibm_db_sa://rls52050:6c3x3lkddxwxfm%2Bh@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB

'Connected: rls52050@BLUDB'

### Query the database system catalog to retrieve table metadata

##### You can verify that the table creation was successful by retrieving the list of all tables in your schema and checking whether the SCHOOLS table was created

In [4]:
# type in your query to retrieve list of all tables in the database for your db2 schema (username)
%sql select tabschema,tabname,create_time from syscat.tables where tabschema = 'RLS52050'

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


tabschema,tabname,create_time
RLS52050,EMPLOYEES,2020-02-06 05:42:17.133999
RLS52050,JOB_HISTORY,2020-02-06 05:42:17.323856
RLS52050,DEPARTMENTS,2020-02-06 05:42:17.688302
RLS52050,LOCATIONS,2020-02-06 05:42:17.857721
RLS52050,JOBS,2020-02-06 06:35:13.895643
RLS52050,PETSALE,2020-02-06 08:51:17.284861
RLS52050,INSTRUCTOR,2020-02-11 06:53:18.190273
RLS52050,CHICAGO_SOCIOECONOMIC_DATA,2020-02-11 12:15:22.880948
RLS52050,CHICAGO_CRIME_DATA,2020-02-12 10:23:40.751377
RLS52050,SELECTED_SOCIOECONOMIC_INDICATORS_IN_CHICAGO,2020-02-12 10:25:02.464155


In [8]:
%sql select * from syscat.tables limit 5

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


tabschema,tabname,owner,ownertype,TYPE,status,base_tabschema,base_tabname,rowtypeschema,rowtypename,create_time,alter_time,invalidate_time,stats_time,colcount,tableid,tbspaceid,card,npages,mpages,fpages,npartitions,nfiles,tablesize,overflow,tbspace,index_tbspace,long_tbspace,parents,children,selfrefs,keycolumns,keyindexid,keyunique,checkcount,datacapture,const_checked,pmap_id,partition_mode,log_attribute,pctfree,append_mode,REFRESH,refresh_time,LOCKSIZE,VOLATILE,row_format,property,statistics_profile,compression,rowcompmode,access_mode,clustered,active_blocks,droprule,maxfreespacesearch,avgcompressedrowsize,avgrowcompressionratio,avgrowsize,pctrowscompressed,logindexbuild,codepage,collationschema,collationname,collationschema_orderby,collationname_orderby,encoding_scheme,pctpagessaved,last_regen_time,secpolicyid,protectiongranularity,auditpolicyid,auditpolicyname,auditexceptionenabled,definer,oncommit,logged,onrollback,lastused,control,temporaltype,tableorg,extended_row_size,pctextendedrows,remarks
SYSIBM,SYSTABLES,SYSIBM,S,T,N,,,,,2018-07-14 07:09:38.251643,2018-07-14 07:09:38.251643,2018-07-14 07:24:10.627030,2020-02-11 15:56:39.550459,83,5,0,14188,292,0,292,-1,-1,-1,40,SYSCATSPACE,,,0,0,0,0,0,0,0,N,YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY,0,,0,-1,N,,,R,,N,,,N,,F,,0,�,999,0,0.0,508,0.0,,1208,SYSIBM,IDENTITY,SYSIBM,IDENTITY,,0,2018-07-14 07:09:38.251643,0,,,,N,SYSIBM,,,,2020-02-12,R,N,R,N,-1.0,
SYSIBM,SYSCOLUMNS,SYSIBM,S,T,N,,,,,2018-07-14 07:09:38.251643,2018-07-14 07:09:38.251643,2018-07-14 07:24:06.515291,2020-02-11 15:41:46.893834,45,6,0,224458,2196,0,2204,-1,-1,-1,1235,SYSCATSPACE,,,0,0,0,0,0,0,0,N,YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY,0,,0,-1,N,,,R,,N,,,N,,F,,0,�,999,0,0.0,248,0.0,,1208,SYSIBM,IDENTITY,SYSIBM,IDENTITY,,0,2018-07-14 07:09:38.251643,0,,,,N,SYSIBM,,,,2020-02-12,R,N,R,N,-1.0,
SYSIBM,SYSINDEXES,SYSIBM,S,T,N,,,,,2018-07-14 07:09:38.251643,2018-07-14 07:09:38.251643,2018-07-14 07:24:08.373848,2020-02-10 15:17:53.843470,71,7,0,9203,145,0,145,-1,-1,-1,30,SYSCATSPACE,,,0,0,0,0,0,0,0,N,YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY,0,,0,-1,N,,,R,,N,,,N,,F,,0,�,999,0,0.0,402,0.0,,1208,SYSIBM,IDENTITY,SYSIBM,IDENTITY,,0,2018-07-14 07:09:38.251643,0,,,,N,SYSIBM,,,,2020-02-12,R,N,R,N,-1.0,
SYSIBM,SYSVIEWS,SYSIBM,S,T,N,,,,,2018-07-14 07:09:38.251643,2018-07-14 07:09:38.251643,2018-07-14 07:24:11.136408,2020-02-06 13:22:30.155163,12,11,0,460,5,0,5,-1,-1,-1,0,SYSCATSPACE,,,0,0,0,0,0,0,0,N,YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY,0,,0,-1,N,,,R,,N,,,N,,F,,0,�,999,0,0.0,195,0.0,,1208,SYSIBM,IDENTITY,SYSIBM,IDENTITY,,0,2018-07-14 07:09:38.251643,0,,,,N,SYSIBM,,,,2020-02-09,R,N,R,N,-1.0,
SYSIBM,SYSVIEWDEP,SYSIBM,S,T,N,,,,,2018-07-14 07:09:38.251643,2018-07-14 07:09:38.251643,2018-07-14 07:24:11.092534,2020-01-17 07:57:21.419313,11,12,0,1052,5,0,7,-1,-1,-1,0,SYSCATSPACE,,,0,0,0,0,0,0,0,N,YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY,0,,0,-1,N,,,R,,N,,,N,,F,,0,�,999,0,0.0,97,0.0,,1208,SYSIBM,IDENTITY,SYSIBM,IDENTITY,,0,2018-07-14 07:09:38.251643,0,,,,N,SYSIBM,,,,2020-02-12,R,N,R,N,-1.0,


In [9]:
%sql select * from syscat.tables where tabschema='RLS52050'

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


tabschema,tabname,owner,ownertype,TYPE,status,base_tabschema,base_tabname,rowtypeschema,rowtypename,create_time,alter_time,invalidate_time,stats_time,colcount,tableid,tbspaceid,card,npages,mpages,fpages,npartitions,nfiles,tablesize,overflow,tbspace,index_tbspace,long_tbspace,parents,children,selfrefs,keycolumns,keyindexid,keyunique,checkcount,datacapture,const_checked,pmap_id,partition_mode,log_attribute,pctfree,append_mode,REFRESH,refresh_time,LOCKSIZE,VOLATILE,row_format,property,statistics_profile,compression,rowcompmode,access_mode,clustered,active_blocks,droprule,maxfreespacesearch,avgcompressedrowsize,avgrowcompressionratio,avgrowsize,pctrowscompressed,logindexbuild,codepage,collationschema,collationname,collationschema_orderby,collationname_orderby,encoding_scheme,pctpagessaved,last_regen_time,secpolicyid,protectiongranularity,auditpolicyid,auditpolicyname,auditexceptionenabled,definer,oncommit,logged,onrollback,lastused,control,temporaltype,tableorg,extended_row_size,pctextendedrows,remarks
RLS52050,EMPLOYEES,RLS52050,U,T,N,,,,,2020-02-06 05:42:17.133999,2020-02-06 05:42:17.302688,2020-02-06 05:42:17.302706,2020-02-06 07:07:22.752935,11,5,741,9,1,0,2,-1,-1,-1,0,rls52050space1,,,0,0,0,1,1,0,0,N,YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY,1,,0,-1,N,,,R,,N,,,N,,F,,0,N,999,0,0.0,118,0.0,,1208,SYSIBM,IDENTITY,SYSIBM,IDENTITY,,0,2020-02-06 05:42:17.133999,0,,,,N,RLS52050,,,,2020-02-06,,N,R,N,-1.0,
RLS52050,JOB_HISTORY,RLS52050,U,T,N,,,,,2020-02-06 05:42:17.323856,2020-02-06 05:42:17.503911,2020-02-06 05:42:17.503925,2020-02-06 06:11:40.228100,4,6,741,9,1,0,2,-1,-1,-1,0,rls52050space1,,,0,0,0,2,1,0,0,N,YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY,1,,0,-1,N,,,R,,N,,,N,,F,,0,N,999,0,0.0,43,0.0,,1208,SYSIBM,IDENTITY,SYSIBM,IDENTITY,,0,2020-02-06 05:42:17.323856,0,,,,N,RLS52050,,,,2020-02-06,,N,R,N,-1.0,
RLS52050,DEPARTMENTS,RLS52050,U,T,N,,,,,2020-02-06 05:42:17.688302,2020-02-06 05:42:17.844694,2020-02-06 05:42:17.844707,2020-02-06 06:16:39.401755,4,8,741,2,1,0,2,-1,-1,-1,0,rls52050space1,,,0,0,0,1,1,0,0,N,YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY,1,,0,-1,N,,,R,,N,,,N,,F,,0,N,999,0,0.0,56,0.0,,1208,SYSIBM,IDENTITY,SYSIBM,IDENTITY,,0,2020-02-06 05:42:17.688302,0,,,,N,RLS52050,,,,2020-02-06,,N,R,N,-1.0,
RLS52050,LOCATIONS,RLS52050,U,T,N,,,,,2020-02-06 05:42:17.857721,2020-02-06 05:42:18.008873,2020-02-06 05:42:18.008886,2020-02-06 07:07:22.949753,2,9,741,2,1,0,2,-1,-1,-1,0,rls52050space1,,,0,0,0,2,1,0,0,N,YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY,1,,0,-1,N,,,R,,N,,,N,,F,,0,N,999,0,0.0,28,0.0,,1208,SYSIBM,IDENTITY,SYSIBM,IDENTITY,,0,2020-02-06 05:42:17.857721,0,,,,N,RLS52050,,,,2020-02-06,,N,R,N,-1.0,
RLS52050,JOBS,RLS52050,U,T,N,,,,,2020-02-06 06:35:13.895643,2020-02-06 06:35:14.063338,2020-02-06 06:35:14.063350,2020-02-06 07:07:23.139162,4,7,741,9,1,0,2,-1,-1,-1,0,rls52050space1,,,0,0,0,1,1,0,0,N,YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY,1,,0,-1,N,,,R,,N,,,N,,F,,0,N,999,0,0.0,53,0.0,,1208,SYSIBM,IDENTITY,SYSIBM,IDENTITY,,0,2020-02-06 06:35:13.895643,0,,,,N,RLS52050,,,,0001-01-01,,N,R,N,-1.0,
RLS52050,PETSALE,RLS52050,U,T,N,,,,,2020-02-06 08:51:17.284861,2020-02-06 08:51:17.442852,2020-02-06 08:51:17.442866,2020-02-06 09:12:22.337247,5,10,741,9,1,0,1,-1,-1,-1,0,rls52050space1,,,0,0,0,1,1,0,0,N,YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY,1,,0,-1,N,,,R,,N,,,N,,F,,0,N,999,0,0.0,38,0.0,,1208,SYSIBM,IDENTITY,SYSIBM,IDENTITY,,0,2020-02-06 08:51:17.284861,0,,,,N,RLS52050,,,,2020-02-06,,N,R,N,-1.0,
RLS52050,INSTRUCTOR,RLS52050,U,T,N,,,,,2020-02-11 06:53:18.190273,2020-02-11 06:53:18.422481,2020-02-11 06:53:18.422496,2020-02-11 07:57:25.923655,5,4,741,0,0,0,1,-1,-1,-1,0,rls52050space1,,,0,0,0,1,1,0,0,N,YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY,1,,0,-1,N,,,R,,N,,,N,,F,,0,N,999,0,0.0,0,0.0,,1208,SYSIBM,IDENTITY,SYSIBM,IDENTITY,,0,2020-02-11 06:53:18.190273,0,,,,N,RLS52050,,,,0001-01-01,,N,R,N,-1.0,
RLS52050,CHICAGO_SOCIOECONOMIC_DATA,RLS52050,U,T,N,,,,,2020-02-11 12:15:22.880948,2020-02-11 12:15:22.880948,2020-02-11 12:15:23.672741,2020-02-11 12:16:44.984720,10,11,741,78,1,0,1,-1,-1,-1,0,rls52050space1,,,0,0,0,0,0,0,0,N,YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY,1,,0,-1,N,,,R,,N,,,N,,F,,0,N,999,0,0.0,111,0.0,,1208,SYSIBM,IDENTITY,SYSIBM,IDENTITY,,0,2020-02-11 12:15:22.880948,0,,,,N,RLS52050,,,,2020-02-11,,N,R,N,-1.0,


Double-click __here__ for a hint

<!--
In Db2 the system catalog table called SYSCAT.TABLES contains the table metadata
-->

Double-click __here__ for the solution.

<!-- Solution:

%sql select TABSCHEMA, TABNAME, CREATE_TIME from SYSCAT.TABLES where TABSCHEMA='YOUR-DB2-USERNAME'

or, you can retrieve list of all tables where the schema name is not one of the system created ones:

%sql select TABSCHEMA, TABNAME, CREATE_TIME from SYSCAT.TABLES \
      where TABSCHEMA not in ('SYSIBM', 'SYSCAT', 'SYSSTAT', 'SYSIBMADM', 'SYSTOOLS', 'SYSPUBLIC')
      
or, just query for a specifc table that you want to verify exists in the database
%sql select * from SYSCAT.TABLES where TABNAME = 'SCHOOLS'

-->

### Query the database system catalog to retrieve column metadata

##### The SCHOOLS table contains a large number of columns. How many columns does this table have?

In [5]:
# type in your query to retrieve the number of columns in the SCHOOLS table
%sql select count(*) from SYSCAT.COLUMNS where TABNAME = 'CHICAGO_PUBLIC_SCHOOLS'

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


1
78


Double-click __here__ for a hint

<!--
In Db2 the system catalog table called SYSCAT.COLUMNS contains the column metadata
-->

Double-click __here__ for the solution.

<!-- Solution:

%sql select count(*) from SYSCAT.COLUMNS where TABNAME = 'SCHOOLS'

-->

Now retrieve the the list of columns in SCHOOLS table and their column type (datatype) and length.

In [6]:
# type in your query to retrieve all column names in the SCHOOLS table along with their datatypes and length
%sql select COLNAME, TYPENAME, LENGTH from SYSCAT.COLUMNS where TABNAME = 'CHICAGO_PUBLIC_SCHOOLS'

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


colname,typename,length
SAFETY_SCORE,SMALLINT,2
Family_Involvement_Icon,VARCHAR,11
Family_Involvement_Score,VARCHAR,3
Environment_Icon,VARCHAR,11
Environment_Score,SMALLINT,2
Instruction_Icon,VARCHAR,11
Instruction_Score,SMALLINT,2
Leaders_Icon,VARCHAR,11
Leaders_Score,VARCHAR,3
Teachers_Icon,VARCHAR,11


Double-click __here__ for the solution.

<!-- Solution:

%sql select COLNAME, TYPENAME, LENGTH from SYSCAT.COLUMNS where TABNAME = 'SCHOOLS'

or

%sql select distinct(NAME), COLTYPE, LENGTH from SYSIBM.SYSCOLUMNS where TBNAME = 'SCHOOLS'

-->

### Questions
1. Is the column name for the "SCHOOL ID" attribute in upper or mixed case?
1. What is the name of "Community Area Name" column in your table? Does it have spaces?
1. Are there any columns in whose names the spaces and paranthesis (round brackets) have been replaced by the underscore character "_"?

## Problems

### Problem 1

##### How many Elementary Schools are in the dataset?

In [11]:
%sql select count(*) from CHICAGO_PUBLIC_SCHOOLS where "Elementary, Middle, or High School" = 'ES'

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


1
462


In [8]:
%sql select "Elementary, Middle, or High School" from CHICAGO_PUBLIC_SCHOOLS limit 10

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


"Elementary, Middle, or High School"
ES
ES
ES
ES
HS
MS
HS
ES
HS
ES


Double-click __here__ for a hint

<!--
Which column specifies the school type e.g. 'ES', 'MS', 'HS'?
-->

Double-click __here__ for another hint

<!--
Does the column name have mixed case, spaces or other special characters?
If so, ensure you use double quotes around the "Name of the Column"
-->

Double-click __here__ for the solution.

<!-- Solution:

%sql select count(*) from SCHOOLS where "Elementary, Middle, or High School" = 'ES'

Correct answer: 462

-->

### Problem 2

##### What is the highest Safety Score?

Double-click __here__ for a hint

<!--
Use the MAX() function
-->

Double-click __here__ for the solution.

<!-- Hint:

%sql select MAX(Safety_Score) AS MAX_SAFETY_SCORE from SCHOOLS

Correct answer: 99
-->


### Problem 3

##### Which schools have highest Safety Score?

In [13]:
%sql select NAME_OF_SCHOOL,SAFETY_SCORE from CHICAGO_PUBLIC_SCHOOLS where SAFETY_SCORE=(select max(SAFETY_SCORE) from CHICAGO_PUBLIC_SCHOOLS)

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


name_of_school,safety_score
Abraham Lincoln Elementary School,99
Alexander Graham Bell Elementary School,99
Annie Keller Elementary Gifted Magnet School,99
Augustus H Burley Elementary School,99
Edgar Allan Poe Elementary Classical School,99
Edgebrook Elementary School,99
Ellen Mitchell Elementary School,99
James E McDade Elementary Classical School,99
James G Blaine Elementary School,99
LaSalle Elementary Language Academy,99


Double-click __here__ for the solution.

<!-- Solution:
In the previous problem we found out that the highest Safety Score is 99, so we can use that as an input in the where clause:

%sql select Name_of_School, Safety_Score from SCHOOLS where Safety_Score = 99

or, a better way:

%sql select Name_of_School, Safety_Score from SCHOOLS where \
  Safety_Score= (select MAX(Safety_Score) from SCHOOLS)


Correct answer: several schools with with Safety Score of 99.
-->


### Problem 4

##### What are the top 10 schools with the highest "Average Student Attendance"?


In [18]:
%sql select NAME_OF_SCHOOL,AVERAGE_STUDENT_ATTENDANCE from CHICAGO_PUBLIC_SCHOOLS order by AVERAGE_STUDENT_ATTENDANCE desc nulls last limit 10

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


name_of_school,average_student_attendance
John Charles Haines Elementary School,98.40%
James Ward Elementary School,97.80%
Edgar Allan Poe Elementary Classical School,97.60%
Orozco Fine Arts & Sciences Elementary School,97.60%
Rachel Carson Elementary School,97.60%
Annie Keller Elementary Gifted Magnet School,97.50%
Andrew Jackson Elementary Language Academy,97.40%
Lenart Elementary Regional Gifted Center,97.40%
Disney II Magnet School,97.30%
John H Vanderpoel Elementary Magnet School,97.20%


Double-click __here__ for the solution.

<!-- Solution:

%sql select Name_of_School, Average_Student_Attendance from SCHOOLS \
    order by Average_Student_Attendance desc nulls last limit 10 

-->

### Problem 5

##### Retrieve the list of 5 Schools with the lowest Average Student Attendance sorted in ascending order based on attendance

In [19]:
%%sql

select NAME_OF_SCHOOL,AVERAGE_STUDENT_ATTENDANCE from CHICAGO_PUBLIC_SCHOOLS 
order by AVERAGE_STUDENT_ATTENDANCE nulls last limit 5

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


name_of_school,average_student_attendance
Richard T Crane Technical Preparatory High School,57.90%
Barbara Vick Early Childhood & Family Center,60.90%
Dyett High School,62.50%
Wendell Phillips Academy High School,63.00%
Orr Academy High School,66.30%


Double-click __here__ for the solution.

<!-- Solution:

%sql SELECT Name_of_School, Average_Student_Attendance  \
     from SCHOOLS \
     order by Average_Student_Attendance \
     fetch first 5 rows only

-->


### Problem 6

##### Now remove the '%' sign from the above result set for Average Student Attendance column

In [20]:
%%sql

select NAME_OF_SCHOOL,replace(AVERAGE_STUDENT_ATTENDANCE,'%','') as ATTENDANCE2 from CHICAGO_PUBLIC_SCHOOLS
order by AVERAGE_STUDENT_ATTENDANCE nulls last limit 5

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


name_of_school,attendance2
Richard T Crane Technical Preparatory High School,57.9
Barbara Vick Early Childhood & Family Center,60.9
Dyett High School,62.5
Wendell Phillips Academy High School,63.0
Orr Academy High School,66.3


Double-click __here__ for a hint

<!--
Use the REPLACE() function to replace '%' with ''
See documentation for this function at:
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0000843.html
-->

Double-click __here__ for the solution.

<!-- Hint:

%sql SELECT Name_of_School, REPLACE(Average_Student_Attendance, '%', '') \
     from SCHOOLS \
     order by Average_Student_Attendance \
     fetch first 5 rows only

-->


### Problem 7

##### Which Schools have Average Student Attendance lower than 70%?

In [27]:
%%sql

select NAME_OF_SCHOOL,AVERAGE_STUDENT_ATTENDANCE
from CHICAGO_PUBLIC_SCHOOLS
where cast(replace(AVERAGE_STUDENT_ATTENDANCE,'%','') as decimal) < 70

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


name_of_school,average_student_attendance
Barbara Vick Early Childhood & Family Center,60.90%
Chicago Vocational Career Academy High School,68.80%
Dyett High School,62.50%
Manley Career Academy High School,66.80%
Orr Academy High School,66.30%
Richard T Crane Technical Preparatory High School,57.90%
Roberto Clemente Community Academy High School,69.60%
Wendell Phillips Academy High School,63.00%


Double-click __here__ for a hint

<!--
The datatype of the "Average_Student_Attendance" column is varchar.
So you cannot use it as is in the where clause for a numeric comparison.
First use the CAST() function to cast it as a DECIMAL or DOUBLE
e.g. CAST("Column_Name" as DOUBLE)
or simply: DECIMAL("Column_Name")
-->

Double-click __here__ for another hint

<!--
Don't forget the '%' age sign needs to be removed before casting
-->

Double-click __here__ for the solution.

<!-- Solution:

%sql SELECT Name_of_School, Average_Student_Attendance  \
     from SCHOOLS \
     where CAST ( REPLACE(Average_Student_Attendance, '%', '') AS DOUBLE ) < 70 \
     order by Average_Student_Attendance
     
or,

%sql SELECT Name_of_School, Average_Student_Attendance  \
     from SCHOOLS \
     where DECIMAL ( REPLACE(Average_Student_Attendance, '%', '') ) < 70 \
     order by Average_Student_Attendance

-->


### Problem 8

##### Get the total College Enrollment for each Community Area

In [29]:
%sql select COLLEGE_ENROLLMENT from CHICAGO_PUBLIC_SCHOOLS limit 5

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


college_enrollment
813
521
1324
556
302


In [36]:
%%sql
select COMMUNITY_AREA_NAME, sum(COLLEGE_ENROLLMENT) as SUM_OF_COLLEGE_ENROLLMENT from CHICAGO_PUBLIC_SCHOOLS
group by COMMUNITY_AREA_NAME

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


community_area_name,sum_of_college_enrollment
ALBANY PARK,6864
ARCHER HEIGHTS,4823
ARMOUR SQUARE,1458
ASHBURN,6483
AUBURN GRESHAM,4175
AUSTIN,10933
AVALON PARK,1522
AVONDALE,3640
BELMONT CRAGIN,14386
BEVERLY,1636


In [None]:
%sql select distinct COMMUNITY_AREA_NAME from CHICAGO_PUBLIC_SCHOOLS

Double-click __here__ for a hint

<!--
Verify the exact name of the Enrollment column in the database
Use the SUM() function to add up the Enrollments for each Community Area
-->

Double-click __here__ for another hint

<!--
Don't forget to group by the Community Area
-->

Double-click __here__ for the solution.

<!-- Solution:

%sql select Community_Area_Name, sum(College_Enrollment) AS TOTAL_ENROLLMENT \
   from SCHOOLS \
   group by Community_Area_Name 

-->


### Problem 9

##### Get the 5 Community Areas with the least total College Enrollment  sorted in ascending order 

In [38]:
%%sql
select COMMUNITY_AREA_NAME, sum(COLLEGE_ENROLLMENT) as SUM_OF_COLLEGE_ENROLLMENT from CHICAGO_PUBLIC_SCHOOLS
group by COMMUNITY_AREA_NAME order by SUM_OF_COLLEGE_ENROLLMENT nulls last limit 5

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


community_area_name,sum_of_college_enrollment
OAKLAND,140
FULLER PARK,531
BURNSIDE,549
OHARE,786
LOOP,871


Double-click __here__ for a hint

<!--
Order the previous query and limit the number of rows you fetch
-->

Double-click __here__ for the solution.

<!-- Solution:

%sql select Community_Area_Name, sum(College_Enrollment) AS TOTAL_ENROLLMENT \
   from SCHOOLS \
   group by Community_Area_Name \
   order by TOTAL_ENROLLMENT asc \
   fetch first 5 rows only

-->

### Problem 10

##### Get the hardship index for the community area which has College Enrollment of 4638

In [69]:
%%sql
select S.name_of_school,S.COMMUNITY_AREA_NAME,S.college_enrollment,C.hardship_index
from CHICAGO_PUBLIC_SCHOOLS S
inner join SELECTED_SOCIOECONOMIC_INDICATORS_IN_CHICAGO C on S.COMMUNITY_AREA_NUMBER=C.COMMUNITY_AREA_NUMBER
where college_enrollment = 4368

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


name_of_school,community_area_name,college_enrollment,hardship_index
Albert G Lane Technical High School,NORTH CENTER,4368,6


In [47]:
%%sql 
select COLNAME, TYPENAME, LENGTH from SYSCAT.COLUMNS 
where TABNAME = 'CHICAGO_PUBLIC_SCHOOLS' and colname like '%H%'

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


colname,typename,length
NAME_OF_SCHOOL,VARCHAR,65
"Elementary, Middle, or High School",VARCHAR,2
HEALTHY_SCHOOL_CERTIFIED,VARCHAR,3


In [48]:
%sql select * from SELECTED_SOCIOECONOMIC_INDICATORS_IN_CHICAGO limit 5

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


community_area_number,community_area_name,percent_of_housing_crowded,percent_households_below_poverty,percent_aged_16__unemployed,percent_aged_25__without_high_school_diploma,percent_aged_under_18_or_over_64,per_capita_income,hardship_index
1,Rogers Park,7.7,23.6,8.7,18.2,27.5,23939,39
2,West Ridge,7.8,17.2,8.8,20.8,38.5,23040,46
3,Uptown,3.8,24.0,8.9,11.8,22.2,35787,20
4,Lincoln Square,3.4,10.9,8.2,13.4,25.5,37524,17
5,North Center,0.3,7.5,5.2,4.5,26.2,57123,6


Double-click __here__ for the solution.

<!-- Solution:
NOTE: For this solution to work the CHICAGO_SOCIOECONOMIC_DATA table 
      as created in the last lab of Week 3 should already exist

%%sql 
select hardship_index 
   from chicago_socioeconomic_data CD, schools CPS 
   where CD.ca = CPS.community_area_number 
      and college_enrollment = 4368

-->

### Problem 11

##### Get the hardship index for the community area which has the highest value for College Enrollment

In [70]:
%%sql
select S.name_of_school,S.COMMUNITY_AREA_NAME,S.college_enrollment,C.hardship_index
from CHICAGO_PUBLIC_SCHOOLS S
inner join SELECTED_SOCIOECONOMIC_INDICATORS_IN_CHICAGO C on S.COMMUNITY_AREA_NUMBER=C.COMMUNITY_AREA_NUMBER
where college_enrollment = (select max(college_enrollment) from CHICAGO_PUBLIC_SCHOOLS)

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


name_of_school,community_area_name,college_enrollment,hardship_index
Albert G Lane Technical High School,NORTH CENTER,4368,6


Double-click __here__ for the solution.

<!-- Solution:
NOTE: For this solution to work the CHICAGO_SOCIOECONOMIC_DATA table 
      as created in the last lab of Week 3 should already exist

%sql select ca, community_area_name, hardship_index from chicago_socioeconomic_data \
   where ca in \
   ( select community_area_number from schools order by college_enrollment desc limit 1 )

-->

In [71]:
%sql select max(college_enrollment) from CHICAGO_PUBLIC_SCHOOLS

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


1
4368


## Summary

##### In this lab you learned how to work with a real word dataset using SQL and Python. You learned how to query columns with spaces or special characters in their names and with mixed case names. You also used built in database functions and practiced how to sort, limit, and order result sets, as well as used sub-queries and worked with multiple tables.

Copyright &copy; 2018 [cognitiveclass.ai](cognitiveclass.ai?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).


In [74]:
%sql select tabschema,tabname,create_time from syscat.tables where tabschema = 'RLS52050' and create_time > '2020-02-12 00:00:00'

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


tabschema,tabname,create_time
RLS52050,CHICAGO_CRIME_DATA,2020-02-12 10:23:40.751377
RLS52050,SELECTED_SOCIOECONOMIC_INDICATORS_IN_CHICAGO,2020-02-12 10:25:02.464155
RLS52050,CHICAGO_PUBLIC_SCHOOLS,2020-02-12 10:26:31.078221


In [76]:
%sql select * from CHICAGO_CRIME_DATA limit 1

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


id,case_number,DATE,block,iucr,primary_type,description,location_description,arrest,domestic,beat,district,ward,community_area_number,fbicode,x_coordinate,y_coordinate,YEAR,updatedon,latitude,longitude,location
3512276,HK587712,2004-08-28 17:50:56,047XX S KEDZIE AVE,890,THEFT,FROM BUILDING,SMALL RETAIL STORE,False,False,911,9,14,58,6,1155838,1873050,2004-01-01,2018-02-10 15:50:01,41.8074405,-87.70395585,"(41.8074405, -87.703955849)"


In [77]:
%sql select count(*) from CHICAGO_CRIME_DATA

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


1
534


In [None]:
%sql select * from CHICAGO_CRIME_DATA limit 10

In [79]:
%sql select count(*) from CHICAGO_CRIME_DATA where arrest = 'TRUE'

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


1
163


In [81]:
%sql select distinct(primary_type) from CHICAGO_CRIME_DATA where location_description = 'GAS STATION'

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


primary_type
CRIMINAL TRESPASS
NARCOTICS
ROBBERY
THEFT


In [82]:
%sql select * from SELECTED_SOCIOECONOMIC_INDICATORS_IN_CHICAGO limit 1

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


community_area_number,community_area_name,percent_of_housing_crowded,percent_households_below_poverty,percent_aged_16__unemployed,percent_aged_25__without_high_school_diploma,percent_aged_under_18_or_over_64,per_capita_income,hardship_index
1,Rogers Park,7.7,23.6,8.7,18.2,27.5,23939,39


In [83]:
%sql select * from SELECTED_SOCIOECONOMIC_INDICATORS_IN_CHICAGO where community_area_name like 'B%'

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


community_area_number,community_area_name,percent_of_housing_crowded,percent_households_below_poverty,percent_aged_16__unemployed,percent_aged_25__without_high_school_diploma,percent_aged_under_18_or_over_64,per_capita_income,hardship_index
19,Belmont Cragin,10.8,18.7,14.6,37.3,37.3,15461,70
47,Burnside,6.8,33.0,18.6,19.3,42.7,12515,79
58,Brighton Park,14.4,23.6,13.9,45.1,39.3,13089,84
60,Bridgeport,4.5,18.9,13.7,22.2,31.3,22694,43
72,Beverly,0.9,5.1,8.0,3.7,40.5,39523,12


In [84]:
%sql select * from CHICAGO_PUBLIC_SCHOOLS limit 1

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


School_ID,name_of_school,"Elementary, Middle, or High School",Street_Address,City,State,ZIP_Code,Phone_Number,Link,Network_Manager,Collaborative_Name,Adequate_Yearly_Progress_Made_,Track_Schedule,CPS_Performance_Policy_Status,CPS_Performance_Policy_Level,healthy_school_certified,Safety_Icon,safety_score,Family_Involvement_Icon,Family_Involvement_Score,Environment_Icon,Environment_Score,Instruction_Icon,Instruction_Score,Leaders_Icon,Leaders_Score,Teachers_Icon,Teachers_Score,Parent_Engagement_Icon,Parent_Engagement_Score,Parent_Environment_Icon,Parent_Environment_Score,average_student_attendance,Rate_of_Misconducts__per_100_students_,Average_Teacher_Attendance,Individualized_Education_Program_Compliance_Rate,Pk_2_Literacy__,Pk_2_Math__,Gr3_5_Grade_Level_Math__,Gr3_5_Grade_Level_Read__,Gr3_5_Keep_Pace_Read__,Gr3_5_Keep_Pace_Math__,Gr6_8_Grade_Level_Math__,Gr6_8_Grade_Level_Read__,Gr6_8_Keep_Pace_Math_,Gr6_8_Keep_Pace_Read__,Gr_8_Explore_Math__,Gr_8_Explore_Read__,ISAT_Exceeding_Math__,ISAT_Exceeding_Reading__,ISAT_Value_Add_Math,ISAT_Value_Add_Read,ISAT_Value_Add_Color_Math,ISAT_Value_Add_Color_Read,Students_Taking__Algebra__,Students_Passing__Algebra__,9th Grade EXPLORE (2009),9th Grade EXPLORE (2010),10th Grade PLAN (2009),10th Grade PLAN (2010),Net_Change_EXPLORE_and_PLAN,11th Grade Average ACT (2011),Net_Change_PLAN_and_ACT,College_Eligibility__,Graduation_Rate__,College_Enrollment_Rate__,college_enrollment,General_Services_Route,Freshman_on_Track_Rate__,x_coordinate,y_coordinate,Latitude,Longitude,community_area_number,community_area_name,Ward,Police_District,Location
610038,Abraham Lincoln Elementary School,ES,615 W Kemper Pl,Chicago,IL,60614,(773) 534-5720,http://schoolreports.cps.edu/SchoolProgressReport_Eng/Spring2011Eng_610038.pdf,Fullerton Elementary Network,NORTH-NORTHWEST SIDE COLLABORATIVE,No,Standard,Not on Probation,Level 1,Yes,Very Strong,99,Very Strong,99,Strong,74,Strong,66,Strong,65,Strong,70,Strong,56,Average,47,96.00%,2.0,96.40%,95.80%,80.1,43.3,89.6,84.9,60.7,62.6,81.9,85.2,52,62.4,66.3,77.9,69.7,64.4,0.2,0.9,Yellow,Green,67.1,54.5,NDA,NDA,NDA,NDA,NDA,NDA,NDA,NDA,NDA,NDA,813,33,NDA,1171699.458,1915829.428,41.92449696,-87.64452163,7,LINCOLN PARK,43,18,"(41.92449696, -87.64452163)"


In [99]:
%sql select name_of_school,community_area_number,healthy_school_certified from CHICAGO_PUBLIC_SCHOOLS\
where community_area_number between 10 and 15 and healthy_school_certified = 'Yes'

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


name_of_school,community_area_number,healthy_school_certified
Rufus M Hitch Elementary School,10,Yes


In [101]:
%sql select AVG(safety_score) from CHICAGO_PUBLIC_SCHOOLS

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


1
49.504873


In [108]:
%sql select community_area_name, AVG(college_enrollment) as AVG_COLLEGE_ENROLLMENT from CHICAGO_PUBLIC_SCHOOLS\
group by community_area_name order by AVG_COLLEGE_ENROLLMENT desc nulls last limit 5

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


community_area_name,avg_college_enrollment
ARCHER HEIGHTS,2411.5
MONTCLARE,1317.0
WEST ELSDON,1233.333333
BRIGHTON PARK,1205.875
BELMONT CRAGIN,1198.833333


In [109]:
%sql select community_area_name, safety_score from CHICAGO_PUBLIC_SCHOOLS\
where safety_score = (select min(safety_score) from CHICAGO_PUBLIC_SCHOOLS)

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


community_area_name,safety_score
WASHINGTON PARK,1


In [110]:
%sql select * from SELECTED_SOCIOECONOMIC_INDICATORS_IN_CHICAGO limit 1

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


community_area_number,community_area_name,percent_of_housing_crowded,percent_households_below_poverty,percent_aged_16__unemployed,percent_aged_25__without_high_school_diploma,percent_aged_under_18_or_over_64,per_capita_income,hardship_index
1,Rogers Park,7.7,23.6,8.7,18.2,27.5,23939,39


In [115]:
%%sql
select S.community_area_name, S.safety_score, C.per_capita_income
from CHICAGO_PUBLIC_SCHOOLS S,SELECTED_SOCIOECONOMIC_INDICATORS_IN_CHICAGO C
where S.community_area_number = C.community_area_number and S.safety_score = 1

 * ibm_db_sa://rls52050:***@dashdb-txn-sbox-yp-lon02-01.services.eu-gb.bluemix.net:50000/BLUDB
Done.


community_area_name,safety_score,per_capita_income
WASHINGTON PARK,1,13785
