# CMSD Data Analyst Skills Assessment
In this notebook, I will outline my overall data preparation process. This environment is setup using Python, however, all of my queries will use SQL and the data is stored in a SQLite db on my computer.

## General Steps
1) Download and verify files from https://reportcard.education.ohio.gov/download
2) Convert files to .csv format and create tables in SQLite (I used DB Browser for SQLite for this step)
3) Create Jupyter SQL environment and connect to db
4) Filter data based on specified requirements - School Years and District IRN (043786)
5) Final query csv ouput file with 6 required columns

In [22]:
%%capture
!pip install ipython-sql
!pip install sqlalchemy

In [23]:
%lsmagic

Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %sql  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%py

In [1]:
%%capture
%load_ext sql
# Establishing SQL environment and connecting to SQLite db
import sqlalchemy
sqlalchemy.create_engine("sqlite:///CMSD_db.db")
%sql sqlite:///CMSD_db.db  

In [2]:
%%js
require(['notebook/js/codecell'], function (codecell) {
    codecell.CodeCell.options_default.highlight_modes['magic_text/x-mssql'] = { 'reg': [/%?%sql/] };
    Jupyter.notebook.events.one('kernel_ready.Kernel', function () {
        Jupyter.notebook.get_cells().map(function (cell) {
            if (cell.cell_type == 'code') { cell.auto_highlight(); }
        });
    });
});

<IPython.core.display.Javascript object>

## Achievement Building

In this section, I am going to work on the three tables for Building Achievement.

The general process will involve filtering by District IRN, removing unnecessary columns, and inserting a year column. Due to the nature of this project, I opted for using temporary tables since this was completed in one session, although this is achievable with views as well.

Overall, I worked on each category separately to ensure my naming conventions were consistent, and that the data was comparable year to year. 

### Creating Temp Table for Achievement Building School Year 2015-2016

In [5]:
%%sql
SELECT *
FROM Achievement_Building_1516
WHERE DistrictIRN = '043786'

/*Just a basic query to check everything is working as intended before making temporary tables

 * sqlite:///CMSD_db.db
Done.


BuildingIRN,BuildingName,DistrictIRN,DistrictName,County,Region,Address,CityandZipCode,Phone#,Principal,PerformanceIndexScore2015-16,PerformanceIndexPercent2015-16,LetterGradeofPerformanceIndex,PercentofStudentsNotTested,PercentofStudentsBelow,PercentofStudentsBasic,PercentofStudentsProficient,PercentofStudentsAccelerated,PercentofStudentsAdvanced,PercentofStudentsAdvancedPlus,GiftedPerformanceIndexScore2015-16,GiftedPerformanceIndex2015-16,PercentofGiftedStudentsNotTested,PercentofGiftedStudentsBelow,PercentofGiftedStudentsBasic,PercentofGiftedStudentsProficient,PercentofGiftedStudentsAccelerated,PercentofGiftedStudentsAdvanced,PercentofGiftedStudentsAdvancedPlus,PerformanceIndexScore2014-15,PerformanceIndexScore2013-14,Watermark
224,Adlai Stevenson School,43786,Cleveland Municipal,Cuyahoga,Region 3,18300 Woda Avenue,"Cleveland, OH, 44122-6441",(216) 482-2950,Christopher T. Wyland,43.903,36.6,F,0,69.9,19,7.6,2.5,1,0.0,NC,NC,NC,NC,NC,NC,NC,NC,NC,55.307,68.009,
489,Almira,43786,Cleveland Municipal,Cuyahoga,Region 3,3375 W 99th St,"Cleveland, OH, 44102-4642",(216) 838-6150,Laverne Hooks,46.384,38.7,F,0,66.2,19.2,10.9,3.2,0.5,0.0,NC,NC,NC,NC,NC,NC,NC,NC,NC,59.407,67.316,
729,Andrew J Rickoff,43786,Cleveland Municipal,Cuyahoga,Region 3,3500 E 147th St,"Cleveland, OH, 44120-4834",(216) 838-4150,Gloriane R. Smith,44.327,36.9,F,0.3,68.5,19.4,8.7,2.4,0.7,0.0,NC,NC,NC,NC,NC,NC,NC,NC,NC,51.929,62.727,
828,Anton Grdina,43786,Cleveland Municipal,Cuyahoga,Region 3,2955 E 71st St,"Cleveland, OH, 44104-4101",(216) 812-1543,Harold S. Booker,38.904,32.4,F,0,79.8,13.4,5.7,0.9,0.2,0.0,NC,NC,NC,NC,NC,NC,NC,NC,NC,50.605,54.412,
1040,Artemus Ward,43786,Cleveland Municipal,Cuyahoga,Region 3,4315 W 140th St,"Cleveland, OH, 44135-2128",(216) 920-7055,Chris P. Myslenski,56.816,47.3,F,0,50.8,21.7,18.8,6.8,1.9,0.0,NC,NC,NC,NC,NC,NC,NC,NC,NC,66.331,73.281,
2378,Benjamin Franklin,43786,Cleveland Municipal,Cuyahoga,Region 3,1905 Spring Rd,"Cleveland, OH, 44109-4460",(216) 749-8580,Rachel J. Snider,64.892,54.1,D,0.1,40.6,22.5,18.6,12,5.9,0.2,94.521,78.8,0,9.6,15.1,30.1,20.5,21.9,2.7,79.004,86.34,
3137,Bolton,43786,Cleveland Municipal,Cuyahoga,Region 3,9803 Quebec Ave,"Cleveland, OH, 44106-3519",(216) 231-2585,Juliet M. King,36.583,30.5,F,0.3,82.5,13.9,2.2,0.6,0.6,0.0,NC,NC,NC,NC,NC,NC,NC,NC,NC,49.694,57.727,
4234,Buhrer,43786,Cleveland Municipal,Cuyahoga,Region 3,1600 Buhrer Ave,"Cleveland, OH, 44109-1747",(216) 744-2800,michele sanchez,71.772,59.8,D,0,29.4,25,27.3,12.5,5.7,0.0,108.529,90.4,0,0,2.9,29.4,38.2,29.4,0,86.263,90.501,
5066,Case,43786,Cleveland Municipal,Cuyahoga,Region 3,4050 Superior Ave,"Cleveland, OH, 44103-1128",(216) 838-1350,Brandee M. Carson-jones,38.715,32.3,F,0.2,80.3,12.4,6,1.1,0,0.0,NC,NC,NC,NC,NC,NC,NC,NC,NC,53.141,63.525,
5637,Alfred Benesch,43786,Cleveland Municipal,Cuyahoga,Region 3,5393 Quincy Ave,"Cleveland, OH, 44104-4409",(216) 431-4410,Latosha M. Glass,39.222,32.7,F,0,78.4,15.4,4.5,1,0.8,0.0,NC,NC,NC,NC,NC,NC,NC,NC,NC,44.381,54.352,


In [24]:
%%sql
CREATE TEMPORARY TABLE Temp_Achievement_Building_1516 AS
SELECT "BuildingIRN", "BuildingName", "DistrictIRN", "DistrictName", "PerformanceIndexScore2015-16" AS "PerformanceIndexScore", "PerformanceIndexPercent2015-16" AS "PerformanceIndexPercent", "LetterGradeofPerformanceIndex", "PercentofStudentsBelow", "PercentofStudentsBasic", "PercentofStudentsProficient", "PercentofStudentsAccelerated", "PercentofStudentsAdvanced", "PercentofStudentsAdvancedPlus", '2015-2016' AS SchoolYear
FROM Achievement_Building_1516
WHERE "DistrictIRN" = '043786';
/* In this temp table, I isolated the columns I wanted to keep and added a school year column for organization.
Additionally, I some extra columns for checking and possible analysis to explore later on.
Lastly, I used quotations around all of the column names for consistency due to irregular naming.

 * sqlite:///CMSD_db.db
Done.
0 rows affected.


[]

In [56]:
%%sql
DROP TABLE IF EXISTS Temp_Building_Ratings_1617

 * sqlite:///CMSD_db.db
Done.


[]

In [25]:
%%sql
PRAGMA table_info(Temp_Achievement_Building_1516)
/* This Temp Table has everything I intended, additional columns might be added later, 
but it looks pretty good from here.

 * sqlite:///CMSD_db.db
Done.


cid,name,type,notnull,dflt_value,pk
0,BuildingIRN,INT,0,,0
1,BuildingName,TEXT,0,,0
2,DistrictIRN,INT,0,,0
3,DistrictName,TEXT,0,,0
4,PerformanceIndexScore,TEXT,0,,0
5,PerformanceIndexPercent,TEXT,0,,0
6,LetterGradeofPerformanceIndex,TEXT,0,,0
7,PercentofStudentsBelow,TEXT,0,,0
8,PercentofStudentsBasic,TEXT,0,,0
9,PercentofStudentsProficient,TEXT,0,,0


In [26]:
%%sql
SELECT *
FROM Temp_Achievement_Building_1516
WHERE DistrictIRN = '043786'

 * sqlite:///CMSD_db.db
Done.


BuildingIRN,BuildingName,DistrictIRN,DistrictName,PerformanceIndexScore,PerformanceIndexPercent,LetterGradeofPerformanceIndex,PercentofStudentsBelow,PercentofStudentsBasic,PercentofStudentsProficient,PercentofStudentsAccelerated,PercentofStudentsAdvanced,PercentofStudentsAdvancedPlus,SchoolYear
224,Adlai Stevenson School,43786,Cleveland Municipal,43.903,36.6,F,69.9,19,7.6,2.5,1,0.0,2015-2016
489,Almira,43786,Cleveland Municipal,46.384,38.7,F,66.2,19.2,10.9,3.2,0.5,0.0,2015-2016
729,Andrew J Rickoff,43786,Cleveland Municipal,44.327,36.9,F,68.5,19.4,8.7,2.4,0.7,0.0,2015-2016
828,Anton Grdina,43786,Cleveland Municipal,38.904,32.4,F,79.8,13.4,5.7,0.9,0.2,0.0,2015-2016
1040,Artemus Ward,43786,Cleveland Municipal,56.816,47.3,F,50.8,21.7,18.8,6.8,1.9,0.0,2015-2016
2378,Benjamin Franklin,43786,Cleveland Municipal,64.892,54.1,D,40.6,22.5,18.6,12,5.9,0.2,2015-2016
3137,Bolton,43786,Cleveland Municipal,36.583,30.5,F,82.5,13.9,2.2,0.6,0.6,0.0,2015-2016
4234,Buhrer,43786,Cleveland Municipal,71.772,59.8,D,29.4,25,27.3,12.5,5.7,0.0,2015-2016
5066,Case,43786,Cleveland Municipal,38.715,32.3,F,80.3,12.4,6,1.1,0,0.0,2015-2016
5637,Alfred Benesch,43786,Cleveland Municipal,39.222,32.7,F,78.4,15.4,4.5,1,0.8,0.0,2015-2016


In [35]:
%%sql
SELECT COUNT (DISTINCT BuildingIRN) AS building, COUNT (DISTINCT DistrictIRN) AS District, COUNT (DISTINCT DistrictName) AS Name
FROM Temp_Achievement_Building_1516
/* Checking to make sure I have the desired results for DistrictIRN and DistrictName. I included DistrictName as 
a second layer/additional means of checking that I only had Cleveland Schools. At this point, due to the low
record count, I also double checked that this was the correct result in Excel to confirm before moving forward.

 * sqlite:///CMSD_db.db
Done.


building,District,Name
101,1,1


### Creating Temp Table for Achievement Building School Year 2016-2017

In [19]:
%%sql
SELECT *
FROM Achievement_Building_1617
WHERE DistrictIRN = '043786'

 * sqlite:///CMSD_db.db
Done.


BuildingIRN,BuildingName,DistrictIRN,DistrictName,County,Region,Address,CityandZipCode,Phone#,Principal,PerformanceIndexScore2016-17,PerformanceIndexPercent2016-17,LetterGradeofPerformanceIndex,PercentofStudentsNotTested,PercentofStudentsBelow,PercentofStudentsBasic,PercentofStudentsProficient,PercentofStudentsAccelerated,PercentofStudentsAdvanced,PercentofStudentsAdvancedPlus,GiftedPerformanceIndexScore2016-17,GiftedPerformanceIndex2016-17,PercentofGiftedStudentsNotTested,PercentofGiftedStudentsBelow,PercentofGiftedStudentsBasic,PercentofGiftedStudentsProficient,PercentofGiftedStudentsAccelerated,PercentofGiftedStudentsAdvanced,PercentofGiftedStudentsAdvancedPlus,PerformanceIndexScore2015-16,PerformanceIndexScore2014-15,Watermark
224,Adlai Stevenson School,43786,Cleveland Municipal,Cuyahoga,Region 3,18300 Woda Avenue,"Cleveland, OH, 44122-6441",(216) 838-5300,Christopher T. Wyland,49.271,41.1,F,0.2,60.4,22.5,10.8,5.4,0.8,0.0,NC,NC,0.0,33.3,11.1,11.1,33.3,11.1,0.0,43.903,55.307,
318,Menlo Park Academy,43786,Cleveland Municipal,Cuyahoga,Region 3,14440 Triskett Rd,"Cleveland, OH, 44111-2263",(440) 925-6365,,106.422,88.7,B,0.0,1.9,8.5,19.9,32.2,32.5,4.9,106.422,88.7,0.0,1.9,8.5,19.9,32.2,32.5,4.9,110.760,100.920,
489,Almira,43786,Cleveland Municipal,Cuyahoga,Region 3,3375 W 99th St,"Cleveland, OH, 44102-4642",(216) 838-6150,James Greene,46.791,39.0,F,1.9,60.9,22.9,10.3,3.2,0.8,0.0,85.000,70.8,0.0,14.3,21.4,35.7,21.4,7.1,0.0,46.384,59.407,
729,Andrew J Rickoff,43786,Cleveland Municipal,Cuyahoga,Region 3,3500 E 147th St,"Cleveland, OH, 44120-4834",(216) 838-4150,SHELBY R. SCHUTT,47.131,39.3,F,2.3,62.1,19.3,11.7,3.9,0.8,0.0,NC,NC,0.0,0.0,0.0,50.0,0.0,50.0,0.0,44.327,51.929,
828,Anton Grdina,43786,Cleveland Municipal,Cuyahoga,Region 3,2955 E 71st St,"Cleveland, OH, 44104-4101",(216) 838-1150,Harold S. Booker,40.361,33.6,F,3.6,72.4,14.4,6.0,3.4,0.2,0.0,NC,NC,0.0,33.3,0.0,0.0,66.7,0.0,0.0,38.904,50.605,
930,Cleveland Entrepreneurship Preparatory School,43786,Cleveland Municipal,Cuyahoga,Region 3,1417 E 36th St Fl 2,"Cleveland, OH, 44114-4116",(216) 456-2080,,75.062,62.6,D,0.0,25.9,24.9,25.2,16.9,7.1,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,73.607,85.536,
1040,Artemus Ward,43786,Cleveland Municipal,Cuyahoga,Region 3,4315 W 140th St,"Cleveland, OH, 44135-2128",(216) 838-6200,Chris P. Myslenski,60.942,50.8,D,0.0,42.2,27.1,19.6,9.0,2.1,0.0,NC,NC,0.0,0.0,0.0,28.6,28.6,42.9,0.0,56.816,66.331,
2378,Benjamin Franklin,43786,Cleveland Municipal,Cuyahoga,Region 3,1905 Spring Rd,"Cleveland, OH, 44109-4460",(216) 838-3150,Rachel J. Snider,64.372,53.6,D,1.1,37.2,25.0,24.3,9.7,2.7,0.0,86.615,72.2,1.5,13.8,16.9,32.3,24.6,10.8,0.0,64.892,79.004,
3137,Bolton,43786,Cleveland Municipal,Cuyahoga,Region 3,9803 Quebec Ave,"Cleveland, OH, 44106-3519",(216) 838-1200,Juliet M. King,41.123,34.3,F,3.2,71.4,15.0,7.8,2.4,0.3,0.0,NC,NC,0.0,25.0,25.0,25.0,0.0,25.0,0.0,36.583,49.694,
4234,Buhrer,43786,Cleveland Municipal,Cuyahoga,Region 3,1600 Buhrer Ave,"Cleveland, OH, 44109-1747",(216) 838-8350,michele sanchez,77.072,64.2,D,0.0,21.5,26.0,31.4,16.6,4.4,0.0,105.143,87.6,0.0,0.0,8.6,28.6,40.0,22.9,0.0,71.772,86.263,


In [27]:
%%sql
CREATE TEMPORARY TABLE Temp_Achievement_Building_1617 AS
SELECT "BuildingIRN", "BuildingName", "DistrictIRN", "DistrictName", "PerformanceIndexScore2016-17" AS "PerformanceIndexScore", "PerformanceIndexPercent2016-17" AS "PerformanceIndexPercent", "LetterGradeofPerformanceIndex", "PercentofStudentsBelow", "PercentofStudentsBasic", "PercentofStudentsProficient", "PercentofStudentsAccelerated", "PercentofStudentsAdvanced", "PercentofStudentsAdvancedPlus", '2016-2017' AS SchoolYear
FROM Achievement_Building_1617
WHERE "DistrictIRN" = '043786';

 * sqlite:///CMSD_db.db
Done.


[]

In [38]:
%%sql
PRAGMA table_info(Temp_Achievement_Building_1617)

 * sqlite:///CMSD_db.db
Done.


cid,name,type,notnull,dflt_value,pk
0,BuildingIRN,INT,0,,0
1,BuildingName,TEXT,0,,0
2,DistrictIRN,INT,0,,0
3,DistrictName,TEXT,0,,0
4,PerformanceIndexScore,TEXT,0,,0
5,PerformanceIndexPercent,TEXT,0,,0
6,LetterGradeofPerformanceIndex,TEXT,0,,0
7,PercentofStudentsBelow,TEXT,0,,0
8,PercentofStudentsBasic,TEXT,0,,0
9,PercentofStudentsProficient,TEXT,0,,0


In [29]:
%%sql
SELECT *
FROM Temp_Achievement_Building_1617
WHERE DistrictIRN = '043786'

 * sqlite:///CMSD_db.db
Done.


BuildingIRN,BuildingName,DistrictIRN,DistrictName,PerformanceIndexScore,PerformanceIndexPercent,LetterGradeofPerformanceIndex,PercentofStudentsBelow,PercentofStudentsBasic,PercentofStudentsProficient,PercentofStudentsAccelerated,PercentofStudentsAdvanced,PercentofStudentsAdvancedPlus,SchoolYear
224,Adlai Stevenson School,43786,Cleveland Municipal,49.271,41.1,F,60.4,22.5,10.8,5.4,0.8,0.0,2016-2017
318,Menlo Park Academy,43786,Cleveland Municipal,106.422,88.7,B,1.9,8.5,19.9,32.2,32.5,4.9,2016-2017
489,Almira,43786,Cleveland Municipal,46.791,39.0,F,60.9,22.9,10.3,3.2,0.8,0.0,2016-2017
729,Andrew J Rickoff,43786,Cleveland Municipal,47.131,39.3,F,62.1,19.3,11.7,3.9,0.8,0.0,2016-2017
828,Anton Grdina,43786,Cleveland Municipal,40.361,33.6,F,72.4,14.4,6.0,3.4,0.2,0.0,2016-2017
930,Cleveland Entrepreneurship Preparatory School,43786,Cleveland Municipal,75.062,62.6,D,25.9,24.9,25.2,16.9,7.1,0.0,2016-2017
1040,Artemus Ward,43786,Cleveland Municipal,60.942,50.8,D,42.2,27.1,19.6,9.0,2.1,0.0,2016-2017
2378,Benjamin Franklin,43786,Cleveland Municipal,64.372,53.6,D,37.2,25.0,24.3,9.7,2.7,0.0,2016-2017
3137,Bolton,43786,Cleveland Municipal,41.123,34.3,F,71.4,15.0,7.8,2.4,0.3,0.0,2016-2017
4234,Buhrer,43786,Cleveland Municipal,77.072,64.2,D,21.5,26.0,31.4,16.6,4.4,0.0,2016-2017


In [37]:
%%sql
SELECT COUNT (DISTINCT BuildingIRN) AS building, COUNT (DISTINCT DistrictIRN) AS District, COUNT (DISTINCT DistrictName) AS Name
FROM Temp_Achievement_Building_1617

 * sqlite:///CMSD_db.db
Done.


building,District,Name
119,1,1


### Creating Temp Table for Achievement Building School Year 2017-2018

In [42]:
%%sql
CREATE TEMPORARY TABLE Temp_Achievement_Building_1718 AS
SELECT "BuildingIRN", "BuildingName", "DistrictIRN", "DistrictName", "PerformanceIndexScore2017-18" AS "PerformanceIndexScore", "PerformanceIndexPercent2017-18" AS "PerformanceIndexPercent", "LetterGradeofPerformanceIndex", "PercentofStudentsLimited" AS PercentofStudentsBelow, "PercentofStudentsBasic", "PercentofStudentsProficient", "PercentofStudentsAccelerated", "PercentofStudentsAdvanced", "PercentofStudentsAdvancedPlus", '2017-2018' AS SchoolYear
FROM Achievement_Building_1718
WHERE "DistrictIRN" = '043786';

 * sqlite:///CMSD_db.db
Done.


[]

Since I included some additional columns for consideration while working on further analysis, there is one column that I had to rename: PerformanceofStudentsLimited. In the 2017-2018 Achievement table, this column was originally named "PerformanceofStudentsLimited", which deviates from the naming convention used in the two school years prior. For the sake of continuity, I renamed the 2017-2018 column to fit with the prior years, but this change has been noted. 

In [43]:
%%sql
PRAGMA table_info(Temp_Achievement_Building_1718)

 * sqlite:///CMSD_db.db
Done.


cid,name,type,notnull,dflt_value,pk
0,BuildingIRN,INT,0,,0
1,BuildingName,TEXT,0,,0
2,DistrictIRN,INT,0,,0
3,DistrictName,TEXT,0,,0
4,PerformanceIndexScore,TEXT,0,,0
5,PerformanceIndexPercent,TEXT,0,,0
6,LetterGradeofPerformanceIndex,TEXT,0,,0
7,PercentofStudentsBelow,TEXT,0,,0
8,PercentofStudentsBasic,TEXT,0,,0
9,PercentofStudentsProficient,TEXT,0,,0


In [44]:
%%sql
SELECT COUNT (DISTINCT BuildingIRN) AS building, COUNT (DISTINCT DistrictIRN) AS District, COUNT (DISTINCT DistrictName) AS Name
FROM Temp_Achievement_Building_1718

 * sqlite:///CMSD_db.db
Done.


building,District,Name
123,1,1


### Combining the Achievement Temp Tables into one table

In [45]:
%%sql
CREATE TEMPORARY TABLE Combined_Achievement AS
SELECT * FROM Temp_Achievement_Building_1516
UNION ALL
SELECT * FROM Temp_Achievement_Building_1617
UNION ALL
SELECT * FROM Temp_Achievement_Building_1718;

 * sqlite:///CMSD_db.db
Done.


[]

In [46]:
%%sql
SELECT *
FROM Combined_Achievement
WHERE DistrictIRN = '043786'

 * sqlite:///CMSD_db.db
Done.


BuildingIRN,BuildingName,DistrictIRN,DistrictName,PerformanceIndexScore,PerformanceIndexPercent,LetterGradeofPerformanceIndex,PercentofStudentsBelow,PercentofStudentsBasic,PercentofStudentsProficient,PercentofStudentsAccelerated,PercentofStudentsAdvanced,PercentofStudentsAdvancedPlus,SchoolYear
224,Adlai Stevenson School,43786,Cleveland Municipal,43.903,36.6,F,69.9,19,7.6,2.5,1,0,2015-2016
489,Almira,43786,Cleveland Municipal,46.384,38.7,F,66.2,19.2,10.9,3.2,0.5,0,2015-2016
729,Andrew J Rickoff,43786,Cleveland Municipal,44.327,36.9,F,68.5,19.4,8.7,2.4,0.7,0,2015-2016
828,Anton Grdina,43786,Cleveland Municipal,38.904,32.4,F,79.8,13.4,5.7,0.9,0.2,0,2015-2016
1040,Artemus Ward,43786,Cleveland Municipal,56.816,47.3,F,50.8,21.7,18.8,6.8,1.9,0,2015-2016
2378,Benjamin Franklin,43786,Cleveland Municipal,64.892,54.1,D,40.6,22.5,18.6,12,5.9,0.2,2015-2016
3137,Bolton,43786,Cleveland Municipal,36.583,30.5,F,82.5,13.9,2.2,0.6,0.6,0,2015-2016
4234,Buhrer,43786,Cleveland Municipal,71.772,59.8,D,29.4,25,27.3,12.5,5.7,0,2015-2016
5066,Case,43786,Cleveland Municipal,38.715,32.3,F,80.3,12.4,6,1.1,0,0,2015-2016
5637,Alfred Benesch,43786,Cleveland Municipal,39.222,32.7,F,78.4,15.4,4.5,1,0.8,0,2015-2016


In [48]:
%%sql
SELECT COUNT (DISTINCT BuildingIRN) AS building, COUNT (DISTINCT DistrictIRN) AS District, COUNT (DISTINCT DistrictName) AS Name, COUNT (DISTINCT SchoolYear) AS Year
FROM Combined_Achievement

 * sqlite:///CMSD_db.db
Done.


building,District,Name,Year
128,1,1,3


# Building Ratings
Now, I'm going to narrow down the Building Ratings data in a similar fashion to the Achievement Ratings. I'll go through each year, validate the data, ensure that the correct columns are included, and then union the temporary tables to form a combined Building Ratings table, which can be used to form the final output file.

### Building Ratings 2015-2016

In [49]:
%%sql
SELECT *
FROM Building_Ratings_1516
WHERE DistrictIRN = '043786'

/*Just a basic query to check everything is working as intended before making temporary tables

 * sqlite:///CMSD_db.db
Done.


DistrictIRN,DistrictName,BuildingIRN,BuildingName,County,Region,Address,CityStateZip,Phone,Principal,Enrollment2015-2016,LetterGradeofAchievementComponent,LetterGradeofPercentStandards,GiftedIndicatorMet/NotMetStatus,PercentofStandardsMet,LetterGradeofPerformanceIndex,PerformanceIndexPercent,LetterGradeofAMO,AMOPoints,LetterGradeofK3Literacy,K3LiteracyPercent,LetterGradeofProgressComponent,LetterGradeofOverallValueAdded,OverallValueAddedGainIndex,LetterGradeofGiftedValueAdded,GiftedValueAddedGainIndex,LetterGradeofStudentswithDisabilitiesValueAdded,StudentswithDisabilitiesValueAddedGainIndex,LetterGradeofLowest20%ValueAdded,Lowest20%ValueAddedGainIndex,LetterGradeofHighMobilityValueAdded,HighMobilityValueAddedGainIndex,LetterGradeofGradRateComponent,LetterGradeof4YearGradRate2015,4YearGradRate2015,LetterGradeof5YearGradRate2014,5YearGradRate2014,LetterGradeofPreparedforSuccessComponent,PercentofPreparedforSuccessComponent,AttendanceRate2015-2016,AttendanceRate2014-2015,AttendanceRate2013-2014,ChronicAbsenteeismPercent2015-2016,Watermark
43786,Cleveland Municipal City,224,Adlai Stevenson School,Cuyahoga,Region 3,18300 Woda Avenue,"Cleveland, OH, 44122-6441",(216) 482-2950,Christopher T. Wyland,430,F,F,NC,0.0,F,36.6,F,0.0,F,15.8,F,F,-6.6,NR,NC,F,-3.5,F,-4.4,F,-6.1,NR,NR,NC,NR,NC,NR,NC,94.0,92.7,93,17.4,
43786,Cleveland Municipal City,318,Menlo Park Academy,Cuyahoga,Region 3,14440 Triskett Rd,"Cleveland, OH, 44111-2263",(440) 925-6365,,367,A,A,Not Met,95.0,A,92.3,A,100.0,NR,NC,C,C,-0.5,C,-0.5,NR,NC,NR,NC,NR,NC,NR,NR,NC,NR,NC,NR,NC,95.6,95.9,96,6.8,
43786,Cleveland Municipal City,489,Almira,Cuyahoga,Region 3,3375 W 99th St,"Cleveland, OH, 44102-4642",(216) 838-6150,Laverne Hooks,499,F,F,NC,0.0,F,38.7,F,0.0,F,11.2,F,F,-6.2,NR,NC,F,-5.1,F,-4.8,NR,NC,NR,NR,NC,NR,NC,NR,NC,92.2,89.9,90,28.1,
43786,Cleveland Municipal City,729,Andrew J Rickoff,Cuyahoga,Region 3,3500 E 147th St,"Cleveland, OH, 44120-4834",(216) 838-4150,Gloriane R. Smith,477,F,F,NC,0.0,F,36.9,F,0.4,F,8.6,F,F,-6.4,NR,NC,F,-5.7,F,-4.7,NR,NC,NR,NR,NC,NR,NC,NR,NC,90.5,89.8,90,31.7,
43786,Cleveland Municipal City,828,Anton Grdina,Cuyahoga,Region 3,2955 E 71st St,"Cleveland, OH, 44104-4101",(216) 812-1543,Harold S. Booker,371,F,F,NC,0.0,F,32.4,F,0.0,F,3.3,D,F,-7.5,NR,NC,D,-1.5,F,-5.5,NR,NC,NR,NR,NC,NR,NC,NR,NC,89.3,89.0,87,38.1,
43786,Cleveland Municipal City,930,Cleveland Entrepreneurship Preparatory School,Cuyahoga,Region 3,1417 E 36th St Fl 2,"Cleveland, OH, 44114-4116",(216) 456-2080,Tiara S. Jordan,295,D,F,NC,0.0,D,61.3,F,0.0,NR,NC,A,A,5.4,NR,NC,A,2.4,A,5.1,NR,NC,NR,NR,NC,NR,NC,NR,NC,94.7,94.0,93,15.0,
43786,Cleveland Municipal City,1040,Artemus Ward,Cuyahoga,Region 3,4315 W 140th St,"Cleveland, OH, 44135-2128",(216) 920-7055,Chris P. Myslenski,491,F,F,NC,0.0,F,47.3,F,1.7,F,15.3,F,F,-4.7,NR,NC,F,-3.2,F,-2.2,NR,NC,NR,NR,NC,NR,NC,NR,NC,94.3,92.0,92,17.8,
43786,Cleveland Municipal City,2378,Benjamin Franklin,Cuyahoga,Region 3,1905 Spring Rd,"Cleveland, OH, 44109-4460",(216) 749-8580,Rachel J. Snider,602,F,F,Not Met,0.0,D,54.1,F,2.3,D,27.7,D,F,-12.2,D,-1.8,D,-1.5,F,-5.3,NR,NC,NR,NR,NC,NR,NC,NR,NC,93.4,92.4,93,19.0,
43786,Cleveland Municipal City,3137,Bolton,Cuyahoga,Region 3,9803 Quebec Ave,"Cleveland, OH, 44106-3519",(216) 231-2585,Juliet M. King,346,F,F,NC,0.0,F,30.5,F,0.0,F,8.2,F,F,-7.2,NR,NC,F,-4.0,F,-5.6,F,-6.7,NR,NR,NC,NR,NC,NR,NC,92.7,91.4,87,25.8,
43786,Cleveland Municipal City,4234,Buhrer,Cuyahoga,Region 3,1600 Buhrer Ave,"Cleveland, OH, 44109-1747",(216) 744-2800,michele sanchez,393,D,F,Not Met,11.8,D,59.8,F,0.0,F,10.0,D,F,-3.4,C,0.1,D,-1.9,F,-2.6,NR,NC,NR,NR,NC,NR,NC,NR,NC,96.3,95.9,96,7.5,


In [136]:
%%sql
CREATE TEMPORARY TABLE Temp_Building_Ratings_1516 AS
SELECT "DistrictIRN", "DistrictName", "BuildingIRN", "Address", "CityStateZip", "Enrollment2015-2016" AS "Enrollment", "LetterGradeofAchievementComponent", "LetterGradeofPercentStandards", "PercentofStandardsMet", "LetterGradeofPerformanceIndex", "PerformanceIndexPercent", '2015-2016' AS SchoolYear
FROM Building_Ratings_1516
WHERE DistrictIRN = '043786'

 * sqlite:///CMSD_db.db
Done.


[]

In [137]:
%%sql
PRAGMA table_info(Temp_Building_Ratings_1516)

 * sqlite:///CMSD_db.db
Done.


cid,name,type,notnull,dflt_value,pk
0,DistrictIRN,INT,0,,0
1,DistrictName,TEXT,0,,0
2,BuildingIRN,INT,0,,0
3,Address,TEXT,0,,0
4,CityStateZip,TEXT,0,,0
5,Enrollment,INT,0,,0
6,LetterGradeofAchievementComponent,TEXT,0,,0
7,LetterGradeofPercentStandards,TEXT,0,,0
8,PercentofStandardsMet,TEXT,0,,0
9,LetterGradeofPerformanceIndex,TEXT,0,,0


In [138]:
%%sql
SELECT COUNT (DISTINCT BuildingIRN) AS building, COUNT (DISTINCT DistrictIRN) AS District, COUNT (DISTINCT DistrictName) AS Name
FROM Temp_Building_Ratings_1516

 * sqlite:///CMSD_db.db
Done.


building,District,Name
117,1,1


First potential issue has arisen. Based on the entries remaining in my Temp Tables for 15-16, there are more schools present in the Building Ratings Table than there are in the Achievement Table. The difference is relatively small in absolute terms - 16 entries - however, this is significant.

### Building Ratings 2016-2017

In [171]:
%%sql
DROP TABLE Temp_Building_Ratings_1617

 * sqlite:///CMSD_db.db
Done.


[]

In [172]:
%%sql
CREATE TEMPORARY TABLE Temp_Building_Ratings_1617 AS
SELECT "DistrictIRN", "DistrictName", "BuildingIRN", "Address", "CityStateZip", "Enrollment2016-2017" AS "Enrollment", "LetterGradeofAchievementComponent", "LetterGradeofPercentStandards", "PercentofStandardsMet", "LetterGradeofPerformanceIndex", "PerformanceIndexPercent", '2016-2017' AS SchoolYear
FROM Building_Ratings_1617
WHERE DistrictIRN = '043786'

 * sqlite:///CMSD_db.db
Done.


[]

In [173]:
%%sql
PRAGMA table_info(Temp_Building_Ratings_1617)

 * sqlite:///CMSD_db.db
Done.


cid,name,type,notnull,dflt_value,pk
0,DistrictIRN,INT,0,,0
1,DistrictName,TEXT,0,,0
2,BuildingIRN,INT,0,,0
3,Address,TEXT,0,,0
4,CityStateZip,TEXT,0,,0
5,Enrollment,INT,0,,0
6,LetterGradeofAchievementComponent,TEXT,0,,0
7,LetterGradeofPercentStandards,TEXT,0,,0
8,PercentofStandardsMet,TEXT,0,,0
9,LetterGradeofPerformanceIndex,TEXT,0,,0


In [174]:
%%sql
SELECT COUNT (DISTINCT BuildingIRN) AS building, COUNT (DISTINCT DistrictIRN) AS District, COUNT (DISTINCT DistrictName) AS Name
FROM Temp_Building_Ratings_1617

 * sqlite:///CMSD_db.db
Done.


building,District,Name
116,1,1


In [147]:
%%sql
SELECT *
FROM Temp_Building_Ratings_1617
WHERE DistrictIRN = '043786'

 * sqlite:///CMSD_db.db
Done.


DistrictIRN,DistrictName,BuildingIRN,Address,CityStateZip,Enrollment,LetterGradeofAchievementComponent,LetterGradeofPercentStandards,PercentofStandardsMet,LetterGradeofPerformanceIndex,PerformanceIndexPercent,SchoolYear
43786,Cleveland Municipal City,224,18300 Woda Avenue,"Cleveland, OH, 44122-6441",445,F,F,0.0,F,41.1,2015-2016
43786,Cleveland Municipal City,318,14440 Triskett Rd,"Cleveland, OH, 44111-2263",405,B,A,90.5,B,88.7,2015-2016
43786,Cleveland Municipal City,489,3375 W 99th St,"Cleveland, OH, 44102-4642",491,F,F,0.0,F,39.0,2015-2016
43786,Cleveland Municipal City,729,3500 E 147th St,"Cleveland, OH, 44120-4834",457,F,F,0.0,F,39.3,2015-2016
43786,Cleveland Municipal City,828,2955 E 71st St,"Cleveland, OH, 44104-4101",361,F,F,0.0,F,33.6,2015-2016
43786,Cleveland Municipal City,930,1417 E 36th St Fl 2,"Cleveland, OH, 44114-4116",311,D,F,0.0,D,62.6,2015-2016
43786,Cleveland Municipal City,1040,4315 W 140th St,"Cleveland, OH, 44135-2128",514,F,F,6.3,D,50.8,2015-2016
43786,Cleveland Municipal City,2378,1905 Spring Rd,"Cleveland, OH, 44109-4460",652,F,F,0.0,D,53.6,2015-2016
43786,Cleveland Municipal City,3137,9803 Quebec Ave,"Cleveland, OH, 44106-3519",318,F,F,0.0,F,34.3,2015-2016
43786,Cleveland Municipal City,4234,1600 Buhrer Ave,"Cleveland, OH, 44109-1747",391,D,F,5.9,D,64.2,2015-2016


Similar issue to 2015-2016, however, 2016-2017 is missing 3 buildings. Based on the number of missing entries for 2017-2018, I think I'll just remove these schools and make note of the issue in the final report.

### Building Ratings 2017-2018

In [60]:
%%sql
SELECT *
FROM Building_Overview_1718
WHERE DistrictIRN = '043786'

 * sqlite:///CMSD_db.db
Done.


DistrictIRN,DistrictName,BuildingIRN,BuildingName,County,Region,Address,CityStateZip,Phone,Principal,Enrollment2017-2018,OverallGrade,AchievementComponentGrade,IndicatorsMetGrade,GiftedIndicatorMet/NotMetStatus,PercentofIndicatorsMet,PerformanceIndexGrade,PerformanceIndexPercent,GapClosingGrade,GapClosingPoints,ImprovingAt-RiskK3ReadersGrade,ImprovingAt-RiskK3ReadersPercent,ProgressComponentGrade,OverallSubgroupValue-AddedGrade,OverallSubgroupValue-AddedGainIndex,GiftedSubgroupValue-AddedGrade,GiftedSubgroupValue-AddedGainIndex,SWDSubgroupValue-AddedGrade,SWDSubgroupValue-AddedGainIndex,Lowest20%SubgroupValue-AddedGrade,Lowest20%SubgroupValue-AddedGainIndex,GradRateComponentGrade,LetterGradeof4YearGradRate2017,4YearGradRate2017,LetterGradeof5YearGradRate2016,5YearGradRate2016,PreparedforSuccessComponentGrade,PreparedforSuccessComponentPercent,AttendanceRate2017-2018,AttendanceRate2016-2017,AttendanceRate2015-2016,ChronicAbsenteeismPercent2017-2018,Watermark
43786,Cleveland Municipal,224,Adlai Stevenson School,Cuyahoga,Region 3,18300 Woda Avenue,"Cleveland, OH, 44122-6441",(216) 838-5300,Christopher T. Wyland,443,D,F,F,NR,0.0,F,40.1,D,62.5,C,37.5,D,F,-8.39,NR,NC,C,-1,F,-5.7,NR,NR,NC,NR,NC,NR,NC,93.5,94.5,94.0,17.6,
43786,Cleveland Municipal,318,Menlo Park Academy,Cuyahoga,Region 3,2149 W 53rd St,"Cleveland, OH, 44102-2263",(440) 925-6365,Stacy J. Stuhldreher,418,B,A,B,Not Met,85.0,A,90.7,A,100.0,NR,NC,F,F,-4.99,F,-4.97,NR,NC,NR,NC,NR,NR,NC,NR,NC,NR,NC,94.4,95.2,95.6,15.5,
43786,Cleveland Municipal,489,Almira,Cuyahoga,Region 3,3375 W 99th St,"Cleveland, OH, 44102-4642",(216) 838-6150,James Greene,547,D,F,F,Not Met,0.0,F,41.6,C,72.3,D,19.3,F,F,-7.71,NR,NC,F,-5.33,F,-5.03,NR,NR,NC,NR,NC,NR,NC,88.4,91.2,92.2,44.1,
43786,Cleveland Municipal,729,Andrew J Rickoff,Cuyahoga,Region 3,3500 E 147th St,"Cleveland, OH, 44120-4834",(216) 838-4150,SHELBY R. SCHUTT,441,D,F,F,NR,0.0,F,41.6,C,79.5,F,8.4,F,F,-5.56,NR,NC,F,-5.06,F,-4.82,NR,NR,NC,NR,NC,NR,NC,88.8,88.9,90.5,41.5,
43786,Cleveland Municipal,828,Anton Grdina,Cuyahoga,Region 3,2955 E 71st St,"Cleveland, OH, 44104-4101",(216) 838-1150,Harold S. Booker,396,F,F,F,NR,0.0,F,35.6,F,39.3,F,7.5,D,F,-7.57,NR,NC,D,-1.45,F,-5.4,NR,NR,NC,NR,NC,NR,NC,89.4,91.0,89.3,43.3,
43786,Cleveland Municipal,930,Cleveland Entrepreneurship Preparatory School,Cuyahoga,Region 3,1417 E 36th St Fl 2,"Cleveland, OH, 44114-4116",(216) 456-2080,,322,C,D,F,NR,9.1,D,63.9,D,62.5,NR,NC,B,A,5.31,NR,NC,C,-0.74,A,2.69,NR,NR,NC,NR,NC,NR,NC,93.1,94.7,94.7,21.2,
43786,Cleveland Municipal,936,Promise Academy,Cuyahoga,Region 3,1701 E 13th St,"Cleveland, OH, 44114-3227",(216) 443-0500,,209,F,F,F,NR,0.0,F,37.2,B,89.9,NR,NC,NR,NR,NC,NR,NC,NR,NC,NR,NC,F,F,19.3,F,13.2,F,0.3,33.9,NC,NC,100.0,
43786,Cleveland Municipal,1040,Artemus Ward,Cuyahoga,Region 3,4315 W 140th St,"Cleveland, OH, 44135-2128",(216) 838-6200,Chris P. Myslenski,513,C,F,F,NR,0.0,D,54.4,B,80.7,D,32.3,C,C,0.64,NR,NC,F,-2.66,B,1.37,NR,NR,NC,NR,NC,NR,NC,90.6,92.3,94.3,34.5,
43786,Cleveland Municipal,2378,Benjamin Franklin,Cuyahoga,Region 3,1905 Spring Rd,"Cleveland, OH, 44109-4460",(216) 838-3150,Rachel J. Snider,615,D,D,F,Not Met,0.0,D,58.2,B,81.7,C,35.1,D,F,-8.05,C,-0.67,C,0.39,F,-2.42,NR,NR,NC,NR,NC,NR,NC,91.6,92.6,93.4,27.2,
43786,Cleveland Municipal,3137,Bolton,Cuyahoga,Region 3,9803 Quebec Ave,"Cleveland, OH, 44106-3519",(216) 838-1200,Juliet M. King,330,F,F,F,NR,0.0,F,31.9,F,0.0,F,7.4,F,F,-8.53,NR,NC,F,-5.09,F,-6.22,NR,NR,NC,NR,NC,NR,NC,91.6,91.2,92.7,30.6,


In [149]:
%%sql
CREATE TEMPORARY TABLE Temp_Building_Ratings_1718 AS
SELECT "DistrictIRN", "DistrictName", "BuildingIRN", "Address", "CityStateZip", "Enrollment2017-2018" AS "Enrollment", "AchievementComponentGrade" AS "LetterGradeofAchievementComponent", "IndicatorsMetGrade" AS "LetterGradeofPercentStandards", "PercentofIndicatorsMet" AS "PercentofStandardsMet", "PerformanceIndexGrade" AS "LetterGradeofPerformanceIndex", "PerformanceIndexPercent", '2017-2018' AS SchoolYear
FROM Building_Overview_1718
WHERE DistrictIRN = '043786'
/* I might drop the IndicatorsMetGrade column. While this appears to be the same as the LetterGradeofPercentStandards from the previous two years, I can't absolutely
confirm this is the case, and this column isn't of necessary importance. However, to give myself some
flexibility, I'll keep it for now, and then drop it from the unioned table later on.

 * sqlite:///CMSD_db.db
Done.


[]

In [150]:
%%sql
PRAGMA table_info(Temp_Building_Ratings_1718)

 * sqlite:///CMSD_db.db
Done.


cid,name,type,notnull,dflt_value,pk
0,DistrictIRN,INT,0,,0
1,DistrictName,TEXT,0,,0
2,BuildingIRN,INT,0,,0
3,Address,TEXT,0,,0
4,CityStateZip,TEXT,0,,0
5,Enrollment,INT,0,,0
6,LetterGradeofAchievementComponent,TEXT,0,,0
7,LetterGradeofPercentStandards,TEXT,0,,0
8,PercentofStandardsMet,REAL,0,,0
9,LetterGradeofPerformanceIndex,TEXT,0,,0


In [151]:
%%sql
SELECT COUNT (DISTINCT BuildingIRN) AS building, COUNT (DISTINCT DistrictIRN) AS District, COUNT (DISTINCT DistrictName) AS Name
FROM Temp_Building_Ratings_1718

 * sqlite:///CMSD_db.db
Done.


building,District,Name
123,1,1


After checking the data types with my PRAGMA query earlier, I can see that PercentofStandardsMet is cast as REAL and PeformanceIndexPercent is cast as TEXT - neither of which are consistent with my other temp tables, so I will recast them to the correct data types now. 

Following this step, the PRAGMA results will show PercentofStandardsMet as TEXT (although it could be a REAL...), and PerformanceIndexPercent as REAL. Once I have finished forming the Union Table, then I can recast them if necessary in the final output file.

In [152]:
%%sql
SELECT CAST(PercentofStandardsMet AS TEXT) AS PercentofStandardsMet, CAST(PerformanceIndexPercent AS REAL) AS PerformanceIndexPercent
FROM Temp_Building_Ratings_1718

 * sqlite:///CMSD_db.db
Done.


PercentofStandardsMet,PerformanceIndexPercent
0.0,40.1
85.0,90.7
0.0,41.6
0.0,41.6
0.0,35.6
9.1,63.9
0.0,37.2
0.0,54.4
0.0,58.2
0.0,31.9


In [153]:
%%sql
PRAGMA table_info(Temp_Building_Ratings_1718)

 * sqlite:///CMSD_db.db
Done.


cid,name,type,notnull,dflt_value,pk
0,DistrictIRN,INT,0,,0
1,DistrictName,TEXT,0,,0
2,BuildingIRN,INT,0,,0
3,Address,TEXT,0,,0
4,CityStateZip,TEXT,0,,0
5,Enrollment,INT,0,,0
6,LetterGradeofAchievementComponent,TEXT,0,,0
7,LetterGradeofPercentStandards,TEXT,0,,0
8,PercentofStandardsMet,REAL,0,,0
9,LetterGradeofPerformanceIndex,TEXT,0,,0


### Combining Building Ratings into one table

In [175]:
%%sql
DROP TABLE Combined_Building_Ratings

 * sqlite:///CMSD_db.db
Done.


[]

In [176]:
%%sql
CREATE TEMPORARY TABLE Combined_Building_Ratings AS
SELECT * FROM Temp_Building_Ratings_1516
UNION ALL
SELECT * FROM Temp_Building_Ratings_1617
UNION ALL
SELECT * FROM Temp_Building_Ratings_1718;

 * sqlite:///CMSD_db.db
Done.


[]

In [177]:
%%sql
PRAGMA table_info(Combined_Building_Ratings)

 * sqlite:///CMSD_db.db
Done.


cid,name,type,notnull,dflt_value,pk
0,DistrictIRN,INT,0,,0
1,DistrictName,TEXT,0,,0
2,BuildingIRN,INT,0,,0
3,Address,TEXT,0,,0
4,CityStateZip,TEXT,0,,0
5,Enrollment,INT,0,,0
6,LetterGradeofAchievementComponent,TEXT,0,,0
7,LetterGradeofPercentStandards,TEXT,0,,0
8,PercentofStandardsMet,,0,,0
9,LetterGradeofPerformanceIndex,TEXT,0,,0


Due to SQLite's type flexibilty, I have two columns that are not cast correctly. So, I'm going to recast them by creating a new table, forming the columns, and then repopulating the data.

In [178]:
%%sql
CREATE TABLE Combined_Building_Ratings_cast (
    DistrictIRN INT,
    DistrictName TEXT,
    BuildingIRN INT,
    Address TEXT,
    CityStateZip TEXT,
    Enrollment INT,
    LetterGradeofAchievementComponent TEXT,
    LetterGradeofPercentStandards TEXT,
    PercentofStandardsMet TEXT,
    LetterGradeofPerformanceIndex TEXT,
    PerformanceIndexPercent REAL,
    SchoolYear TEXT
);

 * sqlite:///CMSD_db.db
Done.


[]

In [179]:
%%sql
INSERT INTO Combined_Building_Ratings_cast
SELECT CAST(DistrictIRN AS INT)
, CAST(DistrictName AS TEXT)
, CAST(BuildingIRN AS INT)
, CAST(Address AS TEXT)
, CAST(CityStateZip AS TEXT)
, CAST(Enrollment AS INT)
, CAST(LetterGradeofAchievementComponent AS TEXT)
, CAST(LetterGradeofPercentStandards AS TEXT)
, CAST(PercentofStandardsMet AS TEXT)
, CAST(LetterGradeofPerformanceIndex AS TEXT)
, CAST(PerformanceIndexPercent AS REAL)
, CAST(SchoolYear AS TEXT)
FROM Combined_Building_Ratings;

 * sqlite:///CMSD_db.db
356 rows affected.


[]

In [180]:
%%sql
DROP TABLE Combined_Building_Ratings;

 * sqlite:///CMSD_db.db
Done.


[]

In [181]:
%%sql
ALTER TABLE Combined_Building_Ratings_cast RENAME TO Combined_Building_Ratings;

 * sqlite:///CMSD_db.db
Done.


[]

In [182]:
%%sql
PRAGMA table_info(Combined_Building_Ratings)

 * sqlite:///CMSD_db.db
Done.


cid,name,type,notnull,dflt_value,pk
0,DistrictIRN,INT,0,,0
1,DistrictName,TEXT,0,,0
2,BuildingIRN,INT,0,,0
3,Address,TEXT,0,,0
4,CityStateZip,TEXT,0,,0
5,Enrollment,INT,0,,0
6,LetterGradeofAchievementComponent,TEXT,0,,0
7,LetterGradeofPercentStandards,TEXT,0,,0
8,PercentofStandardsMet,TEXT,0,,0
9,LetterGradeofPerformanceIndex,TEXT,0,,0


In [162]:
%%sql
SELECT COUNT (DISTINCT BuildingIRN) AS building, COUNT (DISTINCT DistrictIRN) AS District, COUNT (DISTINCT DistrictName) AS Name
FROM Combined_Building_Ratings

 * sqlite:///CMSD_db.db
Done.


building,District,Name
128,1,2


In [183]:
%%sql
SELECT DISTINCT DistrictName, COUNT(*) AS Count
FROM Combined_Building_Ratings
GROUP BY DistrictName;

 * sqlite:///CMSD_db.db
Done.


DistrictName,Count
Cleveland Municipal,123
Cleveland Municipal City,233


In [184]:
%%sql
SELECT DISTINCT DistrictName, COUNT(*) AS Count
FROM Temp_Building_Ratings_1516
GROUP BY DistrictName;

 * sqlite:///CMSD_db.db
Done.


DistrictName,Count
Cleveland Municipal City,117


In [185]:
%%sql
SELECT DISTINCT DistrictName, COUNT(*) AS Count
FROM Temp_Building_Ratings_1617
GROUP BY DistrictName;

 * sqlite:///CMSD_db.db
Done.


DistrictName,Count
Cleveland Municipal City,116


In [186]:
%%sql
SELECT DISTINCT DistrictName, COUNT(*) AS Count
FROM Temp_Building_Ratings_1718
GROUP BY DistrictName;

 * sqlite:///CMSD_db.db
Done.


DistrictName,Count
Cleveland Municipal,123


In [187]:
%%sql
UPDATE Combined_Building_Ratings
SET DistrictName = 'Cleveland Municipal City'
WHERE DistrictName = 'Cleveland Municipal';

 * sqlite:///CMSD_db.db
123 rows affected.


[]

In [188]:
%%sql
SELECT *
FROM Combined_Building_Ratings

 * sqlite:///CMSD_db.db
Done.


DistrictIRN,DistrictName,BuildingIRN,Address,CityStateZip,Enrollment,LetterGradeofAchievementComponent,LetterGradeofPercentStandards,PercentofStandardsMet,LetterGradeofPerformanceIndex,PerformanceIndexPercent,SchoolYear
43786,Cleveland Municipal City,224,18300 Woda Avenue,"Cleveland, OH, 44122-6441",430,F,F,0.0,F,36.6,2015-2016
43786,Cleveland Municipal City,318,14440 Triskett Rd,"Cleveland, OH, 44111-2263",367,A,A,95.0,A,92.3,2015-2016
43786,Cleveland Municipal City,489,3375 W 99th St,"Cleveland, OH, 44102-4642",499,F,F,0.0,F,38.7,2015-2016
43786,Cleveland Municipal City,729,3500 E 147th St,"Cleveland, OH, 44120-4834",477,F,F,0.0,F,36.9,2015-2016
43786,Cleveland Municipal City,828,2955 E 71st St,"Cleveland, OH, 44104-4101",371,F,F,0.0,F,32.4,2015-2016
43786,Cleveland Municipal City,930,1417 E 36th St Fl 2,"Cleveland, OH, 44114-4116",295,D,F,0.0,D,61.3,2015-2016
43786,Cleveland Municipal City,1040,4315 W 140th St,"Cleveland, OH, 44135-2128",491,F,F,0.0,F,47.3,2015-2016
43786,Cleveland Municipal City,2378,1905 Spring Rd,"Cleveland, OH, 44109-4460",602,F,F,0.0,D,54.1,2015-2016
43786,Cleveland Municipal City,3137,9803 Quebec Ave,"Cleveland, OH, 44106-3519",346,F,F,0.0,F,30.5,2015-2016
43786,Cleveland Municipal City,4234,1600 Buhrer Ave,"Cleveland, OH, 44109-1747",393,D,F,11.8,D,59.8,2015-2016


All done with the Building Ratings! All of my columns look good, the data types are correct, and the table looks good. The only potential issue is the difference in records between this table and my Achievement Table, but we'll cross that bridge after the next section.

# Overall Value Added Grade

Judging by the columns in Overall Grade, I am looking at only taking a few columns from this data. We'll focus on the required columns, and then also take Overall Composite (although I must say, I don't really know what this is right now - but it's numeric and fluctuates, so maybe it's useful). Aside from that, the gifted, composite, and percentile values are likely outside the scope of my analysis.

In [93]:
%%sql
SELECT *
FROM VA_org_1516
WHERE DistrictIRN = '043786'

 * sqlite:///CMSD_db.db
Done.


DistrictIRN,DistrictName,BuildingIRN,BuildingName,County,Region,OverallValueAddedGrade,OverallComposite,GiftedValueAddedGrade,GiftedComposite,StudentswithDisabilitiesValueAddedGrade,StudentswithDisabilitiescomposite,Lowest20%ValueAddedGrade,Lowest20%ValueAddedComposite,HighMobilityValueAddedGrade,HighMobilityComposite,Watermark
43786,Cleveland Municipal City,224,Adlai Stevenson School,Cuyahoga,Region 3,F,-6.56,NR,NC,F,-3.48,F,-4.40,F,-6.09,
43786,Cleveland Municipal City,318,Menlo Park Academy,Cuyahoga,Region 3,C,-0.54,C,-0.49,NR,NC,NR,NC,NR,NC,
43786,Cleveland Municipal City,489,Almira,Cuyahoga,Region 3,F,-6.16,NR,NC,F,-5.10,F,-4.75,NR,NC,
43786,Cleveland Municipal City,729,Andrew J Rickoff,Cuyahoga,Region 3,F,-6.4,NR,NC,F,-5.68,F,-4.65,NR,NC,
43786,Cleveland Municipal City,828,Anton Grdina,Cuyahoga,Region 3,F,-7.53,NR,NC,D,-1.53,F,-5.54,NR,NC,
43786,Cleveland Municipal City,930,Cleveland Entrepreneurship Preparatory School,Cuyahoga,Region 3,A,5.38,NR,NC,A,NC,A,5.06,NR,NC,
43786,Cleveland Municipal City,1040,Artemus Ward,Cuyahoga,Region 3,F,-4.72,NR,NC,F,-3.22,F,-2.16,NR,NC,
43786,Cleveland Municipal City,2378,Benjamin Franklin,Cuyahoga,Region 3,F,-12.2,D,-1.77,D,-1.53,F,-5.27,NR,NC,
43786,Cleveland Municipal City,3137,Bolton,Cuyahoga,Region 3,F,-7.15,NR,NC,F,NC,F,-5.56,F,-6.65,
43786,Cleveland Municipal City,4234,Buhrer,Cuyahoga,Region 3,F,-3.43,C,0.13,D,-1.86,F,-2.60,NR,NC,


In [94]:
%%sql
PRAGMA table_info(VA_org_1516)

 * sqlite:///CMSD_db.db
Done.


cid,name,type,notnull,dflt_value,pk
0,DistrictIRN,INTEGER,0,,0
1,DistrictName,TEXT,0,,0
2,BuildingIRN,INTEGER,0,,0
3,BuildingName,TEXT,0,,0
4,County,TEXT,0,,0
5,Region,TEXT,0,,0
6,OverallValueAddedGrade,TEXT,0,,0
7,OverallComposite,REAL,0,,0
8,GiftedValueAddedGrade,TEXT,0,,0
9,GiftedComposite,TEXT,0,,0


In [103]:
%%sql
CREATE TEMPORARY TABLE Temp_VA_1516 AS
SELECT "DistrictIRN", "DistrictName", "BuildingIRN", "OverallValueAddedGrade", "OverallComposite", '2015-2016' AS SchoolYear
FROM VA_org_1516
WHERE DistrictIRN = '043786'

 * sqlite:///CMSD_db.db
Done.


[]

In [104]:
%%sql
PRAGMA table_info(Temp_VA_1516)

 * sqlite:///CMSD_db.db
Done.


cid,name,type,notnull,dflt_value,pk
0,DistrictIRN,INT,0,,0
1,DistrictName,TEXT,0,,0
2,BuildingIRN,INT,0,,0
3,OverallValueAddedGrade,TEXT,0,,0
4,OverallComposite,REAL,0,,0
5,SchoolYear,,0,,0


In [112]:
%%sql
CREATE TEMPORARY TABLE Temp_VA_1617 AS
SELECT "DistrictIRN", "DistrictName", "BuildingIRN", "OverallValueAddedGrade", "OverallComposite", '2016-2017' AS SchoolYear
FROM VA_org_1617
WHERE DistrictIRN = '043786'

 * sqlite:///CMSD_db.db
Done.


[]

In [113]:
%%sql
PRAGMA table_info(Temp_VA_1617)

 * sqlite:///CMSD_db.db
Done.


cid,name,type,notnull,dflt_value,pk
0,DistrictIRN,INT,0,,0
1,DistrictName,TEXT,0,,0
2,BuildingIRN,INT,0,,0
3,OverallValueAddedGrade,TEXT,0,,0
4,OverallComposite,REAL,0,,0
5,SchoolYear,,0,,0


In [115]:
%%sql
CREATE TEMPORARY TABLE Temp_VA_1718 AS
SELECT "DistrictIRN", "DistrictName", "BuildingIRN", "OverallValueAddedGrade", "OverallComposite", '2017-2018' AS SchoolYear
FROM VA_ORG_1718
WHERE DistrictIRN = '043786'

 * sqlite:///CMSD_db.db
Done.


[]

In [116]:
%%sql
PRAGMA table_info(Temp_VA_1718)

 * sqlite:///CMSD_db.db
Done.


cid,name,type,notnull,dflt_value,pk
0,DistrictIRN,INT,0,,0
1,DistrictName,TEXT,0,,0
2,BuildingIRN,INT,0,,0
3,OverallValueAddedGrade,TEXT,0,,0
4,OverallComposite,REAL,0,,0
5,SchoolYear,,0,,0


In [117]:
%%sql
CREATE TEMPORARY TABLE Combined_VA AS
SELECT * FROM Temp_VA_1516
UNION ALL
SELECT * FROM Temp_VA_1617
UNION ALL
SELECT * FROM Temp_VA_1718;

 * sqlite:///CMSD_db.db
Done.


[]

In [118]:
%%sql
PRAGMA table_info(Combined_VA)

 * sqlite:///CMSD_db.db
Done.


cid,name,type,notnull,dflt_value,pk
0,DistrictIRN,INT,0,,0
1,DistrictName,TEXT,0,,0
2,BuildingIRN,INT,0,,0
3,OverallValueAddedGrade,TEXT,0,,0
4,OverallComposite,REAL,0,,0
5,SchoolYear,,0,,0


# Final Output File
At this point, I have 3 unioned tables which contain all of the data from each respective category - Achievement Building, Building Rating, and Value Added. The only missing piece currently is that SchoolYear is not defined as TEXT in Combined_Achievement or Combined_VA - however, I can address this once the tables are joined.

In [190]:
%%sql
CREATE TABLE Final_Output_Table AS
SELECT 
    CA.BuildingIRN, 
    CA.BuildingName, 
    CA.DistrictIRN, 
    CA.DistrictName, 
    CA.PerformanceIndexScore, 
    CA.PerformanceIndexPercent,
    CA.LetterGradeofPerformanceIndex,
    CA.PercentofStudentsBelow,
    CA.PercentofStudentsBasic,
    CA.PercentofStudentsProficient,
    CA.PercentofStudentsAccelerated,
    CA.PercentofStudentsAdvanced,
    CA.PercentofStudentsAdvancedPlus,
    CBR.Address,
    CBR.CityStateZip,
    CBR.Enrollment,
    CBR.LetterGradeofAchievementComponent, 
    CVA.OverallValueAddedGrade, 
    CVA.OverallComposite,
    CA.SchoolYear
FROM 
    Combined_Achievement CA
INNER JOIN 
    Combined_Building_Ratings CBR 
    ON CA.BuildingIRN = CBR.BuildingIRN AND CA.SchoolYear = CBR.SchoolYear
INNER JOIN 
    Combined_VA CVA 
    ON CA.BuildingIRN = CVA.BuildingIRN AND CA.SchoolYear = CVA.SchoolYear


 * sqlite:///CMSD_db.db
Done.


[]

In [189]:
%%sql
DROP TABLE Final_Output_Table

 * sqlite:///CMSD_db.db
Done.


[]

In [191]:
%%sql
PRAGMA table_info(Final_Output_Table)

 * sqlite:///CMSD_db.db
Done.


cid,name,type,notnull,dflt_value,pk
0,BuildingIRN,INT,0,,0
1,BuildingName,TEXT,0,,0
2,DistrictIRN,INT,0,,0
3,DistrictName,TEXT,0,,0
4,PerformanceIndexScore,TEXT,0,,0
5,PerformanceIndexPercent,TEXT,0,,0
6,LetterGradeofPerformanceIndex,TEXT,0,,0
7,PercentofStudentsBelow,TEXT,0,,0
8,PercentofStudentsBasic,TEXT,0,,0
9,PercentofStudentsProficient,TEXT,0,,0


In [192]:
%%sql
SELECT *
FROM Final_Output_Table

 * sqlite:///CMSD_db.db
Done.


BuildingIRN,BuildingName,DistrictIRN,DistrictName,PerformanceIndexScore,PerformanceIndexPercent,LetterGradeofPerformanceIndex,PercentofStudentsBelow,PercentofStudentsBasic,PercentofStudentsProficient,PercentofStudentsAccelerated,PercentofStudentsAdvanced,PercentofStudentsAdvancedPlus,Address,CityStateZip,Enrollment,LetterGradeofAchievementComponent,OverallValueAddedGrade,OverallComposite,SchoolYear
224,Adlai Stevenson School,43786,Cleveland Municipal,43.903,36.6,F,69.9,19,7.6,2.5,1,0,18300 Woda Avenue,"Cleveland, OH, 44122-6441",430,F,F,-6.56,2015-2016
489,Almira,43786,Cleveland Municipal,46.384,38.7,F,66.2,19.2,10.9,3.2,0.5,0,3375 W 99th St,"Cleveland, OH, 44102-4642",499,F,F,-6.16,2015-2016
729,Andrew J Rickoff,43786,Cleveland Municipal,44.327,36.9,F,68.5,19.4,8.7,2.4,0.7,0,3500 E 147th St,"Cleveland, OH, 44120-4834",477,F,F,-6.4,2015-2016
828,Anton Grdina,43786,Cleveland Municipal,38.904,32.4,F,79.8,13.4,5.7,0.9,0.2,0,2955 E 71st St,"Cleveland, OH, 44104-4101",371,F,F,-7.53,2015-2016
1040,Artemus Ward,43786,Cleveland Municipal,56.816,47.3,F,50.8,21.7,18.8,6.8,1.9,0,4315 W 140th St,"Cleveland, OH, 44135-2128",491,F,F,-4.72,2015-2016
2378,Benjamin Franklin,43786,Cleveland Municipal,64.892,54.1,D,40.6,22.5,18.6,12,5.9,0.2,1905 Spring Rd,"Cleveland, OH, 44109-4460",602,F,F,-12.2,2015-2016
3137,Bolton,43786,Cleveland Municipal,36.583,30.5,F,82.5,13.9,2.2,0.6,0.6,0,9803 Quebec Ave,"Cleveland, OH, 44106-3519",346,F,F,-7.15,2015-2016
4234,Buhrer,43786,Cleveland Municipal,71.772,59.8,D,29.4,25,27.3,12.5,5.7,0,1600 Buhrer Ave,"Cleveland, OH, 44109-1747",393,D,F,-3.43,2015-2016
5066,Case,43786,Cleveland Municipal,38.715,32.3,F,80.3,12.4,6,1.1,0,0,4050 Superior Ave,"Cleveland, OH, 44103-1128",361,F,F,-8.75,2015-2016
5637,Alfred Benesch,43786,Cleveland Municipal,39.222,32.7,F,78.4,15.4,4.5,1,0.8,0,5393 Quincy Ave,"Cleveland, OH, 44104-4409",376,F,F,-2.57,2015-2016


### Checking the final output table and data validation

In [193]:
%%sql
SELECT COUNT(*) AS TotalRows FROM Final_Output_Table;

 * sqlite:///CMSD_db.db
Done.


TotalRows
340


In [194]:
%%sql
SELECT COUNT(DISTINCT DistrictName) AS UniqueDistrictNames, COUNT(DISTINCT SchoolYear) AS UniqueSchoolYears FROM Final_Output_Table;

 * sqlite:///CMSD_db.db
Done.


UniqueDistrictNames,UniqueSchoolYears
1,3


In [129]:
%%sql
SELECT COUNT(*) AS NullCount FROM Final_Output_Table WHERE BuildingIRN IS NULL;

 * sqlite:///CMSD_db.db
Done.


NullCount
0


In [130]:
%%sql
SELECT BuildingIRN, BuildingName, COUNT(*) AS NumOccurrences 
FROM Final_Output_Table 
GROUP BY BuildingIRN, BuildingName 
HAVING COUNT(*) > 3; -- or change to "= 3" to check if there are exactly 3 occurrences

 * sqlite:///CMSD_db.db
Done.


BuildingIRN,BuildingName,NumOccurrences


In [195]:
%%sql
SELECT *
FROM Final_Output_Table 


 * sqlite:///CMSD_db.db
Done.


BuildingIRN,BuildingName,DistrictIRN,DistrictName,PerformanceIndexScore,PerformanceIndexPercent,LetterGradeofPerformanceIndex,PercentofStudentsBelow,PercentofStudentsBasic,PercentofStudentsProficient,PercentofStudentsAccelerated,PercentofStudentsAdvanced,PercentofStudentsAdvancedPlus,Address,CityStateZip,Enrollment,LetterGradeofAchievementComponent,OverallValueAddedGrade,OverallComposite,SchoolYear
224,Adlai Stevenson School,43786,Cleveland Municipal,43.903,36.6,F,69.9,19,7.6,2.5,1,0,18300 Woda Avenue,"Cleveland, OH, 44122-6441",430,F,F,-6.56,2015-2016
489,Almira,43786,Cleveland Municipal,46.384,38.7,F,66.2,19.2,10.9,3.2,0.5,0,3375 W 99th St,"Cleveland, OH, 44102-4642",499,F,F,-6.16,2015-2016
729,Andrew J Rickoff,43786,Cleveland Municipal,44.327,36.9,F,68.5,19.4,8.7,2.4,0.7,0,3500 E 147th St,"Cleveland, OH, 44120-4834",477,F,F,-6.4,2015-2016
828,Anton Grdina,43786,Cleveland Municipal,38.904,32.4,F,79.8,13.4,5.7,0.9,0.2,0,2955 E 71st St,"Cleveland, OH, 44104-4101",371,F,F,-7.53,2015-2016
1040,Artemus Ward,43786,Cleveland Municipal,56.816,47.3,F,50.8,21.7,18.8,6.8,1.9,0,4315 W 140th St,"Cleveland, OH, 44135-2128",491,F,F,-4.72,2015-2016
2378,Benjamin Franklin,43786,Cleveland Municipal,64.892,54.1,D,40.6,22.5,18.6,12,5.9,0.2,1905 Spring Rd,"Cleveland, OH, 44109-4460",602,F,F,-12.2,2015-2016
3137,Bolton,43786,Cleveland Municipal,36.583,30.5,F,82.5,13.9,2.2,0.6,0.6,0,9803 Quebec Ave,"Cleveland, OH, 44106-3519",346,F,F,-7.15,2015-2016
4234,Buhrer,43786,Cleveland Municipal,71.772,59.8,D,29.4,25,27.3,12.5,5.7,0,1600 Buhrer Ave,"Cleveland, OH, 44109-1747",393,D,F,-3.43,2015-2016
5066,Case,43786,Cleveland Municipal,38.715,32.3,F,80.3,12.4,6,1.1,0,0,4050 Superior Ave,"Cleveland, OH, 44103-1128",361,F,F,-8.75,2015-2016
5637,Alfred Benesch,43786,Cleveland Municipal,39.222,32.7,F,78.4,15.4,4.5,1,0.8,0,5393 Quincy Ave,"Cleveland, OH, 44104-4409",376,F,F,-2.57,2015-2016


In order to save the final output table, the following command needs to be executed in terminal (or command prompt, although I did this project using Mac).

In [None]:
sqlite3 -header -csv 'CMSD_db.db' "SELECT * FROM Final_Output_Table;" > final_output.csv