# Data Schema Notebook

This notebook has been created for you to use as a reference so as to easily see the schema and starting point of each dataset we are importing into our notebook in Stage 2.

In [3]:
#Setup 
import pandas as pd

## CIP Course Data

This dataframe contains a list of the CIP courses taught at middle schools and high schools in the state of Washington. Please note that this list is partially complete as data was not available for all Washington schools in the year 2017.

*Note: That you will need to edit the file name to match the year of the file you are aiming to process a file for. So for example you would change the file name from 'cip_course_statistics_2017.csv' to 'cip_course_statistics_2018.csv'*

**Here is a quick explanation on what each column is.**

| **COLUMN NAME** | **COLUMN DESCRIPTION** |
| ----------- | ----------- |
| **DistrictCode:** | Code of the School District in which the school is (e.g. 2420) |
| **DistrictName:** | Name of the School District in which the school is (e.g. Asotin-Anatone School District) |
| **SchoolCode:** |Code of the School (e.g. 2434) |
| **SchoolName:**v| Name of the School (e.g. Asotin Jr Sr High) |
| **term:** | Which semester was the course in (e.g. SEM1) |
| **cipcode:** | National Course Code under which this class falls |
|  **courseTitle:** | Title of the course (e.g. AP Computer Science Principles) |
| **letterGrade:** | LetterGrade (e.g. A, A-, B, etc.) |
| **count:** | total students who received that letterGrade in the course |
| **cs_course:** | whether the course is a computer science course or not |

In [4]:
#Importing and Saving Student Results for CIP Courses
#Note: this is where you will want to change the file name for the new CIP Student Results Dataset
cip_courses  = pd.read_csv("data/labelled_data/CIP_Data/cip_course_statistics_2017.csv")
cip_courses.head()

Unnamed: 0,DistrictCode,DistrictName,SchoolCode,SchoolName,term,cipcode,courseTitle,letterGrade,count,cs_course
0,2420,Asotin-Anatone School District,2434,Asotin Jr Sr High,SEM1,110201,AP Computer Science Principles,A,3,yes
1,2420,Asotin-Anatone School District,2434,Asotin Jr Sr High,SEM1,110201,CSS ENGINEERING,A-,1,yes
2,2420,Asotin-Anatone School District,2434,Asotin Jr Sr High,SEM1,110201,CSS ENGINEERING,A,3,yes
3,2420,Asotin-Anatone School District,2434,Asotin Jr Sr High,SEM1,110201,CSS ENGINEERING,D,1,yes
4,2420,Asotin-Anatone School District,2434,Asotin Jr Sr High,SEM2,110201,AP Computer Science Principles,A,3,yes


## State Course Code Data
This dataframe is a list of state course code courses taught at schools in the state of Washington. Please note that this list is partially complete as data is not available for all Washington state schools as of 2017. 

*Note: That you will need to edit the file name to match the year of the file you are aiming to process a file for. So for example you would change the file name from 'state_course_code_statistics_2017.csv' to 'state_course_code_statistics_2018.csv'*

**Here is a quick explanation on what each column is.**

| **COLUMN NAME** | **COLUMN DESCRIPTION** |
| ----------- | ----------- |
| **DistrictCode:** | Code of the School District in which the school is (e.g. 2420) |
| **DistrictName:** | Name of the School District in which the school is (e.g. Asotin-Anatone School District) |
| **SchoolCode:** |Code of the School (e.g. 2434) |
| **SchoolName:**v| Name of the School (e.g. Asotin Jr Sr High) |
| **term:** | Which semester was the course in (e.g. SEM1) |
| **stateCourseCodeId:** | State Course Code under which this class falls |
|  **courseTitle:** | Title of the course (e.g. AP Computer Science Principles) |
| **letterGrade:** | LetterGrade (e.g. A, A-, B, etc.) |
| **count:** | total students who received that letterGrade in the course |
| **cs_course:** | whether the course is a computer science course or not |

In [5]:
scc_courses = pd.read_csv("data/labelled_data/State_Course_Code_Data/state_course_code_statistics_2017.csv")
scc_courses.head()

Unnamed: 0,DistrictCode,DistrictName,SchoolCode,SchoolName,term,stateCourseCodeId,courseTitle,letterGrade,count,cs_course
0,17407,Riverview School District,3524,Cedarcrest High School,SEM1,837,PRG/GAMES/SIM A,A-,2,no
1,17407,Riverview School District,3524,Cedarcrest High School,SEM1,837,PRG/GAMES/SIM A,A,6,no
2,17407,Riverview School District,3524,Cedarcrest High School,SEM1,837,PRG/GAMES/SIM A,B,1,no
3,17407,Riverview School District,3524,Cedarcrest High School,SEM1,837,PRG/GAMES/SIM A,B+,1,no
4,17407,Riverview School District,3524,Cedarcrest High School,SEM1,837,PRG/GAMES/SIM A,C,1,no


## List of All High Schools in Washington State

This file was sourced from the [OSPI School Directory website](https://eds.ospi.k12.wa.us/directoryeds.aspx). 

**Note:** This file modified in the following ways: 

    - Only the most relevant columns were kept for our visualization dataset were kept. Those were: 'LEACode', 'LEAName', 'SchoolCode', 'SchoolName', 'LowestGrade','HighestGrade', 'PrincipalName', 'Email','Phone', 'OrgCategoryList','GradeCategory', and 'City'. 

    - We only kept the schools which have their highest grade as 9, 10, 11 or 12. Some of these schools are alternate schools, jails, detention centres and learning centres. Do not be surprised by this. We have retained these schools so as to be inclusive. 

    - It was missing the names of the last three schools listed in this csv file. I added these manually by doing a google search for information. Please beware that the quality of the data from OSPI is not 100% complete always. 

Below I have listed out the description of each column name for easy understanding.

| **Column Name** | **Column Description** | 
| ----------- | ----------- |
| LEACode | Local Education Agency	Code e.g. 3346|
| LEAName | Local Education Agency	Name e.g. Colfax School District|
| SchoolCode | School Code of the School in Washington State e.g. 3366|
| SchoolName | Name of the School e.g. Colfax High School|
| LowestGrade | Lowest Grade in the School e.g. 7 |
| HighestGrade | Highest Grade in the School e.g. 12|
| Principal Name | Name of the School Principal e.g. David Gibb |
| Email | Email of the Principal e.g. david.gibb@csd300.com	|
| Phone |Phone Number of the Principal e.g. 509.830.2347	|
|OrgCategoryList |Type of Category the School falls under e.g. Public School, Regular School	| 
|Grade Category | Type of School e.g. High School, K-12, etc.) |
| City | City Name e.g. Colfax |

In [6]:
high_schools = pd.read_csv("data/labelled_data/School_Data/High_Schools_WA_Information.csv")
high_schools.head()

Unnamed: 0,LEACode,LEAName,SchoolCode,SchoolName,LowestGrade,HighestGrade,PrincipalName,Email,Phone,OrgCategoryList,GradeCategory,City
0,38300,Colfax School District,3366,Colfax High School,7,12,David Gibb,david.gibb@csd300.com,509.830.2347,"Public School, Regular School",High School,Colfax
1,38301,Palouse School District,2634,Palouse High School,9,12,Mike Jones,mjones@garpal.net,509.878.1921,"Public School, Regular School",High School,Palouse
2,38306,Colton School District,2588,Colton School,PK,12,Tim Casey,tcasey@colton.k12.wa.us,509.229.3386,"Public School, Regular School",PK-12,Colton
3,38320,Rosalia School District,3204,Rosalia Elementary & Secondary School,PK,12,Matthew McLain,mmclain@rosaliaschools.org,509.523.3061,"Public School, Regular School",PK-12,Rosalia
4,38322,St. John School District,3068,St John/Endicott High,9,12,Mark Purvine,mpurvine@stjohn.wednet.edu,509.648.3336,"Public School, Regular School",High School,Saint John


## GeoData and School Metadata File

**For us to map the schools, we need geo-spatial data such as latitude and longitude. We have sourced this geodata for K-12 schools from [Washington State Data Gov Website](https://geo.wa.gov/datasets/OSPI::k-12-schools).** 

**While this website claims to have information on all Washington State K-12 schools, this is not true. It was missing 8 schools for the 2017 dataset. These schools were:**

    1. Sage Hills High School
    2. Marysville Mountain View High School
    3. Nooksack Reengagement
    4. Selah Academy BPL
    5. College Place Open Doors Program
    6. Tonasket Choice High School
    7. Tonasket Outreach School
    8. Moses Lake Big Picture

**I manually added the information about these schools to the dataset used below. You can see the new additions by comparing the `WA_K12_Schools_Geo_Data_With_Manual_Additions.csv` file and the `WA_K12_Schools_Geo_Data.csv` file. For these additions I referred to the [Washington School directory](https://eds.ospi.k12.wa.us/directoryeds.aspx) and if I did not find information there then I looked at the OSPI website and Google. For the co-ordinates, I sourced them from Google Maps.**

**Below is a description of each column.**

| **Column Name** | **Column Description** | 
| --- | --- | 
| X | Longitude of the School Location |
| Y  | Latitude of the School Location  | 
| FID | Unique ID of the School in this dataset |
| School Code | Washing State School Code of this School |
| Latitude | Latitude of School | 
| Longitude | Longitude of School |
| ESDCode | Education Service District Code of the School | 
| ESDName | Education Service District Code of the School | 
| LEACode | Local Education Agency Code for the School | 
| SchoolName | Name of the School | 
| LowestGrad | Lowest Grade of the School |
| HighestGra | Highest Grade of the School |
| AddressLin | Address Line 1 of the School | 
| AddressL_1 | Address Line 2 of the School (optional)| 
| City | City of the School |
| State | State of the School | 
| ZipCode | ZipCode of the School | 
| PricipalN | Principal Name of the School |
| Email | Email ID of the Principal | 
| Phone | Phone Number of the School | 
| OrgCategor | Type of School by Organization Type e.g Public School, Re-Engagement School	 | 
| AYPCode | Adequate Yearly Progress Code | 
| GradeCateg | Grade Category of the School |
| OrgCateg_1 | Organization Category of the School |

In [7]:
#Importing Data
wa_school_geo_data = pd.read_csv("data/labelled_data/School_Data/WA_K12_Schools_Geo_Data_With_Manual_Additions.csv")

#Printing the Head of the Dataframe
wa_school_geo_data.head()

Unnamed: 0,X,Y,FID,SchoolCode,Latitude,Longitude,ESDCode,ESDName,LEACode,LEAName,...,City,State,ZipCode,PrincipalN,Email,Phone,OrgCategor,AYPCode,GradeCateg,OrgCateg_1
0,-119.195783,46.224367,2001,4007,46.224373,-119.195797,11801,Educational Service District 123,3017,Kennewick School District,...,KENNEWICK,Washington,99336-1300,Dennis Boatman,dennis.boatman@ksd.org,509.222.6522,Detention Center,J,Other,Public
1,-122.354846,47.211844,2002,5549,47.21185,-122.35486,OSPI,Office of Superintendent of Public Instruction,27901,Chief Leschi Tribal Compact,...,Puyallup,Washington,98371,Bruce Leonardy,bruce.leonardy@leschischools.org,253.445.6000,"Not Affiliated With District, Tribal School",Q,K-12,Tribal
2,-122.460763,45.593231,2003,5534,45.593237,-122.460777,6801,Educational Service District 112,6117,Camas School District,...,Camas,Washington,98607,Aaron J Smith,aaronj.smith@camas.wednet.edu,360-833-5780,"Affiliated With District, Public School",P,Middle School,Public
3,-117.558706,47.808964,2004,5417,47.80897,-117.55872,32801,Educational Service District 101,32325,Nine Mile Falls School District,...,Nine Mile Falls,Washington,99026,Willard B Osborn,bosborn@9mile.org,509.340.4200,"Public School, Re-Engagement School",R,High School,Public
4,-122.917265,46.994554,2005,5305,46.99456,-122.91728,OSPI,Office of Superintendent of Public Instruction,34801,Capital Region ESD 113,...,Tumwater,Washington,98512,Gerald Grubbs,ggrubbs@esd113.org,360.927.6232,"Public School, Re-Engagement School",R,High School,Public


## List of School Zipcodes and their County

**While we have been able to gather and add the Geodata and Statistics data of each school, we are missing the County Data. To gather this information I went to the [ArcGIS Public Schools Locations web page](https://hub.arcgis.com/datasets/87376bdb0cb3490cbda39935626f6604_0). Their dataset had available a list of all the schools in the United States, the school's zipcodes and their counties.**

*Note: Since I wanted only the schools in Washington, I filtered  for them within excel and removed any duplicate zipcodes. After this I only kept the ZipCode, CountyID, and County columns and removed the rest. After that I saved the zipcodes file as a new file called `WA_School_Counties_List.csv`* 

Below is a description of each column name and we have also printed the head of the dataframe for easy viewing. 

| **Column Name** | **Column Description** | 
| --- | --- | 
| Zip_Code | Zip Code e.g. 98010|
| County_Name | Name of County e.g. 53033|
| CountyID | Unique ID of County in USA e.g. King County|

In [9]:
# Importing and Saving the School County List
school_county_list = pd.read_csv("data/labelled_data/School_Data/WA_School_Counties_List.csv")
school_county_list.head()

Unnamed: 0,Zip_Code,CountyID,County_Name
0,98010,53033,King County
1,98022,53033,King County
2,99138,53019,Ferry County
3,98626,53015,Cowlitz County
4,98632,53015,Cowlitz County
