# Predicting school STAAR scores

In [1]:
import pandas as pd
import numpy as np
import prepare as pr
#import explore as ex
#import modeling as mo

## Project Description
Our youth matters. In Texas, the State of Texas Assessment of Academic Readiness (STAAR) exam is used to measure student learning at the end of the school year. Scores on these exams are used to calculate school accountability ratings which ensures that only high performing schools stay open. We want to use the publically available data to identify key features of schools that have the largest impact on the STAAR exams. After exploration, we use a machine learning algorithm to predict the most likely STAAR exam outcome based on the features we identified as having the largest impact. The scope of this project is limited to Texas High Schools, but may be applied to other types of schools as well.

## Project Goals:
* Identify drivers of high schools' percent of students passing each STAAR subject
* Build a regression model to predict passing percentage of STAAR scores 
* Deliver results in a final notebook
* Deliver presentation to stakeholders

## Acquisition and Data Preperation
* Functions to use webscraping were engineered to extract data from the Texas Education Agency's website
* Schools that had special characters were removed from analysis
    * special characters (*, -, ?, n/a)
* Nulls were removed:
    * Nulls or reserved information was incoded into the special characters above and removed
    * All the percent signs, dollar signs, and commas were removed from values
* Columns were combined into desired features
    * high_edu was generated from combining percent of teachers that have a masters or doctorate
    * Features for teacher_exp_0to5, teacher_exp_6to10, and teacher_exp_11_plus were generated from combining:
        * Beginning teachers and teachers with 1-5 years of experience into teacher_exp_0to5
        * Teachers with 11+ years of experience were combined into teacher_exp_11_plus
        * teacher_exp_6to10 stayed the same. Teachers of 6-10 years of experience
* There were an initial 1571 rows
    * The total number of rows after preperation and cleaning is 1391
* Separate into train, validate, and test datasets

## Data Dictionary  

| Feature | Definition |
|:--------|:-----------|
|school_id| The id number of the school from TEA|
|english_1| English I, percent of students at approaches grade level or above for English I|
|english_2| English II, percent of students at approaches grade level or above for English II|
|algebra| Algebra, percent of students at approaches grade level or above for Algebra|
|biology| Biology, percent of students at approaches grade level or above for Biology|
|history| U.S. History, percent of students at approaches grade level or above for U.S. History|
|bilingual_or_english_learner| EB/EL Current and Monitored, percent of students in the dual-language program that enables emergent bilingual (EB) students/English learners (ELs) to become proficient in listening, speaking, reading, and writing in the English language through the development of literacy and academic skills in the primary language and English.|
|teacher_exp_0to5| Integer, number of teachers with 0-5 years of experience|
|teacher_exp_6to10| Integer, number of teachers with 6-10 numbers of experience|
|teacher_exp_11_plus| Integer, number of teachers with 11 or more years of experience|
|extracurricular_expend| The amount of funds (in dollars) spent on extracurriculuars per student|
|total_expend| The average total amount of funds (in dollars) spent per student|
|econdis| students that are from homes that are below the poverty line
|salary| Average Actual Salary, Average amount teachers are being paid in dollars|
|high_edu| Percent of teachers with a masters or doctorate degree|
|ratio| Count of the number of students per one teacher|

In [3]:
#loading in cleaned data
df=pr.clean_df()
df.head(3)

Unnamed: 0,school_id,english_1,english_2,algebra,biology,history,bilingual_or_english_learner,econdis,salary,teacher_exp_6to10,extracurricular_expend,total_expend,student_teacher_ratio,teacher_exp_0to5,teacher_exp_11_plus,high_edu
0,1902001,67.0,82.0,95.0,88.0,93.0,1.2,34.3,55259.0,16.4,1852.0,10656.0,10.1,18.0,65.6,22.0
1,1903001,75.0,87.0,76.0,92.0,93.0,1.3,34.5,48689.0,21.6,2056.0,11177.0,8.3,27.2,51.2,29.4
2,1904001,78.0,80.0,90.0,87.0,88.0,4.7,42.2,51538.0,15.2,2151.0,10885.0,8.8,42.2,42.6,37.0


## Exploration
For exploration, we have 5 target variables. Each STAAR exam subject was explored using a variety of questions and the resulting insights were found. 

## English 1

<div class="alert alert-info">
    <header>
    <h2>English 1 Exploration Question 1:</h2>
    </header>
    <dl>
        <dt>Do schools with more economically disadvantaged students have a lower average STAAR score for biology than all schools?</dt>
        <dd>- $H_0$: There is no difference in the average STAAR score for biology in schools with 50% or more  economically disadvantaged students and all schools</dd>
        <dd>- $H_a$: The difference in the average STAAR score for biology is <strong>significantly lower</strong> in schools with 50% or more economically disadvantaged students compared to all schools</dd>
    </dl>
</div>

In [None]:
english 1, 2, biology

<div class="alert alert-info">
    <header>
    <h2>English 1 Exploration Question 2:</h2>
    </header>
    <dl>
        <dt>Do schools with teachers that have more years of experience have a better average STAAR score in biology?</dt>
        <dd>- $H_0$: There is no difference in the average STAAR score for biology in schools with 50% or fewer teachers with 11 years of experience or more and schools with more than 50% of their teachers with 11 years of experience or more</dd>
        <dd>- $H_a$: The difference in the average STAAR score for biology is <strong>significantly lower</strong> in schools with 50% or fewer teachers with 11 years of experience or more compared to schools with 50% or more teachers with 11 years of experience or more </dd>
    </dl>
</div>

## English 2

<div class="alert alert-info">
    <header>
    <h2>Exploration Question:</h2>
    </header>
    <dl>
        <dt>Of schools with above average economically disadvantaged students, do the schools with higher average STAAR scores get more funding per student?</dt>
        <dd>- $H_0$: There is no difference in the funding per student in schools with above average percent economically disadvantaged students and above average percent passing STAAR exams compared to schools with above average percent economically disadvantaged students and less than average percent passing STAAR scores</dd>
        <dd>- $H_a$: The difference in the average funding per student in schools with an above average economically disadvantaged students and above average percent passing STAAR exams is <strong>significantly more</strong> than schools with above average percent economically disadvantaged students and below average percent passing STAAR scores</dd>
    </dl>
</div>

# Algebra

is there a relationship between the average extracurricular expend and algebra scores

is there a sig diff in the amount of funding per student between schools with above average and below average econ dis

# U.S. History

correlation of student teacher ratio an history