# CAST Dashboard Predictor

The purpose of this Jupyter notebook is to take the raw file for the CAST scores and then to calculate the DFS for each student allowing for a prediction for the dashboard and the various subgroups.

In [None]:
import numpy as np
import pandas as pd

## Loading Data

__Aeries Query__: LIST STU SC ID CID NM GR LF SPECIALED DISADVANTAGED STU.ETH? STU.RC1? LAC LAC.RD1

After running the query through Aeries to get roster and subgroup information, you need to copy the file pathway below as well as the one for the raw CAST file. The two files will be merged with an inner merge to just leave the students remaining that are enrolled.

In [None]:
# LIST STU SC ID CID NM GR LF SPECIALED DISADVANTAGED STU.ETH? STU.RC1? LAC LAC.RD1

roster = pd.read_excel(r"C:\Users\derek.castleman\Desktop\CAST Roster.xlsx") # Aeries Query file

cast = pd.read_csv(r"C:\Users\derek.castleman\Desktop\CERS_SBAC_2023-24_6_20_24ALL.csv") # Raw CAST file

In [None]:
roster

In [None]:
cast

In [None]:
merge = pd.merge(roster, cast, how='inner', left_on='State Student ID', right_on='StudentIdentifier')
merge

In [None]:
merge = merge.drop_duplicates(subset=['State Student ID', 'Subject', 'AssessmentType'])
merge

## Redesignated English Learners

Redesignated English Learners within their first four years are counted toward EL subgroup for the dashboard. In order to properly calculate this group, the number of years between redesignated and test date will be looked at. Students with less than 4 years between designation and tests will be given a new category RL to allow for them to be found.

In [None]:
merge['SubmitDateTime'] = merge['SubmitDateTime'].apply(lambda x: x.split(' ', 1)[0])
merge

In [None]:
# Change submission to datetime
merge['SubmitDateTime']= pd.to_datetime(merge['SubmitDateTime']) 
merge

In [None]:
# Selecting only redesignated students
reclassified = merge.dropna(subset=['Redes Date'])
reclassified

In [None]:
# Turn redesignated date to a datetime
reclassified['Redes Date']= pd.to_datetime(reclassified['Redes Date']) 
reclassified

In [None]:
# Find number of years between redesignation and test
reclassified['Years'] = reclassified['SubmitDateTime'].dt.year - reclassified['Redes Date'].dt.year

In [None]:
reclassified

In [None]:
# Selecting students under four years of redesignation
newly_reclassified = reclassified[reclassified['Years'] < 4]
newly_reclassified

In [None]:
# Selecting students with more than four years after redesignation
old_reclassified = reclassified[reclassified['Years'] >= 4]
old_reclassified

In [None]:
# Categorize newly redesignated as RL
newly_reclassified['LangFlu'] = 'RL'
newly_reclassified

In [None]:
# Concats the two redesignated dataframes back together
reclassified = pd.concat([newly_reclassified, old_reclassified])
reclassified

In [None]:
# Drop the years column
reclassified = reclassified.drop('Years', axis=1)
reclassified

In [None]:
# Find the non-reclassified students
non_reclassified = merge[merge['Redes Date'].isna()]
non_reclassified

In [None]:
# Combined non-reclassified with the fixed reclassified dataframe
merge = pd.concat([non_reclassified, reclassified])
merge

## Selects Group of Interest

Several inputs are then asked which allows for the selection of particular schools of interest as well as any subgroups that want to be looked at.

In [None]:
x = input('What are you interest in (All, Elementary, Secondary, Middle, High?             )') # Input choice of site

In [None]:
if x == 'All': #Selects all the schools at a site
    merge = merge[(merge['School'] == 1) | (merge['School'] == 2) | (merge['School'] == 4) | 
                 (merge['School'] == 6) | (merge['School'] == 7) | (merge['School'] == 8)]
elif x == 'Elementary': #Selects just elementary schools
    merge = merge[(merge['School'] == 4) | (merge['School'] == 6)]
elif x == 'Secondary': # Selects the middle and high school
    merge = merge[(merge['School'] == 1) | (merge['School'] == 2) | (merge['School'] == 7) | (merge['School'] == 8)]
elif x == 'Middle': # Selects just the middle school
    merge = merge[(merge['School'] == 7) | (merge['School'] == 2)]
elif x == 'High': # Selects just the high school
    merge = merge[(merge['School'] == 1) | (merge['School'] == 8)]

In [None]:
merge

In [None]:
y = input('All or Subgroups (All, EL, , LTEL, SPED, SED, Hispanic, White, Filipino?             )') # Input subgroup

In [None]:
if y == 'All':
    merge = merge
elif y == 'EL': # Selects EL and newly redesignated students
    merge = merge[(merge['LangFlu'] == 'L') | (merge['LangFlu'] == 'RL') ]
elif y == 'SPED': # Selects SPED students
    merge = merge[merge['SPECIALED Value'] == 'Yes']
elif y == 'SED': # Selects socioeconomic disadvantaged
    merge = merge[merge['DISADVANTAGED Value'] == 'Yes']
elif y == 'Hispanic': # Selectes Hispanic students
    merge = merge[merge['HispanicOrLatinoEthnicity'] == 'Yes']
elif y == 'White': # Selects white students
    merge = merge[merge['White'] == 'Yes']
elif y == 'Filipino': # Selects Filipino students
    merge = merge[merge['Filipino'] == 'Yes']
elif y == 'LTEL': # Selects EL and newly redesignated students
    merge = merge[(merge['LangFlu'] == 'L') & (merge['Grade'] > 6) ]

In [None]:
merge

## DFS for CAST

The summative CAST will be selected. Then based on the grade level that a student is in the DFS will be calculated for them 

In [None]:
cast = merge[merge['Subject'] == 'CAST'] # Select CAST
cast

In [None]:
cast = cast[cast['AssessmentType'] == 'Summative'] # Select the summative CAST
cast

In [None]:
cast = cast[['School', 'Student ID', 'Student Name', 'GradeLevelWhenAssessed', 'Subject', 'LangFlu', 'SPECIALED Value', 
          'DISADVANTAGED Value', 'Description_STU_ETH', 'Description_STU_RC1', 'ScaleScoreAchievementLevel', 
          'ScaleScore']] # Cut it down to columns of interest
cast

In [None]:
if x == 'All':
    a = cast[cast['GradeLevelWhenAssessed'] == '05']
    a['DFS'] = a['ScaleScore'] - 214
    b = cast[cast['GradeLevelWhenAssessed'] == '08']
    b['DFS'] = b['ScaleScore'] - 415
    c = cast[cast['GradeLevelWhenAssessed'] == '10']
    c['DFS'] = c['ScaleScore'] - 615
    d = cast[cast['GradeLevelWhenAssessed'] == '11']
    d['DFS'] = d['ScaleScore'] - 615
    e = cast[cast['GradeLevelWhenAssessed'] == '12']
    e['DFS'] = e['ScaleScore'] - 615
    cast = pd.concat([a, b, c, d, e])
elif x == 'Elementary':
    a = cast[cast['GradeLevelWhenAssessed'] == '05']
    a['DFS'] = a['ScaleScore'] - 214
    cast = a
elif x == 'Middle':
    a = cast[cast['GradeLevelWhenAssessed'] =='08']
    a['DFS'] = a['ScaleScore'] - 415
    cast = a
elif x == 'High':
    c = cast[cast['GradeLevelWhenAssessed'] == '10']
    c['DFS'] = c['ScaleScore'] - 615
    d = cast[cast['GradeLevelWhenAssessed'] == '11']
    d['DFS'] = d['ScaleScore'] - 615
    e = cast[cast['GradeLevelWhenAssessed'] == '12']
    e['DFS'] = e['ScaleScore'] - 615
    cast = pd.concat([c, d, e])
elif x == 'Secondary':
    b = cast[cast['GradeLevelWhenAssessed'] == '08']
    b['DFS'] = b['ScaleScore'] - 415
    c = cast[cast['GradeLevelWhenAssessed'] == '10']
    c['DFS'] = c['ScaleScore'] - 615
    d = cast[cast['GradeLevelWhenAssessed'] == '11']
    d['DFS'] = d['ScaleScore'] - 615
    e = cast[cast['GradeLevelWhenAssessed'] == '12']
    e['DFS'] = e['ScaleScore'] - 615
    cast = pd.concat([b, c, d, e])

In [None]:
cast

## Site Calculations

The site of interest will be asked. If there is any loss it will also be asked through an input. Then the average DFS will be calculated as well as a file generated with the raw data for each student.

In [None]:
q = input('What site are you interested in (ALL, Delano or Lost Hills?             )') # Input choice of site

In [None]:
if q == 'Delano': 
    cast = cast[(cast['School'] == 1) | (cast['School'] == 2) | (cast['School'] == 4)]
elif q == 'Lost Hills':
    cast = cast[(cast['School'] == 6) | (cast['School'] == 7) | (cast['School'] == 8)]
elif q == 'All': #Selects all the schools at a site
    cast = cast[(cast['School'] == 1) | (cast['School'] == 2) | (cast['School'] == 4) | 
                 (cast['School'] == 6) | (cast['School'] == 7) | (cast['School'] == 8)]

In [None]:
cast

In [None]:
m = cast['DFS'].sum() # Sums the DFS column
m

In [None]:
n = input("How many CAST loss:       ") # Allows an input for loss to be added in
n = int(n)
n

In [None]:
loss = -65 * n # Creates a total for loss based on input
loss

In [None]:
o = len(cast) # Calculates the number of students that took the test
o

In [None]:
final_count = o + n # Creates a final count that included students and loss count
final_count

In [None]:
cast_dashboard = (m + loss) / (final_count) # Calculates dashboard prediction by dividing sum of DFS by count
cast_dashboard

In [None]:
import base64
from IPython.display import HTML

def create_download_link( df, title = "CAST DFS", filename = "CAST DFS"):
    csv = df.to_csv(index=False)
    b64 = base64.b64encode(csv.encode())
    payload = b64.decode()
    html = '<a download="{filename}" href="data:text/csv;base64,{payload}" target="_blank">{title}</a>'
    html = html.format(payload=payload,title=title,filename=filename)
    return HTML(html)

create_download_link(cast)