## Overview

Previous years documents showed that the YouScience results per school provided by the county included students that were not in 8th grade. The relavent purpose within the algorithm of the original notebook was to filter the YouScience files to 8th graders and track those 8th graders that did not have YouScience results. 

### Constraints, Conditions

<table>
    <tr>
        <td>(T) Troubleshooting</td>
        <td>(E) Export constraint</td>
        <td>(F) Future Development</td>
    </tr>
</table>

<ol>
    <li>(T) Known issue: there are non-8th grade students in raw YouScience files, the YouScience files do have a "grad_year" feature, so if you know the grad year you can also just filter them that way</li></br>
    <li>(T) Known issue: currently, at least one school's skyward files has duplicated students (resolve by droping duplicates on student email feature)</li></br>
    <li>(T) Be aware: related to the second issues, future skyward rosters should be filtered to the 3rd quearter records, which might resolve duplicated students, but as is, point 2 won't break anything regardless</li></br>
    <li>(E) Export: incoming you science docs have titles in the format "YouScience_Cluster_Advising_{school}.xlsx", exported updated files are expected to be titled like "{schoo}_YouScience.xlsx"</li></br>
    <li>(F) Future exploration: while irrelavent to the current iteration of the algorith, the youscience file does include sex (binary) and ethnicity features. Long term, I initially thought to use these features in part to create metrics for efficacy of the sorting algorithm.</li></br>
    <li></li></br>
    <li></li></br>
</ol>

In [1]:
import pandas as pd
import numpy as np
import random
import datetime
import os

# implementing random seed for control
np.random.seed(42)

# school names for consistency 
schools = ['Oakland Middle School',
    'Siegel Middle School',
    'Whitworth-Buchanan Middle School',
    'Christiana Middle School',
    'Smyrna Middle School',
    'Stewarts Creek Middle School',
    'Rockvale Middle School',
    'Rocky Fork Middle School',
    'Blackman Middle School',
    # 'Thurman Francis Arts Academy',
    'Rock Springs Middle School',
    'LaVergne Middle School'
]

In [2]:
# get grad_year
# if fall at time of running calculation vs if spring semester when running
try:
    event_year = int(input('Enter 4-digit year of YouScience Event'))
except ValueError:
    event_year = 2022
print(f"Running for YS Career Fair {event_year}")

# establishing file folder path
sky_p = f'../YouScienceData/Skyward/{event_year-1}-{event_year}/'
ys_p = f'../YouScienceData/YouScience/{event_year}/'

# checking if Updated_YouScience directory exists
year_folder_path = f"../YouScienceData/Updated_YouScience/{event_year}"
year_missing_path = f"../YouScienceData/Missing_YS/{event_year}"
if not os.path.exists(year_folder_path):
    os.makedirs(year_folder_path)
if not os.path.exists(year_missing_path):
    os.makedirs(year_missing_path)

Running for YS Career Fair 2022


In [4]:
# checking for students with YS results stored elsewhere
for school in schools:
    school_sky_p = f'{sky_p}Skyward_{school}.xlsx'
    skyward_df = pd.read_excel(school_sky_p)
    skyward_df.drop_duplicates("Student\'s School Email", inplace=True)

    school_ys_p = f'{ys_p}YouScience_cluster_advising_{school}.csv'
    youscience_all_df = pd.read_csv(school_ys_p)
    # filter to 8th graders only
    youscience_df = youscience_all_df[youscience_all_df.grad_year == event_year + 4]

    ys_emails = set(list(youscience_df.email))
    sky_emails = set(list(skyward_df['Student\'s School Email']))
    diff_emails = ys_emails.difference(sky_emails)
    list_emails = list(diff_emails)
    
    to_be_stored_df = youscience_df
    for email in youscience_df.email.unique():
        if email not in list_emails:
            to_be_stored_df = to_be_stored_df[to_be_stored_df.email != email]

    if school == schools[0]:
        storage_df = to_be_stored_df
    else:
        storage_df = pd.concat([storage_df, to_be_stored_df])
    
    

In [5]:
# compiling unsorted YS results
misplaced_students = []

for school in schools:
    school_sky_p = f'{sky_p}Skyward_{school}.xlsx'
    skyward_df = pd.read_excel(school_sky_p)
    skyward_df.drop_duplicates("Student\'s School Email", inplace=True)
    for email in skyward_df['Student\'s School Email']:
        if email in storage_df.email.unique():
            misplaced_students.append(email)



In [6]:
missing_YS_results = {}
# adding back students with YS results sorted incorrectly
missed_emails = set(misplaced_students)
for school in schools:
    school_sky_p = f'{sky_p}Skyward_{school}.xlsx'
    skyward_df = pd.read_excel(school_sky_p)
    skyward_df.drop_duplicates("Student\'s School Email", inplace=True)

    school_ys_p = f'{ys_p}YouScience_cluster_advising_{school}.csv'
    youscience_all_df = pd.read_csv(school_ys_p)
    # filter to 8th graders only
    youscience_df = youscience_all_df[youscience_all_df.grad_year == event_year + 4]

    sky_emails = set(list(skyward_df['Student\'s School Email']))
    catch_em_all = missed_emails.intersection(sky_emails)
    if len(catch_em_all) > 0:
        print(f"Found {len(catch_em_all)} additional students for {school}.")
        for email in list(catch_em_all):
            X = storage_df.loc[storage_df.email == email]
            youscience_df = pd.concat([youscience_df, X]).reset_index(drop=True)

    # updated now
    ys_emails = set(list(youscience_df.email))
    missing_YS_results_email = list(sky_emails.difference(ys_emails))
    missing_YS_results[school] = missing_YS_results_email

    #export updated ys file
    export_p = f"../YouScienceData/Updated_YouScience/{event_year}/{school}_YouScience.csv"
    youscience_df.to_csv(export_p)

Found 5 additional students for Oakland Middle School.
Found 9 additional students for Siegel Middle School.
Found 6 additional students for Whitworth-Buchanan Middle School.
Found 2 additional students for Christiana Middle School.
Found 6 additional students for Smyrna Middle School.
Found 4 additional students for Stewarts Creek Middle School.
Found 1 additional students for Rockvale Middle School.
Found 3 additional students for Rocky Fork Middle School.
Found 12 additional students for Blackman Middle School.
Found 1 additional students for Rock Springs Middle School.


In [7]:
# prepping and exporting student info with missing results
for school in schools:
    # read in school roster
    school_sky_p = f'{sky_p}Skyward_{school}.xlsx'
    skyward_df = pd.read_excel(school_sky_p)
    skyward_df.drop_duplicates("Student\'s School Email", inplace=True)
    # get students that don't have YS results
    students_without = missing_YS_results[school]

    # set table schematic
    export_missing_results = {
        'First':[],
        'Last':[],
        'Email':[],
        'School':[],
    }

    # find and build table
    for email in students_without:
        row = skyward_df.loc[skyward_df["Student\'s School Email"] == email]
        export_missing_results['First'].append(row["Student First Name"].values[0])
        export_missing_results['Last'].append(row["Student Last Name"].values[0])
        export_missing_results['Email'].append(row["Student\'s School Email"].values[0])
        export_missing_results['School'].append(school)
    
    # export table
    export_missing_p = f"{year_missing_path}/{school}_missingYS.csv"
    pd.DataFrame(export_missing_results, columns=export_missing_results.keys()).to_csv(export_missing_p)
    

In [11]:
len(storage_df)/3, len(misplaced_students), round(len(misplaced_students)/(len(storage_df)/3),2)

(262.0, 49, 0.19)

In [16]:
# comparing lengths of original data with more current data
n = 0
for school in schools:
    up_og_path = f'../YouScienceData/Updated_YouScience/{school}_YouScience.csv'
    up_new_path = f'../YouScienceData/Updated_YouScience/2022/{school}_YouScience.csv' 
    og_df = pd.read_csv(up_og_path)
    new_df = pd.read_csv(up_new_path)
    x = int((len(new_df) - len(og_df))/3)
    print(x, school)
    n += x 

print(f'Total students recovered: {n}')

25 Oakland Middle School
34 Siegel Middle School
26 Whitworth-Buchanan Middle School
17 Christiana Middle School
57 Smyrna Middle School
19 Stewarts Creek Middle School
27 Rockvale Middle School
20 Rocky Fork Middle School
37 Blackman Middle School
0 Thurman Francis Arts Academy
25 Rock Springs Middle School
21 LaVergne Middle School
Total students recovered: 308
