### This notebook aims to select the relevant outcome oriented data columns and perform necessary transformations to those columns.  

The specific functions in this notebook are the following:  
1. Load merged data from GitHub
2. Perform transformations to STAAR test score data 
3. Calculate average dropout rates for grades 09-08 and 09-12 by student group 
4. Get other relevant columns based on what year data set you are using
---

In [18]:
import pandas as pd

1. Load merged data from github

In [29]:
def load_data_from_github(year):
    """
    Loads district-level education data and corresponding column key from the HERC GitHub repository.

    Parameters:
    -----------
    year : int or str
        The year for which to load the data (e.g., 2020).

    Returns:
    --------
    df : pandas.DataFrame
        The cleaned district-level dataset for the specified year. Charter schools are filtered out,
        and negative values in numeric columns are replaced with NaN.

    column_key : pandas.DataFrame
        The column key DataFrame from the 'distprof' sheet in the corresponding Excel file, 
        used for understanding column meanings in `df`.

    Notes:
    ------
    - The data is sourced from the RiceD2KLab/HERC_Sp25/ GitHub repository.
    - Assumes the structure of files follows the naming convention: 
      'merged_<year>.csv' and 'TAPR_district_adv_<year>.xlsx' located at:
      https://github.com/RiceD2KLab/HERC_Sp25/tree/main/5_Dashboard_Development/data /<year>
    """
    base_url = f"https://raw.githubusercontent.com/RiceD2KLab/HERC_Sp25/refs/heads/main/0_Datasets/1.0MergedData"
    csv_url = f"{base_url}/merged_{year}.csv"
    #key_url = f"{base_url}/column_key_{year}.csv"
    print(csv_url)
    try:
        df = pd.read_csv(csv_url)
        #print(df)
        #column_key = pd.read_csv(key_url)
    except Exception as e:
        print(f"Data for the year {year} does not exist or cannot be accessed")
        return None, None

    if 'Charter School (Y/N)' in df.columns:
        df = df[df['Charter School (Y/N)'] == 'N']

    numeric_cols = df.select_dtypes(include='number').columns
    df[numeric_cols] = df[numeric_cols].mask(df[numeric_cols] < 0, pd.NA)

    return df

In [34]:
df24 = load_data_from_github(2024)
df24.head(5)

https://raw.githubusercontent.com/RiceD2KLab/HERC_Sp25/refs/heads/main/0_Datasets/1.0MergedData/merged_2024.csv


Unnamed: 0,DISTRICT_id,DISTNAME,District 2023 Graduates: All Students Count (excluding FHSP/Texas First-DLA),District 2023 Graduates: RHSP/DAP All Students Count_x,District 2023 Graduates: FHSP-E All Students Count_x,District 2023 Graduates: FHSP-DLA All Students Count_x,District 2023 Graduates: RHSP/DAP/FHSP-E/FHSP-DLA/Texas First-DLA All Students Count,District 2023 Graduates: All Students Count All Students Denominator,District 2023 Graduates: FHSP All Students Count All Students Denominator,District 2023 Graduates: African American Count (excluding FHSP/Texas First-DLA),...,"District 2024 Domain 1A: Approaches Grade Level STD, Grade 8, Non-Continuous Enrollee, STAAR Science Rate","District 2024 Domain 1A: Meets Grade Level STD, Grade 8, Non-Continuous Enrollee, STAAR Science Rate","District 2024 Domain 1A: Masters Grade Level STD, Grade 8, Non-Continuous Enrollee, STAAR Science Rate",District,District Number,TEA District Type,TEA Description,NCES District Type,NCES Description,Charter School (Y/N)
0,1902,CAYUGA ISD,,,0.0,39.0,39.0,39.0,39.0,,...,50.0,42.0,0.0,CAYUGA ISD,1902,H,Rural,43,Rural-Remote,N
1,1903,ELKHART ISD,,,3.0,53.0,56.0,66.0,66.0,,...,88.0,58.0,12.0,ELKHART ISD,1903,G,Non-metropolitan Stable,42,Rural-Distant,N
2,1904,FRANKSTON ISD,,,1.0,49.0,50.0,54.0,54.0,,...,50.0,35.0,5.0,FRANKSTON ISD,1904,H,Rural,42,Rural-Distant,N
3,1906,NECHES ISD,,,4.0,21.0,25.0,26.0,26.0,,...,86.0,86.0,0.0,NECHES ISD,1906,H,Rural,42,Rural-Distant,N
4,1907,PALESTINE ISD,,,16.0,217.0,233.0,235.0,235.0,,...,60.0,35.0,6.0,PALESTINE ISD,1907,E,Independent Town,32,Town-Distant,N


2. Clean STAAR data to be mutually exclusive by grade level and subject.  
The original data set contains "Approaches Grade Level", "Meets Grade Level", and Masters Grade Level for STAAR testing results. However, these three groupings are not mutually exclusive as "Meets Grade Level" includes Approaches Grade Level, and "Masters Grade Level" includes "Approaches Grade Level" & "Meets Grade Level". These values will be recomputed to be mutually exclusive and a new bucket called "Did Not Meet Grade Level" will be implemented. 

In [36]:
def get_subject_level_exclusive_scores(df, subject):
    """
    Returns mutually exclusive STAAR scores (Approaches only, Meets only, Masters, Did Not Meet) by grade level
    for a given subject.

    Args:
        df (pd.DataFrame): Raw district-level STAAR dataset.
        subject (str): One of ['Mathematics', 'Reading/ELA', 'Writing', 'Science', 'Social Studies'].

    Returns:
        pd.DataFrame: Long-format dataframe with DISTNAME, DISTRICT_id, Grade, and exclusive performance levels.
    """
    # Step 1: Build level mapping dynamically
    level_mapping = {
        'Approaches': [col for col in df.columns if subject in col and 'Approaches Grade Level' in col and "Rate" in col and "All Students" in col],
        'Meets': [col for col in df.columns if subject in col and 'Meets Grade Level' in col and "Rate" in col and "All Students" in col],
        'Masters': [col for col in df.columns if subject in col and 'Masters Grade Level' in col and "Rate" in col and "All Students" in col],
    }

    if not any(level_mapping.values()):
        print(f"Warning: No data available for subject '{subject}'.")
        return None

    # Step 2: Create long DataFrames per level
    def melt_level(level):
        cols = level_mapping[level]
        df_level = df[['DISTNAME', 'DISTRICT_id'] + cols].copy()
        df_long = df_level.melt(id_vars=['DISTNAME', 'DISTRICT_id'], value_vars=cols,
                                var_name='raw_column', value_name=level)
        df_long['Grade'] = df_long['raw_column'].str.extract(r'Grade (\d+)')
        return df_long.drop(columns='raw_column')

    df_approaches = melt_level('Approaches')
    df_meets = melt_level('Meets')
    df_masters = melt_level('Masters')

    # Step 3: Merge the levels on DISTRICT, DISTNAME, and Grade
    merged = df_approaches.merge(df_meets, on=['DISTNAME', 'DISTRICT_id', 'Grade'], how='inner')
    merged = merged.merge(df_masters, on=['DISTNAME', 'DISTRICT_id', 'Grade'], how='inner')

    # Step 4: Compute mutually exclusive performance levels
    merged['Masters Grade Level'] = merged['Masters']
    merged['Meets Grade Level'] = merged['Meets'] - merged['Masters']
    merged['Approaches Grade Level'] = merged['Approaches'] - merged['Meets']
    merged['Did Not Meet Grade Level'] = 100 - merged['Approaches']

    # Round values and reorder
    result = merged[['DISTNAME', 'DISTRICT_id', 'Grade', 'Approaches Grade Level', 'Meets Grade Level', 'Masters Grade Level', 'Did Not Meet Grade Level']]
    return result.round(2)

In [37]:
math_scores = get_subject_level_exclusive_scores(df24, "Mathematics")
math_scores.head(5)

Unnamed: 0,DISTNAME,DISTRICT_id,Grade,Approaches Grade Level,Meets Grade Level,Masters Grade Level,Did Not Meet Grade Level
0,CAYUGA ISD,1902,3,36.0,36.0,2.0,26.0
1,ELKHART ISD,1903,3,30.0,29.0,13.0,28.0
2,FRANKSTON ISD,1904,3,18.0,44.0,21.0,17.0
3,NECHES ISD,1906,3,34.0,35.0,3.0,28.0
4,PALESTINE ISD,1907,3,30.0,26.0,15.0,29.0


3. Calculate Dropout Rates.  
Before using this function, the dropout rate data was spread across many separate columns — one for each combination of grade range (07–08 or 09–12) and student identity group (e.g., 'African American', 'Economically Disadvantaged', etc.).

 
After running this function, the data is cleaned and consolidated: for each student identity group, a single column called something like "Hispanic Dropout Rate" is created by averaging the available 07–08 and 09–12 dropout rates. Only the final averaged dropout rates (plus district name and ID) are kept, making the dataset much more compact and analysis-read


3. Calculate Dropout Rates.  
Before using this function, the dropout rate data was spread across many separate columns — one for each combination of grade range (07–08 or 09–12) and student identity group (e.g., 'African American', 'Economically Disadvantaged', etc.).   

After running this function, the data is cleaned and consolidated: for each student identity group, a single column called something like "Hispanic Dropout Rate" is created by averaging the available 07–08 and 09–12 dropout rates. Only the final averaged dropout rates (plus district name and ID) are kept, making the dataset much more compact and analysis-read


In [38]:
def compute_dropout_rates(df, year):
    """
    Calculate average dropout rates for grade 07-08 and 09-12 by student group.

    Args:
        df (pd.DataFrame): Raw district-level dataframe with dropout columns.
        year (int): The current reporting year. Dropout rates are based on year - 1.

    Returns:
        pd.DataFrame: DataFrame with combined dropout rates by identity and district.
    """
    identities = ['All Students', 'Male', 'Female', 'African American', 'American Indian', 'Asian',
                  'Hispanic', 'Pacific Islander', 'Two or More Races', 'White', 'Econ Disadv', 
                  'Special Ed', 'At Risk', 'EB/EL']

    dropout_columns = [
        f'District {year - 1} Annual Dropout for Grades 07-08: {id_} Rate'
        for id_ in identities
    ] + [
        f'District {year - 1} Annual Dropout for Grades 09-12: {id_} Rate'
        for id_ in identities
    ]

    existing_columns = ['DISTNAME', 'DISTRICT_id'] + [col for col in dropout_columns if col in df.columns]
    df_dropout = df[existing_columns].copy()

    for identity in identities:
        col1 = f'District {year - 1} Annual Dropout for Grades 07-08: {identity} Rate'
        col2 = f'District {year - 1} Annual Dropout for Grades 09-12: {identity} Rate'

        cols_to_avg = [col for col in [col1, col2] if col in df_dropout.columns]
        if cols_to_avg:
            df_dropout.loc[:, f'{identity} Dropout Rate'] = df_dropout[cols_to_avg].mean(axis=1)

    df_dropout.drop(columns=dropout_columns, inplace=True, errors='ignore')
    return df_dropout

In [40]:
dropout_rates = compute_dropout_rates(df24, 2024)
dropout_rates.head(5)

Unnamed: 0,DISTNAME,DISTRICT_id,All Students Dropout Rate,Male Dropout Rate,Female Dropout Rate,African American Dropout Rate,American Indian Dropout Rate,Asian Dropout Rate,Hispanic Dropout Rate,Pacific Islander Dropout Rate,Two or More Races Dropout Rate,White Dropout Rate,Econ Disadv Dropout Rate,Special Ed Dropout Rate,At Risk Dropout Rate,EB/EL Dropout Rate
0,CAYUGA ISD,1902,0.25,0.0,0.6,0.0,,,0.0,,0.0,0.35,0.6,1.7,0.0,
1,ELKHART ISD,1903,0.0,0.0,0.0,0.0,,,0.0,,0.0,0.0,0.0,0.0,0.0,
2,FRANKSTON ISD,1904,0.35,0.0,0.75,0.0,,,0.0,,0.0,0.45,0.6,0.0,0.0,0.0
3,NECHES ISD,1906,0.0,0.0,0.0,0.0,,,0.0,,,0.0,0.0,0.0,0.0,
4,PALESTINE ISD,1907,0.05,0.0,0.1,0.2,,0.0,0.0,,0.0,0.0,0.05,0.0,0.1,0.0


4. There are a wide variety of other outcome oriented columns that will be helpful for our analysis moving forward. This function will pull these relevant columns based on what year dataset you are using. 

In [42]:
# --- Gather the Remaining Outcomes Data ---
def get_existing_columns(df, year):
    """
    Selects columns from a master DataFrame that exist and are relevant to performance indicators.

    Args:
        df (pd.DataFrame): The master district-level DataFrame.
        year (int): Reporting year to resolve dynamic column names.

    Returns:
        pd.DataFrame: Subset of the original DataFrame with only the relevant and existing columns.
    """
    additional_columns = [

    # DREF:

   'DFLCHART',
   'DFLALTED',
   'ASVAB_STATUS',

   # DTYPE:

   'TEA Description',
   'NCES Description',
   'Charter School (Y/N)',

    # Demography
   f'District {year} Student Membership: All Students Count',
   f'District {year} Student Membership: Male Percent',
   f'District {year} Student Membership: Female Percent',
   f'District {year} Student Membership: African American Percent',
   f'District {year} Student Membership: American Indian Percent',
   f'District {year} Student Membership: Asian Percent',
   f'District {year} Student Membership: Hispanic Percent',
   f'District {year} Student Membership: Pacific Islander Percent',
   f'District {year} Student Membership: Two or More Races Percent',
   f'District {year} Student Membership: White Percent',
   f'District {year} Student Membership: Econ Disadv Percent',
   f'District {year} Student Membership: Special Ed Percent',
   f'District {year} Student Membership: Gifted & Talented Percent',
   f'District {year} Student Membership: EB/EL Percent',
   f'District {year} Student Membership: At Risk Percent',
   f'District {year} Student Membership: Immigrant Percent',
   f'District {year} Student Membership: Gifted & Talented Percent',
   f'District {year} Staff: Teacher Student Ratio',

    # CCMR Rates
   f'District {year - 1} College, Career, & Military Ready Graduates: All Students Rate',
   f'District {year - 1} College, Career, & Military Ready Graduates: Male Rate',
   f'District {year - 1} College, Career, & Military Ready Graduates: Female Rate',
   f'District {year - 1} College, Career, & Military Ready Graduates: African American Rate',
   f'District {year - 1} College, Career, & Military Ready Graduates: Hispanic Rate',
   f'District {year - 1} College, Career, & Military Ready Graduates: White Rate',
   f'District {year - 1} College, Career, & Military Ready Graduates: American Indian Rate',
   f'District {year - 1} College, Career, & Military Ready Graduates: Asian Rate',
   f'District {year - 1} College, Career, & Military Ready Graduates: Pacific Islander Rate',
   f'District {year - 1} College, Career, & Military Ready Graduates: Two or More Races Rate',
   f'District {year - 1} College, Career, & Military Ready Graduates: Econ Disadv Rate',
   f'District {year - 1} College, Career, & Military Ready Graduates: Special Ed Rate',
   f'District {year - 1} College, Career, & Military Ready Graduates: EB/EL Rate',
   f'District {year - 1} College, Career, & Military Ready Graduates: At Risk Rate',


    # Attendence Rates
   f'District {year - 1} Attendance: All Students Rate',
   f'District {year - 1} Attendance: Two or More Races Rate',
   f'District {year - 1} Attendance: Asian Rate',
   f'District {year - 1} Attendance: Pacific Islander Rate',
   f'District {year - 1} Attendance: African American Rate',
   f'District {year - 1} Attendance: Hispanic Rate',
   f'District {year - 1} Attendance: White Rate',
   f'District {year - 1} Attendance: American Indian Rate',
   f'District {year - 1} Attendance: Econ Disadv Rate',
   f'District {year - 1} Attendance: Special Ed Rate',
   f'District {year - 1} Attendance: Female Rate',
   f'District {year - 1} Attendance: Male Rate',
   f'District {year - 1} Attendance: EB/EL Rate',
   f'District {year - 1} Attendance: At Risk Rate',

    # Chronic Absenteeism Rates
   f'{year - 1} district Chronic Absenteeism All Students Group: Rate',
   f'{year - 1} district Chronic Absenteeism African American Group: Rate',
   f'{year - 1} district Chronic Absenteeism Hispanic Group: Rate',
   f'{year - 1} district Chronic Absenteeism White Group: Rate',
   f'{year - 1} district Chronic Absenteeism American Indian Group: Rate',
   f'{year - 1} district Chronic Absenteeism Asian Group: Rate',
   f'{year - 1} district Chronic Absenteeism Pacific Islander Group: Rate',
   f'{year - 1} district Chronic Absenteeism Two or More Races Group: Rate',
   f'{year - 1} district Chronic Absenteeism Econ Disadv Group: Rate',
   f'{year - 1} district Chronic Absenteeism Special Ed Group: Rate',
   f'{year - 1} district Chronic Absenteeism EL Group: Rate',
   f'{year - 1} district Chronic Absenteeism At Risk Group: Rate',

    # 4-Year Longitudinal
   f'District {year - 1} 4-Year Longitudinal: [FHSP-DLA Graduates] for All Students Rate',
   f'District {year - 1} 4-Year Longitudinal: [FHSP-DLA Graduates] for Female Rate',
   f'District {year - 1} 4-Year Longitudinal: [FHSP-DLA Graduates] for Male Rate',
   f'District {year - 1} 4-Year Longitudinal: [FHSP-DLA Graduates] for African American Rate',
   f'District {year - 1} 4-Year Longitudinal: [FHSP-DLA Graduates] for American Indian Rate',
   f'District {year - 1} 4-Year Longitudinal: [FHSP-DLA Graduates] for Asian Rate',
   f'District {year - 1} 4-Year Longitudinal: [FHSP-DLA Graduates] for Hispanic Rate',
   f'District {year - 1} 4-Year Longitudinal: [FHSP-DLA Graduates] for Pacific Islander Rate',
   f'District {year - 1} 4-Year Longitudinal: [FHSP-DLA Graduates] for White Rate',
   f'District {year - 1} 4-Year Longitudinal: [FHSP-DLA Graduates] for Two or More Races Rate',
   f'District {year - 1} 4-Year Longitudinal: [FHSP-DLA Graduates] for Econ Disadv Rate',
   f'District {year - 1} 4-Year Longitudinal: [FHSP-DLA Graduates] for Special Ed Rate',
   f'District {year - 1} 4-Year Longitudinal: [FHSP-DLA Graduates] for EB/EL Rate',
   f'District {year - 1} 4-Year Longitudinal: [FHSP-DLA Graduates] for At Risk Rate',
   f'District {year - 1} 4-Year Longitudinal: [RHSP/DAP or FHSP-E/DLA Graduates] for All Students Rate',
   f'District {year - 1} 4-Year Longitudinal: [RHSP/DAP or FHSP-E/DLA Graduates] for Male Rate',
   f'District {year - 1} 4-Year Longitudinal: [RHSP/DAP or FHSP-E/DLA Graduates] for Female Rate',
   f'District {year - 1} 4-Year Longitudinal: [RHSP/DAP or FHSP-E/DLA Graduates] for African American Rate',
   f'District {year - 1} 4-Year Longitudinal: [RHSP/DAP or FHSP-E/DLA Graduates] for American Indian Rate',
   f'District {year - 1} 4-Year Longitudinal: [RHSP/DAP or FHSP-E/DLA Graduates] for Asian Rate',
   f'District {year - 1} 4-Year Longitudinal: [RHSP/DAP or FHSP-E/DLA Graduates] for Hispanic Rate',
   f'District {year - 1} 4-Year Longitudinal: [RHSP/DAP or FHSP-E/DLA Graduates] for Pacific Islander Rate',
   f'District {year - 1} 4-Year Longitudinal: [RHSP/DAP or FHSP-E/DLA Graduates] for White Rate',
   f'District {year - 1} 4-Year Longitudinal: [RHSP/DAP or FHSP-E/DLA Graduates] for Two or More Races Rate',
   f'District {year - 1} 4-Year Longitudinal: [RHSP/DAP or FHSP-E/DLA Graduates] for Econ Disadv Rate',
   f'District {year - 1} 4-Year Longitudinal: [RHSP/DAP or FHSP-E/DLA Graduates] for Special Ed Rate',
   f'District {year - 1} 4-Year Longitudinal: [RHSP/DAP or FHSP-E/DLA Graduates] for EB/EL Rate',
   f'District {year - 1} 4-Year Longitudinal: [RHSP/DAP or FHSP-E/DLA Graduates] for At Risk Rate',

    # AP/IB
   f'District {year - 1} AP/IB Course Completion Graduates: All Students Rate',
   f'District {year - 1} AP/IB Course Completion Graduates: African American Rate',
   f'District {year - 1} AP/IB Course Completion Graduates: Hispanic Rate',
   f'District {year - 1} AP/IB Course Completion Graduates: White Rate',
   f'District {year - 1} AP/IB Course Completion Graduates: American Indian Rate',
   f'District {year - 1} AP/IB Course Completion Graduates: Asian Rate',
   f'District {year - 1} AP/IB Course Completion Graduates: Pacific Islander Rate',
   f'District {year - 1} AP/IB Course Completion Graduates: Two or More Races Rate',
   f'District {year - 1} AP/IB Course Completion Graduates: Male Rate',
   f'District {year - 1} AP/IB Course Completion Graduates: Female Rate',
   f'District {year - 1} AP/IB Course Completion Graduates: Econ Disadv Rate',
   f'District {year - 1} AP/IB Course Completion Graduates: Special Ed Rate',
   f'District {year - 1} AP/IB Course Completion Graduates: EB/EL Rate',
   f'District {year - 1} AP/IB Course Completion Graduates: At Risk Rate',
   f'District {year - 1} AP/IB: All Students (All Subjects) % Taking',
   f'District {year - 1} AP/IB: Male (All Subjects) % Taking',
   f'District {year - 1} AP/IB: Female (All Subjects) % Taking',
   f'District {year - 1} AP/IB: African American (All Subjects) % Taking',
   f'District {year - 1} AP/IB: American Indian (All Subjects) % Taking',
   f'District {year - 1} AP/IB: Asian (All Subjects) % Taking',
   f'District {year - 1} AP/IB: Hispanic (All Subjects) % Taking',
   f'District {year - 1} AP/IB: Two or More Races (All Subjects) % Taking',
   f'District {year - 1} AP/IB: Pacific Islander (All Subjects) % Taking',
   f'District {year - 1} AP/IB: White (All Subjects) % Taking',
   f'District {year - 1} AP/IB: Special Ed (All Subjects) % Taking',
   f'District {year - 1} AP/IB: Econ Disadv (All Subjects) % Taking',
   f'District {year - 1} AP/IB: EB/EL (All Subjects) % Taking',
   f'District {year - 1} AP/IB: At Risk (All Subjects) % Taking',
   f'District {year - 1} AP/IB: All Students (All Subjects) % Students Above Criterion',
   f'District {year - 1} AP/IB: Female (All Subjects) % Students Above Criterion',
   f'District {year - 1} AP/IB: Male (All Subjects) % Students Above Criterion',
   f'District {year - 1} AP/IB: African American (All Subjects) % Students Above Criterion',
   f'District {year - 1} AP/IB: American Indian (All Subjects) % Students Above Criterion',
   f'District {year - 1} AP/IB: Asian (All Subjects) % Students Above Criterion',
   f'District {year - 1} AP/IB: Hispanic (All Subjects) % Students Above Criterion',
   f'District {year - 1} AP/IB: Two or More Races (All Subjects) % Students Above Criterion',
   f'District {year - 1} AP/IB: Pacific Islander (All Subjects) % Students Above Criterion',
   f'District {year - 1} AP/IB: White (All Subjects) % Students Above Criterion',
   f'District {year - 1} AP/IB: Special Ed (All Subjects) % Students Above Criterion',
   f'District {year - 1} AP/IB: Econ Disadv (All Subjects) % Students Above Criterion',
   f'District {year - 1} AP/IB: EB/EL (All Subjects) % Students Above Criterion',
   f'District {year - 1} AP/IB: At Risk (All Subjects) % Students Above Criterion',

    # SAT/ACT
   f'District {year - 1} SAT/ACT: All Students, % Above Criterion',
   f'District {year - 1} SAT/ACT: Female Students, % Above Criterion',
   f'District {year - 1} SAT/ACT: Male Students, % Above Criterion',
   f'District {year - 1} SAT/ACT: African American Students, % Above Criterion',
   f'District {year - 1} SAT/ACT: American Indian Students, % Above Criterion',
   f'District {year - 1} SAT/ACT: Asian Students, % Above Criterion',
   f'District {year - 1} SAT/ACT: Hispanic Students, % Above Criterion',
   f'District {year - 1} SAT/ACT: Two or More Races Students, % Above Criterion',
   f'District {year - 1} SAT/ACT: Pacific Islander Students, % Above Criterion',
   f'District {year - 1} SAT/ACT: White Students, % Above Criterion',
   f'District {year - 1} SAT/ACT: Special Ed Students, % Above Criterion',
   f'District {year - 1} SAT/ACT: Econ Disadv Students, % Above Criterion',
   f'District {year - 1} SAT/ACT: EL Students, % Above Criterion',
   f'District {year - 1} SAT/ACT: At Risk Students, % Above Criterion',
   f'District {year - 1} SAT/ACT: All Students, % Test-Taking',
   f'District {year - 1} SAT/ACT: Female Students, % Test-Taking',
   f'District {year - 1} SAT/ACT: Male Students, % Test-Taking',
   f'District {year - 1} SAT/ACT: African American Students, % Test-Taking',
   f'District {year - 1} SAT/ACT: American Indian Students, % Test-Taking',
   f'District {year - 1} SAT/ACT: Asian Students, % Test-Taking',
   f'District {year - 1} SAT/ACT: Hispanic Students, % Test-Taking',
   f'District {year - 1} SAT/ACT: Two or More Races Students, % Test-Taking',
   f'District {year - 1} SAT/ACT: Pacific Islander Students, % Test-Taking',
   f'District {year - 1} SAT/ACT: White Students, % Test-Taking',
   f'District {year - 1} SAT/ACT: Special Ed Students, % Test-Taking',
   f'District {year - 1} SAT/ACT: Econ Disadv Students, % Test-Taking',
   f'District {year - 1} SAT/ACT: EL Students, % Test-Taking',
   f'District {year - 1} SAT/ACT: At Risk Students, % Test-Taking',
   f'District {year - 1} SAT/ACT: All Students, % Graduates Above Criterion',
   f'District {year - 1} SAT/ACT: Male Students, % Graduates Above Criterion',
   f'District {year - 1} SAT/ACT: Female Students, % Graduates Above Criterion',
   f'District {year - 1} SAT/ACT: African American Students, % Graduates Above Criterion',
   f'District {year - 1} SAT/ACT: Hispanic Students, % Graduates Above Criterion',
   f'District {year - 1} SAT/ACT: White Students, % Graduates Above Criterion',
   f'District {year - 1} SAT/ACT: American Indian Students, % Graduates Above Criterion',
   f'District {year - 1} SAT/ACT: Asian Students, % Graduates Above Criterion',
   f'District {year - 1} SAT/ACT: Pacific Islander Students, % Graduates Above Criterion',
   f'District {year - 1} SAT/ACT: Two or More Races Students, % Graduates Above Criterion',
   f'District {year - 1} SAT/ACT: Econ Disadv Students, % Graduates Above Criterion',
   f'District {year - 1} SAT/ACT: At Risk Students, % Graduates Above Criterion',
   f'District {year - 1} SAT/ACT: EL Students, % Graduates Above Criterion',
   f'District {year - 1} SAT/ACT: Special Ed Students, % Graduates Above Criterion',
]
    existing_cols = ['DISTNAME', 'DISTRICT_id'] + [col for col in additional_columns if col in df.columns]
    return df[existing_cols].copy()

In [45]:
outcome_data = get_existing_columns(df24, 2024)
outcome_data.head(5)

Unnamed: 0,DISTNAME,DISTRICT_id,DFLCHART,DFLALTED,TEA Description,NCES Description,Charter School (Y/N),District 2024 Student Membership: All Students Count,District 2024 Student Membership: Male Percent,District 2024 Student Membership: Female Percent,...,"District 2023 SAT/ACT: Hispanic Students, % Graduates Above Criterion","District 2023 SAT/ACT: White Students, % Graduates Above Criterion","District 2023 SAT/ACT: American Indian Students, % Graduates Above Criterion","District 2023 SAT/ACT: Asian Students, % Graduates Above Criterion","District 2023 SAT/ACT: Pacific Islander Students, % Graduates Above Criterion","District 2023 SAT/ACT: Two or More Races Students, % Graduates Above Criterion","District 2023 SAT/ACT: Econ Disadv Students, % Graduates Above Criterion","District 2023 SAT/ACT: At Risk Students, % Graduates Above Criterion","District 2023 SAT/ACT: EL Students, % Graduates Above Criterion","District 2023 SAT/ACT: Special Ed Students, % Graduates Above Criterion"
0,CAYUGA ISD,1902,N,N,Rural,Rural-Remote,N,588,52.0,48.0,...,,34.4,,,,,20.0,8.3,,
1,ELKHART ISD,1903,N,N,Non-metropolitan Stable,Rural-Distant,N,1196,51.1,48.9,...,37.5,9.6,,,,,13.3,8.7,,0.0
2,FRANKSTON ISD,1904,N,N,Rural,Rural-Distant,N,802,51.5,48.5,...,50.0,34.1,,,,,16.7,0.0,,14.3
3,NECHES ISD,1906,N,N,Rural,Rural-Distant,N,326,49.4,50.6,...,,9.5,,,,,15.4,,,
4,PALESTINE ISD,1907,N,N,Independent Town,Town-Distant,N,3244,53.0,47.0,...,20.9,31.0,,,,0.0,13.9,3.6,0.0,0.0
