### Subject-Level Insights
1. Which subject shows the highest proficiency rates for elementary/middle school students?
2. Are there subjects where a significant percentage of students consistently score at Level 1?
3. How do proficiency rates in Science compare to Math and ELA in middle school?
4. What are the top- and bottom-performing subjects in the Regents Exams?
5. Is there a significant difference in performance between Math Regents and ELA Regents?

#### Which subject shows the highest proficiency rates for elementary/middle school students?

In [2]:
import pandas as pd

In [4]:
# Load datasets
data_ela = pd.read_excel(r'E:\Data Analytics\NYSE Report Card\Annual_EM_ELA.xlsx')

In [6]:
data_math = pd.read_excel(r'E:\Data Analytics\NYSE Report Card\Annual_EM_MATH.xlsx')

In [9]:
data_science = pd.read_excel(r'E:\Data Analytics\NYSE Report Card\Annual_EM_SCIENCE.xlsx')

In [12]:
# Ensure PER_PROF is numeric
data_ela['PER_PROF'] = pd.to_numeric(data_ela['PER_PROF'], errors='coerce')
data_math['PER_PROF'] = pd.to_numeric(data_math['PER_PROF'], errors='coerce')
data_science['PER_PROF'] = pd.to_numeric(data_science['PER_PROF'], errors='coerce')

In [14]:
# Calculate mean and proficiency rates by subject
ela_proficiency = data_ela['PER_PROF'].mean()
math_proficiency = data_math['PER_PROF'].mean()
science_proficiency = data_science['PER_PROF'].mean()

In [16]:
# Determine the subjects with high proficiency rate
subjects = ["ELA", "Math", "Science"]
proficiencies = [ela_proficiency, math_proficiency, science_proficiency]

In [18]:
highest_subject = subjects[proficiencies.index(max(proficiencies))]
highest_rate = max(proficiencies)

In [20]:
print(f"The subject with highest proficiency rate is {highest_subject} with a rate of {highest_rate:.2f}%.")

The subject with highest proficiency rate is Math with a rate of 48.31%.


#### Are there subjects where a significant percentage of students consistently score at Level 1?

In [24]:
# Combine all subjects and ensure LEVEL1_%TESTED is numeric
data_ela['LEVEL1_%TESTED'] = pd.to_numeric(data_ela['LEVEL1_%TESTED'], errors='coerce')
data_math['LEVEL1_%TESTED'] = pd.to_numeric(data_math['LEVEL1_%TESTED'], errors='coerce')
data_science['LEVEL1_%TESTED'] = pd.to_numeric(data_science['LEVEL1_%TESTED'], errors='coerce')

In [26]:
# Calculate mean percentage of students scoring at level 1
ela_level1 = data_ela['LEVEL1_%TESTED'].mean()
math_level1 = data_math['LEVEL1_%TESTED'].mean()
science_level1 = data_science['LEVEL1_%TESTED'].mean()

In [28]:
# Find subjects with high percentages at level 1
level1_data = {'ELA': ela_level1, "Math": math_level1, "Science": science_level1}
significant_level1 = {subject: pct for subject, pct in level1_data.items() if pct > 30}

In [30]:
print("Subjects with significant Level 1 Scores:")
print(significant_level1)

Subjects with significant Level 1 Scores:
{}


#### How do proficiency rates in Science compare to Math and ELA in middle school?

In [34]:
# Ensure PER_PROF is numeric and filter for middle school data
data_ela_middle = data_ela[data_ela["ASSESSMENT_NAME"].str.contains("ELA3|ELA4|ELA5|ELA6|ELA7|ELA8", na=False)]
data_math_middle = data_math[data_math["ASSESSMENT_NAME"].str.contains("Math3|Math4|Math5|Math6|Math7|Math8", na=False)]
data_science_middle = data_science[data_science["ASSESSMENT_NAME"].str.contains("Science5|Science8", na=False)]

In [36]:
# Calculate mean proficiency rates
ela_middle_proficiency = data_ela_middle["PER_PROF"].mean()
math_middle_proficiency = data_math_middle["PER_PROF"].mean()
science_middle_proficiency = data_science_middle["PER_PROF"].mean()

In [38]:
print(f"Middle School Proficiency Rates:")
print(f"ELA: {ela_middle_proficiency:.2f}%")
print(f"Math: {math_middle_proficiency:.2f}%")
print(f"Science: {science_middle_proficiency:.2f}%")

Middle School Proficiency Rates:
ELA: 43.36%
Math: 90.08%
Science: 38.71%


#### What are the top- and bottom-performing subjects in the Regents Exams?

In [41]:
# Load Regents data
data_regents = pd.read_excel(r'E:\Data Analytics\NYSE Report Card\Annual_Regents_Exams.xlsx')

In [44]:
data_regents["PER_PROF"] = pd.to_numeric(data_regents["PER_PROF"], errors="coerce")

In [46]:
# Calculate mean proficiency rates by subject
regents_proficiency = data_regents.groupby("SUBJECT")["PER_PROF"].mean()

In [48]:
# Find top and bottom performing subjects
top_subject = regents_proficiency.idxmax()
top_rate = regents_proficiency.max()

In [50]:
bottom_subject = regents_proficiency.idxmin()
bottom_rate = regents_proficiency.min()

In [52]:
print(f"Top Performing Subject: {top_subject} with {top_rate:.2f}%")
print(f"Bottom Performing Subject: {bottom_subject} with {bottom_rate:.2f}%")

Top Performing Subject: Regents Common Core English Language Art with 76.44%
Bottom Performing Subject: Regents Common Core Geometry with 56.09%


#### Is there a significant difference in performance between Math Regents and ELA Regents?

In [56]:
data_regents.head()

Unnamed: 0,INSTITUTION_ID,ENTITY_CD,ENTITY_NAME,YEAR,SUBJECT,SUBGROUP_NAME,TESTED,NUM_LEVEL1,PER_LEVEL1,NUM_LEVEL2,PER_LEVEL2,NUM_LEVEL3,PER_LEVEL3,NUM_LEVEL4,PER_LEVEL4,NUM_LEVEL5,PER_LEVEL5,NUM_PROF,PER_PROF
0,,1,High Need/Resource Category: New York City Dis...,2023,Regents US History&Gov't (Framework),All Students,57050,7571,13,8982,16,17478,31,15720,28,7299,13,40497,71.0
1,,1,High Need/Resource Category: New York City Dis...,2023,Regents NF Global History,All Students,67848,11613,17,11037,16,23687,35,9180,14,12331,18,45198,67.0
2,,1,High Need/Resource Category: New York City Dis...,2023,Regents Common Core Algebra II,All Students,40168,16005,40,5620,14,8932,22,4689,12,4922,12,18543,46.0
3,,1,High Need/Resource Category: New York City Dis...,2023,Regents Common Core Geometry,All Students,50999,22131,43,8360,16,11561,23,3065,6,5882,12,20508,40.0
4,,1,High Need/Resource Category: New York City Dis...,2023,Regents Common Core Algebra I,All Students,97407,24388,25,18712,19,33709,35,10256,11,10342,11,54307,56.0


In [58]:
data_regents['SUBJECT']

0             Regents US History&Gov't (Framework)
1                        Regents NF Global History
2                   Regents Common Core Algebra II
3                     Regents Common Core Geometry
4                    Regents Common Core Algebra I
                            ...                   
809215    Regents Common Core English Language Art
809216              Regents Common Core Algebra II
809217               Regents Common Core Algebra I
809218                           Regents Algebra I
809219                   Regents Phy Set/Earth Sci
Name: SUBJECT, Length: 809220, dtype: object

In [60]:
# Filter Regents data for Math and ELA subjects
math_regents = data_regents[data_regents["SUBJECT"].str.contains("Math", na=False)]
ela_regents = data_regents[data_regents["SUBJECT"].str.contains("English Language Art", na=False)]

In [62]:
# Calculate mean proficiency rates
math_regents_proficiency = math_regents["PER_PROF"].mean()
ela_regents_proficiency = ela_regents["PER_PROF"].mean()

In [64]:
# Compute the difference
difference = abs(math_regents_proficiency - ela_regents_proficiency)

In [66]:
print(f"Math Regents Proficiency: {math_regents_proficiency:.2f}%")
print(f"ELA Regents Proficiency: {ela_regents_proficiency:.2f}%")
print(f"Difference: {difference:.2f}%")

Math Regents Proficiency: nan%
ELA Regents Proficiency: 76.44%
Difference: nan%
