## PyCitySchools Analysis
* In conclusion, schools with higher budgets, did not yield better results in academic achievements. Surprisingly, schools with higher spending per student underperformed more than those with lower capita expenditure.
* As a whole, smaller and medium sized schools significantly outperformed larger schools based on math score, reading score and overall passing score.
* Overall, Independent Schools exhibited a higher level of academic success in comparison to the Government institutes across all metrics. However, further analysis would be requried to establish whether this difference is due to resources available or teacher-to-student ratio; when comparing data it was evident that there is a vast difference in sizing of private setups compared to government schools.

Dataset limitations:
Upon viewing the data, I observed an imbalance, with 8 Independent Schools compared to 7 Government Schools. Ideally, for a fair comparison, an equal amount of data for each school type would be preferable.

Additionally, there is a notable difference in the size of the government schools. For a more equitable analysis, it would ne beneficial to generate insights based on an equal number of students and their achievements. This approach could offer more in-depth analysis, as average scored and overall passing percentages might be more sensitive to fluctuations when based ona  lower number of students, with a single poor result potentially exerting more influence on the overall academic success rate.

In [1]:
# Dependencies and Setup
import pandas as pd
from pathlib import Path

# File to Load (Remember to Change These)
school_data_to_load = Path("../Resources/schools_complete.csv")
student_data_to_load = Path("../Resources/students_complete.csv")

# Read School and Student Data File and store into Pandas DataFrames
school_data = pd.read_csv(school_data_to_load)
student_data = pd.read_csv(student_data_to_load)

# Combine the data into a single dataset.  
school_data_complete = pd.merge(student_data, school_data, how="left", on=["school_name", "school_name"])
school_data_complete.head() 

Unnamed: 0,Student ID,student_name,gender,year,school_name,reading_score,maths_score,School ID,type,size,budget
0,0,Paul Bradley,M,9,Huang High School,96,94,0,Government,2917,1910635
1,1,Victor Smith,M,12,Huang High School,90,43,0,Government,2917,1910635
2,2,Kevin Rodriguez,M,12,Huang High School,41,76,0,Government,2917,1910635
3,3,Richard Scott,M,12,Huang High School,89,86,0,Government,2917,1910635
4,4,Bonnie Ray,F,9,Huang High School,87,69,0,Government,2917,1910635


## Local Government Area Summary
* There is 15 different schools.
* The total number of students is 39170.
* The toal budget for the area is $ 24649428.
* The avreage maths score in the district schools is 70.33819249425581.
* The average reading score in the district schools is 69.98013786060761.
* The percentage of students with a passing maths score (50 or greater) is equal to %86.07863160582077.
* The percentage of students with a passing reading score (50 or greater) is equal to %84.42685728874139.
* The percentage of students who passed both reading and maths is equal to %72.80827163645647.

In [2]:
#Counting the number of schools
school_count = len(school_data_complete["school_name"].unique())
school_count

15

In [3]:
#Counting the number of students
student_count = len(student_data["Student ID"].unique())
student_count

39170

In [4]:
#Calculating total budget
budget_total = school_data["budget"].sum()
budget_total

24649428

In [5]:
#Calculating average maths score
avreage_maths_score = student_data["maths_score"].mean()
avreage_maths_score

70.33819249425581

In [6]:
#Calculating average reading score
average_reading_score = student_data["reading_score"].mean()
average_reading_score

69.98013786060761

In [7]:
#Calculating the percentage of students with a passing score (50 or higher) in maths
passing_score = student_data[(student_data["maths_score"]>=50)].count()["student_name"]
passing_score_percentage = passing_score/float(student_count)*100
passing_score_percentage

86.07863160582077

In [8]:
#Calculating the percentage of students with a passing score (50 or higher) in reading
reading_passing_score = school_data_complete[(school_data_complete["reading_score"]>=50)].count()["student_name"]
reading_passing_score_percentage = reading_passing_score/float(student_count)*100
reading_passing_score_percentage

84.42685728874139

In [9]:
#Calculating the percentage of students with a passing score (50 or higher) in reading and maths
both_passing_score = school_data_complete[(school_data_complete["reading_score"]>=50) & (school_data_complete["maths_score"]>=50)].count()["student_name"]
both_passing_score_percentage = both_passing_score/float(student_count)*100
both_passing_score_percentage

72.80827163645647

In [10]:
#Creating dataframe to hold the results
area_summary = pd.DataFrame({"Total Schools" : [school_count],
                             "Total Students" : [student_count],
                             "Total Budget" : [budget_total],
                             "Average Maths Score" : [avreage_maths_score],
                             "Average Reading Score" : [average_reading_score],
                             "% Passing Maths" : [passing_score_percentage],
                             "% Passing Reading" : [reading_passing_score_percentage],
                             "% Overall Passing" : [both_passing_score_percentage]})

#Copying the summary to keep the original for future reference
area_summary_copy = area_summary.copy()

#Setting the right format
area_summary_copy["Total Students"] = area_summary_copy["Total Students"].map("{:,}".format)
area_summary_copy["Total Budget"] = area_summary_copy["Total Budget"].map("${:,.2f}".format)
area_summary_copy["Average Maths Score"] = area_summary_copy["Average Maths Score"].map("{:,.6f}".format)
area_summary_copy["Average Reading Score"] = area_summary_copy["Average Reading Score"].map("{:,.6f}".format)
area_summary_copy["% Passing Maths"] = area_summary_copy["% Passing Maths"].map("{:,.3f}%".format)
area_summary_copy["% Passing Reading"] = area_summary_copy["% Passing Reading"].map("{:,.3f}%".format)
area_summary_copy["% Overall Passing"] = area_summary_copy["% Overall Passing"].map("{:,.3f}%".format)

area_summary_copy

Unnamed: 0,Total Schools,Total Students,Total Budget,Average Maths Score,Average Reading Score,% Passing Maths,% Passing Reading,% Overall Passing
0,15,39170,"$24,649,428.00",70.338192,69.980138,86.079%,84.427%,72.808%


## School Summary
* Government Schools are significantly larger by size in comparison Independed Schools.

    For Independent setup - 427 to 2283 students.
    
    For Government setup - 2739 to 4976 students.
    
* A surprising finding is that budget per student in the independent setup is not higher, contrary to what general population might assume. According to this data, government schools appear to receive better funding per capita.

    For Independent setup - $578 to $638 per student.
    
    For Government setup - $628 to $655 per student.

While Government Schools often have larger budgets, this does not necessarily translate into higher academic performance.
* Schools operating independently, with a few exceptions, generally outperformed government schools in passing rates of mathematics, reading and overall success. The data indicates that better outcomes are more prevalent in private educational setups.
* The overall passing rate is slightly higher in mathematics 80-91% other than in reading 81-89%.



In [11]:
#Sourcing schooltypes from the dataset (School name and type)
school_types = school_data.set_index(["school_name"])["type"]
school_types.head()


school_name
Huang High School         Government
Figueroa High School      Government
Shelton High School      Independent
Hernandez High School     Government
Griffin High School      Independent
Name: type, dtype: object

In [12]:
#Calculating number of students per school
students_per_school = school_data.set_index(["school_name"])["size"]
students_per_school.head()

school_name
Huang High School        2917
Figueroa High School     2949
Shelton High School      1761
Hernandez High School    4635
Griffin High School      1468
Name: size, dtype: int64

In [13]:
#Sourcing total school budget
budget_per_school = school_data.set_index(["school_name"])["budget"]
budget_per_school.head()

school_name
Huang High School        1910635
Figueroa High School     1884411
Shelton High School      1056600
Hernandez High School    3022020
Griffin High School       917500
Name: budget, dtype: int64

In [14]:
#Calculating budget per student
budget_per_student = budget_per_school/students_per_school
budget_per_student.head()

school_name
Huang High School        655.0
Figueroa High School     639.0
Shelton High School      600.0
Hernandez High School    652.0
Griffin High School      625.0
dtype: float64

In [15]:
#Grouping the school
grouped_school = school_data_complete.groupby(["school_name"])
#Calculating avreage math score depending on the school
ave_maths_score = grouped_school["maths_score"].mean()
ave_maths_score.head()



school_name
Bailey High School      72.352894
Cabrera High School     71.657158
Figueroa High School    68.698542
Ford High School        69.091274
Griffin High School     71.788147
Name: maths_score, dtype: float64

In [16]:
#Calculating avreage reading score depending on the school
ave_reading_score = grouped_school["reading_score"].mean()

ave_reading_score.head()

school_name
Bailey High School      71.008842
Cabrera High School     71.359526
Figueroa High School    69.077993
Ford High School        69.572472
Griffin High School     71.245232
Name: reading_score, dtype: float64

In [17]:
#Sourcing data of students who passed in maths
pass_math_school = school_data_complete[school_data_complete["maths_score"]>=50]
pass_math_school.head()

Unnamed: 0,Student ID,student_name,gender,year,school_name,reading_score,maths_score,School ID,type,size,budget
0,0,Paul Bradley,M,9,Huang High School,96,94,0,Government,2917,1910635
2,2,Kevin Rodriguez,M,12,Huang High School,41,76,0,Government,2917,1910635
3,3,Richard Scott,M,12,Huang High School,89,86,0,Government,2917,1910635
4,4,Bonnie Ray,F,9,Huang High School,87,69,0,Government,2917,1910635
5,5,Bryan Miranda,M,9,Huang High School,88,93,0,Government,2917,1910635


In [18]:
#Sourcing data of students who passed in reading
pass_reading_school = school_data_complete[school_data_complete["reading_score"]>=50]
pass_reading_school.head()

Unnamed: 0,Student ID,student_name,gender,year,school_name,reading_score,maths_score,School ID,type,size,budget
0,0,Paul Bradley,M,9,Huang High School,96,94,0,Government,2917,1910635
1,1,Victor Smith,M,12,Huang High School,90,43,0,Government,2917,1910635
3,3,Richard Scott,M,12,Huang High School,89,86,0,Government,2917,1910635
4,4,Bonnie Ray,F,9,Huang High School,87,69,0,Government,2917,1910635
5,5,Bryan Miranda,M,9,Huang High School,88,93,0,Government,2917,1910635


In [19]:
#Sourcing data of students who passed in both
pass_both_school = school_data_complete[(school_data_complete["maths_score"]>=50) & (school_data_complete["reading_score"]>=50) ]
pass_both_school.head()

Unnamed: 0,Student ID,student_name,gender,year,school_name,reading_score,maths_score,School ID,type,size,budget
0,0,Paul Bradley,M,9,Huang High School,96,94,0,Government,2917,1910635
3,3,Richard Scott,M,12,Huang High School,89,86,0,Government,2917,1910635
4,4,Bonnie Ray,F,9,Huang High School,87,69,0,Government,2917,1910635
5,5,Bryan Miranda,M,9,Huang High School,88,93,0,Government,2917,1910635
6,6,Sheena Carter,F,11,Huang High School,73,60,0,Government,2917,1910635


In [20]:
#Calculating the percentages of students who passed maths
pass_math_school_percentage = pass_math_school.groupby(["school_name"]).count()["student_name"]/ students_per_school*100
pass_math_school_percentage.head()

school_name
Bailey High School      91.639871
Cabrera High School     90.850377
Figueroa High School    81.654798
Ford High School        82.438846
Griffin High School     91.212534
dtype: float64

In [21]:
#Calculating the percentages of students who passed reading
pass_reading_school_percentage = pass_reading_school.groupby(["school_name"]).count()["student_name"]/ students_per_school*100
pass_reading_school_percentage.head()

school_name
Bailey High School      87.379421
Cabrera High School     89.074273
Figueroa High School    82.807731
Ford High School        82.219788
Griffin High School     88.487738
dtype: float64

In [22]:
#Calculating the percentages of students who passed both in maths and reading
pass_both_school_percentage = pass_both_school.groupby(["school_name"]).count()["student_name"]/ students_per_school*100
pass_both_school_percentage

school_name
Bailey High School       80.084405
Cabrera High School      80.785791
Figueroa High School     67.650051
Ford High School         67.469880
Griffin High School      81.335150
Hernandez High School    66.364617
Holden High School       78.922717
Huang High School        66.712376
Johnson High School      67.191766
Pena High School         79.209979
Rodriguez High School    79.419855
Shelton High School      78.875639
Thomas High School       69.480122
Wilson High School       67.455103
Wright High School       79.722222
dtype: float64

In [23]:

#Creating dataframe to hold the results
school_area_summary = pd.DataFrame({"School Type" : school_types,
                             "Total Students" : students_per_school,
                             "Total School Budget" : budget_per_school,
                             "Per Student Budget" : budget_per_student,
                             "Average Maths Score" : ave_maths_score,
                             "Average Reading Score" : ave_reading_score,
                             "% Passing Maths" : pass_math_school_percentage,
                             "% Passing Reading" : pass_reading_school_percentage,
                             "% Overall Passing" : pass_both_school_percentage})

#Copying the summary to keep the original for future reference
school_area_summary_copy = school_area_summary.copy()

#Setting the right format
# Formatting
school_area_summary_copy["Total School Budget"] = school_area_summary_copy["Total School Budget"].map("${:,.2f}".format)
school_area_summary_copy["Per Student Budget"] = school_area_summary_copy["Per Student Budget"].map("${:,.2f}".format)
school_area_summary_copy["Average Maths Score"] = school_area_summary_copy["Average Maths Score"].map("{:,.6f}".format)
school_area_summary_copy["Average Reading Score"] = school_area_summary_copy["Average Reading Score"].map("{:,.6f}".format)
school_area_summary_copy["% Passing Math"] = school_area_summary_copy["% Passing Maths"].map("{:,.3f}%".format)
school_area_summary_copy["% Passing Reading"] = school_area_summary_copy["% Passing Reading"].map("{:,.3f}%".format)
school_area_summary_copy["% Overall Passing"] = school_area_summary_copy["% Overall Passing"].map("{:,.3f}%".format)


#Displaying the table
school_area_summary_copy



Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Per Student Budget,Average Maths Score,Average Reading Score,% Passing Maths,% Passing Reading,% Overall Passing,% Passing Math
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Bailey High School,Government,4976,"$3,124,928.00",$628.00,72.352894,71.008842,91.639871,87.379%,80.084%,91.640%
Cabrera High School,Independent,1858,"$1,081,356.00",$582.00,71.657158,71.359526,90.850377,89.074%,80.786%,90.850%
Figueroa High School,Government,2949,"$1,884,411.00",$639.00,68.698542,69.077993,81.654798,82.808%,67.650%,81.655%
Ford High School,Government,2739,"$1,763,916.00",$644.00,69.091274,69.572472,82.438846,82.220%,67.470%,82.439%
Griffin High School,Independent,1468,"$917,500.00",$625.00,71.788147,71.245232,91.212534,88.488%,81.335%,91.213%
Hernandez High School,Government,4635,"$3,022,020.00",$652.00,68.874865,69.186408,80.949299,81.877%,66.365%,80.949%
Holden High School,Independent,427,"$248,087.00",$581.00,72.583138,71.660422,89.929742,88.525%,78.923%,89.930%
Huang High School,Government,2917,"$1,910,635.00",$655.00,68.935207,68.910525,81.693521,81.454%,66.712%,81.694%
Johnson High School,Government,4761,"$3,094,650.00",$650.00,68.8431,69.039277,82.062592,81.979%,67.192%,82.063%
Pena High School,Independent,962,"$585,858.00",$609.00,72.088358,71.613306,91.683992,86.590%,79.210%,91.684%


## Highest-Performing Schools (by % Overall Passing)
This analysis provides insights into key metrics for the selected schools. The comparison between independent and government schools reveals variations in academic persormance and budget utilization. Independent schools consistently exhibit strong academic achievements, while government schools face the challenge of serving larger students populations.

In [24]:
top_schools = school_area_summary.sort_values(by='% Overall Passing', ascending = False)
top_schools.head()

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Per Student Budget,Average Maths Score,Average Reading Score,% Passing Maths,% Passing Reading,% Overall Passing
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Griffin High School,Independent,1468,917500,625.0,71.788147,71.245232,91.212534,88.487738,81.33515
Cabrera High School,Independent,1858,1081356,582.0,71.657158,71.359526,90.850377,89.074273,80.785791
Bailey High School,Government,4976,3124928,628.0,72.352894,71.008842,91.639871,87.379421,80.084405
Wright High School,Independent,1800,1049400,583.0,72.047222,70.969444,91.777778,86.666667,79.722222
Rodriguez High School,Government,3999,2547363,637.0,72.047762,70.935984,90.797699,87.396849,79.419855


## Lowest-Performing Schools (by % Overall Passing)
The analysis sheds light on the academic performance and challenges faced by selected Government and Independent schools. Government schools face bigger difficulties in achieveing high overall passing percentage. Based on this data, it is evident that schools in the private setup are consistently performing at a higher level. In the bottom five schools, the independent setup is represented only once, indicating a 4:1 ratio in favour of private institutions.

Surprisingly, lowest-performing schools have higher budget per capita; based on this information it is clear that budget has no implication on better performance.

In [25]:
bottom_schools = school_area_summary.sort_values(by='% Overall Passing', ascending = True)
bottom_schools.head()

Unnamed: 0_level_0,School Type,Total Students,Total School Budget,Per Student Budget,Average Maths Score,Average Reading Score,% Passing Maths,% Passing Reading,% Overall Passing
school_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Hernandez High School,Government,4635,3022020,652.0,68.874865,69.186408,80.949299,81.877023,66.364617
Huang High School,Government,2917,1910635,655.0,68.935207,68.910525,81.693521,81.453548,66.712376
Johnson High School,Government,4761,3094650,650.0,68.8431,69.039277,82.062592,81.978576,67.191766
Wilson High School,Independent,2283,1319574,578.0,69.170828,68.876916,82.785808,81.29654,67.455103
Ford High School,Government,2739,1763916,644.0,69.091274,69.572472,82.438846,82.219788,67.46988


## Maths Scores by Year
* Math Score for 9th Year ranges between 68.47 to 72.93.
* Math Score for 10th Year ranges between 68.33 to 75.10.
* Math Score for 11th Year ranges between 68.37 to 73.27.
* Math Score for 12th Year ranges between 68.61 - 73.40.

Overall, the students enrolled in all grades has similar results ranging between 68 to 75.


In [26]:

#Separating data based on a year
ninth_year = student_data[student_data["year"] == 9]
tenth_year = student_data[student_data["year"] == 10]
eleventh_year = student_data[student_data["year"] == 11]
twelfth_year = student_data[student_data["year"] == 12]

#Grouping data by the school name and taking the average score in maths
ninth_year_maths_score = ninth_year.groupby(['school_name'])['maths_score'].mean()
tenth_year_maths_score = tenth_year.groupby(['school_name'])['maths_score'].mean()
eleventh_year_maths_score = eleventh_year.groupby(['school_name'])['maths_score'].mean()
twelfth_year_maths_score = twelfth_year.groupby(['school_name'])['maths_score'].mean()

#Creating a dataframe to hold the results
maths_scores_by_year = pd.DataFrame({"9th": ninth_year_maths_score,
                                    "10th": tenth_year_maths_score,
                                    "11th": eleventh_year_maths_score,
                                    "12th": twelfth_year_maths_score
                                    })

#Displaying and setting the index
maths_scores_by_year.index.name = None
maths_scores_by_year 



Unnamed: 0,9th,10th,11th,12th
Bailey High School,72.493827,71.897498,72.3749,72.675097
Cabrera High School,72.32197,72.437768,71.008299,70.604712
Figueroa High School,68.477804,68.331586,68.811001,69.325282
Ford High School,69.021609,69.387006,69.248862,68.617811
Griffin High School,72.789731,71.093596,71.692521,71.469178
Hernandez High School,68.586831,68.867156,69.154412,68.985075
Holden High School,70.543307,75.105263,71.640777,73.409639
Huang High School,69.081754,68.533246,69.431345,68.639316
Johnson High School,69.469286,67.99022,68.63773,69.287393
Pena High School,71.996364,72.396,72.523438,71.187845


## Reading Scores by Year
* Reading Score for 9th Year ranges between 68.47 to 72.93.
* Reading Score for 10th Year ranges between 67.99 to 75.10.
* Reading Score for 11th Year ranges between 68.37 to 72.67.
* Reading Score for 12th Year ranges between 68.61 to 72.67.

Overall, the students enrolled in all grades has similar results ranging between 67 to 75. With 10th year achieving lowest and highest score amongst all.

In [27]:
#Separating data based on a year
ninth_year = student_data[student_data["year"] == 9]
tenth_year = student_data[student_data["year"] == 10]
eleventh_year = student_data[student_data["year"] == 11]
twelfth_year = student_data[student_data["year"] == 12]

#Grouping data by the school name and taking the average score in maths
ninth_year_reading_score = ninth_year.groupby(['school_name'])['reading_score'].mean()
tenth_year_reading_score = tenth_year.groupby(['school_name'])['reading_score'].mean()
eleventh_year_reading_score = eleventh_year.groupby(['school_name'])['reading_score'].mean()
twelfth_year_reading_score = twelfth_year.groupby(['school_name'])['reading_score'].mean()

#Creating a dataframe to hold the results
reading_scores_by_year = pd.DataFrame({"9th": ninth_year_reading_score,
                                    "10th": tenth_year_reading_score,
                                    "11th": eleventh_year_reading_score,
                                    "12th": twelfth_year_reading_score
                                    })

#Displaying and setting the index
reading_scores_by_year.index.name = None 
reading_scores_by_year 

Unnamed: 0,9th,10th,11th,12th
Bailey High School,70.90192,70.848265,70.317346,72.195525
Cabrera High School,71.172348,71.328326,71.201245,71.856021
Figueroa High School,70.261682,67.677588,69.152327,69.082126
Ford High School,69.615846,68.988701,70.735964,68.849722
Griffin High School,72.026895,70.746305,72.385042,69.434932
Hernandez High School,68.477569,70.621842,68.418199,69.244136
Holden High School,71.598425,71.096491,73.31068,70.481928
Huang High School,68.670616,69.516297,68.740638,68.671795
Johnson High School,68.719286,69.295029,69.969115,67.992521
Pena High School,70.949091,72.324,71.703125,71.513812


## Scores by School Spending
* Schools with a budget per student less than %585 demonstrate reasonable avreage math score and reading score. The passing rates for both are relatively high, resulting in an overall passing rate of 76.72%.
* As spending per student increases the performance declines. This indicates that extremely high spending may not corelate with improved academic outcomes. 


In [28]:
# Establishing the bins
spending_bins = [0, 585, 630, 645, 680]
bin_labels = ["<$585", "$585-630", "$630-645", "$645-680"]


In [29]:

#Creating a copy of dataframe
spending_summary = school_area_summary.loc[:,['Average Maths Score',
                                         'Average Reading Score',
                                         '% Passing Maths',
                                         '% Passing Reading',
                                         '% Overall Passing',]]

#Setting new column and assigning bins
spending_summary["Spending Ranges (Per Student)"] = pd.cut(school_area_summary["Per Student Budget"],
                                                      spending_bins, labels = bin_labels, include_lowest = True)
spending_summary = spending_summary.groupby("Spending Ranges (Per Student)").mean()

spending_summary
                   
                                        

Unnamed: 0_level_0,Average Maths Score,Average Reading Score,% Passing Maths,% Passing Reading,% Overall Passing
Spending Ranges (Per Student),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
<$585,71.364587,70.716577,88.835926,86.390517,76.721458
$585-630,72.065868,71.031297,91.518824,87.292423,79.876293
$630-645,69.854807,69.838814,84.686139,83.763585,71.004977
$645-680,68.884391,69.045403,81.56847,81.769716,66.756253


## Scores by School Size
The analysis suggests that smaller and medium-sized schools tend to outperform larger schools in therms of the average scores and passing rates. The data supports the idea that school size can have a significant impact on academic success; this could be due to different ratio of teachers per students where in a smaller-sized institution it is more favorable compared to a larger school.

#### Small-sized Schools - demonstrate a positive correlation between size and academic performance, with focus on individual attention and resources.

    Maths Score: 72.34
    Reading Score: 71.64
    % Overall Passing: 79.07% (highest passing rate)

#### Medium-sized Schools - strike a balance between individualised attention and the ability to provide a diverse range of resources.

    Maths Score: 71.42
    Reading Score: 70.72
    % Overall Passing: 78.04%

#### Larger-sized Schools - might face challenges in resource distribution and personalized attention, impacting academic outcomes.

    Maths Score: 69.75
    Reading Score: 69.58
    % Overall Passing: 70.39% (lowest passing rate)

As the size of the school increases the performance of students in every aspect (math scores, reading scores and overall performance) declines.


In [30]:
# Establishing the bins
spending_bins_by_size = [0, 1000, 2000, 5000]
bin_labels_by_size = ["Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"]

In [31]:
#Creating a copy of dataframe
size_summary = school_area_summary.loc[:,['Average Maths Score',
                                         'Average Reading Score',
                                         '% Passing Maths',
                                         '% Passing Reading',
                                         '% Overall Passing',]]

#Setting new column and assigning bins
size_summary["School Size"] = pd.cut(school_area_summary["Total Students"],
                                                      spending_bins_by_size, labels = bin_labels_by_size, include_lowest = True)
size_summary = size_summary.groupby("School Size").mean()

size_summary


Unnamed: 0_level_0,Average Maths Score,Average Reading Score,% Passing Maths,% Passing Reading,% Overall Passing
School Size,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Small (<1000),72.335748,71.636864,90.806867,87.557513,79.066348
Medium (1000-2000),71.42165,70.720164,89.84656,86.714149,78.039785
Large (2000-5000),69.751809,69.576052,84.252804,83.301185,70.293507


## Scores by School Type
#### Government Schools
* Government Schools have a lower math score (69.83) and reading score (69.68) compared to independent schools.
* The Overall passing rate is equal to 70.70%
#### Independent Schools
* Independent schools show a higher math score(71.37) and reading score (70.72) compared to government schools.
* The Overall passing rate is equal to 76.97%

In summary, the analysis indicates that independent schools outperform government schools in terms of average scores and overall passing percentages. The data suggests that the type of school has an impact on academic performance and plays a role in success of students, with independent schools displaying a notable advantage in academic achievements.

In [32]:
#Creating a copy of dataframe
type_summary = school_area_summary[['School Type',
                                        'Average Maths Score',
                                         'Average Reading Score',
                                         '% Passing Maths',
                                         '% Passing Reading',
                                         '% Overall Passing',]]

#Filtering through data and grouping the results
type_summary = type_summary.groupby('School Type').mean()

type_summary

Unnamed: 0_level_0,Average Maths Score,Average Reading Score,% Passing Maths,% Passing Reading,% Overall Passing
School Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Government,69.834806,69.675929,84.462375,83.587562,70.698993
Independent,71.368822,70.718933,89.204043,86.247789,76.97334
