# Australia Primary and Secondary Education Visualisation Project

## Summary
Recently I was fortunate enough to become a father of a beautiful girl and as all parents do during the late hours and no sleep my mind started wandering to the topic of schooling. 
It is a very important topic that one can never begin to ponder too early, but I digress. 

The question was *"what school do we send her to?"* which transformed into another question, and another, so on and so forth.
The questions kept rolling, changing, becoming more complex, and to answer them I needed a solution.

Enter the **Australian Primary and Secondary Education Visualisation Project** or APSEV for short. 

The aim of the APSEV is to visualise a comparison of schools in Australia based on a number of metrics and to visualise these using Python and other tools within Jupyterlab. 

## Data

### Source

The data we will be using for this project is supplied by ACARA, the **Australian Curriculum, Assesment and Reporting Authority**.  

They are the authoritative source of advice and delivery of the national curriculum, assessment and reporting for all Australia education ministers.  They are also the developers of the Australian Curriculum which is utilised throughout schools across Australia and delivered to students in all year levels. 

More information about ACARA can be found **here:** <https://www.acara.edu.au/about-us>

ACARA have a program known as the *Data Access Program* which has a number of datasets and information available for various research and reporting use cases.

Due to the sensitive nature of content surrounding student outcomes, data and reporting, a vast majority of the data is secured behind an application based request system.  To gain access to NAPLAN data for an example an application form would need to be filled out and submitted to ACARA where they will verify the validity, securtiy and nature of the project before granting access. 

This sensitive nature of content is a reoccuring theme throughout education in Australia and rightly so. 

Moving forward there are some data sets available to the general public, one of which we will be using for our APSEV project. 

### School Profile Report

We will be utilising the School Profile report (2008-2019) offered by ACARA on their Data Access Program web page **here:** <https://www.acara.edu.au/contact-us/acara-data-access>

This is a vast collection of information from all schools in Australia between 2008-2019 and will provide a large amount of information for us to work with. 

***Link to specific dataset used can be found here:*** <https://www.acara.edu.au/docs/default-source/default-document-library/school-profile-2008-2019e35613404c94637ead88ff00003e0139.xlsx?sfvrsn=6dc27007_0>


## ICSEA - What is it?

Before continuing there is a measure that needs to be clarified, and that is the ICSEA which stands for the Index of Community Socio-educational Advantage.  This is a scale that enables users to make a comparison between a selected school and all students with a similar background based on the level of educational advantage or disadvantage that the students bring to their academic studies.  

It is not a rating of the school, staff or teaching programs nor is it a measure that describes student performance in testing programs. 

For more information you can read the full guide here: <https://www.myschool.edu.au/media/1820/guide-to-understanding-icsea-values.pdf>

## Why are we using it?

For our project we will be using it to attempt to better understand learning outcomes provided by a school. ICSEA Values are calculated on a scale that has a median of 1000.  They range from about 500 to around 1300.  These numbers scale from schools with extremely disadvantaged student backgrounds to extremely advantaged student backgrounds. 

## Did it work?

**Spoilers** it didnt.  As a measure of academic performance it shouldnt be utilised as all we ended up doing was comparing socio-economic status between schools which instead of comparing learning outcomes we are on the verge of pitting advantaged schools against disadvantaged schools. 

We will continue the journey anyway so you can come along on the ride and see how we went. 


## Cleaning

The first thing we had to tackle was converting the dataset into a CSV from XLSX format and check its integrity.  Using the pandas inbluilt functionality of pd.read_excel and .to_csv we were able to do this pretty easy. 

from there we loaded in the newly created CSV and were ready to check it out. We loaded it into a dataframe and had a look: ![alt](pictures/1.jpg)

We began exploring the metrics available to us
![alt](pictures/2.jpg)

There were a few columns we knew we did not need instantly, so we got rid of them and were left with the following (count on the right are the NaNs)

![alt](pictures/3.jpg)

Moving forward we primarily dealt with the NaN values starting at the most impacting to least impacting and along the way we found a few more metrics we didnt need. 

Arguably there were some we didnt use for our project such as the School Name, Governing Body etc but these will be valuable for future visualisations so we left them in there. 

We interogated some of the data and ensured the data type was correct whether catagorical or numerical etc

![alt](pictures/4.jpg)

Finally reaching equalibrium the dataset was clean and we were left with the following: 

![alt](pictures/5.jpg)

On to the analysis!

# Analysis

## Questions

The questions we were interested in asnwering were as follows: 

### - Question 1 - Public vs Private, is there a measurable difference based on ICSEA Score?

Interestingly enough when we first approached this question we found out that there was more than public and private sectors,  instead we found 4 different sectors known as **Government, Catholic and Independent**

so the question should really be depicted as: 

### - Question 1.1 - Public vs Private vs Independent, is there a measurable difference based on ICSEA score?

At face value we can notice some pretty amazing things, the difference was quite vast when looking from a distance, however the ICSEA Score is much tighter than it first appears. To further that this encompassed primary, high and combined institutes which was rather inaccurate to being with.

![alt](pictures/ICSEA_overview.png)

We also identified that we could visualise a general score value that included every year.  So we found the median of our values across all years, and layed them out. 

![alt](pictures/Median_ICSEA_score_per_sector.png)

Looking at the above we can see the sectors perform very closely, while an arguement could be made on which school to go to based on ICSEA scores alone, it doesnt seem to paint the full picture.  


### - Question 2 - Does location or region matter to educational outcomes?

For this question we grouped the locations together by region.  Education regions are divided by Major Cities, Inner Regional Towns, Outer Regional Towns, Remote locations and lastly Very Remote locations. 

We gathered some descriptive statistics for these and noticed 2 distinct things: 

- The number of entries for Major Cities dwarfed all other geolocations

- The median didnt paint the whole picture

Because the counts would skew the representation a bit we decided to go with a stacked bar plot so we could view the Min, Mean, Median and Max at a glance.  

This is what we found: 

![alt](pictures/6.jpg)

Using ICSEA as a scoring method we found that on average major cities were not that far ahead, and very remote locations scored the highest max value (possible outlier) 

### - Question 3 - Which state performs better?
When talking about location, the region is only part of the picture.  We  also looked at how each State faired against the others. 

![alt](pictures/7.jpg)

First we grouped the states together to see which state had the most entries to better understand if there is any bias to our findings, which there was. 

![alt](pictures/8.jpg)

Looking at this the ACT, NT and TAS didnt have nearly as many entries as the big three; QLD, NSW and VIC

Tables are cool, but graphs are better so we had a look and confirmed our above findings. 

![alt](pictures/Number_of_data_entries_per_state.png)

Knowing this we then dove deeper into the visualisation and used a boxplot to see if we could seperate the school type and see if this had any effect on the overall scores.  

![alt](pictures/ICSEA_Score_by_School_Type_across_Australia.png)

By seperating the school types out we could see some interesting outliers, and some interesting consistancies one of note is the performance of Combined schools across most states.  

The winner was the **ACT**.

Each state is fairly close together however we saw that the ACT edges forward.  This didnt paint the full picture however as if we remember above ACT, NT and TAS had the least amount of entries so this will was definitly skewed. 

Furthermore as a QLD resident moving to these locations wasnt feesable and this had to be taken into account. 

Staying on the east coast was the way to go so we compared the top three contenders; QLD, NSW and VIC.

![alt](pictures/ICSEA_Score_by_School_Type_across_QLD_NSW_VIC.png)

Interestingly enough although being much closer VIC edgee out just a little bit, with Combined schools (P-12) being the highest performing school type overall. 

We then dove further to see how each School Sector performed within Victoria (I love box plots by the way, can you tell?)

![alt](pictures/ICSEA_Score_by_Sector_and_School_Type_in_VIC.png)

We noted the outliers but for the most part could asertain the information we needed. Judging by this we can see that combined Catholic schools in Victoria overall perform better when judging by the ICSEA score. 


### - Question 4 - Does gender play a role in learning outcomes?

While gender should not be a defining factor in deciding a school, we were interested to see if it has a measurable difference based on the ICSEA score. 

Looking at the above information, there appeaed to be a trend where the more students that were enrolled regardless of gender, the greater on average the ICSEA score became.  This looks to reach an upper threshold with diminishing returns afterwards. 

![alt](pictures/9.jpg)

*Check out the interactive version in the **Code** folder*


We were curious to see if gender played any part when comparing school sectors, to visualise it a touch better the x and y axis were flipped which made it easier to understand. 

![alt](pictures/10.jpg)

Interestingly enough the trend between sectors for both male and female students appeared to be pretty consistant.  A few outliers here and there but for the most part we could say that gender does not conclusively result in different learning outcomes when measuring by the ICSEA Score. 


### - Question 5 - Which school would I choose for my Daughter?

If we are going by the above data then we would choose a Catholic Combined school in a NSW Major City.  The caviet to this is that an ICSEA score does not paint a picture on how good a school will be for a child.  There are many other factors that come into play, travel time, extra curricular offerings, beleifs, affordability etc.  The list goes on. 

While not a conclusive way to show "Which school is best" it is interesting to dive into various aspects of Australian Schooling between 2008 - 2019.  

We have barely scratched the surface however, and in the future I would love to dive further in to education in Australia and see what we can visually scuplt in the future

## Conclusion

Even though we found some very interesting results from our data visualisation, the fact of the matter is that using a single score to judge how good a school performs is a terrible idea.  What a school offers to a child goes beyond learning outcomes such as extra curricular activities, travel time, public transport, friends attending the same school, primary focus of the school etc. 

At face value the decision would be "A Combined Catholic School located in Victoria", but that is not a definitive answer.  

The correct answer can be discoverd with time, for now I will enjoy the early formative years. 
