Data Sources
Rental Price Data as Inferred by value of lodged bond (typically 4 weeks rent)
Monthly Data Decemeber 2019
https://www.fairtrading.nsw.gov.au/about-fair-trading/data-and-statistics/rental-bond-data
RentalBond_Lodgements_December_2019.xlsx

Annual 2019 Data for NSW Postcodes
https://www.fairtrading.nsw.gov.au/about-fair-trading/data-and-statistics/rental-bond-data
RentalBond_Lodgements_Year2019.xlsx

Post Code Area Data
https://data.mongabay.com/igapo/australia/postcodes/sydney-numeric.html

Rent-Tables-Jun-Quarter-2019.xlsx

Postcode API
https://postcodeapi.com.au/
curl http://v0.postcodeapi.com.au/suburbs/3066.json -H 'Accept: application/json; indent=4'


In [1]:
# Install Libraries
!conda install -c conda-forge folium --yes
!conda install -c conda-forge wget --yes

Collecting package metadata (current_repodata.json): done
Solving environment: - 
  - anaconda/osx-64::ca-certificates-2019.8.28-0, anaconda/osx-64::certifi-2019.9.11-py37_0, anaconda/osx-64::openssl-1.1.1d-h1de35cc_2
  - anaconda/osx-64::ca-certificates-2019.8.28-0, anaconda/osx-64::openssl-1.1.1d-h1de35cc_2, defaults/osx-64::certifi-2019.9.11-py37_0
  - anaconda/osx-64::ca-certificates-2019.8.28-0, anaconda/osx-64::certifi-2019.9.11-py37_0, defaults/osx-64::openssl-1.1.1d-h1de35cc_2
  - anaconda/osx-64::ca-certificates-2019.8.28-0, defaults/osx-64::certifi-2019.9.11-py37_0, defaults/osx-64::openssl-1.1.1d-h1de35cc_2
  - anaconda/osx-64::certifi-2019.9.11-py37_0, defaults/osx-64::ca-certificates-2019.8.28-0, defaults/osx-64::openssl-1.1.1d-h1de35cc_2
  - defaults/osx-64::ca-certificates-2019.8.28-0, defaults/osx-64::certifi-2019.9.11-py37_0, defaults/osx-64::openssl-1.1.1d-h1de35cc_2
  - anaconda/osx-64::certifi-2019.9.11-py37_0, anaconda/osx-64::openssl-1.1.1d-h1de35cc_2, defaults/os

In [2]:
# Import Libraries
import pandas as pd
import requests
import io
from bs4 import BeautifulSoup

Scrape postcode data from the mongabay site and build a data frame keyed on postal code
Note as a postcode can have many different neighbourhoods we concatenate these into a
list to end up with a data frame of post codes and a description of the composite neighbourhoods

In [3]:
# Scrape Post Codes from mongabay.com
page = requests.get("https://data.mongabay.com/igapo/australia/postcodes/sydney-numeric.html")

#Get the content from the page
soup = BeautifulSoup(page.content, "html.parser")

# Post codes are in a table structure which is not identified 
# by an id or any particular CSS structure. Hence we address
# it as the tenth table on the page.
table = soup.find_all("table")[10]

# Within the table all postcodes are textually separated in a single row
# which is the third row of the table
# get the row text
text = table.find_all("td")[2].get_text()

# Convert the text to a list by splitting on newline
data = text.split("\n")

# Trim the blank first and last element which is caused by additional 
# newline markers in the data
data = data[1:len(data)-1]

# Convert postcode data to dataframe with columns of postcode and description
df_postcode = pd.DataFrame(data, columns=["data"])

# Split the data column into two seperate columns (postcode and description) on the space seperator 
df_postcode[["Postcode", "Area"]]= df_postcode["data"].str.split(" ",n=1,expand=True)

# Drop the original data columnn now we have perfromed the split 
df_postcode.drop("data", axis=1, inplace=True)

# Set Index to the PostCode column 
df_postcode = df_postcode.set_index("Postcode")

# Lastly join the data description so we have a single key for each postal district
df_postcode = df_postcode.groupby(["Postcode"])["Area"].apply(','.join).reset_index()

df_postcode.head()


Unnamed: 0,Postcode,Area
0,2000,"Australia Square Post Office,Circular Quay,Cla..."
1,2006,Sydney University
2,2007,"Broadway,Ultimo"
3,2008,"Chippendale,Darlington"
4,2009,Pyrmont


The mongabay website is not up all of the time so I have created a local file version of the data
The code below loads the file and performs the same transforms to create a dataframe conforming to
the same definition as above

In [4]:
# Open a local postcode file
f = open("Postal Codes.csv", "r")
data = f.readlines()
f.close()

# Remove "\n" characters from the description 
data = [item.replace("\n","") for item in data]

# Convert postcode data to dataframe with columns of postcode and description
df_postcode = pd.DataFrame(data, columns=["data"])

# Split the data column into two seperate columns (postcode and description) on the space seperator 
df_postcode[["Postcode", "Area"]]= df_postcode["data"].str.split(" ",n=1,expand=True)

# Drop the original data columnn now we have perfromed the split 
df_postcode.drop("data", axis=1, inplace=True)

# Set Index to the PostCode column 
df_postcode = df_postcode.set_index("Postcode")

# Lastly join the data description so we have a single key for each postal district
df_postcode = df_postcode.groupby(["Postcode"])["Area"].apply(','.join).reset_index()

df_postcode.head()

Unnamed: 0,Postcode,Area
0,2000,"Australia Square Post Office,Circular Quay,Cla..."
1,2006,Sydney University
2,2007,"Broadway,Ultimo"
3,2008,"Chippendale,Darlington"
4,2009,Pyrmont


In [5]:
df_rentalprice = pd.read_excel("RentalBond_Lodgements_Year_2019.xlsx", skiprows=2, header=0, sheet_name=0)
df_rentalprice.shape

(348925, 5)

In [6]:
df_rentalprice.head()

Unnamed: 0,Lodgement Date,Postcode,Dwelling Type,Bedrooms,Weekly Rent
0,2019-03-01,2000,F,0,580
1,2019-03-06,2000,F,0,595
2,2019-03-04,2000,F,0,500
3,2019-03-05,2000,F,0,520
4,2019-03-05,2000,F,0,550


In [7]:
# Decode the Category Data With a Description and Add a New Column
df_rentalprice["Dwelling Type"].value_counts()
dwelling_dicts = {"F":"Flat", "H":"House", "T":"Townhouse", "U":"Unknown", "O":"Other"}
df_rentalprice["Dwelling Desc"] = df_rentalprice["Dwelling Type"].apply(lambda x: dwelling_dicts[x])

# Move the dwelling type columns next to each other
col_order = df_rentalprice.columns[0:3].to_list()
col_order = col_order + df_rentalprice.columns[-1:].to_list()
col_order = col_order + df_rentalprice.columns[3:-1].to_list()
df_rentalprice = df_rentalprice[col_order]

# Check the result
df_rentalprice.head()


Unnamed: 0,Lodgement Date,Postcode,Dwelling Type,Dwelling Desc,Bedrooms,Weekly Rent
0,2019-03-01,2000,F,Flat,0,580
1,2019-03-06,2000,F,Flat,0,595
2,2019-03-04,2000,F,Flat,0,500
3,2019-03-05,2000,F,Flat,0,520
4,2019-03-05,2000,F,Flat,0,550


In [8]:
# Drop Rows where the Rental Price is Unspecified = "U" as these add no useful data 
# We are only interested in the rental price
rows = df_rentalprice[df_rentalprice["Weekly Rent"]=="U"].index
df_rentalprice.drop(rows, axis=0, inplace=True)
df_rentalprice.shape

# Drop Rows where the Bedrooms is Unspecified = "U" as these add no useful data 
rows = df_rentalprice[df_rentalprice["Bedrooms"]=="U"].index
df_rentalprice.drop(rows, axis=0, inplace=True)
df_rentalprice.shape


(325448, 6)

In [9]:
# Change the Data Type of the Weekly Rent Column
df_rentalprice[["Weekly Rent"]] = df_rentalprice[["Weekly Rent"]].astype("int64")

# Change the Data Type of the Bedrooms Column
df_rentalprice[["Bedrooms"]] = df_rentalprice[["Bedrooms"]].astype("int64")

# Change the Data Type of the Postcode Column
df_rentalprice[["Postcode"]] = df_rentalprice[["Postcode"]].astype("str")

df_rentalprice.dtypes

Lodgement Date    datetime64[ns]
Postcode                  object
Dwelling Type             object
Dwelling Desc             object
Bedrooms                   int64
Weekly Rent                int64
dtype: object

In [10]:
# Merge the rental price data with the postcode data
# Use an inner join which will discard rows that are not listed in postcode lookuop and the rental price data
# Justification postcode lookup covers greater sydney which incorporates areas that woukd be far away
df_rentalprice = pd.merge(df_rentalprice, df_postcode, left_on="Postcode", right_on="Postcode", how="inner")
df_rentalprice.isnull().sum()

Lodgement Date    0
Postcode          0
Dwelling Type     0
Dwelling Desc     0
Bedrooms          0
Weekly Rent       0
Area              0
dtype: int64

In [97]:
df_three_bed =  df_rentalprice[df_rentalprice["Bedrooms"]==3]
grps= df_three_bed.groupby("Postcode")[["Weekly Rent"]]
df_grps=grps.describe()
df_grps=df_grps["Weekly Rent"].sort_values(by="50%", ascending=False)
df_grps.head(20)
df_grps.loc[["2000","2092","2093","2094","2095","2096","2097"]]


Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
Postcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2000,269.0,1485.95539,637.489115,290.0,1100.0,1300.0,1695.0,5000.0
2092,51.0,1020.490196,478.484331,560.0,805.0,950.0,1087.5,3980.0
2093,150.0,1023.3,305.466872,520.0,820.0,950.0,1100.0,2350.0
2094,92.0,1161.032609,326.986354,670.0,942.5,1087.5,1328.75,2200.0
2095,269.0,1319.583643,428.761603,650.0,1050.0,1200.0,1500.0,3454.0
2096,115.0,1107.165217,331.95093,150.0,905.0,1025.0,1275.0,2800.0
2097,91.0,884.395604,142.349014,560.0,775.0,895.0,975.0,1250.0


Load the Schools Datasets
These are obtained from the Australian Curriculum Assessment and Reporting Austhority
School Profile : https://www.acara.edu.au/docs/default-source/default-document-library/school-profile-2018.xlsx?sfvrsn=0
School Location: https://www.acara.edu.au/docs/default-source/default-document-library/school-locations-20189cf512404c94637ead88ff00003e0139.xlsx?sfvrsn=0

In [98]:
!wget -O acara-school-profile-2018.xlsx https://www.acara.edu.au/docs/default-source/default-document-library/school-profile-2018.xlsx?sfvrsn=0
!wget -O acara-school-locs-2018.xlsx https://www.acara.edu.au/docs/default-source/default-document-library/school-locations-20189cf512404c94637ead88ff00003e0139.xlsx?sfvrsn=0

--2020-01-29 18:52:20--  https://www.acara.edu.au/docs/default-source/default-document-library/school-profile-2018.xlsx?sfvrsn=0
Resolving www.acara.edu.au (www.acara.edu.au)... 2606:4700:10::6814:ed18, 2606:4700:10::6814:ec18, 104.20.237.24, ...
Connecting to www.acara.edu.au (www.acara.edu.au)|2606:4700:10::6814:ed18|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1696292 (1.6M) [application/vnd.openxmlformats-officedocument.spreadsheetml.sheet]
Saving to: ‘acara-school-profile-2018.xlsx’


2020-01-29 18:52:20 (2.86 MB/s) - ‘acara-school-profile-2018.xlsx’ saved [1696292/1696292]

--2020-01-29 18:52:21--  https://www.acara.edu.au/docs/default-source/default-document-library/school-locations-20189cf512404c94637ead88ff00003e0139.xlsx?sfvrsn=0
Resolving www.acara.edu.au (www.acara.edu.au)... 2606:4700:10::6814:ed18, 2606:4700:10::6814:ec18, 104.20.237.24, ...
Connecting to www.acara.edu.au (www.acara.edu.au)|2606:4700:10::6814:ed18|:443... connected.
HTTP reque

In [99]:
# Read the School Profile Data Into a dataframe
df_schools = pd.read_excel("acara-school-profile-2018.xlsx", sheet_name=1)
df_schools.head()

Unnamed: 0,Calendar Year,ACARA SML ID,AGE ID,School Name,Suburb,State,Postcode,School Sector,School Type,Campus Type,...,Teaching Staff,Full Time Equivalent Teaching Staff,Non-Teaching Staff,Full Time Equivalent Non-Teaching Staff,Total Enrolments,Girls Enrolments,Boys Enrolments,Full Time Equivalent Enrolments,Indigenous Enrolments (%),Language Background Other Than English (%)
0,2018,40000,3.0,Corpus Christi Catholic School,Bellerive,TAS,7018,Catholic,Primary,School Single Entity,...,29.0,20.8,18.0,10.3,380.0,179.0,201.0,380.0,2.0,3.0
1,2018,40001,4.0,Fahan School,Sandy Bay,TAS,7005,Independent,Combined,School Single Entity,...,41.0,35.0,27.0,19.0,390.0,390.0,0.0,390.0,1.0,7.0
2,2018,40002,5.0,Geneva Christian College,Latrobe,TAS,7307,Independent,Combined,School Single Entity,...,23.0,16.0,29.0,15.6,208.0,89.0,119.0,208.0,6.0,5.0
3,2018,40003,7.0,Holy Rosary Catholic School,Claremont,TAS,7011,Catholic,Primary,School Single Entity,...,28.0,23.5,24.0,11.3,399.0,176.0,223.0,399.0,5.0,1.0
4,2018,40004,9.0,Immaculate Heart of Mary Catholic School,Lenah Valley,TAS,7008,Catholic,Primary,School Single Entity,...,15.0,11.3,10.0,4.8,200.0,107.0,93.0,200.0,11.0,15.0


In [100]:
# Read the school locations data into a dataframe
df_school_locs = pd.read_excel("acara-school-locs-2018.xlsx", sheet_name=1)
df_school_locs.head()

Unnamed: 0,Calendar Year,ACARA SML ID,AGE ID,School Name,Suburb,State,Postcode,School Sector,School Type,Campus Type,...,Latitude,Longitude,Statistical Area 1,Statistical Area 2,Name of Statistical Area 2,Statistical Area 3,Name of Statistical Area 3,Statistical Area 4,Name of Statistical Area 4,ABS Remoteness Area
0,2018,40000,3.0,Corpus Christi Catholic School,BELLERIVE,TAS,7018,Catholic,Primary,School Single Entity,...,-42.871256,147.371473,6100410,61004,Bellerive - Rosny,60102,Hobart - North East,601,Hobart,Inner Regional Australia
1,2018,40001,4.0,Fahan School,SANDY BAY,TAS,7005,Independent,Combined,School Single Entity,...,-42.916158,147.352764,6103105,61031,Sandy Bay,60105,Hobart Inner,601,Hobart,Inner Regional Australia
2,2018,40002,5.0,Geneva Christian College,LATROBE,TAS,7307,Independent,Combined,School Single Entity,...,-41.226741,146.438726,6108720,61087,Latrobe,60402,Devonport,604,West and North West,Outer Regional Australia
3,2018,40003,7.0,Holy Rosary Catholic School,CLAREMONT,TAS,7011,Catholic,Primary,School Single Entity,...,-42.789375,147.248306,6101510,61015,Claremont (Tas.),60103,Hobart - North West,601,Hobart,Inner Regional Australia
4,2018,40004,9.0,Immaculate Heart of Mary Catholic School,LENAH VALLEY,TAS,7008,Catholic,Primary,School Single Entity,...,-42.865543,147.290159,6102812,61028,Lenah Valley - Mount Stuart,60105,Hobart Inner,601,Hobart,Inner Regional Australia


In [101]:
df_schools.columns

Index(['Calendar Year', 'ACARA SML ID', 'AGE ID', 'School Name', 'Suburb',
       'State', 'Postcode', 'School Sector', 'School Type', 'Campus Type',
       'Rolled Reporting Description', 'School URL', 'Governing Body',
       'Governing Body URL', 'Year Range', 'Geolocation', 'ICSEA',
       'Bottom SEA Quarter (%)', 'Lower Middle SEA Quarter (%)',
       'Upper Middle SEA Quarter (%)', 'Top SEA Quarter (%)', 'Teaching Staff',
       'Full Time Equivalent Teaching Staff', 'Non-Teaching Staff',
       'Full Time Equivalent Non-Teaching Staff', 'Total Enrolments',
       'Girls Enrolments', 'Boys Enrolments',
       'Full Time Equivalent Enrolments', 'Indigenous Enrolments (%)',
       'Language Background Other Than English (%)'],
      dtype='object')

In [102]:
df_school_locs.columns

Index(['Calendar Year', 'ACARA SML ID', 'AGE ID', 'School Name', 'Suburb',
       'State', 'Postcode', 'School Sector', 'School Type', 'Campus Type',
       'Rolled Reporting Description', 'Latitude', 'Longitude',
       'Statistical Area 1', 'Statistical Area 2',
       'Name of Statistical Area 2', 'Statistical Area 3',
       'Name of Statistical Area 3', 'Statistical Area 4',
       'Name of Statistical Area 4', 'ABS Remoteness Area'],
      dtype='object')

In [103]:
# Check Each Row Has A unique Identifier
print(df_schools.shape)
print(len(df_schools["ACARA SML ID"].unique()))

(9535, 31)
9535


In [104]:
print(df_school_locs.shape)
print(len(df_school_locs["ACARA SML ID"].unique()))

(10491, 21)
10491


In [105]:
# Join the Datasets Together To Create a Larger Set
# Left Join on School Profile

# First Get the Columns in the locations datafrane that are not also repersented in the school profile datafrane
# Easiest way to do this is to push the columsn name data into sets and run a difference
# However we will need to add the common key back in to join the data
set_prf_cols = set(df_schools.columns)
set_loc_cols = set(df_school_locs.columns)
extract_cols = list(set_loc_cols.difference(set_prf_cols))
extract_cols.sort()
extract_cols.insert(0, "ACARA SML ID")
extract_cols


['ACARA SML ID',
 'ABS Remoteness Area',
 'Latitude',
 'Longitude',
 'Name of Statistical Area 2',
 'Name of Statistical Area 3',
 'Name of Statistical Area 4',
 'Statistical Area 1',
 'Statistical Area 2',
 'Statistical Area 3',
 'Statistical Area 4']

In [106]:
df_schools_merged = pd.merge(df_schools, df_school_locs[extract_cols], left_on="ACARA SML ID", right_on="ACARA SML ID", how="left")
df_schools_merged.head()

Unnamed: 0,Calendar Year,ACARA SML ID,AGE ID,School Name,Suburb,State,Postcode,School Sector,School Type,Campus Type,...,ABS Remoteness Area,Latitude,Longitude,Name of Statistical Area 2,Name of Statistical Area 3,Name of Statistical Area 4,Statistical Area 1,Statistical Area 2,Statistical Area 3,Statistical Area 4
0,2018,40000,3.0,Corpus Christi Catholic School,Bellerive,TAS,7018,Catholic,Primary,School Single Entity,...,Inner Regional Australia,-42.871256,147.371473,Bellerive - Rosny,Hobart - North East,Hobart,6100410,61004,60102,601
1,2018,40001,4.0,Fahan School,Sandy Bay,TAS,7005,Independent,Combined,School Single Entity,...,Inner Regional Australia,-42.916158,147.352764,Sandy Bay,Hobart Inner,Hobart,6103105,61031,60105,601
2,2018,40002,5.0,Geneva Christian College,Latrobe,TAS,7307,Independent,Combined,School Single Entity,...,Outer Regional Australia,-41.226741,146.438726,Latrobe,Devonport,West and North West,6108720,61087,60402,604
3,2018,40003,7.0,Holy Rosary Catholic School,Claremont,TAS,7011,Catholic,Primary,School Single Entity,...,Inner Regional Australia,-42.789375,147.248306,Claremont (Tas.),Hobart - North West,Hobart,6101510,61015,60103,601
4,2018,40004,9.0,Immaculate Heart of Mary Catholic School,Lenah Valley,TAS,7008,Catholic,Primary,School Single Entity,...,Inner Regional Australia,-42.865543,147.290159,Lenah Valley - Mount Stuart,Hobart Inner,Hobart,6102812,61028,60105,601


In [107]:
df_schools_nsw = df_schools_merged[df_schools_merged["State"]=="NSW"]
df_schools_nsw.head()

Unnamed: 0,Calendar Year,ACARA SML ID,AGE ID,School Name,Suburb,State,Postcode,School Sector,School Type,Campus Type,...,ABS Remoteness Area,Latitude,Longitude,Name of Statistical Area 2,Name of Statistical Area 3,Name of Statistical Area 4,Statistical Area 1,Statistical Area 2,Statistical Area 3,Statistical Area 4
228,2018,40275,28510.0,Saint Mary MacKillop College Albury,Jindera,NSW,2642,Independent,Combined,School Single Entity,...,Inner Regional Australia,-35.902879,146.826712,Albury Region,Albury,Murray,1117503,11175,10901,109
229,2018,40276,5298.0,St Dominic Savio School,Rockdale,NSW,2216,Independent,Primary,School Single Entity,...,Major Cities of Australia,-33.950985,151.149577,Rockdale - Banksia,Kogarah - Rockdale,Sydney - Inner South West,1138116,11381,11904,119
230,2018,40277,26768.0,Saint Mary MacKillop Colleges Limited,Wagga Wagga,NSW,2650,Independent,Combined,School Single Entity,...,Inner Regional Australia,-35.128967,147.347981,Wagga Wagga - South,Wagga Wagga,Riverina,1126940,11269,11303,113
268,2018,40366,1409.0,Kinma School,Terrey Hills,NSW,2084,Independent,Primary,School Single Entity,...,Major Cities of Australia,-33.686607,151.216969,Terrey Hills - Duffys Forest,Warringah,Sydney - Northern Beaches,1143210,11432,12203,122
269,2018,40367,1411.0,Knox Grammar School,Wahroonga,NSW,2076,Independent,Combined,School Head Campus,...,Major Cities of Australia,-33.723114,151.119418,Wahroonga (East) - Warrawee,Ku-ring-gai,Sydney - North Sydney and Hornsby,1141211,11412,12103,121


In [108]:
# Understand some of the data classifications
print(df_schools_nsw.shape)
print(df_schools_nsw["School Sector"].value_counts())
print(df_schools_nsw["School Type"].value_counts())
print(df_schools_nsw.shape)


(3155, 41)
Government     2206
Catholic        550
Independent     399
Name: School Sector, dtype: int64
Primary      2093
Secondary     544
Combined      321
Special       197
Name: School Type, dtype: int64
(3155, 41)


In [109]:
# Define A filter for Primary Schools or combined schools
primary_combined = df_schools_nsw["School Type"].isin(["Primary","Combined"])

# Define a filter for secular schools by excluding denominational 
secular = df_schools_nsw["School Sector"]!="Catholic"

#Apply the Filters
df_primary_secular_nsw=df_schools_nsw[primary_combined & secular]

df_primary_secular_nsw.head()

Unnamed: 0,Calendar Year,ACARA SML ID,AGE ID,School Name,Suburb,State,Postcode,School Sector,School Type,Campus Type,...,ABS Remoteness Area,Latitude,Longitude,Name of Statistical Area 2,Name of Statistical Area 3,Name of Statistical Area 4,Statistical Area 1,Statistical Area 2,Statistical Area 3,Statistical Area 4
228,2018,40275,28510.0,Saint Mary MacKillop College Albury,Jindera,NSW,2642,Independent,Combined,School Single Entity,...,Inner Regional Australia,-35.902879,146.826712,Albury Region,Albury,Murray,1117503,11175,10901,109
229,2018,40276,5298.0,St Dominic Savio School,Rockdale,NSW,2216,Independent,Primary,School Single Entity,...,Major Cities of Australia,-33.950985,151.149577,Rockdale - Banksia,Kogarah - Rockdale,Sydney - Inner South West,1138116,11381,11904,119
230,2018,40277,26768.0,Saint Mary MacKillop Colleges Limited,Wagga Wagga,NSW,2650,Independent,Combined,School Single Entity,...,Inner Regional Australia,-35.128967,147.347981,Wagga Wagga - South,Wagga Wagga,Riverina,1126940,11269,11303,113
268,2018,40366,1409.0,Kinma School,Terrey Hills,NSW,2084,Independent,Primary,School Single Entity,...,Major Cities of Australia,-33.686607,151.216969,Terrey Hills - Duffys Forest,Warringah,Sydney - Northern Beaches,1143210,11432,12203,122
269,2018,40367,1411.0,Knox Grammar School,Wahroonga,NSW,2076,Independent,Combined,School Head Campus,...,Major Cities of Australia,-33.723114,151.119418,Wahroonga (East) - Warrawee,Ku-ring-gai,Sydney - North Sydney and Hornsby,1141211,11412,12103,121


In [114]:
# Group the Data ro get some basic counts per postcode area
grp = df_primary_secular_nsw.groupby(["Name of Statistical Area 4","Name of Statistical Area 3", "Name of Statistical Area 2", "Postcode"])["Postcode"].count()
grp.reset_index(name="Count")
grp.sort_values(ascending=False, inplace=True)
grp.head(50)

Name of Statistical Area 4              Name of Statistical Area 3        Name of Statistical Area 2          Postcode
Richmond - Tweed                        Richmond Valley - Hinterland      Lismore Region                      2480        21
New England and North West              Armidale                          Armidale                            2350        11
Sydney - Inner South West               Bankstown                         Greenacre - Mount Lewis             2190         9
Richmond - Tweed                        Tweed Valley                      Murwillumbah Region                 2484         9
Mid North Coast                         Taree - Gloucester                Taree                               2430         9
Capital Region                          Goulburn - Mulwaree               Goulburn                            2580         8
Coffs Harbour - Grafton                 Clarence Valley                   Grafton Region                      2460         8
Richmo

In [168]:
#Apply the Filters
postcodes = df_primary_secular_nsw["Postcode"].isin([2092,2093,2094,2095,2096,2097,2098,2099])
a = df_primary_secular_nsw[postcodes]
a.head()

Unnamed: 0,Calendar Year,ACARA SML ID,AGE ID,School Name,Suburb,State,Postcode,School Sector,School Type,Campus Type,...,ABS Remoteness Area,Latitude,Longitude,Name of Statistical Area 2,Name of Statistical Area 3,Name of Statistical Area 4,Statistical Area 1,Statistical Area 2,Statistical Area 3,Statistical Area 4
499,2018,40845,28880.0,Karuna Montessori School,Narraweena,NSW,2099,Independent,Combined,School Single Entity,...,Major Cities of Australia,-33.7478,151.276,Beacon Hill - Narraweena,Warringah,Sydney - Northern Beaches,1142416,11424,12203,122
503,2018,40850,28974.0,Farmhouse Montessori School,North Balgowlah,NSW,2093,Independent,Combined,School Head Campus,...,Major Cities of Australia,-33.783352,151.246409,Manly Vale - Allambie Heights,Warringah,Sydney - Northern Beaches,1143031,11430,12203,122
731,2018,41197,7691.0,Seaforth Public School,Seaforth,NSW,2092,Government,Primary,School Single Entity,...,Major Cities of Australia,-33.793262,151.251235,Balgowlah - Clontarf - Seaforth,Manly,Sydney - Northern Beaches,1141822,11418,12201,122
740,2018,41206,7703.0,Dee Why Public School,Dee Why,NSW,2099,Government,Primary,School Single Entity,...,Major Cities of Australia,-33.749025,151.285209,Dee Why - North Curl Curl,Warringah,Sydney - Northern Beaches,1142640,11426,12203,122
751,2018,41217,7698.0,Harbord Public School,Freshwater,NSW,2096,Government,Primary,School Single Entity,...,Major Cities of Australia,-33.772662,151.28579,Freshwater - Brookvale,Warringah,Sydney - Northern Beaches,1142907,11429,12203,122
