## Group Details

Group Name: ACG

Project Title: Best Suburb in Christchurch 

## Student Details

Student Name: Annabelle Bos

Class: DATA422 - Data Wrangling

Student ID: 61694371

## Step 1
Load the necessary libraries.

In [1]:
library(tidyverse)
library(dplyr)
library(skimr)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.5     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.4     [32m✔[39m [34mdplyr  [39m 1.0.7
[32m✔[39m [34mtidyr  [39m 1.1.3     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.0.0     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



## Step 2
Obtain the directory of New Zealand schools by going to this website: https://www.educationcounts.govt.nz/directories/list-of-nz-schools#
Click the "Download the whole Directory" button. There does not seem to be any way to automate this function as the link behind the button is the same link as for the page itself. Attempting to use this to read the data directly results in the web page data being pulled through instead. Therefore, the user must download the directory first and ensure it is saved as directory.csv in the local folder to the programme.

The first 15 lines are skipped as they are information about the dataset rather then the data itself. There is also a blank row between the column headers and the data however, this filters out when the data is filtered to just the Christchurch schools.

In [2]:
schools_df <- read_csv(file = "directory.csv", skip = 15) 

schools_df

[1m[1mRows: [1m[22m[34m[34m2559[34m[39m [1m[1mColumns: [1m[22m[34m[34m42[34m[39m

[36m──[39m [1m[1mColumn specification[1m[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (28): School Name, Telephone, Fax, Email^, Principal*, School Website, S...
[32mdbl[39m (14): School Number, Postal Code, Community of Learning ID, Latitude, Lo...


[36mℹ[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.



School Number,School Name,Telephone,Fax,Email^,Principal*,School Website,Street,Suburb,Town / City,⋯,Isolation Index,Decile,Total School Roll,European / Pākehā,Māori,Pacific,Asian,MELAA,Other,International
<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,⋯,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
,,,,,,,,,,⋯,,,,,,,,,,
1,Te Kura o Te Kao,09 409 7813,,office@tekao.school.nz,Hemi Takawe,http://www.tekuraotekao.school.nz,6603 Far North Road,,Te Kao,⋯,4.18,2,42,1,41,0,0,0,0,0
2,Taipa Area School,09 406 0159,09 406 1096,office@taipa.school.nz,David Lowe,http://www.taipa.school.nz,578 State Highway 10,,Taipa,⋯,2.88,2,331,34,286,5,5,0,1,0
3,Kaitaia College,09 408 0190,09 408 0193,admin@kaitaiacollege.school.nz,Louise Anaru-Tangira,http://www.kaitaiacollege.school.nz/,53 Redan Road,,Kaitaia,⋯,2.74,2,829,164,636,14,11,1,3,0
4,Whangaroa College,09 405 0199,09 405 0288,office@whc.school.nz,Jack Anderson,http://www.whangaroacollege.school.nz,4157 State Highway 10,,Kaeo,⋯,2.61,1,128,20,108,0,0,0,0,0
5,Kerikeri High School,09 407 8916,09 407 9323,enquiries@kerikerihigh.ac.nz,Elizabeth Forgie,http://www.kerikerihigh.ac.nz,48 Hone Heke Road,,Kerikeri,⋯,2.41,6,1535,924,505,29,37,17,14,9
6,Broadwood Area School,09 409 5878,09 409 5877,admin@broadwood.school.nz,Danelle Tatana,http://broadwood.school.nz,1041 Broadwood Road,,Broadwood,⋯,2.87,1,94,6,86,1,1,0,0,0
7,Okaihau College,09 401 9030,09 401 9793,admin@okaihau-college.school.nz,Thomas Davison,http://www.okaihau-college.school.nz,58 Settlers Way,,Okaihau,⋯,2.34,2,372,103,254,3,9,1,2,0
8,Bay of Islands College,09 404 1055,09 404 1048,accounts@boic.school.nz,Edith Painting-Davis,http://www.boic.school.nz,1-9 Derrick Road,,Kawakawa,⋯,1.80,2,383,42,328,6,5,0,2,0
9,Northland College,09 401 3200,09 401 2378,admin@northlandcollege.school.nz,Duane Allen,http://www.northlandcollege.school.nz,62 Mangakahia Road,,Kaikohe,⋯,2.37,1,306,5,292,8,1,0,0,0


## Re-Naming Columns
As a group, it was decided that we should use a standard naming convention with underscores and lowercase letters. Therefore, the below is the renaming of all the columns to match this.

In [3]:
names(schools_df)[names(schools_df) == 'School Number' ] <- 'school_number'
names(schools_df)[names(schools_df) == 'School Name' ] <- 'school_name'
names(schools_df)[names(schools_df) == 'Telephone' ] <- 'telephone'
names(schools_df)[names(schools_df) == 'Fax' ] <- 'fax'
names(schools_df)[names(schools_df) == 'Email^'] <- 'email'
names(schools_df)[names(schools_df) == 'Principal*'] <- 'principal'
names(schools_df)[names(schools_df) == 'School Website'] <- 'school_website'
names(schools_df)[names(schools_df) == 'Street' ] <- 'street'
names(schools_df)[names(schools_df) == 'Suburb'] <- 'suburb'
names(schools_df)[names(schools_df) == 'Town / City'] <- 'town_city'
names(schools_df)[names(schools_df) == 'Postal Address'] <- 'postal_address'
names(schools_df)[names(schools_df) == 'Postal Address Suburb'] <- 'postal_suburb'
names(schools_df)[names(schools_df) == 'Postal Address City'] <- 'postal_city'
names(schools_df)[names(schools_df) == 'Postal Code'] <- 'postal_code'
names(schools_df)[names(schools_df) == 'Urban Area'] <- 'urban_area'
names(schools_df)[names(schools_df) == 'School Type'] <- 'school_type'
names(schools_df)[names(schools_df) == 'Definition' ] <- 'definition'
names(schools_df)[names(schools_df) == 'Authority' ] <- 'authority'
names(schools_df)[names(schools_df) == 'Donations' ] <- 'donations'
names(schools_df)[names(schools_df) == 'Gender of Students'] <- 'gender_of_students'
names(schools_df)[names(schools_df) == 'Territorial Authority'] <- 'territorial_authority'
names(schools_df)[names(schools_df) == 'Regional Council'] <- 'regional_council'
names(schools_df)[names(schools_df) == 'Ministry of Education Local Office'] <- 'moe_local_office'
names(schools_df)[names(schools_df) == 'Education Region'] <- 'education_region'
names(schools_df)[names(schools_df) == 'General Electorate'] <- 'general_electorate'
names(schools_df)[names(schools_df) == 'Māori Electorate'] <- 'māori_electorate'
names(schools_df)[names(schools_df) == 'Area Unit'] <- 'area_unit'
names(schools_df)[names(schools_df) == 'Ward'] <- 'ward'
names(schools_df)[names(schools_df) == 'Community of Learning ID'] <- 'comm_of_learning_id'
names(schools_df)[names(schools_df) == 'Community of Learning Name'] <- 'comm_of_learning_name'
names(schools_df)[names(schools_df) == 'Latitude'] <- 'latitude'
names(schools_df)[names(schools_df) == 'Longitude'] <- 'longitude'
names(schools_df)[names(schools_df) == 'Isolation Index'] <- 'isolation_index'
names(schools_df)[names(schools_df) == 'Decile'] <- 'decile'
names(schools_df)[names(schools_df) == 'Total School Roll'] <- 'total_school_roll'
names(schools_df)[names(schools_df) == 'European / Pākehā'] <- 'european_pākehā'
names(schools_df)[names(schools_df) == 'Māori'] <- "māori"
names(schools_df)[names(schools_df) == 'Pacific'] <- "pacific"
names(schools_df)[names(schools_df) == 'Asian'] <- "asian"
names(schools_df)[names(schools_df) == 'MELAA'] <- "m_e_l_a_a"
names(schools_df)[names(schools_df) == 'Other'] <- "other" 
names(schools_df)[names(schools_df) == 'International'] <- "international"

## Filtering for Christchurch Only
This dataset is very clean in its original form. This means that filtering by town_city for the Christchurch fields was a simple task. I checked against this list - https://geographic.org/streetview/new_zealand/canterbury_west_coast/christchurch/index.html - to see if it would be better to filter by suburbs, rather than by town_city. I found that there are duplicate suburbs between other cities, for example there is also a Northcote and an Avondale in Auckland. Therefore, I found it better to filter by city.

In [4]:
chch_schools <- schools_df %>%
  filter(town_city == 'Christchurch')

chch_schools

school_number,school_name,telephone,fax,email,principal,school_website,street,suburb,town_city,⋯,isolation_index,decile,total_school_roll,european_pākehā,māori,pacific,asian,m_e_l_a_a,other,international
<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,⋯,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
82,Aidanfield Christian School,03 338 8153,03 339 0821,enrol@aidanfield.school.nz,Mark Richardson,http://www.aidanfield.school.nz/,2 Nash Road,Aidanfield,Christchurch,⋯,0.16,8,386,154,15,16,156,35,3,7
315,St Bedes College,03 375 0647,03 352 0345,office@stbedes.school.nz,Justin Boyle,http://www.stbedes.school.nz,210 Main North Road,Papanui,Christchurch,⋯,0.13,9,811,602,95,32,51,7,3,21
316,Papanui High School,03 352 6119,03 352 6117,admin@papanui.school.nz,Jeffrey Smith,http://www.papanui.school.nz/,30 Langdons Road,Papanui,Christchurch,⋯,0.12,7,1578,990,244,54,231,40,11,8
317,Christchurch Adventist School,03 352 9173,03 352 3470,accounts@cas.school.nz,Evan Ellis,http://www.cas.school.nz,15 Grants Road,Papanui,Christchurch,⋯,0.09,5,271,87,17,63,73,24,6,1
318,St Andrew's College (Christchurch),03 940 2000,03 940 2060,reception@stac.school.nz,Christine Leighton,http://www.stac.school.nz,347 Papanui Road,Bryndwr,Christchurch,⋯,Not applicable,10,1588,1247,114,23,164,8,17,15
319,Burnside High School,03 358 8383,03 358 8380,,Phillip Holstein,http://www.burnside.school.nz,151 Greers Road,Burnside,Christchurch,⋯,0.15,8,2455,1181,219,74,800,84,11,86
320,Mairehau High School,03 385 3145,03 385 3143,admin@mairehau.school.nz,Harry Romana,http://www.mairehau.school.nz,Hills Road,,Christchurch,⋯,0.11,4,359,180,105,28,37,5,4,0
321,Shirley Boys' High School,03 375 7057,03 385 3934,jmf@shirley.school.nz,Tim Grocott,http://www.shirley.school.nz,209 Travis Road,,Christchurch,⋯,0.08,6,1247,794,255,78,72,21,4,23
324,Avonside Girls' High School,03 389 7199,03 389 9250,shume@avonside.school.nz,Susan Hume,http://www.avonside.school.nz,209 Travis Road,North New Brighton,Christchurch,⋯,0.06,6,1027,631,234,74,50,14,7,17
325,Rangi Ruru Girls' School,03 983 3700,03 983 3766,office@rangiruru.school.nz,Sandra Hastie,http://www.rangiruru.school.nz,59 Hewitts Road,Merivale,Christchurch,⋯,Not applicable,10,672,531,52,6,60,1,5,17


In [5]:
chch_schools %>% skim()

── Data Summary ────────────────────────
                           Values    
Name                       Piped data
Number of rows             138       
Number of columns          42        
_______________________              
Column type frequency:               
  character                28        
  numeric                  14        
________________________             
Group variables            None      

── Variable type: character ────────────────────────────────────────────────────
   skim_variable         n_missing complete_rate   min   max empty n_unique
 1 school_name                   0         1        11    54     0      138
 2 telephone                     0         1        11    11     0      136
 3 fax                          22         0.841    11    11     0      115
 4 email                         6         0.957    19    39     0      131
 5 principal                     0         1         8    33     0      136
 6 school_website                1       

## Missing Suburb Values
There are 138 schools in total directly in Christchurch. Of these, eight have no information in the suburb field. I have researched each of these instances individually and can fix this by pulling the Postal Address Suburb which on inspection is in the same vicinity for the blank suburbs.

In [6]:
chch_schools <- chch_schools %>% 
    mutate(suburb = coalesce(suburb,postal_suburb))

## Trimming White Space
There does not appear to be any whitespace throughout the dataset but to be certain, I have trimmed all strings of any possible whitespace.

In [7]:
chch_schools %>%
   mutate_if(is.character, str_trim)

school_number,school_name,telephone,fax,email,principal,school_website,street,suburb,town_city,⋯,isolation_index,decile,total_school_roll,european_pākehā,māori,pacific,asian,m_e_l_a_a,other,international
<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,⋯,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
82,Aidanfield Christian School,03 338 8153,03 339 0821,enrol@aidanfield.school.nz,Mark Richardson,http://www.aidanfield.school.nz/,2 Nash Road,Aidanfield,Christchurch,⋯,0.16,8,386,154,15,16,156,35,3,7
315,St Bedes College,03 375 0647,03 352 0345,office@stbedes.school.nz,Justin Boyle,http://www.stbedes.school.nz,210 Main North Road,Papanui,Christchurch,⋯,0.13,9,811,602,95,32,51,7,3,21
316,Papanui High School,03 352 6119,03 352 6117,admin@papanui.school.nz,Jeffrey Smith,http://www.papanui.school.nz/,30 Langdons Road,Papanui,Christchurch,⋯,0.12,7,1578,990,244,54,231,40,11,8
317,Christchurch Adventist School,03 352 9173,03 352 3470,accounts@cas.school.nz,Evan Ellis,http://www.cas.school.nz,15 Grants Road,Papanui,Christchurch,⋯,0.09,5,271,87,17,63,73,24,6,1
318,St Andrew's College (Christchurch),03 940 2000,03 940 2060,reception@stac.school.nz,Christine Leighton,http://www.stac.school.nz,347 Papanui Road,Bryndwr,Christchurch,⋯,Not applicable,10,1588,1247,114,23,164,8,17,15
319,Burnside High School,03 358 8383,03 358 8380,,Phillip Holstein,http://www.burnside.school.nz,151 Greers Road,Burnside,Christchurch,⋯,0.15,8,2455,1181,219,74,800,84,11,86
320,Mairehau High School,03 385 3145,03 385 3143,admin@mairehau.school.nz,Harry Romana,http://www.mairehau.school.nz,Hills Road,Mairehau,Christchurch,⋯,0.11,4,359,180,105,28,37,5,4,0
321,Shirley Boys' High School,03 375 7057,03 385 3934,jmf@shirley.school.nz,Tim Grocott,http://www.shirley.school.nz,209 Travis Road,Shirley,Christchurch,⋯,0.08,6,1247,794,255,78,72,21,4,23
324,Avonside Girls' High School,03 389 7199,03 389 9250,shume@avonside.school.nz,Susan Hume,http://www.avonside.school.nz,209 Travis Road,North New Brighton,Christchurch,⋯,0.06,6,1027,631,234,74,50,14,7,17
325,Rangi Ruru Girls' School,03 983 3700,03 983 3766,office@rangiruru.school.nz,Sandra Hastie,http://www.rangiruru.school.nz,59 Hewitts Road,Merivale,Christchurch,⋯,Not applicable,10,672,531,52,6,60,1,5,17


## Areas of Interest
As a group we decided on the following areas of interest for each suburb:

Total school roll

Number of primary schools 

Number of intermediate schools

Number of secondary schools 

Average decile 

Flag for single sex schools 

Count of authorities (i.e. private / integrated / public) 

Flags were added for schools that met each of the criteria. I have opted to go very inclusive in this and include all schools that met the specifications, meaning if I was capturing whether or not there was a school available for intermediate aged children, I included solely intermediate schools, primary schools up to year 8 and secondary schools that started at year 7.

In [8]:
chch_schools <- chch_schools %>%
   mutate(primary_schools = ifelse((school_type == "Composite" | 
                                    school_type == "Contributing" | 
                                    school_type == "Full Primary"), 1, 0)) %>%
   mutate(intermediate_schools = ifelse((school_type == "Composite" | 
                                         school_type == "Full Primary" | 
                                         school_type == "Intermediate" |
                                         school_type == "Secondary (Year 7-15)"), 1, 0)) %>%
   mutate(secondary_schools = ifelse((school_type == "Composite" | 
                                         school_type == "Secondary (Year 9-15)" |
                                         school_type == "Secondary (Year 7-15)"), 1, 0)) %>%
   mutate(private_schools = ifelse(authority == "Private : Fully Registered", 1, 0)) %>%
   mutate(state_schools = ifelse((authority == "State" |
                                  authority == "State : Integrated"), 1, 0)) %>%
   mutate(co_educational = ifelse(gender_of_students == "Co-Educational", 1, 0)) %>%
   mutate(single_sex_girls = ifelse(gender_of_students == "Single Sex (Girls School)", 1, 0)) %>%
   mutate(single_sex_boys = ifelse(gender_of_students == "Single Sex (Boys School)", 1, 0)) 


chch_schools

school_number,school_name,telephone,fax,email,principal,school_website,street,suburb,town_city,⋯,other,international,primary_schools,intermediate_schools,secondary_schools,private_schools,state_schools,co_educational,single_sex_girls,single_sex_boys
<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
82,Aidanfield Christian School,03 338 8153,03 339 0821,enrol@aidanfield.school.nz,Mark Richardson,http://www.aidanfield.school.nz/,2 Nash Road,Aidanfield,Christchurch,⋯,3,7,1,1,1,0,1,1,0,0
315,St Bedes College,03 375 0647,03 352 0345,office@stbedes.school.nz,Justin Boyle,http://www.stbedes.school.nz,210 Main North Road,Papanui,Christchurch,⋯,3,21,0,0,1,0,1,0,0,1
316,Papanui High School,03 352 6119,03 352 6117,admin@papanui.school.nz,Jeffrey Smith,http://www.papanui.school.nz/,30 Langdons Road,Papanui,Christchurch,⋯,11,8,0,0,1,0,1,1,0,0
317,Christchurch Adventist School,03 352 9173,03 352 3470,accounts@cas.school.nz,Evan Ellis,http://www.cas.school.nz,15 Grants Road,Papanui,Christchurch,⋯,6,1,1,1,1,0,1,1,0,0
318,St Andrew's College (Christchurch),03 940 2000,03 940 2060,reception@stac.school.nz,Christine Leighton,http://www.stac.school.nz,347 Papanui Road,Bryndwr,Christchurch,⋯,17,15,1,1,1,1,0,1,0,0
319,Burnside High School,03 358 8383,03 358 8380,,Phillip Holstein,http://www.burnside.school.nz,151 Greers Road,Burnside,Christchurch,⋯,11,86,0,0,1,0,1,1,0,0
320,Mairehau High School,03 385 3145,03 385 3143,admin@mairehau.school.nz,Harry Romana,http://www.mairehau.school.nz,Hills Road,Mairehau,Christchurch,⋯,4,0,0,0,1,0,1,1,0,0
321,Shirley Boys' High School,03 375 7057,03 385 3934,jmf@shirley.school.nz,Tim Grocott,http://www.shirley.school.nz,209 Travis Road,Shirley,Christchurch,⋯,4,23,0,0,1,0,1,0,0,1
324,Avonside Girls' High School,03 389 7199,03 389 9250,shume@avonside.school.nz,Susan Hume,http://www.avonside.school.nz,209 Travis Road,North New Brighton,Christchurch,⋯,7,17,0,0,1,0,1,0,1,0
325,Rangi Ruru Girls' School,03 983 3700,03 983 3766,office@rangiruru.school.nz,Sandra Hastie,http://www.rangiruru.school.nz,59 Hewitts Road,Merivale,Christchurch,⋯,5,17,0,1,1,1,0,0,1,0


## Missing Decile Values
Where there is no current decile information available for a school, the value has been recorded as 99. This means that any attempt to summarise by the mean of the decile will be thrown out by any schools with such a high value. To review these, I pulled them out of the dataset to inspect.

In [9]:
missing_value_schools <- subset(chch_schools, decile==99)
missing_value_schools

school_number,school_name,telephone,fax,email,principal,school_website,street,suburb,town_city,⋯,other,international,primary_schools,intermediate_schools,secondary_schools,private_schools,state_schools,co_educational,single_sex_girls,single_sex_boys
<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
610,Seven Oaks School,03 377 8603,,office@sevenoaks.school.nz,Jeremy Orczy,http://www.sevenoaks.school.nz/,77 Murphys Road,Halswell,Christchurch,⋯,1,0,1,1,0,1,0,1,0,0
695,Seven Oaks Secondary School,03 377 8603,03 377 8603,office@sevenoaks.school.nz,Jeremy Orczy,http://www.sevenoaks.school.nz,77 Murphys Road,Halswell,Christchurch,⋯,0,0,0,0,1,1,0,1,0,0
2126,Jean Seabrook Memorial School,03 381 5383,03 381 5385,school@seabrookmckenzie.net,Mary Gillies,,68 London Street,Richmond,Christchurch,⋯,0,0,1,1,0,1,0,1,0,0


In [10]:
missing_value_schools %>% skim()

── Data Summary ────────────────────────
                           Values    
Name                       Piped data
Number of rows             3         
Number of columns          50        
_______________________              
Column type frequency:               
  character                28        
  numeric                  22        
________________________             
Group variables            None      

── Variable type: character ────────────────────────────────────────────────────
   skim_variable         n_missing complete_rate   min   max empty n_unique
 1 school_name                   0         1        17    29     0        3
 2 telephone                     0         1        11    11     0        2
 3 fax                           1         0.667    11    11     0        2
 4 email                         0         1        26    27     0        2
 5 principal                     0         1        12    12     0        2
 6 school_website                1       

Given how few schools meet this criteria and their very low school rolls (maximum of 61 students at the largest school) I have opted to exclude these schools from the dataset completely.

In [11]:
chch_schools <- subset(chch_schools, decile!=99)
chch_schools

school_number,school_name,telephone,fax,email,principal,school_website,street,suburb,town_city,⋯,other,international,primary_schools,intermediate_schools,secondary_schools,private_schools,state_schools,co_educational,single_sex_girls,single_sex_boys
<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
82,Aidanfield Christian School,03 338 8153,03 339 0821,enrol@aidanfield.school.nz,Mark Richardson,http://www.aidanfield.school.nz/,2 Nash Road,Aidanfield,Christchurch,⋯,3,7,1,1,1,0,1,1,0,0
315,St Bedes College,03 375 0647,03 352 0345,office@stbedes.school.nz,Justin Boyle,http://www.stbedes.school.nz,210 Main North Road,Papanui,Christchurch,⋯,3,21,0,0,1,0,1,0,0,1
316,Papanui High School,03 352 6119,03 352 6117,admin@papanui.school.nz,Jeffrey Smith,http://www.papanui.school.nz/,30 Langdons Road,Papanui,Christchurch,⋯,11,8,0,0,1,0,1,1,0,0
317,Christchurch Adventist School,03 352 9173,03 352 3470,accounts@cas.school.nz,Evan Ellis,http://www.cas.school.nz,15 Grants Road,Papanui,Christchurch,⋯,6,1,1,1,1,0,1,1,0,0
318,St Andrew's College (Christchurch),03 940 2000,03 940 2060,reception@stac.school.nz,Christine Leighton,http://www.stac.school.nz,347 Papanui Road,Bryndwr,Christchurch,⋯,17,15,1,1,1,1,0,1,0,0
319,Burnside High School,03 358 8383,03 358 8380,,Phillip Holstein,http://www.burnside.school.nz,151 Greers Road,Burnside,Christchurch,⋯,11,86,0,0,1,0,1,1,0,0
320,Mairehau High School,03 385 3145,03 385 3143,admin@mairehau.school.nz,Harry Romana,http://www.mairehau.school.nz,Hills Road,Mairehau,Christchurch,⋯,4,0,0,0,1,0,1,1,0,0
321,Shirley Boys' High School,03 375 7057,03 385 3934,jmf@shirley.school.nz,Tim Grocott,http://www.shirley.school.nz,209 Travis Road,Shirley,Christchurch,⋯,4,23,0,0,1,0,1,0,0,1
324,Avonside Girls' High School,03 389 7199,03 389 9250,shume@avonside.school.nz,Susan Hume,http://www.avonside.school.nz,209 Travis Road,North New Brighton,Christchurch,⋯,7,17,0,0,1,0,1,0,1,0
325,Rangi Ruru Girls' School,03 983 3700,03 983 3766,office@rangiruru.school.nz,Sandra Hastie,http://www.rangiruru.school.nz,59 Hewitts Road,Merivale,Christchurch,⋯,5,17,0,1,1,1,0,0,1,0


In [12]:
chch_schools %>% skim()

── Data Summary ────────────────────────
                           Values    
Name                       Piped data
Number of rows             135       
Number of columns          50        
_______________________              
Column type frequency:               
  character                28        
  numeric                  22        
________________________             
Group variables            None      

── Variable type: character ────────────────────────────────────────────────────
   skim_variable         n_missing complete_rate   min   max empty n_unique
 1 school_name                   0         1        11    54     0      135
 2 telephone                     0         1        11    11     0      134
 3 fax                          21         0.844    11    11     0      113
 4 email                         6         0.956    19    39     0      129
 5 principal                     0         1         8    33     0      134
 6 school_website                0       

## Writing the Tidied Dataset to CSV
We have decided to include the tidied dataset into the final product so have exported a copy to CSV prior to summarising. 

In [15]:
chch_schools %>%
  write_csv("invidual_schools_dataset.csv")

## Summarising the Dataset
Following the criteria determined by the group, I summarised the data by the given criterias and returned it in alphabetical order by suburb. I then wrote this to a CSV using the naming convention required for the next stage of the project where each dataset produced will be wrangled together.

In [None]:
school_summary_by_suburb = chch_schools %>%
        select(suburb,
               total_school_roll,
               primary_schools,
               intermediate_schools,
               secondary_schools,
               decile,
               co_educational,
               single_sex_girls,
               single_sex_boys,
               private_schools,
               state_schools) %>%
        group_by(suburb) %>%
        summarise(number_of_schools = sum(single_sex_girls, single_sex_boys, co_educational),
                  total_school_roll = sum(total_school_roll), 
                  primary_schools = sum(primary_schools),
                  intermediate_schools = sum(intermediate_schools),
                  secondary_schools = sum(secondary_schools),
                  average_decile = mean(decile),
                  co_educational = sum(co_educational),
                  single_sex_girls = sum(single_sex_girls),
                  single_sex_boys = sum(single_sex_boys),
                  private_schools = sum(private_schools),
                  state_schools = sum(state_schools)) %>%
       arrange(suburb)

school_summary_by_suburb

In [None]:
school_summary_by_suburb %>% skim()

In [None]:
school_summary_by_suburb %>%
  write_csv("suburb_level_schools_summary.csv")

## Graphing
To add to the presentation, I wanted to provide an overall summary graph of the dataset. I have opted to show the total number of schools per suburb, ordered by the average school decile. 

This required setting the predetermined order of the data from alphabetically by suburb to be by the average decile field.

In [None]:
school_summary_by_suburb$suburb <- factor(school_summary_by_suburb$suburb, levels = school_summary_by_suburb$suburb[order(school_summary_by_suburb$average_decile)])

In [None]:
summary_graph <- school_summary_by_suburb %>%
  ggplot(aes(x = number_of_schools, y = suburb, fill = average_decile)) + 
    geom_bar(stat="identity", width=.5)  +
    labs(x = 'Count of Schools',y='Suburbs',
         title= "Number of Schools by Suburb and Average Decile")

summary_graph

In [None]:
summary_graph %>%
  ggsave("school_summary_plt.jpg", ., device = NULL, scale = 1.0)