# Vulnerability Ireland 

The 2022 Census is used to derive the vulnerability index for Ireland. In order to use this data, first the relevant data needs to be identified and then normalised. Below is the method used to do this.

## Census Data

The Central Statistics Office (CSO) has produced a dataset of [small area statistics](https://www.cso.ie/en/media/csoie/census/census2022/) for the 2022 Census. This will be the main data source for use with the Irish Vulnerbility Assessment.


### R Libraries

The relvant R libraries are imported in to the kernal:

In [2]:
# load the libraries
library(tidyverse)

### Import the csv data

The data can be imported directly from the CSO website (this is the default) or using a local version.

In [3]:
#get the data from the CSO website
#smallAreaCSOData <- read.csv("https://www.cso.ie/en/media/csoie/census/census2022/SAPS_2022_Small_Area_270923.csv",  header=TRUE, sep=",")

#get the data locally
smallAreaCSOData <- read.csv('C:/Users/wcamaro/Documents/Reachout/ISVEHI-main_2024/1_InputData/CSOData/SAPS_2022_Small_Area_270923.csv', header=TRUE, sep=",", stringsAsFactors = FALSE)


## Select only the relevant data

In total there are 799 different variables in the small area dataset. However, only a smaller subset are useful for our purposes. We therefore need to extract the relevant data, then combine these to create our vulnerability indicators.

The dataset also includes data that is at the persons level (number of people in a small area) and the household level (number of households in a small area). As the preprocessing is slightly different for each, they are treated differently below.

### Small Area ID

First, we need to get the unique ID data for each of the small areas:

In [6]:
smallAreaID <- smallAreaCSOData[, c('GUID'), drop = FALSE]
colnames(smallAreaID)[colnames(smallAreaID) == "GUID"] ="SA_GUID__1"
head(smallAreaID)

SA_GUID__1
IE0
00b00ae4-229d-455d-84f1-d6face4876b1
03003797-1fcd-4fcf-8dde-b2188e3fb1db
06650182-eeaa-4c6c-847c-f85ddaf5361b
08e82f06-46ee-4141-aa07-79a793a12b27
0920215b-86d3-4a53-9fc0-6008ae5c91f9


### Persons Level Data

We then get the persons level data and combine the variables together to create indicators:

In [7]:
#PERSONS DATA

#POPULATION TOTAL
populationTotalData <- smallAreaCSOData[, 'T1_1AGETT', drop = FALSE]
names(populationTotalData)[1] <- 'populationTotal'

#AGE - YOUNG 
ageYoungVariables <- c(
    'T1_1AGE0T', #Age 0 - Total
    'T1_1AGE1T', #Age 1 - Total
    'T1_1AGE2T', #Age 2 - Total
    'T1_1AGE3T', #Age 3 - Total
    'T1_1AGE4T', #Age 4 - Total
    'T1_1AGE5T'  #Age 5 - Total
)

ageYoungData <- smallAreaCSOData[,ageYoungVariables, drop = FALSE]
ageYoungData$young <- apply(ageYoungData,1,sum)
ageYoungData <- select(ageYoungData, 'young')

#AGE - OLD (Only over 75)
ageOldVariables <- c(
    'T1_1AGE75_79T', #Age 75 - 79 - Total
    'T1_1AGE80_84T', #Age 80 - 84 - Total
    'T1_1AGEGE_85T'  #Age 85 and over - Total
)
ageOldData <- smallAreaCSOData[, ageOldVariables, drop = FALSE]
ageOldData$old <- apply(ageOldData,1,sum)
ageOldData <- select(ageOldData, 'old')

#PRIMARY SCHOOL AGE
primarySchoolAgeVariables <- c(
    'T1_1AGE4T',  #Age 4 - Total
    'T1_1AGE5T',  #Age 5 - Total
    'T1_1AGE6T',  #Age 6 - Total
    'T1_1AGE7T',  #Age 7 - Total
    'T1_1AGE8T',  #Age 8 - Total
    'T1_1AGE9T',  #Age 9 - Total
    'T1_1AGE10T', #Age 10 - Total
    'T1_1AGE11T', #Age 11 - Total
    'T1_1AGE12T'  #Age 12 - Total
)

primarySchoolAgeData <- smallAreaCSOData[, primarySchoolAgeVariables, drop = FALSE]
primarySchoolAgeData$priSch <- apply(primarySchoolAgeData,1,sum)
primarySchoolAgeData <- select(primarySchoolAgeData, 'priSch')

#VOLUNTEERS IN THE COMMUNITY
volunteersData <- smallAreaCSOData[, 'T7_1_VOL', drop = FALSE] 
volunteersData$volunteers <- apply(volunteersData,1,sum)
volunteersData <- select(volunteersData, 'volunteers')

#HEALTH -  BAD HEALTH (Choice of: Very good, Good, Fair, Bad, Very bad, and Not stated) 
healthVariables <- c(
    'T12_3_BT', #Bad - Total
    'T12_3_VBT' #Very bad - Total
)
healthData <- smallAreaCSOData[, healthVariables, drop = FALSE]
healthData$poorHealth <- apply(healthData,1,sum)
healthData <- select(healthData, 'poorHealth')

#DISABILITY 
disabilitiesData <- smallAreaCSOData[, 'T12_1_T', drop = FALSE] 
disabilitiesData$disability <- apply(disabilitiesData,1,sum)
disabilitiesData <- select(disabilitiesData, 'disability')

#UNEMPLOYMENT 
unemploymentVariables <- c(
    'T8_1_LFFJT',   #Looking for first regular job - Total
    #'T8_1_ULGUPJT', #Unemployed having lost or given up previous job - Total
    'T8_1_UTWSDT',  #Unable to work due to permanent sickness or disability - Total - MAY CORRELATE WITH HEALTH TOO MUCH
    'T8_1_LAHFT',   #Looking after home/family - Total - NOT SURE ABOUT THIS ONE
    'T8_1_STUT',    #Short Term Unemployed  - Total 
    'T8_1_LTUT'     #Long Term Unemployed - Total
)

unemploymentData <- smallAreaCSOData[, unemploymentVariables, drop = FALSE]
unemploymentData$unemploy <- apply(unemploymentData,1,sum)
unemploymentData <- select(unemploymentData, 'unemploy')

#LOW SKILLED EMPLOYMENT
lowSkilledEmploymentVariables <- c(
    'T9_2_PE', #E Manual skilled (No. of persons)
    'T9_2_PF', #F Semi-skilled (No. of persons)
    'T9_2_PG'  #G Unskilled (No. of persons)
)

lowSkilledEmploymentData <- smallAreaCSOData[, lowSkilledEmploymentVariables, drop = FALSE]
lowSkilledEmploymentData$lowSkill <- apply(lowSkilledEmploymentData,1,sum)
lowSkilledEmploymentData <- select(lowSkilledEmploymentData, 'lowSkill')

#FARMERS
farmingEmploymentVariables <- c(
    'T9_2_PI' #I Farmers (No. of persons)
    #'T9_2_PJ'  #J Agricultural workers (No. of persons) Forestry and fishing also included
)

farmingEmploymentData <- smallAreaCSOData[, farmingEmploymentVariables, drop = FALSE]
farmingEmploymentData$farming<- apply(farmingEmploymentData,1,sum)
farmingEmploymentData <- select(farmingEmploymentData, 'farming')


#TENURE - Permanent private households by type of occupancy 
rentVariables <- c(
    'T6_3_RPLP',  #Rented from private landlord (No. of persons) 
    'T6_3_RLAP',  #Rented from Local Authority (No. of persons)
    'T6_3_RVCHBP' #Rented from voluntary/co-operative housing body (No. of persons)
)

rentData <- smallAreaCSOData[, rentVariables, drop = FALSE]
rentData$rent <- apply(rentData,1,sum)
rentData <- select(rentData, 'rent')

#EDUCATION 
educationVariables <- c(
    'T10_4_NFT' #No formal education - Total
#     'T10_4_PT'   #Primary education - Total
)

educationData <- smallAreaCSOData[, educationVariables, drop = FALSE]
educationData$education <- apply(educationData,1,sum)
educationData <- select(educationData, 'education')

#ENGLISH ABILITY - Speakers of foreign languages by ability to speak English
englishVariables <- c(
    'T2_6NW', #Not well
    'T2_6NAA' #Not at all
)

englishData <- smallAreaCSOData[, englishVariables, drop = FALSE] 
englishData$engLang <- apply(englishData,1,sum)
englishData <- select(englishData, 'engLang')

#NEW RESIDENTS - Usually resident population aged 1 year and over by usual residence 1 year before Census Day
newResidentsVariables <- c(
    'T2_3EI', #Elsewhere in Ireland
    'T2_3OI'  #Outside Ireland
)

newResidentsData <- smallAreaCSOData[, newResidentsVariables, drop = FALSE] 
newResidentsData$newRes <- apply(newResidentsData,1,sum)
newResidentsData <- select(newResidentsData, 'newRes')

#TRAVEL TIME - Population aged 5 years and over by journey time to work, school or college 
travelTimeVariables <- c(
    'T11_3_D5', #hour - under 1 1/2 hours
    'T11_3_D6'  #1 1/2 hours and over
)

travelTimeData <- smallAreaCSOData[, travelTimeVariables, drop = FALSE] 
travelTimeData$travelTime <- apply(travelTimeData,1,sum)
travelTimeData <- select(travelTimeData, 'travelTime')

#combine all the data into one table
personsData <- cbind(smallAreaID,
                     populationTotalData,
                     ageYoungData,
                     ageOldData,
                     primarySchoolAgeData,
                     volunteersData,
                     healthData,
                     disabilitiesData,
                     unemploymentData,
                     lowSkilledEmploymentData,
                     farmingEmploymentData,
                     rentData,
                     educationData,
                     englishData,
                     newResidentsData,
                     travelTimeData
                    )

#get the number of columns in the data
personsDataColLength = ncol(personsData)

head(personsData)

#output the data as a csv
write.csv(personsData, "../1_InputData/1a_CensusData/persons/personsSmallAreaRawData2022.csv", row.names = FALSE)

SA_GUID__1,populationTotal,young,old,priSch,volunteers,poorHealth,disability,unemploy,lowSkill,farming,rent,education,engLang,newRes,travelTime
IE0,5149139,359441,335287,629689,711379,89399,1109557,672428,1015975,158622,1382845,81280,95170,166195,276264
00b00ae4-229d-455d-84f1-d6face4876b1,376,44,2,92,23,5,82,65,130,0,290,3,3,2,13
03003797-1fcd-4fcf-8dde-b2188e3fb1db,310,24,41,34,66,8,83,54,67,37,26,11,4,5,23
06650182-eeaa-4c6c-847c-f85ddaf5361b,375,29,5,55,43,3,54,22,44,0,33,3,4,16,34
08e82f06-46ee-4141-aa07-79a793a12b27,225,25,2,43,14,3,45,38,64,0,148,3,14,3,17
0920215b-86d3-4a53-9fc0-6008ae5c91f9,344,48,0,68,36,3,42,25,46,4,109,0,3,26,43


### Household Level Data

We then get the household level data and combine the variables together to create indicators:

In [8]:
#HOUSEHOLD DATA

#HOUSEHOLDS TOTAL
householdsTotalData <- smallAreaCSOData[, 'T5_1T_H', drop = FALSE] #Total households (No. of households)
names(householdsTotalData)[1] <- 'householdsTotal'


#NO HEATING - Permanent private households by central heating - Households
noHeatingData <- smallAreaCSOData[, 'T6_5_NCH', drop = FALSE]  #No central heating
noHeatingData$noHeating <- apply(noHeatingData,1,sum)
noHeatingData <- select(noHeatingData, 'noHeating')

#RENEWABLE ENERGY SOURCE – Households with renewable energy source
renewableEnergyHousesVariables <- c(
    'T6_10_NS', #Renewable energy source not stated
    'T6_10_RE'  #Has at least one renewable energy source of any type
)

renewablenergyData <- smallAreaCSOData[, renewableEnergyHousesVariables, drop = FALSE]
renewablenergyData$renewableEnergyHouses <- apply(renewablenergyData,1,sum)
renewablenergyData <- select(renewablenergyData, 'renewableEnergyHouses')


#YEAR PROPERTY BUILT - Permanent private households by year built (Pre 1919, 1919-1945, 1946-1960, 1961-1970, 
#1971-1980, 1981-1990, 1991-2000, 2001-2010, 2011 or Later, Not stated)

yearBuiltVariables <- c(
    'T6_2_PRE19H', #Pre 1919 (No. of households)
    'T6_2_19_45H'  #1919 - 1945 (No. of households)
)

yearBuiltData <- smallAreaCSOData[, yearBuiltVariables, drop = FALSE]
yearBuiltData$yearBuilt <- apply(yearBuiltData,1,sum)
yearBuiltData <- select(yearBuiltData, 'yearBuilt')


#CARAVAN/MOBILE HOME (House/Bungalow, Flat/Apartment Bed-Sit, Caravan/Mobile home, Not stated)
mobileHomeData <- smallAreaCSOData[, 'T6_1_CM_H', drop = FALSE] # #Caravan/Mobile home (No. of households)
mobileHomeData$mobHome <- apply(mobileHomeData,1,sum)
mobileHomeData <- select(mobileHomeData, 'mobHome')

#UNOCCUPIED DWELLINGS - Occupancy Status of Permanent Dwellings on Census Night
#1971-1980, 1981-1990, 1991-2000, 2001-2010, 2011 or Later, Not stated)
unoccupiedDwellingsVariables <- c(
    'T6_8_UHH', #Unoccupied holiday homes (No. of households)
    'T6_8_TA',  #Temperorily absent (No. of households)
    'T6_8_OVD'  #Other vacant dwellings (No. of households)
)

unoccupiedDwellingsData <- smallAreaCSOData[, unoccupiedDwellingsVariables, drop = FALSE]
unoccupiedDwellingsData$unoccupiedDwellings <- apply(unoccupiedDwellingsData,1,sum)
unoccupiedDwellingsData <- select(unoccupiedDwellingsData, 'unoccupiedDwellings')


#ONE PARENT HOUSEHOLDS
oneParentVariables <- c(
    'T5_1OPFC_H', #One parent family (father) with  children households (No. of households)
    'T5_1OPMC_H', #One parent family (mother) and children households (No. of households)
    'T5_1OPFCO_H',#One parent family (father) with children and others households (No. of households)
    'T5_1OPMCO_H' #One parent family (mother) with children and others households (No. of households)
)

oneParentData <- smallAreaCSOData[, oneParentVariables, drop = FALSE]
oneParentData$oneParent <- apply(oneParentData,1,sum)
oneParentData <- select(oneParentData, 'oneParent')

#ONE PERSON HOUSEHOLDS
onePersonData <- smallAreaCSOData[, 'T5_1OP_H', drop = FALSE] #One person households (No. of households)
onePersonData$onePerson <- apply(onePersonData,1,sum)
onePersonData <- select(onePersonData, 'onePerson')

#CAR OWNERSHIP
noCarData <- smallAreaCSOData[, 'T15_1_NC', drop = FALSE] #No motor car (No. of households)
noCarData$noCar <- apply(noCarData,1,sum)
noCarData <- select(noCarData, 'noCar')


#NO INTERNET
noInternetData <- smallAreaCSOData[, 'T15_2_NO', drop = FALSE] #No internet (No. of households)
noInternetData$noInternet <- apply(noInternetData,1,sum)
noInternetData <- select(noInternetData, 'noInternet')

#WATER SUPPLY - private water supplies at risk of disease due to reduced quality control - *BIG ASSUMPTION*
waterSupplyVariables <- c(
    'T6_6_GSP', #Group scheme with private source
    'T6_6_OP'   #Other private source
)

waterSupplyData <- smallAreaCSOData[, waterSupplyVariables, drop = FALSE]
waterSupplyData$priWater <- apply(waterSupplyData,1,sum)
waterSupplyData <- select(waterSupplyData, 'priWater')


#FAMILY UNITS - HOUSEHOLDS WITH MORE THAN 3 CHILDREN
familyUnitsVariables <- c(
    'T4_2_3CT',  #Families with 3 children - Total
    'T4_2_4CT',  #Families with 4 children - Total
    'T4_2_GE5CT'  #Families with 5+ children - Total
)

familyUnitsData <- smallAreaCSOData[, familyUnitsVariables, drop = FALSE]
familyUnitsData$familyUnits <- apply(familyUnitsData,1,sum)
familyUnitsData <- select(familyUnitsData, 'familyUnits')





#combine all the data into one table
householdData <- cbind(smallAreaID,
                       householdsTotalData,
                       noHeatingData,
                       renewablenergyData,
                       yearBuiltData,
                       mobileHomeData,
                       unoccupiedDwellingsData,
                       oneParentData,
                       onePersonData,
                       noCarData,
                       noInternetData,
                       waterSupplyData,
                       familyUnitsData
                    )
#inspect the table
head(householdData)

#get the number of columns in the data
householdDataColLength = ncol(householdData)

#output the data as a csv
write.csv(householdData, "../1_InputData/1a_CensusData/household/householdSmallAreaRawData2022.csv", row.names = FALSE)

SA_GUID__1,householdsTotal,noHeating,renewableEnergyHouses,yearBuilt,mobHome,unoccupiedDwellings,oneParent,onePerson,noCar,noInternet,priWater,familyUnits
IE0,1841152,21254,623033,268078,4424,265183,209523,425974,245455,159866,244925,215165
00b00ae4-229d-455d-84f1-d6face4876b1,116,0,101,2,0,2,38,14,15,4,0,23
03003797-1fcd-4fcf-8dde-b2188e3fb1db,119,1,65,34,0,11,7,34,8,17,90,16
06650182-eeaa-4c6c-847c-f85ddaf5361b,114,0,54,0,0,4,12,7,2,1,0,26
08e82f06-46ee-4141-aa07-79a793a12b27,86,0,12,0,0,12,29,13,16,3,0,10
0920215b-86d3-4a53-9fc0-6008ae5c91f9,98,0,54,1,0,9,12,12,2,5,0,15


## Percentages

The raw data is not suitable for use within the vulnerabiltiy assessment. It needs to be normalised based on the number of people/households within each small area. Therefore, the data is converted to percentages based on the total persons/households within each small area.

### Persons Percentages

In [9]:
#PERSONS DATA

#Copy the data
personsDataPct <- personsData

#Calculate the percentages for each of the relevant columns - starting at the 4th column
for(col in names(personsDataPct)[3:personsDataColLength]) {
  personsDataPct[paste0(col, "_pct")] = (personsDataPct[col] / personsDataPct$populationTotal)*100
}

#remove the original data to leave only the percentages
personsDataPct <- personsDataPct[-c(2:personsDataColLength)]
head(personsDataPct)

#output the data as a csv
write.csv(personsDataPct, "../1_InputData/1a_CensusData/persons/personsSmallAreaPctData2022.csv", row.names = FALSE)

SA_GUID__1,young_pct,old_pct,priSch_pct,volunteers_pct,poorHealth_pct,disability_pct,unemploy_pct,lowSkill_pct,farming_pct,rent_pct,education_pct,engLang_pct,newRes_pct,travelTime_pct
IE0,6.980604,6.5115158,12.22902,13.815494,1.736193,21.5484,13.059038,19.73097,3.080554,26.855849,1.5785163,1.8482702,3.227627,5.365247
00b00ae4-229d-455d-84f1-d6face4876b1,11.702128,0.5319149,24.46809,6.117021,1.329787,21.80851,17.287234,34.57447,0.0,77.12766,0.7978723,0.7978723,0.5319149,3.457447
03003797-1fcd-4fcf-8dde-b2188e3fb1db,7.741935,13.2258065,10.96774,21.290323,2.580645,26.77419,17.419355,21.6129,11.935484,8.387097,3.5483871,1.2903226,1.6129032,7.419355
06650182-eeaa-4c6c-847c-f85ddaf5361b,7.733333,1.3333333,14.66667,11.466667,0.8,14.4,5.866667,11.73333,0.0,8.8,0.8,1.0666667,4.2666667,9.066667
08e82f06-46ee-4141-aa07-79a793a12b27,11.111111,0.8888889,19.11111,6.222222,1.333333,20.0,16.888889,28.44444,0.0,65.777778,1.3333333,6.2222222,1.3333333,7.555556
0920215b-86d3-4a53-9fc0-6008ae5c91f9,13.953488,0.0,19.76744,10.465116,0.872093,12.2093,7.267442,13.37209,1.162791,31.686047,0.0,0.872093,7.5581395,12.5


### Household Percentages

In [10]:
#HOUSEHOLD DATA

#Copy the data
householdDataPct <- householdData

#Calculate the percentages for each of the relevant columns - starting at the 4th column
for(col in names(householdDataPct)[3:ncol(householdDataPct)]) {
  householdDataPct[paste0(col, "_pct")] = (householdDataPct[col] / householdDataPct$householdsTotal)*100
}

#remove the original data to leave only the percentages
householdDataPct <- householdDataPct[-c(2:householdDataColLength)]
head(householdDataPct)

#output the data as a csv
write.csv(householdDataPct, "../1_InputData/1a_CensusData/household/householdSmallAreaNormalisedData2022.csv", row.names = FALSE)

SA_GUID__1,noHeating_pct,renewableEnergyHouses_pct,yearBuilt_pct,mobHome_pct,unoccupiedDwellings_pct,oneParent_pct,onePerson_pct,noCar_pct,noInternet_pct,priWater_pct,familyUnits_pct
IE0,1.1543859,33.8393,14.56034,0.2402843,14.403102,11.379995,23.136276,13.331599,8.682933,13.30281,11.68643
00b00ae4-229d-455d-84f1-d6face4876b1,0.0,87.06897,1.724138,0.0,1.724138,32.758621,12.068966,12.931034,3.448276,0.0,19.82759
03003797-1fcd-4fcf-8dde-b2188e3fb1db,0.8403361,54.62185,28.571429,0.0,9.243697,5.882353,28.571429,6.722689,14.285714,75.63025,13.44538
06650182-eeaa-4c6c-847c-f85ddaf5361b,0.0,47.36842,0.0,0.0,3.508772,10.526316,6.140351,1.754386,0.877193,0.0,22.80702
08e82f06-46ee-4141-aa07-79a793a12b27,0.0,13.95349,0.0,0.0,13.953488,33.72093,15.116279,18.604651,3.488372,0.0,11.62791
0920215b-86d3-4a53-9fc0-6008ae5c91f9,0.0,55.10204,1.020408,0.0,9.183673,12.244898,12.244898,2.040816,5.102041,0.0,15.30612


## Z-Scores

The raw data is not suitable for use within the vulnerabiltiy assessment. It needs to be standardised. Therefore, the data is converted to z-scores. Z-scores are:

>"A statistical measurement of a score's relationship to the mean (average value) in a group of scores. A Z-score of 0 means the score is the same as the mean (average value). A Z-score can be positive or negative, indicating whether it is above or below the mean and by how many standard deviations. Z-score standardisation represents the deviation of a raw score from its mean in standard deviation units." (Kazmierczak et al., 2015)

## Persons Z-scores

In [11]:
#PERSONS DATA

#Copy the data
personsDataZ <- personsDataPct

#Calculate the z scores for each of the relevant columns - starting at the 2nd column
for(col in names(personsDataZ)[2:ncol(personsDataZ)]) {
  personsDataZ[paste0(col, "_Z")] = scale(personsDataZ[col])
}


#remove the original data to leave only the z scores
personsDataZ <- personsDataZ[-c(2:ncol(personsDataPct))]
# summary(personsDataZ)
head(personsDataZ)

# #output the data as a csv
write.csv(personsDataZ, "../1_InputData/1a_CensusData/persons/personsSmallAreaZData2022.csv", row.names = FALSE)

SA_GUID__1,young_pct_Z,old_pct_Z,priSch_pct_Z,volunteers_pct_Z,poorHealth_pct_Z,disability_pct_Z,unemploy_pct_Z,lowSkill_pct_Z,farming_pct_Z,rent_pct_Z,education_pct_Z,engLang_pct_Z,newRes_pct_Z,travelTime_pct_Z
IE0,0.07791627,-0.06884322,0.1116042,-0.03610055,-0.04031203,-0.07180963,-0.03964614,-0.02547265,-0.03950277,-0.02310875,-0.05037719,0.01797747,-0.009458907,0.04965956
00b00ae4-229d-455d-84f1-d6face4876b1,1.71314101,-1.20605969,2.9123835,-1.59443632,-0.32806286,-0.03079729,0.75770342,1.52721113,-0.57755666,2.12798605,-0.54077383,-0.46083143,-0.71068987,-0.58410883
03003797-1fcd-4fcf-8dde-b2188e3fb1db,0.34159136,1.20809849,-0.1770247,1.47696478,0.55759202,0.7521512,0.78261864,0.17138453,1.50711182,-0.81337345,1.18708577,-0.236355,-0.429494229,0.73203148
06650182-eeaa-4c6c-847c-f85ddaf5361b,0.33861214,-1.05364379,0.6694344,-0.51155354,-0.70317233,-1.19891098,-1.39597713,-0.86205425,-0.57755666,-0.79570561,-0.53943725,-0.33830536,0.260824702,1.27926616
08e82f06-46ee-4141-aa07-79a793a12b27,1.50845182,-1.13816942,1.6864977,-1.5731414,-0.32555209,-0.31594853,0.68258384,0.88598847,-0.57755666,1.64233273,-0.20439992,2.0117813,-0.502218266,0.77727717
0920215b-86d3-4a53-9fc0-6008ae5c91f9,2.49286403,-1.30722068,1.8366919,-0.71428878,-0.65212774,-1.54432236,-1.13182018,-0.69063405,-0.374462,0.18357194,-1.04199324,-0.42699899,1.117029892,2.41981469


## Households Z-scores

In [12]:
#HOUSEHOLD DATA

#Copy the data
householdDataZ <- householdDataPct

#Calculate the z scores for each of the relevant columns - starting at the 3rd column
for(col in names(householdDataZ)[2:ncol(householdDataZ)]) {
  householdDataZ[paste0(col, "_Z")] = scale(householdDataZ[col])
}

#remove the original data to leave only the z scores
householdDataZ <- householdDataZ[-c(2:ncol(householdDataPct))]
# summary(householdDataZ)
head(householdDataZ)

#output the data as a csv
write.csv(householdDataZ, "../1_InputData/1a_CensusData/household/householdSmallAreaZData2022.csv", row.names = FALSE)

SA_GUID__1,noHeating_pct_Z,renewableEnergyHouses_pct_Z,yearBuilt_pct_Z,mobHome_pct_Z,unoccupiedDwellings_pct_Z,oneParent_pct_Z,onePerson_pct_Z,noCar_pct_Z,noInternet_pct_Z,priWater_pct_Z,familyUnits_pct_Z
IE0,-0.02461051,-0.007598731,-0.03768719,-0.0003867792,-0.08024804,0.02462081,-0.03921855,0.0074691,-0.05142154,-0.02750751,0.0094049267
00b00ae4-229d-455d-84f1-d6face4876b1,-0.77842976,3.305097246,-0.7424689,-0.1877975292,-0.65098238,3.24176742,-1.11598164,-0.02238398,-0.84090136,-0.53905021,1.2664539346
03003797-1fcd-4fcf-8dde-b2188e3fb1db,-0.22968646,1.285782682,0.73160253,-0.1877975292,-0.31249488,-0.80268772,0.48957951,-0.48507659,0.79357788,2.36921477,0.2809978975
06650182-eeaa-4c6c-847c-f85ddaf5361b,-0.77842976,0.834372707,-0.83713403,-0.1877975292,-0.57064838,-0.10384443,-1.69278968,-0.85535193,-1.22866657,-0.53905021,1.7264982598
08e82f06-46ee-4141-aa07-79a793a12b27,-0.77842976,-1.245172933,-0.83713403,-0.1877975292,-0.10048707,3.38657985,-0.81950176,0.40045663,-0.83485413,-0.53905021,0.0003680707
0920215b-86d3-4a53-9fc0-6008ae5c91f9,-0.77842976,1.315666966,-0.78110773,-0.1877975292,-0.31519682,0.15477513,-1.09886478,-0.83400498,-0.59148407,-0.53905021,0.5683093868


## Combine Data

The persons level and household level data are then combined into a single CSV:

In [13]:
#Combine the RAW persons and household data
personsHouseholdDataCombined <- cbind(personsData,
                                       householdData[2:ncol(householdData)])

#output the data as a csv
write.csv(personsHouseholdDataCombined, "../1_InputData/1a_CensusData/censusData.csv", row.names = FALSE)

#Combine the % persons and household data
personsHouseholdPctDataCombined <- cbind(personsDataPct,
                                       householdDataPct[2:ncol(householdDataPct)])

#output the data as a csv
write.csv(personsHouseholdPctDataCombined, "../1_InputData/1a_CensusData/censusDataPercent.csv", row.names = FALSE)


#Combine the Z-score persons and household data
personsHouseholdZDataCombined <- cbind(personsDataZ,
                                       householdDataZ[2:ncol(householdDataZ)])

names(personsHouseholdZDataCombined) <- gsub("_pct_Z","",names(personsHouseholdZDataCombined))

#output the data as a csv
write.csv(personsHouseholdZDataCombined, "../1_InputData/1a_CensusData/censusDataZ.csv", row.names = FALSE)


**END**