# Logroño 



## Census Data

The Central Statistics Office (CSO) has produced a dataset of [small area statistics](https://www.cso.ie/en/census/census2016reports/census2016smallareapopulationstatistics/) for the 2016 Census. This will be the main data source for use with the Irish Vulnerbility Assessment.


### R Libraries

The relvant R libraries are imported in to the kernal:

In [1]:
# libraries
# Load R libraries
 if(!require("pacman"))
     install.packages("pacman")
     library("pacman")

p_load("dplyr", "sf", "purrr", "tidyverse")

print("Loaded Packages:")
p_loaded()

Loading required package: pacman



[1] "Loaded Packages:"


The Spanish 2021 Census data available at:
https://www.ine.es/dyngs/INEbase/es/operacion.htm?c=Estadistica_C&cid=1254736176992&menu=resultados&idp=1254735572981

In [2]:
# create the pipeline directory if it does not exist
pipelineDir <- file.path("../..","2_pipeline","Logrono","1a_CensusData","2021")
if(!dir.exists(pipelineDir)){
    dir.create(pipelineDir, recursive = TRUE)
    print(paste0(pipelineDir, " created"))
}

# create the output directory if it does not exist
outputDir <- file.path("../..","3_outputs","Logrono","2021")
if(!dir.exists(outputDir)){
    dir.create(outputDir, recursive = TRUE)
    print(paste0(outputDir, " created"))
}

In [3]:
#Current Spanish 2021 Census is a mixture of:
#  1) a static CSV or Excel downloaded from the Census website
#  2) extra data from missing indicators appended into this CSV or Excel
# in our case we read the CSV version
census <- read.csv("../../0_data/Logrono/SpanishCensus/2021/Census_Data_2023-07-03_Census.csv")

### Import Census Output Areas 

The spatial output areas need to be ingested.

In [4]:
#set the path to the census OA directory
censusOAPath <- "../../0_data/Logrono/OA/2021/Seccionado_2021/SECC_CE_20210101.shp"

#read the OA data
censusOA <- st_read(censusOAPath)

Reading layer `SECC_CE_20210101' from data source 
  `/Volumes/2023_WD18TB/REACHOUT/teams/Cities/0_data/Logrono/OA/2021/Seccionado_2021/SECC_CE_20210101.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 36333 features and 18 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -1004502 ymin: 3132130 xmax: 1126932 ymax: 4859240
Projected CRS: ETRS89 / UTM zone 30N


### Join Data to OU

We need to join the census data to the spatial data.

In [5]:
#change column types
censusOA$CCA <- as.integer(as.character(censusOA$CCA))
censusOA$CPRO <- as.integer(as.character(censusOA$CPRO))
censusOA$CMUN <- as.integer(as.character(censusOA$CMUN))
censusOA$CDIS <- as.integer(as.character(censusOA$CDIS))
censusOA$CSEC <- as.integer(as.character(censusOA$CSEC))

In [6]:
# merge the data
censusDataOA = merge(censusOA, census, by.x=c("CCA", "CPRO", "CMUN", "CDIS", "CSEC"), by.y=c("ccaa", "CPRO", "CMUN","dist","secc"))

#drop geometry
censusDataOANoGeom <- st_drop_geometry(censusDataOA)

In [7]:
# test the join works by the row count
nrow(censusDataOA)==nrow(census)

In [8]:
#output the joined data for backup
st_write(obj = censusDataOA, file.path(outputDir, "national_census_spain_2021.geojson"), delete_dsn=TRUE)

Deleting source `../../3_outputs/Logrono/2021/national_census_spain_2021.geojson' using driver `GeoJSON'
Writing layer `national_census_spain_2021' to data source 
  `../../3_outputs/Logrono/2021/national_census_spain_2021.geojson' using driver `GeoJSON'
Writing 343 features with 59 fields and geometry type Multi Polygon.


## Select only the relevant data

We only require a subset of the census data for our purposes. We therefore need to extract the relevant data, then combine these to create our vulnerability indicators.


### Persons Level Data

We then get the persons level data and combine the variables together to create indicators:

In [9]:
#PERSONS DATA

#POPULATION TOTAL
populationTotalData <- censusDataOANoGeom[, c('CUSEC', 't1_1'), drop = FALSE]
names(populationTotalData)[2] <- 'pop'

# variables using
# t4_1 - Percentage of persons under 16 years of age
# t5_1 - Percentage of foreign nationalities
# t8_1 - Percentage of population attending university studies (escur = 09 10 11 12 ) sobre población de 16 y más
# t9_1 - Percentage of people with higher education  (esreal_cneda=08 09 10 11 12) sobre población de 16 y más
#        invert t9_1 to get percentage of people with no higher education
# t10_1 - Percentage of unemployed population over active population = Parados /Activos
# t13_1 - Percentage of pensioner population due to disability over population of 16 years or more
# t20_2 - Dwellings in property
# t22_1 - Households of 1 person
# c_ab0t4 - Age 0-4 Boys
# c_ag0t4 - Age 0-4 Girls
# c_amo75 - Age-Male_75+
# c_afo75 - Age Female_75+
# c_ac5t9 - Children Age 5--9

### Domain - Age 
###   Boys under 5 years of age
###   Girls under 5 years of age
###   Men over 75 years of age
###   Women over 75 years of age
ageIndicators <- c('c_ab0t4', 'c_ag0t4', 'c_amo75', 'c_afo75')
ageDomain <- censusDataOANoGeom[, ageIndicators, drop = FALSE]

### Domain - Health
###  People with a disability preventing work
healthIndicators <- c('t13_1')
healthDomain <- censusDataOANoGeom[, healthIndicators, drop = FALSE]

### Domain - Mobility
###   Dependants Rate
mobilityIndicators <- c('t4_1')
mobilityDomain <- censusDataOANoGeom[, mobilityIndicators, drop = FALSE]

### Domain - Income
###   People with no higher education (invert t9_1)
###   Population attending university
###   Unemployment
incomeIndicators <- c('t9_1', 't8_1', 't10_1')
incomeDomain <- censusDataOANoGeom[, incomeIndicators, drop = FALSE]
head(incomeDomain$t9_1)
incomeDomain$t9_1 <- (1-incomeDomain$t9_1)
colnames(incomeDomain)[1] <- "invert_t9_1"

## invert the housholds owned score
#householdDataZ$homesOwned_pct_Z <- householdDataZ$homesOwned_pct_Z*-1

### Domain - Social Network
###   Primary School Age Children
###   Households with one person
socialNetworkIndicators <- c('c_ac5t9', 't22_1')
socialNetworkDomain <- censusDataOANoGeom[, socialNetworkIndicators, drop = FALSE]

### Domain - Local Knowledge
###   Percentage of foreign nationals
localKnowledgeIndicators <- c('t5_1')
localKnowledgeDomain <- censusDataOANoGeom[, localKnowledgeIndicators, drop = FALSE]

### Domain - Tenure
###   Households renting
tenureIndicators <- c('t20_2')
tenureDomain <- censusDataOANoGeom[, tenureIndicators, drop = FALSE]

#combine all the data into one table
personsData<- cbind(populationTotalData,
                    ageDomain,
                    healthDomain,
                    mobilityDomain,
                    incomeDomain,
                    socialNetworkDomain,
                    localKnowledgeDomain,
                    tenureDomain
                   )
head(personsData)

#output the data as a csv
# output the data as a csv
write.csv(personsData, file.path(pipelineDir, "persons_oa_raw.csv"), row.names = FALSE)

Unnamed: 0_level_0,CUSEC,pop,c_ab0t4,c_ag0t4,c_amo75,c_afo75,t13_1,t4_1,invert_t9_1,t8_1,t10_1,c_ac5t9,t22_1,t5_1,t20_2
Unnamed: 0_level_1,<chr>,<int>,<int>,<int>,<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<int>,<dbl>,<int>
1,2600101001,236,3,3,20,18,0.03,0.1,0.77,0.05,0.04,10,46.0,0.11,13.0
2,2601001001,92,1,1,11,9,,,,,,2,,,
3,2610001001,67,1,0,5,4,,,,,,1,,,
4,2610101001,36,0,0,2,3,,,,,,0,,,
5,2610201001,628,10,6,42,65,0.02,0.1,0.78,0.03,0.17,16,103.0,0.09,20.0
6,2610201002,777,17,16,49,82,0.02,0.14,0.76,0.01,0.15,40,138.0,0.11,15.0


### Household Level Data

We then get the household level data and combine the variables together to create indicators:

In [10]:
# TEMP below while no household data

##HOUSEHOLD DATA
#
##HOUSEHOLDS TOTAL
#householdsTotalData <- censusDataOANoGeom[, c('CUSEC', 't21_1'), drop = FALSE] #Total households (No. of households)
#names(householdsTotalData)[2] <- 'householdsTotal'
#
##Renting 
#rentingData <- censusDataOANoGeom[, 't18_4', drop = FALSE]  
#rentingData$renting <- apply(rentingData,1,sum)
#rentingData <- select(rentingData, 'renting')
#
##Free housing 
#freeHousingData <- censusDataOANoGeom[, 't18_5', drop = FALSE]  
#freeHousingData$freeHousing <- apply(freeHousingData,1,sum)
#freeHousingData <- select(freeHousingData, 'freeHousing')
#
## Homes owned
#homesOwnedVariables <- c(
#    't18_1', #Homes owned, by purchase, fully paid for
#    't18_2', #Homes owned, by purchase, with payments due
#    't18_3' #Homes owned, inherited or donated
#
#)
#
#homesOwnedData <- censusDataOANoGeom[, homesOwnedVariables, drop = FALSE]
#homesOwnedData$homesOwned <- apply(homesOwnedData,1,sum)
#homesOwnedData <- select(homesOwnedData, 'homesOwned')
#
## #ONE PERSON HOUSEHOLDS
#onePersonData <- censusDataOANoGeom[, 't22_1', drop = FALSE] #One person households (No. of households)
#onePersonData$onePerson <- apply(onePersonData,1,sum)
#onePersonData <- select(onePersonData, 'onePerson')
#
#
##combine all the data into one table
#householdData <- cbind(householdsTotalData,
#                       rentingData,
#                       homesOwnedData,
#                       freeHousingData,
#                       onePersonData
#                    )
##inspect the table
#head(householdData)
#
##get the number of columns in the data
#householdDataColLength = ncol(householdData)
#
## output the data as a csv
#write.csv(householdData, file.path(pipelineDir,"households_oa_raw_2011.csv"), row.names = FALSE)

## Percentages

The raw data is not suitable for use within the vulnerabiltiy assessment. It needs to be normalised based on the number of people/households within each small area. Therefore, the data is converted to percentages based on the total persons/households within each small area.

### Persons Percentages

In [11]:
#PERSONS DATA

# some data is already in percentage, but these are not. We need to convert these to percentage.
makepercentage <- c('c_ab0t4', 'c_ag0t4', 'c_amo75', 'c_afo75', 'c_ac5t9', 't22_1', 't20_2')

#Copy the data
personsDataPct <- personsData

#get the number of columns in the data
personsDataColLength = ncol(personsData)

#Calculate the percentages for each of the relevant columns - starting at the 4th column
for(col in names(personsDataPct)[3:personsDataColLength]) {
  if( col %in% makepercentage ){
      personsDataPct[paste0(col, "_pct")] = (personsDataPct[col] / personsDataPct$pop)*100
  } else {
      # already as percentage
      personsDataPct[paste0(col, "_pct")] = personsDataPct[col]
  }
}
#remove the original data to leave only the percentages
personsDataPct <- personsDataPct[-c(2:personsDataColLength)]
#head(personsDataPct)

#output the data as a csv
write.csv(personsDataPct, file.path(pipelineDir,"persons_oa_pct_data.csv"), row.names = FALSE)

### Household Percentages

In [12]:
# TEMP below while no household data

##HOUSEHOLD DATA
#
##Copy the data
#householdDataPct <- householdData
#
##Calculate the percentages for each of the relevant columns - starting at the 4th column
#for(col in names(householdDataPct)[3:ncol(householdDataPct)]) {
#  householdDataPct[paste0(col, "_pct")] = (householdDataPct[col] / householdDataPct$householdsTotal)*100
#}
#
##remove the original data to leave only the percentages
#householdDataPct <- householdDataPct[-c(2:householdDataColLength)]
## head(householdDataPct)
#
## output the data as a csv
#write.csv(householdDataPct, file.path(pipelineDir,"households_oa_pct_data_2011.csv"), row.names = FALSE)

## Z-Scores

The raw data is not suitable for use within the vulnerabiltiy assessment. It needs to be standardised. Therefore, the data is converted to z-scores. Z-scores are:

>"A statistical measurement of a score's relationship to the mean (average value) in a group of scores. A Z-score of 0 means the score is the same as the mean (average value). A Z-score can be positive or negative, indicating whether it is above or below the mean and by how many standard deviations. Z-score standardisation represents the deviation of a raw score from its mean in standard deviation units." (Kazmierczak et al., 2015)

## Persons Z-scores

In [13]:
#PERSONS DATA

#Copy the data
personsDataZ <- personsDataPct

#Calculate the z scores for each of the relevant columns - starting at the 2nd column
for(col in names(personsDataZ)[2:ncol(personsDataZ)]) {
  personsDataZ[paste0(col, "_Z")] = scale(personsDataZ[col])
}

#remove the original data to leave only the z scores
personsDataZ <- personsDataZ[-c(2:ncol(personsDataPct))]
# summary(personsDataZ)
# head(personsDataZ)

# output the data as a csv
write.csv(personsDataZ, file.path(pipelineDir,"persons_oa_z_data.csv"), row.names = FALSE)

## Households Z-scores

In [14]:
# TEMP below while no household data

##HOUSEHOLD DATA
#
##Copy the data
#householdDataZ <- householdDataPct
#
##Calculate the z scores for each of the relevant columns - starting at the 3rd column
#for(col in names(householdDataZ)[2:ncol(householdDataZ)]) {
#  householdDataZ[paste0(col, "_Z")] = scale(householdDataZ[col])
#}
#
##remove the original data to leave only the z scores
#householdDataZ <- householdDataZ[-c(2:ncol(householdDataPct))]
## summary(householdDataZ)
## head(householdDataZ)
#
## invert the housholds owned score
#householdDataZ$homesOwned_pct_Z <- householdDataZ$homesOwned_pct_Z*-1
#
##output the data as a csv
#write.csv(householdDataZ, file.path(pipelineDir,"households_oa_z_data_2011.csv"), row.names = FALSE)

## Combine Data

The persons level and household level data are then combined into a single CSV:

In [15]:
##Combine the RAW persons and household data
#personsHouseholdDataCombined <- cbind(personsData,
#                                       householdData[2:ncol(householdData)])
#
##output the data as a csv
#write.csv(personsHouseholdDataCombined, file.path(pipelineDir, "census_oa_raw_data_2011.csv"), row.names = FALSE)
#
##Combine the % persons and household data
#personsHouseholdPctDataCombined <- cbind(personsDataPct,
#                                       householdDataPct[2:ncol(householdDataPct)])
#
##output the data as a csv
#write.csv(personsHouseholdPctDataCombined, file.path(pipelineDir, "census_oa_pct_data_2011.csv"), row.names = FALSE)
#
##Combine the Z-score persons and household data
#personsHouseholdZDataCombined <- cbind(personsDataZ,
#                                       householdDataZ[2:ncol(householdDataZ)])
#
#names(personsHouseholdZDataCombined) <- gsub("_pct_Z","",names(personsHouseholdZDataCombined))
#
##output the data as a csv
#write.csv(personsHouseholdZDataCombined, file.path(pipelineDir, "census_oa_z_data_2011.csv"), row.names = FALSE)
#
##output geojson
#personsHouseholdDataCombinedSFOA <- subset(censusDataOA, select = c('CUSEC', 'geometry'))
#personsHouseholdDataCombinedSF <- merge(personsHouseholdDataCombinedSFOA,personsHouseholdZDataCombined, by = "CUSEC")
#
#st_write(obj= personsHouseholdDataCombinedSF, file.path(pipelineDir, "census_persons_oa_z_data_2011.geojson"), delete_dsn=TRUE)

# TEMP below while no household data
# output geojson
personsHouseholdDataCombinedSFOA <- subset(censusDataOA, select = c('CUSEC', 'geometry'))
personsHouseholdDataCombinedSF <- merge(personsHouseholdDataCombinedSFOA,personsDataZ, by = "CUSEC")

st_write(obj= personsHouseholdDataCombinedSF, file.path(pipelineDir, "census_persons_oa_z_data.geojson"), delete_dsn=TRUE)

Deleting source `../../2_pipeline/Logrono/1a_CensusData/2021/census_persons_oa_z_data.geojson' using driver `GeoJSON'
Writing layer `census_persons_oa_z_data' to data source 
  `../../2_pipeline/Logrono/1a_CensusData/2021/census_persons_oa_z_data.geojson' using driver `GeoJSON'
Writing 343 features with 14 fields and geometry type Multi Polygon.


## Logroño

Get the analysis just for Logroño.


In [16]:
# get the logrono OAs
logOA <- censusOA %>% filter(NCA == 'La Rioja')

In [17]:
## get the log data
#logData <- filter(personsHouseholdDataCombined, CUSEC %in% logOA$CUSEC)
#logDataPct <- filter(personsHouseholdPctDataCombined, CUSEC %in% logOA$CUSEC)

# TEMP below while no household data
# get the log data
logData <- filter(personsData, CUSEC %in% logOA$CUSEC)
logDataPct <- filter(personsDataPct, CUSEC %in% logOA$CUSEC)

In [18]:
# caluclate the z scores based on just the region

#Copy the data
logDataZ <- logDataPct

#Calculate the z scores for each of the relevant columns - starting at the 3rd column
for(col in names(logDataZ)[2:ncol(logDataZ)]) {
  logDataZ[paste0(col, "_Z")] = scale(logDataZ[col])
}

#remove the original data to leave only the z scores
logDataZ <- logDataZ[-c(2:ncol(logDataPct))]

## invert the housholds owned score
#logDataZ$homesOwned_pct_Z <- logDataZ$homesOwned_pct_Z*-1

In [19]:
#output the data as a csv
write.csv(logData, file.path(pipelineDir, "census_oa_raw_data_log.csv"), row.names = FALSE)

write.csv(logDataPct, file.path(pipelineDir, "census_oa_pct_data_log.csv"), row.names = FALSE)

write.csv(logDataZ, file.path(pipelineDir, "census_oa_z_data_log.csv"), row.names = FALSE)

logCombined <- subset(logOA, select = c('CUSEC', 'geometry'))
logCombined <- merge(logCombined,logData, by = "CUSEC")
logCombined <- merge(logCombined,logDataPct, by = "CUSEC")
logCombined <- merge(logCombined,logDataZ, by = "CUSEC")

#output geojson
st_write(obj= logCombined, file.path(pipelineDir, "census_persons_oa_z_data_log.geojson"), delete_dsn=TRUE)

# personsHouseholdDataCombinedSFOA <- subset(censusDataOA, select = c('CUSEC', 'geometry'))
# personsHouseholdDataCombinedSF <- merge(personsHouseholdDataCombinedSFOA,personsHouseholdZDataCombined, by = "CUSEC")

Deleting source `../../2_pipeline/Logrono/1a_CensusData/2021/census_persons_oa_z_data_log.geojson' using driver `GeoJSON'
Writing layer `census_persons_oa_z_data_log' to data source 
  `../../2_pipeline/Logrono/1a_CensusData/2021/census_persons_oa_z_data_log.geojson' using driver `GeoJSON'
Writing 343 features with 41 fields and geometry type Multi Polygon.


**END**