## Chapter 7. Exploratory data analysis
#### Notebook for R. Aditional notebook to clean data and create the file eurobarometer.csv 

Van Atteveldt, W., Trilling, D. & Arcila, C. (2022). <a href="https://cssbook.net" target="_blank">Computational Analysis of Communication</a>. Wiley.

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/ccs-amsterdam/ccsbook/blob/master/chapter07/cleaning_eurobarometer_r.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
  </td>
  <td>
</table>

In [1]:
library(tidyverse)
url="https://media.githubusercontent.com/media/ccs-amsterdam/ccsbook/master/docs/d/ZA6928_v1-0-0.csv"
d= read_csv2(url, col_names = TRUE)
print("Shape of my original data:")
dim(d)

#Select and rename columns
d2 = d %>%  select (survey, uniqid, p1, tnscntry, d7, d8, d10, d11, d15a, d25, d40a, qd9_4, qd9_1)
d2 = d2 %>% rename (date = p1, country = tnscntry, marital_status = d7, educational = 'd8', gender = d10, age = d11, occupation = d15a, type_community = d25, household_composition = 'd40a', support_refugees = 'qd9_4', support_migrants = 'qd9_1')

print("Shape of my filtered data:")
print(dim(d2))

print("Variables:")
print(names(d2))

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.5     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.3     [32m✔[39m [34mdplyr  [39m 1.0.7
[32m✔[39m [34mtidyr  [39m 1.1.3     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.1.2     [32m✔[39m [34mforcats[39m 0.5.1

“package ‘readr’ was built under R version 4.0.5”
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

[36mℹ[39m Using [34m[34m"','"[34m[39m as decimal and [34m[34m"'.'"[34m[39m as grouping mark. Use [30m[47m[30m[47m`read_delim()`[47m[30m[49m[39m for more control.

[1m[1mRows: [1m[22m[34m[34m33193[34m[39m [1m[1mColumns: [1m[22m[34m[34m705[34m[39m

[36m──[39m [1m[1mColu

[1] "Shape of my original data:"


[1] "Shape of my filtered data:"
[1] 33193    13
[1] "Variables:"
 [1] "survey"                "uniqid"                "date"                 
 [4] "country"               "marital_status"        "educational"          
 [7] "gender"                "age"                   "occupation"           
[10] "type_community"        "household_composition" "support_refugees"     
[13] "support_migrants"     


In [2]:
#Replace some categories by missing values
d2$support_refugees = na_if(d2$support_refugees, "DK")
d2$support_refugees = na_if(d2$support_refugees, 'Inap. (not 1 in eu28)')

#Replace age values to correct strings and convert to numeric
d2$age =  recode(d2$age, "15 years" = "15")
d2$age =  recode(d2$age, "98 years" = "98")
d2$age =  recode(d2$age, "99 years (and older)" = "99")
d2$age =  as.numeric(d2$age)

#We transform date, support_refugees and support_migrants into new numerical variables

#Days in order
d2$date_n = d2$date
d2$date_n =  recode(d2$date_n, "Sunday, 5th November 2017" = '1')
d2$date_n =  recode(d2$date_n, "Monday, 6th November 2017" = '2')
d2$date_n =  recode(d2$date_n, "Tuesday, 7th November 2017" = '3')
d2$date_n =  recode(d2$date_n, "Wednesday, 8th November 2017" = '4')
d2$date_n =  recode(d2$date_n, "Thursday, 9th November 2017" = '5')
d2$date_n =  recode(d2$date_n, "Friday, 10th November 2017" = '6')
d2$date_n =  recode(d2$date_n, "Saturday, 11th November 2017" = '7')
d2$date_n =  recode(d2$date_n, "Sunday, 12th November 2017" = '8')
d2$date_n =  recode(d2$date_n, "Monday, 13th November 2017" = '9')
d2$date_n =  recode(d2$date_n, "Tuesday, 14th November 2017" = '10')
d2$date_n =  as.numeric(d2$date_n)

#Level of support to refugees from 1 to 4
d2$support_refugees_n = d2$support_refugees
d2$support_refugees_n =  recode(d2$support_refugees_n, "Totally disagree" = "1")
d2$support_refugees_n =  recode(d2$support_refugees_n, "Tend to disagree" = "2")
d2$support_refugees_n =  recode(d2$support_refugees_n, "Tend to agree" = "3")
d2$support_refugees_n =  recode(d2$support_refugees_n, "Totally agree" = "4")
d2$support_refugees_n =  as.numeric(d2$support_refugees_n)

#Level of support to migrants from 1 to 4
d2$support_migrants_n = d2$support_migrants
d2$support_migrants_n =  recode(d2$support_migrants_n, "Totally disagree" = "1")
d2$support_migrants_n =  recode(d2$support_migrants_n, "Tend to disagree" = "2")
d2$support_migrants_n =  recode(d2$support_migrants_n, "Tend to agree" = "3")
d2$support_migrants_n =  recode(d2$support_migrants_n, "Totally agree" = "4")
d2$support_migrants_n =  as.numeric(d2$support_migrants_n)

#Recode country names to standard names of the library maps
d2$country =  recode(d2$country, "ÖSTERREICH" = "Austria")
d2$country =  recode(d2$country, "ITALIA" = "Italy")
d2$country =  recode(d2$country, "BELGIQUE" = "Belgium")
d2$country =  recode(d2$country, "PORTUGAL" = "Portugal")
d2$country =  recode(d2$country, "ESPANA" = "Spain")
d2$country =  recode(d2$country, "FRANCE" = "France")
d2$country =  recode(d2$country, "ÖSTERREICH" = "Austria")
d2$country =  recode(d2$country, "DANMARK" = "Denmark")
d2$country =  recode(d2$country, "HRVATSKA" = "Croatia")
d2$country =  recode(d2$country, "DEUTSCHLAND WEST" = "Germany")
d2$country =  recode(d2$country, "DEUTSCHLAND OST" = "Germany")
d2$country =  recode(d2$country, "GREAT BRITAIN" = "UK")
d2$country =  recode(d2$country, "NORTHERN IRELAND" = "UK")
d2$country =  recode(d2$country, "NEDERLAND" = "Netherlands")
d2$country =  recode(d2$country, "POLSKA" = "Poland")
d2$country =  recode(d2$country, "SLOVENIJA" = "Slovenia")
d2$country =  recode(d2$country, "CESKA REPUBLIKA" = "Czech republic")
d2$country =  recode(d2$country, "SLOVENSKA REPUBLIC" = "Slovakia")
d2$country =  recode(d2$country, "MAGYARORSZAG" = "Hungary")
d2$country =  recode(d2$country, "ELLADA" = "Greece")
d2$country =  recode(d2$country, "SUOMI" = "Finland")
d2$country =  recode(d2$country, "IRELAND" = "Ireland")
d2$country =  recode(d2$country, "LUXEMBOURG" = "Luxemburg")
d2$country =  recode(d2$country, "SVERIGE" = "Sweden")
d2$country =  recode(d2$country, "BALGARIJA" = "Bulgaria")
d2$country =  recode(d2$country, "LATVIA" = "Latvia")
d2$country =  recode(d2$country, "EESTI" = "Estonia")
d2$country =  recode(d2$country, "LIETUVA" = "Lithuania")
d2$country =  recode(d2$country, "MALTA" = "Malta")
d2$country =  recode(d2$country, "ROMANIA" = "Romania")
d2$country =  recode(d2$country, "KYPROS" = "Cyprus")

#transform educational into continuous
d2$educational_n = d2$educational
d2$educational_n = na_if(d2$educational_n, "DK")
d2$educational_n = na_if(d2$educational_n, "Still studying")
d2$educational_n = na_if(d2$educational_n, "No full-time education")
d2$educational_n = na_if(d2$educational_n, "Refusal")
d2$educational_n =  recode(d2$educational_n, "2 years" = "2")
d2$educational_n =  recode(d2$educational_n, "75 years" = "75")
d2$educational_n =  as.numeric(d2$educational_n)

“NAs introducidos por coerción”
“NAs introducidos por coerción”


In [15]:
#Save to csv in a file
#write.csv(d2,"eurobarom_nov_2017.csv", row.names = FALSE)