# Public attitudes towards immigration 2021

This data provides the opportunity to comparatively analyse anti-immigrant and anti-refugee attitudes, news and social media consumption, and political attitudes (e.g., social dominance orientation, right-wing authoritarianism) of the adult population in seven European countries (Austria, Belgium, Germany, Hungary, Italy, Spain, Sweden), the United States, and Colombia in 2021 (N = 13,645). 

__Note:__ almost all variables in this dataset are categorical and ordinal (not discrete or continuous) so the modelling and interpretations will need careful consideration. 

The data and codebook and the original questionnaires are publicly accessible here and are also available on Learn: https://data.mendeley.com/datasets/8mgpmdstp2/2

You can find the complete information about the variables and the original questions asked in the data's Codebook and Questionnaire here: https://data.mendeley.com/datasets/8mgpmdstp2/2


In [2]:
# import packages you need
import pandas as pd
import numpy as np

In [3]:
#read in the data
data = pd.read_csv('Dataset_DeConinck_R.csv', encoding='latin-1')

In [4]:
#see the first 5 rows
data.head()

Unnamed: 0,cntry,ans_id,V001,V002,V002bea,V002beb,V002at,V002de,V002es,V002it,...,V058_5,V058_6,V058_7,V058_8,V058_9,V058_10,V059,Leeftijd3N,Diploma2,Weging
0,4,100,2,62,,,,,7.0,,...,,,,,,,,3,1,1.400395
1,4,233,2,43,,,,,14.0,,...,,,,,,,,2,2,0.720534
2,4,373,2,34,,,,,17.0,,...,,,,,,,,1,2,0.716512
3,4,405,2,53,,,,,14.0,,...,,,,,,,,2,1,1.374635
4,4,462,1,40,,,,,7.0,,...,,,,,,,,2,2,0.689156


In [5]:
#check the dimension of the dataset
data.shape

(13645, 635)

In [6]:
#there are about 1500 individuals interviewed from each country
data[["cntry"]].value_counts()

cntry
9        1543
3        1521
2        1520
7        1517
6        1514
4        1512
5        1510
1        1505
8        1503
dtype: int64

In [7]:
#change the variable/column names to meaningful words for easier handling
#for example:

data = data.rename(columns = {'cntry' : 'Country',
                              'V001' : 'Gender',
                              'V002' : 'Age',
                              'V003' : 'MaritalSt',
                              'V013' : 'PoliAff',
                              'V021' : 'Religion',
                              'V046' : 'FeelingRefugees'})
                   
#depending on your study questions, choose the columns that are needed and make a smaller dataset to use.
#for example
subset_data = data[['Country','Gender','Age','MaritalSt','PoliAff', 'Religion','FeelingRefugees']]
subset_data.head()                        

Unnamed: 0,Country,Gender,Age,MaritalSt,PoliAff,Religion,FeelingRefugees
0,4,2,62,3,2,4,7
1,4,2,43,2,2,4,5
2,4,2,34,3,6,3,8
3,4,2,53,3,5,2,10
4,4,1,40,3,2,2,10


In [8]:
#check the type of variables
subset_data.dtypes

Country            int64
Gender             int64
Age                int64
MaritalSt          int64
PoliAff            int64
Religion           int64
FeelingRefugees    int64
dtype: object

__Note:__

- Most variables are ordinal. You can sum up some of them that are relevant and make a new variable. For example, one variable that indicates feelings towards refugees by summing up V047-V052. 
- Since most variables are ordinal, Pearson correlation may not be a good indication of correlation between such variables. Instead you could try using Spearman's Rank Correlation. 
- When using categorical variables in a model make sure that type of them is category, if not, change them to category. 
- If needed, you could change some variables to have only two categories of 0/1 and use them as response variables in Logistic regression models or in other classification methods. For example, emotions about immigrants: Anger; code 1-4 as no/0, and code 5-7 as yes/1

In [9]:
#changing type of 'MaritalSt' to category and assiging meaningful category names
subset_data['MaritalSt'] = subset_data['MaritalSt'].astype('category')
subset_data['MaritalSt'] = subset_data['MaritalSt'].cat.rename_categories({1: 'Unmarried', 2: 'Cohabitation', 3: 'Married', 4: 'Divorced', 5: 'Widowed'})
subset_data['MaritalSt'].value_counts()
## there is a waring about replacing some values in the dataset

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  subset_data['MaritalSt'] = subset_data['MaritalSt'].astype('category')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  subset_data['MaritalSt'] = subset_data['MaritalSt'].cat.rename_categories({1: 'Unmarried', 2: 'Cohabitation', 3: 'Married', 4: 'Divorced', 5: 'Widowed'})


Married         6186
Unmarried       3533
Cohabitation    2563
Divorced        1154
Widowed          209
Name: MaritalSt, dtype: int64

In [10]:
#choose one country and make a smaller dataset, for example Belgium
data_Belgium = data.loc[data['Country'] == 1]
data_Belgium

Unnamed: 0,Country,ans_id,Gender,Age,V002bea,V002beb,V002at,V002de,V002es,V002it,...,V058_5,V058_6,V058_7,V058_8,V058_9,V058_10,V059,Leeftijd3N,Diploma2,Weging
9094,1,3,2,29,6511.0,8.0,,,,,...,,,,,,,,1,2,0.642257
9095,1,4,1,55,7860.0,8.0,,,,,...,,,,,,,,3,1,1.266764
9096,1,5,2,55,1170.0,3.0,,,,,...,,,,,,,,3,2,0.621797
9097,1,6,1,65,9830.0,4.0,,,,,...,,,,,,,,3,2,0.526566
9098,1,8,1,58,1640.0,2.0,,,,,...,,,,,,,,3,2,0.526566
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10594,1,2172,2,56,1030.0,3.0,,,,,...,,,,,,,,3,1,1.495863
10595,1,2173,1,62,1160.0,3.0,,,,,...,,,,,,,,3,1,1.266764
10596,1,2192,1,60,9960.0,4.0,,,,,...,,,,,,,,3,1,1.266764
10597,1,2202,1,32,8500.0,5.0,,,,,...,,,,,,,,1,2,0.925290
