# Immigration perception 


|Name|SNR|
|----|-------|
|Mattia Malerba|2008050|
|Lorenzo Mattesini|2014924|




# Execution

## Import packages and libraries

In [19]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy 
from scipy import stats

## Data

In [31]:
#Premilinary data cleaning has been done separately on Excel. Specifically, a new variable 'Southern' is defined and equals 0 for western countries and 1 for southern countries. The variable 'Nationality' was dropped.

df = pd.read_csv('Dataset2.csv', sep=";")
df.head(10)                #the first 10 elements of the dataset are shown below

Unnamed: 0,Gender,Age,Southern,Skills,Economy,Jobs,Budget,Culture,Policy
0,Male,25,1,2,2,3,2,3,2
1,Male,23,1,4,4,3,3,4,3
2,Female,23,0,4,4,1,3,5,2
3,Male,22,0,1,3,4,4,1,3
4,Female,23,0,4,4,2,5,4,1
5,Female,24,1,2,2,1,3,2,2
6,Female,23,1,5,4,1,2,3,2
7,Male,25,0,4,4,2,2,4,1
8,Male,25,1,3,4,2,2,3,1
9,Male,25,1,4,4,2,3,4,2


## Descriptive statistics

As we look at demographics between samples, we observe that they are relatively balanced across age and gender. Only Southern Europe presents a slightly larger number of males compared to females. 




In [5]:
#divido il dataset in southern e western + descriptive statistics

south = df[df.Southern == 1]
south1 = south[south.columns.difference(['Southern', 'Budget', 'Culture', 'Economy', 'Jobs', 'Policy', 'Skills'])]
south1.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Age,36.0,23.944444,2.123938,16.0,23.0,25.0,25.0,28.0


In [6]:
west = df[df.Southern == 0]
west1 = west[west.columns.difference(['Southern', 'Budget', 'Culture', 'Economy', 'Jobs', 'Policy', 'Skills'])]
west1.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Age,36.0,24.694444,6.131043,19.0,22.75,24.0,25.0,58.0


In [7]:
#Frequency table for Western Countries by Gender
west2 = west[west.columns.difference(['Budget', 'Culture', 'Economy', 'Jobs', 'Policy', 'Skills','Age'])]
west2.groupby('Gender').count()

Unnamed: 0_level_0,Southern
Gender,Unnamed: 1_level_1
Female,18
Male,18


In [8]:
#Frequency table for Southern Countries by Gender
south2 = south[south.columns.difference(['Budget', 'Culture', 'Economy', 'Jobs', 'Policy', 'Skills', 'Age'])]
south2.groupby('Gender').count()

Unnamed: 0_level_0,Southern
Gender,Unnamed: 1_level_1
Female,12
Male,24


The tables below show some descriptive statistics of our variables of interest, for Western Europe and Southern Europe respectively. Furthermore, differences between means are summarized in a bar graph.

In [9]:
west2 = west[west.columns.difference(['Age', 'Southern'])]
west2.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Budget,36.0,3.222222,1.07201,1.0,2.0,3.5,4.0,5.0
Culture,36.0,3.305556,1.327069,1.0,2.0,4.0,4.0,5.0
Economy,36.0,3.555556,1.106976,1.0,3.0,4.0,4.0,5.0
Jobs,36.0,2.694444,1.009086,1.0,2.0,2.0,4.0,5.0
Policy,36.0,2.638889,1.07312,1.0,2.0,2.0,4.0,5.0
Skills,36.0,3.5,1.133893,1.0,2.75,4.0,4.0,5.0


In [10]:
south2 = south[south.columns.difference(['Age', 'Southern'])]
south2.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Budget,36.0,2.666667,1.264911,1.0,2.0,2.5,3.25,5.0
Culture,36.0,3.833333,0.971008,2.0,3.0,4.0,4.25,5.0
Economy,36.0,3.722222,0.778684,2.0,3.0,4.0,4.0,5.0
Jobs,36.0,2.166667,1.108409,1.0,1.0,2.0,3.0,5.0
Policy,36.0,2.416667,1.052209,1.0,2.0,2.0,3.0,5.0
Skills,36.0,3.527778,1.081959,1.0,3.0,4.0,4.0,5.0


## Statistical analysis

In [25]:
#We define the subsample of data needed to run our set of statistical tests
#Budget
west_budget = west[west.columns.difference(['Age', 'Southern', 'Economy', 'Jobs', 'Policy', 'Skills','Culture','Gender'])]
south_budget = south[south.columns.difference(['Age', 'Southern', 'Economy', 'Jobs', 'Policy', 'Skills','Culture','Gender'])]
#Culture
west_culture = west[west.columns.difference(['Age', 'Southern', 'Economy', 'Jobs', 'Policy', 'Skills','Budget','Gender'])]
south_culture = south[south.columns.difference(['Age', 'Southern', 'Economy', 'Jobs', 'Policy', 'Skills','Budget','Gender'])]
#Economy
west_economy = west[west.columns.difference(['Age', 'Southern', 'Budget', 'Jobs', 'Policy', 'Skills','Culture','Gender'])]
south_economy = south[south.columns.difference(['Age', 'Southern', 'Budget', 'Jobs', 'Policy', 'Skills','Culture','Gender'])]
#Jobs
west_jobs = west[west.columns.difference(['Age', 'Southern', 'Economy', 'Budget', 'Policy', 'Skills','Culture','Gender'])]
south_jobs = south[south.columns.difference(['Age', 'Southern', 'Economy', 'Budget', 'Policy', 'Skills','Culture','Gender'])]
#Policy
west_policy = west[west.columns.difference(['Age', 'Southern', 'Economy', 'Jobs', 'Budget', 'Skills','Culture','Gender'])]
south_policy = south[south.columns.difference(['Age', 'Southern', 'Economy', 'Jobs', 'Budget', 'Skills','Culture','Gender'])]
#Skills
west_skills = west[west.columns.difference(['Age', 'Southern', 'Economy', 'Jobs', 'Policy', 'Budget','Culture','Gender'])]
south_skills = south[south.columns.difference(['Age', 'Southern', 'Economy', 'Jobs', 'Policy', 'Budget','Culture','Gender'])]

In order to evaluate our research question, we need to find out if subjects display different perceptions on immigration between the two samples. Thus, our null and alternative hypotheses can be stated as follows:

H0: Perceptions of the effects of immigration on the economy and society in Western Europe are the same as those in Southern Europe.
Ha: Perceptions of the effects of immigration on the economy and society in Western Europe are different to those in Southern Europe.

To test for these hypotheses, we need to compare every mean of our variables of interest between Southern and Western Europe. Since our variables are ordinal and our samples are independent from each other, we perform, for each variable, a Mann-Whitney U test (Wilcoxon rank-sum test). The outputs are reported below.


### 1. Budget

In [23]:
scipy.stats.ranksums(west_budget, south_budget)

RanksumsResult(statistic=1.9427449628985221, pvalue=0.05204698069306693)

### 2. Culture

In [26]:
scipy.stats.ranksums(west_culture, south_culture)

RanksumsResult(statistic=-1.5035156669388563, pvalue=0.1327061199722343)

### 3. Economy

In [27]:
scipy.stats.ranksums(west_economy, south_economy)

RanksumsResult(statistic=-0.32660639955975157, pvalue=0.74396560262698386)

### 4. Jobs

In [28]:
scipy.stats.ranksums(west_jobs, south_jobs)

RanksumsResult(statistic=2.0722612937584235, pvalue=0.03824108000678763)

### 5. Policy

In [29]:
scipy.stats.ranksums(west_policy, south_policy)

RanksumsResult(statistic=0.85030286781935316, pvalue=0.3951567225679119)

### 6. Skills

In [30]:
scipy.stats.ranksums(west_skills, south_skills)

RanksumsResult(statistic=0.0056311448199957165, pvalue=0.99550702023373527)