# CIVICA Hackathon: Europe Revisited

### Team pynuts: Bergami Michele (LSE), Kapoor Naman (Sciences Po), Moawad Simone (LSE), Pashkina Oleksandra (Sciences Po)

#### Our project objective is to investigate the election turnout of major parties in Europe in the last 10 years to understand how we can better divide parties and, by consequence, the political needs of European citiziens. Our approach is strong because it identifies clusters of parties by looking at the predominance of certain themes in their manifestos. This allow us to reflect better on the drivers of consensus nowadays, with respect to traditional ways of dividing parties and political thoughts.

#### Nevertheless, our work suffers from two important limitations: first, there is no consideration of abstained votes. This can lead to powerful biases in our final consideration, as there is a significant proportion of people that is not considered in our classification and that cannot be seen as indifferent towards the national political scenario. Second, there are no measures of the relevance of more recent themes such as technology issues, related to AI, data and privacy issues that might concern European citiziens now more than ever.

We start by importing the dataset

In [5]:
import pandas as pd

mp_data = pd.read_csv('MPDataset_MPDS2023a.csv')
mp_data

  mp_data = pd.read_csv('MPDataset_MPDS2023a.csv')


Unnamed: 0,country,countryname,oecdmember,eumember,edate,date,party,partyname,partyabbrev,parfam,...,per608_3,per703_1,per703_2,rile,planeco,markeco,welfare,intpeace,datasetversion,id_perm
0,11,Sweden,0,0.0,17/09/1944,194409,11220,Communist Party of Sweden,SKP,20,...,,,,9.600,1.900,1.900,0.000,1.900,2023a,JN1LZH
1,11,Sweden,0,0.0,17/09/1944,194409,11320,Social Democratic Labour Party,SAP,30,...,,,,-37.800,3.300,2.200,33.400,5.600,2023a,CMR7F6
2,11,Sweden,0,0.0,17/09/1944,194409,11420,People’s Party,FP,40,...,,,,9.500,3.200,6.400,14.300,1.600,2023a,Z6OL6C
3,11,Sweden,0,0.0,17/09/1944,194409,11620,Right Party,,60,...,,,,28.000,1.800,22.800,10.600,0.000,2023a,YMKVN2
4,11,Sweden,0,0.0,17/09/1944,194409,11810,Agrarian Party,,80,...,,,,23.810,0.000,19.048,0.000,4.762,2023a,U4SCRD
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5084,181,South Africa,0,0.0,08/05/2019,201905,181510,African Christian Democratic Party,ACDP,50,...,0.0,5.000,0.0,12.500,0.833,7.500,14.167,0.000,2023a,HSQT6Y
5085,181,South Africa,0,0.0,08/05/2019,201905,181520,African Transformation Movement,ATM,50,...,0.0,1.042,0.0,-6.771,3.385,1.042,22.135,1.562,2023a,B6QCSE
5086,181,South Africa,0,0.0,08/05/2019,201905,181710,Freedom Front Plus,FF+,70,...,0.0,3.497,0.0,19.580,0.699,10.664,11.364,0.000,2023a,OS725O
5087,181,South Africa,0,0.0,08/05/2019,201905,181910,Inkatha Freedom Party,IFP,90,...,0.0,3.289,0.0,-6.579,0.658,0.658,28.947,0.000,2023a,XKM7J7


We check for the dimension of the dataset

In [6]:
mp_data.shape

(5089, 175)

Definetly we observe too many rows and columns. We filter the observation for EU countries

In [7]:
eu_data=mp_data[mp_data['eumember']==10]
eu_data

Unnamed: 0,country,countryname,oecdmember,eumember,edate,date,party,partyname,partyabbrev,parfam,...,per608_3,per703_1,per703_2,rile,planeco,markeco,welfare,intpeace,datasetversion,id_perm
92,11,Sweden,10,10.0,21/09/1998,199809,11110,Green Ecology Party,MP,10,...,,,,-36.111,8.333,2.778,18.056,5.556,2023a,HJGVK6
93,11,Sweden,10,10.0,21/09/1998,199809,11220,Left Party,V,20,...,,,,-35.952,9.970,0.604,26.586,1.813,2023a,AV7RGX
94,11,Sweden,10,10.0,21/09/1998,199809,11320,Social Democratic Labour Party,SAP,30,...,,,,-3.516,0.000,10.547,23.438,0.000,2023a,RQVJ4Y
95,11,Sweden,10,10.0,21/09/1998,199809,11420,Liberal People’s Party,FP,40,...,,,,14.286,0.000,13.095,10.714,3.571,2023a,BT8287
96,11,Sweden,10,10.0,21/09/1998,199809,11520,Christian Democrats,Kd,50,...,,,,4.790,0.000,16.168,26.347,0.000,2023a,QFVSEP
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4602,97,Slovenia,10,10.0,03/06/2018,201806,97460,Party of Alenka Bratušek,SAB,40,...,0.0,1.261,0.0,-8.403,0.000,13.025,24.790,1.261,2023a,CERAR3
4603,97,Slovenia,10,10.0,03/06/2018,201806,97461,Modern Centre Party,SMC,40,...,0.0,3.150,0.0,-8.661,0.000,2.362,14.961,0.787,2023a,OXSQCN
4604,97,Slovenia,10,10.0,03/06/2018,201806,97522,New Slovenian Christian People’s Party,Nsi,50,...,0.0,9.384,0.0,10.537,0.278,11.054,10.258,0.358,2023a,GCR1EB
4605,97,Slovenia,10,10.0,03/06/2018,201806,97710,Slovenian National Party,SNS,70,...,0.0,0.000,0.0,7.937,0.794,1.587,9.524,1.587,2023a,6M38Z5


We check which are the columns

In [8]:
list(eu_data.columns)

['country',
 'countryname',
 'oecdmember',
 'eumember',
 'edate',
 'date',
 'party',
 'partyname',
 'partyabbrev',
 'parfam',
 'candidatename',
 'coderid',
 'manual',
 'coderyear',
 'testresult',
 'testeditsim',
 'pervote',
 'voteest',
 'presvote',
 'absseat',
 'totseats',
 'progtype',
 'datasetorigin',
 'corpusversion',
 'total',
 'peruncod',
 'per101',
 'per102',
 'per103',
 'per104',
 'per105',
 'per106',
 'per107',
 'per108',
 'per109',
 'per110',
 'per201',
 'per202',
 'per203',
 'per204',
 'per301',
 'per302',
 'per303',
 'per304',
 'per305',
 'per401',
 'per402',
 'per403',
 'per404',
 'per405',
 'per406',
 'per407',
 'per408',
 'per409',
 'per410',
 'per411',
 'per412',
 'per413',
 'per414',
 'per415',
 'per416',
 'per501',
 'per502',
 'per503',
 'per504',
 'per505',
 'per506',
 'per507',
 'per601',
 'per602',
 'per603',
 'per604',
 'per605',
 'per606',
 'per607',
 'per608',
 'per701',
 'per702',
 'per703',
 'per704',
 'per705',
 'per706',
 'per1011',
 'per1012',
 'per1013',
 '

Keeping only relevant columns

In [9]:
eu_data_clean=eu_data[['countryname','date','edate','partyname','partyabbrev','pervote','per101','per102','per103',
                       'per104','per105','per106','per107','per108','per109','per110','per201','per202','per203',
                       'per204','per301','per302','per303','per304','per305','per401','per402','per403','per404',
                       'per405','per406','per407','per408','per409','per410','per411','per412','per413','per414',
                       'per415','per416','per501','per502','per503','per504','per505','per506','per507','per601',
                       'per602','per603','per604','per605','per606','per607','per608','per701','per702','per703',
                       'per704','per705','per706']]
eu_data_clean

Unnamed: 0,countryname,date,edate,partyname,partyabbrev,pervote,per101,per102,per103,per104,...,per605,per606,per607,per608,per701,per702,per703,per704,per705,per706
92,Sweden,199809,21/09/1998,Green Ecology Party,MP,4.500,0.000,0.000,0.00,0.000,...,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.0,0.000
93,Sweden,199809,21/09/1998,Left Party,V,11.990,0.000,0.000,0.00,0.000,...,0.000,0.000,0.000,0.000,1.208,0.000,0.000,0.000,0.0,3.625
94,Sweden,199809,21/09/1998,Social Democratic Labour Party,SAP,36.390,2.734,0.000,0.00,0.000,...,1.563,7.031,0.000,0.000,1.172,0.000,0.000,0.000,0.0,1.953
95,Sweden,199809,21/09/1998,Liberal People’s Party,FP,4.720,0.000,0.000,0.00,0.000,...,3.571,1.190,0.000,0.000,1.190,0.000,0.000,0.000,0.0,2.381
96,Sweden,199809,21/09/1998,Christian Democrats,Kd,11.770,0.000,0.000,0.00,0.000,...,9.581,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.0,3.593
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4602,Slovenia,201806,03/06/2018,Party of Alenka Bratušek,SAB,5.105,0.000,0.000,0.00,3.361,...,1.261,3.782,0.000,0.000,2.101,0.000,1.261,0.000,0.0,0.000
4603,Slovenia,201806,03/06/2018,Modern Centre Party,SMC,9.748,0.000,0.787,0.00,1.575,...,4.724,0.000,0.000,0.000,3.937,0.000,3.150,0.000,0.0,0.000
4604,Slovenia,201806,03/06/2018,New Slovenian Christian People’s Party,Nsi,7.159,0.119,0.040,0.04,3.260,...,4.175,0.755,0.119,0.517,2.306,1.471,9.384,0.278,0.0,0.080
4605,Slovenia,201806,03/06/2018,Slovenian National Party,SNS,4.173,0.000,0.000,0.00,1.587,...,2.381,0.794,0.000,3.175,3.175,0.000,0.000,0.000,0.0,0.000


We filter the last 10 years of data

In [10]:
eu_data_clean_10y=eu_data_clean[eu_data_clean['date']>201200]
eu_data_clean_10y

Unnamed: 0,countryname,date,edate,partyname,partyabbrev,pervote,per101,per102,per103,per104,...,per605,per606,per607,per608,per701,per702,per703,per704,per705,per706
121,Sweden,201409,14/09/2014,Green Ecology Party,MP,6.889,0.000,0.000,0.000,0.000,...,4.225,3.219,2.817,0.000,10.664,0.000,1.207,0.000,0.000,1.006
122,Sweden,201409,14/09/2014,Left Party,V,5.718,0.000,0.000,0.278,0.000,...,0.556,1.944,0.000,0.000,17.778,0.000,2.778,0.000,0.000,1.389
123,Sweden,201409,14/09/2014,Social Democratic Labour Party,SAP,31.015,0.000,0.000,0.739,0.924,...,2.957,2.403,0.000,0.000,17.375,0.000,0.185,0.000,0.000,1.294
124,Sweden,201409,14/09/2014,Liberal People’s Party,FP,5.420,0.000,0.000,0.459,2.982,...,8.945,0.688,1.835,0.917,9.862,0.229,0.000,1.147,0.688,3.670
125,Sweden,201409,14/09/2014,Christian Democrats,Kd,4.570,0.000,0.000,0.000,5.913,...,7.455,10.026,0.514,0.000,4.113,0.000,0.000,0.000,0.000,5.913
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4602,Slovenia,201806,03/06/2018,Party of Alenka Bratušek,SAB,5.105,0.000,0.000,0.000,3.361,...,1.261,3.782,0.000,0.000,2.101,0.000,1.261,0.000,0.000,0.000
4603,Slovenia,201806,03/06/2018,Modern Centre Party,SMC,9.748,0.000,0.787,0.000,1.575,...,4.724,0.000,0.000,0.000,3.937,0.000,3.150,0.000,0.000,0.000
4604,Slovenia,201806,03/06/2018,New Slovenian Christian People’s Party,Nsi,7.159,0.119,0.040,0.040,3.260,...,4.175,0.755,0.119,0.517,2.306,1.471,9.384,0.278,0.000,0.080
4605,Slovenia,201806,03/06/2018,Slovenian National Party,SNS,4.173,0.000,0.000,0.000,1.587,...,2.381,0.794,0.000,3.175,3.175,0.000,0.000,0.000,0.000,0.000


We group the parameters relative to the manifestos text analysis in: foreign relations, democracy and freedom, political system, economy, welfare, society and social groups

In [11]:
eu_data_clean_10y['for_relations']=eu_data_clean_10y['per101']+eu_data_clean_10y['per102']\
                                  +eu_data_clean_10y['per103']+eu_data_clean_10y['per104']\
                                  +eu_data_clean_10y['per105']+eu_data_clean_10y['per106']\
                                  +eu_data_clean_10y['per107']+eu_data_clean_10y['per108']\
                                  +eu_data_clean_10y['per109']+eu_data_clean_10y['per110']

eu_data_clean_10y['demo']=eu_data_clean_10y['per201']+eu_data_clean_10y['per202']\
                         +eu_data_clean_10y['per203']+eu_data_clean_10y['per204']

eu_data_clean_10y['pol_sys']=eu_data_clean_10y['per301']+eu_data_clean_10y['per302']\
                            +eu_data_clean_10y['per303']+eu_data_clean_10y['per304']+eu_data_clean_10y['per305']

eu_data_clean_10y['economy']=eu_data_clean_10y['per401']+eu_data_clean_10y['per402']\
                            +eu_data_clean_10y['per403']+eu_data_clean_10y['per404']\
                            +eu_data_clean_10y['per405']+eu_data_clean_10y['per406']\
                            +eu_data_clean_10y['per407']+eu_data_clean_10y['per408']\
                            +eu_data_clean_10y['per409']+eu_data_clean_10y['per410']\
                            +eu_data_clean_10y['per411']+eu_data_clean_10y['per412']\
                            +eu_data_clean_10y['per413']+eu_data_clean_10y['per414']\
                            +eu_data_clean_10y['per415']+eu_data_clean_10y['per416']

eu_data_clean_10y['welfare']=eu_data_clean_10y['per501']+eu_data_clean_10y['per502']\
                            +eu_data_clean_10y['per503']+eu_data_clean_10y['per504']\
                            +eu_data_clean_10y['per505']+eu_data_clean_10y['per506']\
                            +eu_data_clean_10y['per507']

eu_data_clean_10y['society']=eu_data_clean_10y['per601']+eu_data_clean_10y['per602']\
                            +eu_data_clean_10y['per603']+eu_data_clean_10y['per604']\
                            +eu_data_clean_10y['per605']+eu_data_clean_10y['per606']\
                            +eu_data_clean_10y['per607']+eu_data_clean_10y['per608']

eu_data_clean_10y['soc_groups']=eu_data_clean_10y['per701']+eu_data_clean_10y['per702']\
                               +eu_data_clean_10y['per703']+eu_data_clean_10y['per704']\
                               +eu_data_clean_10y['per705']+eu_data_clean_10y['per706']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  eu_data_clean_10y['for_relations']=eu_data_clean_10y['per101']+eu_data_clean_10y['per102']\
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  eu_data_clean_10y['demo']=eu_data_clean_10y['per201']+eu_data_clean_10y['per202']\
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  eu_data_clean_10y['pol_sys']=e

In [12]:
eu_data_10y_grouped=eu_data_clean_10y[['countryname','date','edate','partyname','partyabbrev','pervote',
                                      'for_relations','demo','pol_sys','economy','welfare','society','soc_groups']]
eu_data_10y_grouped

Unnamed: 0,countryname,date,edate,partyname,partyabbrev,pervote,for_relations,demo,pol_sys,economy,welfare,society,soc_groups
121,Sweden,201409,14/09/2014,Green Ecology Party,MP,6.889,3.018,2.213,3.219,12.071,52.515,13.883,12.877
122,Sweden,201409,14/09/2014,Left Party,V,5.718,2.500,1.667,5.834,10.835,50.555,6.389,21.945
123,Sweden,201409,14/09/2014,Social Democratic Labour Party,SAP,31.015,8.502,1.110,3.512,11.647,50.093,6.284,18.854
124,Sweden,201409,14/09/2014,Liberal People’s Party,FP,5.420,6.651,2.064,8.945,8.715,40.368,17.431,15.596
125,Sweden,201409,14/09/2014,Christian Democrats,Kd,4.570,6.941,4.113,3.342,9.511,42.417,23.650,10.026
...,...,...,...,...,...,...,...,...,...,...,...,...,...
4602,Slovenia,201806,03/06/2018,Party of Alenka Bratušek,SAB,5.105,7.984,7.143,7.143,36.555,32.774,5.043,3.362
4603,Slovenia,201806,03/06/2018,Modern Centre Party,SMC,9.748,8.661,7.873,8.661,33.858,28.347,5.511,7.087
4604,Slovenia,201806,03/06/2018,New Slovenian Christian People’s Party,Nsi,7.159,5.845,8.112,13.359,28.269,16.383,14.353,13.519
4605,Slovenia,201806,03/06/2018,Slovenian National Party,SNS,4.173,11.111,12.699,5.556,16.668,24.604,25.397,3.175


We can understand how much this groups are relevant in the parties manifestos.

In [13]:
eu_data_10y_grouped.describe()

Unnamed: 0,date,pervote,for_relations,demo,pol_sys,economy,welfare,society,soc_groups
count,532.0,529.0,526.0,526.0,526.0,526.0,526.0,526.0,526.0
mean,201626.479323,11.032058,7.787675,6.47208,11.220222,23.872964,29.955567,11.885785,7.971479
std,258.459243,10.501644,5.637502,4.954746,8.672403,9.334803,11.671834,9.110128,4.498826
min,201203.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,201410.0,3.466,3.9375,3.0315,5.45425,18.04325,22.342,5.70725,5.05975
50%,201606.0,7.283,7.0935,5.5805,9.0165,23.76,29.753,9.456,7.164
75%,201904.0,15.745,10.16675,8.96275,14.14775,28.705,38.00675,15.6375,10.223
max,202109.0,58.63,37.874,34.645,82.353,65.455,86.011,57.692,30.556


We consider only the major parties:

In [14]:
eu_data_10y_grouped_maj = eu_data_10y_grouped[eu_data_10y_grouped['pervote']>5]
eu_data_10y_grouped_maj.describe()

Unnamed: 0,date,pervote,for_relations,demo,pol_sys,economy,welfare,society,soc_groups
count,339.0,339.0,336.0,336.0,336.0,336.0,336.0,336.0,336.0
mean,201607.056047,15.858797,8.25606,5.738226,10.177247,24.028521,29.898685,12.927545,8.280476
std,258.163003,10.294474,5.441257,3.991299,6.699047,9.08461,10.527399,9.186883,4.415733
min,201203.0,5.018617,0.0,0.0,0.0,0.0,1.569,0.0,0.0
25%,201407.0,7.6705,5.00475,2.6215,5.3365,18.76325,23.05725,6.7415,5.30075
50%,201605.0,12.101,7.523,5.1655,8.923,24.2745,29.8565,10.5995,7.508
75%,201810.0,22.2315,10.87275,8.24375,13.355,29.35475,36.66825,16.37875,10.46575
max,202109.0,58.63,29.071,17.61,45.162,52.274,68.706,56.25,25.444


By looking at the median data we notice that economy and welfare play a major role in the manifestos, followed by society, political system and social groups. Lastly, we have the group of democracy and freedom.

Our objective is now to build a measure to classify these parties and we dig into the meaning of the variables.

We notice that democracy and freedom, and values under political system that concern the exposition of political corruption and inefficiency in the bureacratic apparatus is not really able to capture differences between parties. For this reason we do not consider these variables inside our analysis. We believe that parties use statements around these aspect to motivate their candidacy and their will to improve the political framework of the countries were they operate.

For foreign relations we identify two types of parties, open and close towards international collaboration, so we start by divide these variables. In addition, we add the percentage of text about protectionist policies in this division (protectionism --> fr_close)

In [15]:
eu_data_clean_10y['fr_open']=eu_data_clean_10y['per101']+eu_data_clean_10y['per103']+eu_data_clean_10y['per105']\
                            +eu_data_clean_10y['per106']+eu_data_clean_10y['per107']+eu_data_clean_10y['per108']\
                            +eu_data_clean_10y['per407']

eu_data_clean_10y['fr_close']=eu_data_clean_10y['per102']+eu_data_clean_10y['per104']\
                             +eu_data_clean_10y['per109']+eu_data_clean_10y['per110']+eu_data_clean_10y['per406']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  eu_data_clean_10y['fr_open']=eu_data_clean_10y['per101']+eu_data_clean_10y['per103']+eu_data_clean_10y['per105']\
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  eu_data_clean_10y['fr_close']=eu_data_clean_10y['per102']+eu_data_clean_10y['per104']\


For the political system group we identify parties that aim at decentralization and others that prefer centralization.

In [16]:
eu_data_clean_10y['dec']=eu_data_clean_10y['per301']
eu_data_clean_10y['centr']=eu_data_clean_10y['per302']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  eu_data_clean_10y['dec']=eu_data_clean_10y['per301']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  eu_data_clean_10y['centr']=eu_data_clean_10y['per302']


For the economy group we identify two divisions. One is between parties that believe in free market economy and those who believe in regulated market economy. The other is between short-sighted parties, who aim at policies based on incentives and bonuses, opposed to long-term oriented parties that aim more generally at economic growth, reduction of debt and investment in technology and infrastructure.

Moreover, we assign some of the welfare measures inside these categories: Enviromental protection, Education and Culture expansion policies will contribute to identify a party as long-term sighted (Education limitation to short-term). Welfare expansion to regulated market and welfare limitation to free market.

In [17]:
eu_data_clean_10y['free_market']=eu_data_clean_10y['per401']+eu_data_clean_10y['per505']
eu_data_clean_10y['reg_market']=eu_data_clean_10y['per403']+eu_data_clean_10y['per405']\
                               +eu_data_clean_10y['per412']+eu_data_clean_10y['per413']+eu_data_clean_10y['per415']

eu_data_clean_10y['short_term']=eu_data_clean_10y['per402']+eu_data_clean_10y['per409']+eu_data_clean_10y['per507']
eu_data_clean_10y['long_term']=eu_data_clean_10y['per404']+eu_data_clean_10y['per410']\
                              +eu_data_clean_10y['per411']+eu_data_clean_10y['per414']\
                              +eu_data_clean_10y['per416']+eu_data_clean_10y['per501']+eu_data_clean_10y['per506']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  eu_data_clean_10y['free_market']=eu_data_clean_10y['per401']+eu_data_clean_10y['per505']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  eu_data_clean_10y['reg_market']=eu_data_clean_10y['per403']+eu_data_clean_10y['per405']\
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  eu_data_clean_10y['short_te

For Fabric of Society we identify a division between identitarian and liberal parties. We include also some measure from the social groups regroupment.

In [18]:
eu_data_clean_10y['identity']=eu_data_clean_10y['per601']+eu_data_clean_10y['per603']+eu_data_clean_10y['per608']
eu_data_clean_10y['liberal']=eu_data_clean_10y['per602']+eu_data_clean_10y['per604']+eu_data_clean_10y['per607']\
                            +eu_data_clean_10y['per503']+eu_data_clean_10y['per705']+eu_data_clean_10y['per706']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  eu_data_clean_10y['identity']=eu_data_clean_10y['per601']+eu_data_clean_10y['per603']+eu_data_clean_10y['per608']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  eu_data_clean_10y['liberal']=eu_data_clean_10y['per602']+eu_data_clean_10y['per604']+eu_data_clean_10y['per607']\


In [19]:
eu_data_10y_group_clust=eu_data_clean_10y[['countryname','date','edate','partyname','partyabbrev','pervote',
                                           'fr_open','fr_close','dec','centr','free_market','reg_market',
                                           'short_term','long_term','identity','liberal']]
eu_data_10y_group_clust_maj = eu_data_10y_group_clust[eu_data_10y_group_clust['pervote']>5]
eu_data_10y_group_clust_maj.describe()

Unnamed: 0,date,pervote,fr_open,fr_close,dec,centr,free_market,reg_market,short_term,long_term,identity,liberal
count,339.0,339.0,336.0,336.0,336.0,336.0,336.0,336.0,336.0,336.0,336.0,336.0
mean,201607.056047,15.858797,4.968083,4.052366,1.534887,0.22053,2.079372,4.80914,3.660414,22.623994,5.427003,8.574116
std,258.163003,10.294474,4.161235,4.169494,1.691089,0.659284,3.04894,4.401932,3.263122,8.869521,6.940255,6.245905
min,201203.0,5.018617,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,201407.0,7.6705,2.03975,1.18625,0.18325,0.0,0.02625,1.563,1.44225,18.18875,0.864,4.0675
50%,201605.0,12.101,4.251,2.69,1.067,0.0,0.8805,3.53,2.803,22.7085,2.86,7.3405
75%,201810.0,22.2315,6.68775,5.66575,2.3305,0.17575,3.024,7.1435,4.82925,27.76675,7.524,11.859
max,202109.0,58.63,25.807,26.363,9.242,6.117,25.0,23.871,23.333,49.223,46.861,39.287


We create columns with the differences between the opposing variables. We expect a positive value when the first operator of the difference is more relevant, and negative otherwise

In [20]:
eu_data_10y_group_clust_maj['openness']=eu_data_10y_group_clust_maj['fr_open']-eu_data_10y_group_clust_maj['fr_close']
eu_data_10y_group_clust_maj['dec_vs_centr']=eu_data_10y_group_clust_maj['dec']-eu_data_10y_group_clust_maj['centr']
eu_data_10y_group_clust_maj['reg_vs_free']=eu_data_10y_group_clust_maj['reg_market']-eu_data_10y_group_clust_maj['free_market']
eu_data_10y_group_clust_maj['horizon']=eu_data_10y_group_clust_maj['long_term']-eu_data_10y_group_clust_maj['short_term']
eu_data_10y_group_clust_maj['lib_vs_id']=eu_data_10y_group_clust_maj['liberal']-eu_data_10y_group_clust_maj['identity']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  eu_data_10y_group_clust_maj['openness']=eu_data_10y_group_clust_maj['fr_open']-eu_data_10y_group_clust_maj['fr_close']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  eu_data_10y_group_clust_maj['dec_vs_centr']=eu_data_10y_group_clust_maj['dec']-eu_data_10y_group_clust_maj['centr']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ret

In [21]:
eu_data_10y_group_clust_maj.describe()

Unnamed: 0,date,pervote,fr_open,fr_close,dec,centr,free_market,reg_market,short_term,long_term,identity,liberal,openness,dec_vs_centr,reg_vs_free,horizon,lib_vs_id
count,339.0,339.0,336.0,336.0,336.0,336.0,336.0,336.0,336.0,336.0,336.0,336.0,336.0,336.0,336.0,336.0,336.0
mean,201607.056047,15.858797,4.968083,4.052366,1.534887,0.22053,2.079372,4.80914,3.660414,22.623994,5.427003,8.574116,0.915717,1.314357,2.729768,18.96358,3.147113
std,258.163003,10.294474,4.161235,4.169494,1.691089,0.659284,3.04894,4.401932,3.263122,8.869521,6.940255,6.245905,5.881511,1.884202,5.978748,9.159057,10.607297
min,201203.0,5.018617,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-26.06,-6.117,-25.0,-6.667,-46.413
25%,201407.0,7.6705,2.03975,1.18625,0.18325,0.0,0.02625,1.563,1.44225,18.18875,0.864,4.0675,-1.668,0.0,-0.265,13.8295,-2.03725
50%,201605.0,12.101,4.251,2.69,1.067,0.0,0.8805,3.53,2.803,22.7085,2.86,7.3405,1.0655,0.883,2.0785,19.2575,3.4955
75%,201810.0,22.2315,6.68775,5.66575,2.3305,0.17575,3.024,7.1435,4.82925,27.76675,7.524,11.859,3.9395,2.239,6.04975,24.10675,9.09225
max,202109.0,58.63,25.807,26.363,9.242,6.117,25.0,23.871,23.333,49.223,46.861,39.287,20.408,8.582,23.871,48.446,39.287


In [22]:
eu_data_10y_group_clust_maj[['partyname','openness','dec_vs_centr','reg_vs_free','horizon','lib_vs_id']].corr()

Unnamed: 0,openness,dec_vs_centr,reg_vs_free,horizon,lib_vs_id
openness,1.0,-0.042976,0.071625,0.241165,0.417575
dec_vs_centr,-0.042976,1.0,0.008917,0.072838,0.133159
reg_vs_free,0.071625,0.008917,1.0,0.011095,0.230822
horizon,0.241165,0.072838,0.011095,1.0,0.344875
lib_vs_id,0.417575,0.133159,0.230822,0.344875,1.0


As we can see, the measure of decentralization vs centralization is not particulary useful to identify differences among parties. We believe that this is caused by the fact that only a minor part of manifestos cover this issue.

In [23]:
final = eu_data_10y_group_clust_maj[['countryname','date','edate','partyname','partyabbrev','pervote','openness','reg_vs_free','horizon','lib_vs_id']]

In [24]:
final

Unnamed: 0,countryname,date,edate,partyname,partyabbrev,pervote,openness,reg_vs_free,horizon,lib_vs_id
121,Sweden,201409,14/09/2014,Green Ecology Party,MP,6.889,2.214,-0.001,37.224,13.279
122,Sweden,201409,14/09/2014,Left Party,V,5.718,1.944,2.223,16.390,21.111
123,Sweden,201409,14/09/2014,Social Democratic Labour Party,SAP,31.015,6.469,0.370,27.727,8.318
124,Sweden,201409,14/09/2014,Liberal People’s Party,FP,5.420,2.752,-2.982,20.412,15.827
126,Sweden,201409,14/09/2014,Moderate Coalition Party,MSP,23.325,3.918,-3.244,23.072,12.791
...,...,...,...,...,...,...,...,...,...,...
4600,Slovenia,201806,03/06/2018,Slovenian Democratic Party,SDS,24.918,6.731,-4.166,27.243,-8.973
4601,Slovenia,201806,03/06/2018,List of Marjan Šarec,LMŠ,12.597,0.699,0.000,27.273,-1.400
4602,Slovenia,201806,03/06/2018,Party of Alenka Bratušek,SAB,5.105,1.683,-1.681,29.832,2.941
4603,Slovenia,201806,03/06/2018,Modern Centre Party,SMC,9.748,3.937,0.000,32.284,3.150
