# BW \#74 UK elections
Granted the UK was going to have elections at some point in the coming months, but Rishi Sunak's call for an election on July 4th came as a surprise. It was particularly surprising given how dismally the ruling Conservative party had been polling for quite some time; why would Sunak call an early election that he was almost certainly going to lose? And lose he did; the Labour party swept into power with numbers that haven't been seen in a long time. 

It's now common for countries to make election data downloadable in CSV or similar format for analysis. 

And with rich data readily available, I thought it would be interesting to look at the results -- along the way, perhaps learning a bit more about the UK, its parliament and parties, and (of course) the winners and losers in the most recent election.

## Data and seven questions
This week's data comes from the House of Commons library, part of the UK's Parliament. The full research briefing, including charts and graphs, can be read at

https://commonslibrary.parliament.uk/research-briefings/cbp-10009/

They provide two Excel spreadsheets, one describing the winners of the recent elections, split up per electoral district (known as a "constituency"). A similar, companion document, lists the members who were defeated. You can download them from here:

https://researchbriefings.files.parliament.uk/documents/CBP-10009/Winning-members.xlsx

https://researchbriefings.files.parliament.uk/documents/CBP-10009/Defeated-MPs.xlsx

## Challenges
This week's learning goals include joins, groups, pivot tables, plotting, and working with strings.

- Load the two files into a single data frame, with one row for each constituency, and the index being the ons_id columns from each. Columns from the defeated file should have the _defeated suffix attached to their names.
- Which party won the greatest number of votes in each region? Which party lost the greatest number of seats in each region?

In [1]:
import pandas as pd

In [31]:
winners = pd.read_excel('MPs-elected.xlsx', index_col='ons_id')
defeated = pd.read_excel('Defeated-MPs.xlsx', index_col='ons_id')

In [24]:
defeated.columns

Index(['ons_id', 'constituency_name', 'region_name', 'country_name',
       'constituency_type', 'result', 'party_abbreviation', 'party_name',
       'title', 'firstname', 'surname', 'middlenames', 'mnis_id', 'gender',
       'defeated_mp', 'former_mp'],
      dtype='object')

In [25]:
winners.columns


Index(['ons_id', 'constituency_name', 'region_name', 'country_name',
       'constituency_type', 'result', 'party_abbreviation', 'party_name',
       'title', 'firstname', 'surname', 'middlenames', 'mnis_id', 'gender',
       'candidate_type', 'old_constituency'],
      dtype='object')

In [39]:
pd.merge(winners, defeated, on = 'ons_id', suffixes = ('', '_defeated'), how = 'outer')

Unnamed: 0_level_0,constituency_name,region_name,country_name,constituency_type,result,party_abbreviation,party_name,title,firstname,surname,...,party_abbreviation_defeated,party_name_defeated,title_defeated,firstname_defeated,surname_defeated,middlenames_defeated,mnis_id_defeated,gender_defeated,defeated_mp,former_mp
ons_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
W07000081,Aberafan Maesteg,Wales,Wales,County,Lab hold,Lab,Labour,,Stephen,Kinnock,...,,,,,,,,,,
S14000060,Aberdeen North,Scotland,Scotland,Borough,SNP hold,SNP,Scottish National Party,,Kirsty,Blackman,...,,,,,,,,,,
S14000061,Aberdeen South,Scotland,Scotland,Borough,SNP hold,SNP,Scottish National Party,,Stephen,Flynn,...,,,,,,,,,,
S14000062,Aberdeenshire North and Moray East,Scotland,Scotland,County,SNP gain from Con,SNP,Scottish National Party,,Seamus,Logan,...,Con,Conservative,,Douglas,Ross,Gordon,4627.0,Male,Yes,Yes
S14000063,Airdrie and Shotts,Scotland,Scotland,County,Lab gain from SNP,Lab,Labour,,Kenneth,Stevenson,...,SNP,Scottish National Party,,Anum,Qaisar,,4917.0,Female,Yes,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
E14001602,Wythenshawe and Sale East,North West,England,Borough,Lab hold,Lab,Labour,,Mike,Kane,...,,,,,,,,,,
E14001603,Yeovil,South West,England,County,LD gain from Con,LD,Liberal Democrat,,Adam,Dance,...,Con,Conservative,Mr,Marcus,Fysh,John Hudson,4446.0,Male,Yes,Yes
W07000112,Ynys Môn,Wales,Wales,County,PC gain from Con,PC,Plaid Cymru,,Llinos,Medi,...,Con,Conservative,,Virginia,Crosbie,Ann,4859.0,Female,Yes,Yes
E14001604,York Central,Yorkshire and The Humber,England,Borough,Lab hold,Lab,Labour and Co-operative,,Rachael,Maskell,...,,,,,,,,,,


Correction: Since we have the same index we can use `join` to combine them horizontally.

In [38]:
df = winners.join(defeated, rsuffix=('_defeated'))
df

Unnamed: 0_level_0,constituency_name,region_name,country_name,constituency_type,result,party_abbreviation,party_name,title,firstname,surname,...,party_abbreviation_defeated,party_name_defeated,title_defeated,firstname_defeated,surname_defeated,middlenames_defeated,mnis_id_defeated,gender_defeated,defeated_mp,former_mp
ons_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
E14001063,Aldershot,South East,England,Borough,Lab gain from Con,Lab,Labour,,Alex,Baker,...,Con,Conservative,,Leo,Docherty,,4600.0,Male,Yes,Yes
E14001064,Aldridge-Brownhills,West Midlands,England,Borough,Con hold,Con,Conservative,,Wendy,Morton,...,,,,,,,,,,
E14001065,Altrincham and Sale West,North West,England,Borough,Lab gain from Con,Lab,Labour,Mr,Connor,Rand,...,,,,,,,,,,
E14001066,Amber Valley,East Midlands,England,County,Lab gain from Con,Lab,Labour,,Linsey,Farnsworth,...,Con,Conservative,,Nigel,Mills,John,4136.0,Male,Yes,Yes
E14001067,Arundel and South Downs,South East,England,County,Con hold,Con,Conservative,,Andrew,Griffith,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
W07000108,Swansea West,Wales,Wales,County,Lab hold,Lab,Labour,,Torsten,Bell,...,,,,,,,,,,
W07000109,Torfaen,Wales,Wales,County,Lab hold,Lab,Labour,,Nick,Thomas-Symonds,...,,,,,,,,,,
W07000110,Vale of Glamorgan,Wales,Wales,County,Lab gain from Con,Lab,Labour,,Kanishka,Narayan,...,Con,Conservative,,Alun,Cairns,Hugh,4086.0,Male,Yes,Yes
W07000111,Wrexham,Wales,Wales,County,Lab gain from Con,Lab,Labour,,Andrew,Ranger,...,Con,Conservative,,Sarah,Atherton,Elizabeth,4855.0,Female,Yes,Yes


## Which party won the greatest number of seats in each region?

In [40]:
df.columns

Index(['constituency_name', 'region_name', 'country_name', 'constituency_type',
       'result', 'party_abbreviation', 'party_name', 'title', 'firstname',
       'surname', 'middlenames', 'mnis_id', 'gender', 'candidate_type',
       'old_constituency', 'constituency_name_defeated',
       'region_name_defeated', 'country_name_defeated',
       'constituency_type_defeated', 'result_defeated',
       'party_abbreviation_defeated', 'party_name_defeated', 'title_defeated',
       'firstname_defeated', 'surname_defeated', 'middlenames_defeated',
       'mnis_id_defeated', 'gender_defeated', 'defeated_mp', 'former_mp'],
      dtype='object')

When the ideas is grouping 2 different categorical columns, it's best to visualize it with a pivot table (tableau croisé). We'll invoke `pivot_table`, telling Pandas to use party names for the rows (index) and region names for the columns. 

We'll then use the `count` aggregation method, and it doesn't really matter what column we use for counting, so long as it doesn't have any missing values – so I chose country_name. The query thus looks like this:

In [74]:
df.isna().sum()

constituency_name                0
region_name                      0
country_name                     0
constituency_type                0
result                           0
party_abbreviation               0
party_name                       0
title                          582
firstname                        0
surname                          0
middlenames                    415
mnis_id                        337
gender                           0
candidate_type                   0
old_constituency               352
constituency_name_defeated     434
region_name_defeated           434
country_name_defeated          434
constituency_type_defeated     434
result_defeated                434
party_abbreviation_defeated    434
party_name_defeated            434
title_defeated                 620
firstname_defeated             434
surname_defeated               434
middlenames_defeated           529
mnis_id_defeated               434
gender_defeated                434
defeated_mp         

In [54]:
(
    df
    .pivot_table(index='party_name',
                 columns='region_name',
                 values='country_name',
                 aggfunc='count')
)

region_name,East Midlands,East of England,London,North East,North West,Northern Ireland,Scotland,South East,South West,Wales,West Midlands,Yorkshire and The Humber
party_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Alliance,,,,,,1.0,,,,,,
Conservative,15.0,23.0,9.0,1.0,3.0,,5.0,30.0,11.0,,15.0,9.0
Democratic Unionist Party,,,,,,5.0,,,,,,
Green,,1.0,,,,,,1.0,1.0,,1.0,
Independent,1.0,,1.0,,1.0,1.0,,,,,1.0,1.0
Labour,27.0,23.0,49.0,24.0,56.0,,35.0,34.0,22.0,26.0,36.0,38.0
Labour and Co-operative,2.0,4.0,10.0,2.0,9.0,,3.0,2.0,2.0,2.0,2.0,5.0
Liberal Democrat,,7.0,6.0,,3.0,,6.0,24.0,22.0,1.0,2.0,1.0
Plaid Cymru,,,,,,,,,,4.0,,
Reform UK,2.0,3.0,,,,,,,,,,


We didn't want to know all of the results for all of the regions. I wanted to know which party won in each region. Fortunately, I can use the `idxmax` method on the data frame, which will tell me, for each column (i.e. region), the index of the row with the highest value:

In [67]:
(
    df
    .pivot_table(index='party_name',
                 columns='region_name',
                 values='country_name',
                 aggfunc='count')
    .idxmax()
)

region_name
East Midlands                     Labour
East of England             Conservative
London                            Labour
North East                        Labour
North West                        Labour
Northern Ireland               Sinn Fein
Scotland                          Labour
South East                        Labour
South West                        Labour
Wales                             Labour
West Midlands                     Labour
Yorkshire and The Humber          Labour
dtype: object

In [78]:
df.groupby(['party_name', 'region_name'])['result'].size().unstack().idxmax() 
# le unstack transformer en tableau croisé dynamique
# idxmax() renvoie l'index du maximum

region_name
East Midlands                     Labour
East of England             Conservative
London                            Labour
North East                        Labour
North West                        Labour
Northern Ireland               Sinn Fein
Scotland                          Labour
South East                        Labour
South West                        Labour
Wales                             Labour
West Midlands                     Labour
Yorkshire and The Humber          Labour
dtype: object

## Which party lost the greatest number of seats in each region?

In [79]:
(
    df
    .pivot_table(index='party_name_defeated',
                 columns='region_name',
                 values='country_name',
                 aggfunc='count')
    .idxmax()
)

region_name
East Midlands                            Conservative
East of England                          Conservative
London                                   Conservative
North East                               Conservative
North West                               Conservative
Northern Ireland            Democratic Unionist Party
Scotland                      Scottish National Party
South East                               Conservative
South West                               Conservative
Wales                                    Conservative
West Midlands                            Conservative
Yorkshire and The Humber                 Conservative
dtype: object

: 