<a href="https://colab.research.google.com/github/AkiraNom/data-analysis-notebook/blob/main/Japan_election_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Japan General Election Analysis (2017-2022)

In the recent Japan's 2024 general election, Liberal Democratic Party (LDP) which was in power lost the parliamentary majority. According to news, LDP and its coalition party lost more than 6 milions votes (LDP > 5M) compared with the preveious election.

Since at this moment, the final voting counts by state/prefecture has not been released yet, it is difficult to determine which states they lost most. This project is to aim to figure out how the voting shift from LDP and its coalition party to the others.

Source: [Ministry of Internal Affairs and Communications](https://www.soumu.go.jp/senkyo/senkyo_s/data/index.html)
<br>
<br>
2024年第50回衆議院総選挙において、政権与党の自由民主党及び公明党の大敗が報道されています。一部報道によると自民党は500万票公明党も100万票以上の減少が報告されています。まだ、総務省データベースに詳細データが公開されていないため、どこの選挙区で減らしているのかはの分析はできないが、公開されたときのための分析ノートを作成してみる。

Source: [総務省](https://www.soumu.go.jp/senkyo/senkyo_s/data/index.html)



## library import

In [None]:
import pandas as pd
import numpy as np
import geopandas as gpd
import plotly.express as px
from shapely import set_precision

## Data Processing

In [None]:
class DataProcessing:

  @staticmethod
  def read_data(filename):
    return pd.read_csv(filename, index_col=0)

  @staticmethod
  def filter_parties(df, parties_list:list):
    df_mask = df['party'].isin(parties_list)
    return df[df_mask].copy()

  @staticmethod
  def calculate_votes_ratio(df):
    return df.apply(lambda x: x/df.sum(axis=1)*100)

  @staticmethod
  def map_location_id(df, ref_df, location, location_id):
    return df[location].map(lambda x: ref_df[ref_df[location]==x][location_id].values[0])

  @classmethod
  def preprocessing(cls, filename, year:int, location_id: bool=False):
    df = cls.read_data(filename)
    df = df.reset_index().rename({'index':'location'}, axis=1)
    df.insert(1,'year', year)
    if location_id:
      # for sorting add location id
      df.insert(1,'location_id', list(range(len(df['location']))))

    return df

  @staticmethod
  def unpivot_dataframe(df, id_vars, value_vars, value_name, var_name):
    return pd.melt(df,
                   id_vars = id_vars,
                   value_vars= value_vars,
                   value_name=value_name,
                   var_name= var_name)

  @staticmethod
  def calculate_difference(df):

    years_list = list(df['year'].unique())
    location_ids_list = list(df['location_id'].unique())
    parties_list = list(df.columns[3:])

    results =[]
    temp = df.copy()

    for i in range(len(years_list)-1):

      year1 = years_list[i]
      year2 = years_list[i+1]

      for party in parties_list:
        for location_id in location_ids_list:

          results.append({
              'range': f'{year1}-{year2}',
              'party': party,
              'location': temp[temp['location_id']== location_id]['location'].values[0],
              'location_id': location_id,
              # previous number - new number
              'change': temp[(temp['year'] == year2)&(temp['location_id'] == location_id)][party].values[0] \
                        - temp[(temp['year'] == year1)&(temp['location_id'] == location_id)][party].values[0]
              })
    return results

## Geodata Handling

In [None]:
class GeoData:
  @staticmethod
  def read_geo_data(filename):
    return gpd.read_file(filename)

## Data Analysis

### set parameters

In [None]:
filenames_list = ['votes_party_by_state_2017_cleaned.csv', 'votes_party_by_state_2021_cleaned.csv', 'votes_party_by_state_2022_cleaned.csv']
years_list = [2017, 2021, 2022]
parties_list = ['自由民主党', '立憲民主党', '公明党', '日本維新の会']

### data load and process

In [None]:
for i, (filename, year) in enumerate(zip(filenames_list, years_list)):
  if i == 0:
    location_id = True
    df = DataProcessing.preprocessing(filename, year, location_id)
  else:
    df_temp = DataProcessing.preprocessing(filename, year)
    df_temp.loc[:,'location_id'] = DataProcessing.map_location_id(df_temp, df, 'location', 'location_id')
    df = pd.concat([df, df_temp])


## Total votes by year

Calculate the total number of votes by year.

過去の各比例代表における総投票数

In [None]:
max_votes = round(df.iloc[:,2:].groupby(['year']).sum().sum(axis=1).max(),-7)
fig = px.bar(df.iloc[:,2:].groupby(['year']).sum().sum(axis=1),
              range_y=(0,max_votes),
              # markers=True,
              title = '<b>Total Votes by Year<b>'
              )
fig.update_layout(showlegend=False)
fig.update_yaxes(title='Votes')
fig.show()

It seems that there is no significant increase or decrease in votes.
総投票数自体は大きな差異が見られない。

## Exploratory Data Analysis

### Change in votes by prefecture, party, and year (bar plot)

都道府県別、政党別の投票数から分析

In [None]:
df_long = DataProcessing.unpivot_dataframe(df, ['location','location_id','year'], None, 'votes', 'party').sort_values(['year','location_id'])
df_long.head()

Unnamed: 0,location,location_id,year,party,votes
0,北海道,0,2017,自由民主党,779903.0
141,北海道,0,2017,立憲民主党,714032.0
282,北海道,0,2017,希望の党,331463.0
423,北海道,0,2017,公明党,298573.0
564,北海道,0,2017,日本共産党,230316.0


In [None]:
px.bar(df_long, x='location', y='votes', color='party', barmode='group',
       facet_row='year')

### Change in votes by prefecture, party, and year (line plot)

subset the parties

主要政党にの都道府県別、投票数から分析

In [None]:
parties_list = ['自由民主党', '立憲民主党', '公明党', '日本維新の会', '国民民主党']
_df_long = DataProcessing.filter_parties(df_long, parties_list)

px.line(_df_long,
        x='location',
        y='votes',
        color='year',
        facet_col='party'
        )

自民党: For the past three general elections, the total number of votes remains the same. <br>
日本維新の会: significantly increased in total votes in the kansai region <br>
立憲民主党: lost significant number of voets in the kanto region. <br>

関西圏での日本維新の会への投票数増加が顕著見られる。関東圏でも投票数を伸ばしている <br>

自民党への関西圏での投票数は減少傾向。<br>

立憲民主党は2022年に関東圏で得得票数を減らしている <br>

### Vote ratio analysis

都道府県別、政党別の投票割合から分析


In [None]:
df_ratio = df.copy()
df_ratio.iloc[:, 3:] = DataProcessing.calculate_votes_ratio(df.iloc[:,3:])
df_ratio_long = DataProcessing.unpivot_dataframe(df_ratio, ['location','location_id','year'], None, 'vote_ratio', 'party').sort_values(['year','location_id'])

### Change in vote ratio by prefecture, party, and year (bar plot)

In [None]:
px.bar(df_ratio_long, x='location', y='vote_ratio', color='party', barmode='group',
       facet_row='year')

### Change in vote ratio by prefecture, party, and year (line plot)

subset the parties
主要政党にの都道府県別、投票割合から分析

In [None]:
_df_ratio_long = DataProcessing.filter_parties(df_ratio_long, parties_list)

px.line(_df_ratio_long,
        x='location',
        y='vote_ratio',
        color='year',
        facet_col='party'
        )

自民党: It decreased in voting rate in the kansai region by about 7%. <br>
日本維新の会: significantly increased in voting rate in the kansai region by >10% <br>
立憲民主党: decreased in voeting rate for most part of Japan in 2022. <br>

関西圏での日本維新の会への投票率は10％以上の増加が見られ、関東圏においても7-8%の増加 <br>

自民党は大阪で7%程度の減少と関西圏での影響の低下が見られる<br>

立憲民主党は2022年に多くの都道府県で7-8%ほど低下している <br>

## Difference in votes by year

各都道府県ごとの投票数の変化（前回選挙からの増加、減少数）

In [None]:
df_diff_num = pd.DataFrame(DataProcessing.calculate_difference(df))

_df_diff_num = DataProcessing.filter_parties(df_diff_num, parties_list)

px.line(_df_diff_num,
        x='location',
        y='change',
        color='range',
        facet_col='party'
        )

## Difference in vote rates by year

各都道府県ごとの投票率の変化（前回選挙からの増加、減少数）

In [None]:
df_diff_ratio = pd.DataFrame(DataProcessing.calculate_difference(df_ratio))
_df_diff_ratio = DataProcessing.filter_parties(df_diff_ratio, parties_list)

px.line(_df_diff_ratio,
        x='location',
        y='change',
        color='range',
        facet_col='party'
        )

most of parties decreased votes and vote rates.

立憲民主党の減少が顕著だが全体的に投票数・投票率が低下している。国民民主は2021年に比べて増加している。

### geo data load

In [None]:
# geo data
geojson_filename = 'N03-20240101.geojson'

geo_data = GeoData.read_geo_data(geojson_filename)

geo_data = geo_data[['N03_001','geometry']]

# merge polygons
# all prefectures
geo_data = geo_data.dissolve(by='N03_001').copy().reset_index()

# change precision of polygon
precision = 0.001
geo_data['geometry'] = geo_data['geometry'].map(lambda x: set_precision(x, grid_size=precision))

# make geojson file
geojson = geo_data.to_geo_dict()

### visualization

In [None]:
class DataPlotting:

  @staticmethod
  def choropleth_map(df, selected_party:str, color:str, frame:str, color_continuous_scale='Viridis', title:str=None, range_color=None):

    fig = px.choropleth_map(df[(df['party']==selected_party)],
                        geojson=geojson,
                        color=color,
                        locations='location',
                        featureidkey='properties.N03_001',
                        animation_frame= frame,
                        color_continuous_scale=color_continuous_scale,
                        range_color=range_color,
                        map_style="carto-positron",
                        zoom=4,
                        center = {"lat": 40, "lon": 139.8},
                        opacity=0.8,
                        hover_data=['location', 'party', color],
                        title= title,
                        width = 700,
                        height = 800
                        )
    # fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
    fig.show()

## Animated plot

### Votes count data

Including both votes count and vote ratio in one for loop will terminate the session due to "Buffered data was truncated after reaching the output size limit."

In [None]:
# selected_party = '自由民主党'

# dfs_list = [df_long, df_ratio_long]
# plot_elements_list = ['votes','vote_ratio']
# frames_list = ['year', 'year']
# color_scales_list = [None, None]
# range_colors_list = [(0,2000000), (0, 50)]
# titles_list = [f'<b>Votes Count for {selected_party}<b>',f'<b>Voting Rate for {selected_party}<b>']

# for _df, plot_element, frame, color_scale, range_color, title in zip(dfs_list, plot_elements_list, frames_list, color_scales_list, range_colors_list, titles_list):
#   DataPlotting.choropleth_map(_df, selected_party, color = plot_element, frame=frame, color_continuous_scale=color_scale, title = title, range_color = range_color)

### Vote ratio

In [None]:
# dfs_list = [df_diff_num, df_diff_ratio]
# plot_elements_list = ['change','change']
# frames_list = ['range', 'range']
# color_scales_list = ['RdBu_r', 'RdBu_r']
# range_colors_list = [(-100000, 100000), (-20, 20)]
# titles_list = [f'<b>Difference in Votes Count for {selected_party}<b>',f'<b>Difference in Voting Rate for {selected_party}<b>']

# for _df, plot_element, frame, color_scale, range_color, title in zip(dfs_list, plot_elements_list, frames_list, color_scales_list, range_colors_list, titles_list):
#   DataPlotting.choropleth_map(_df, selected_party, color = plot_element, frame=frame, color_continuous_scale=color_scale, title = title, range_color = range_color)

In [None]:
# ! pip install -U kaleido

import io
import PIL

In [None]:
selected_party = '自由民主党'
title_map = {'自由民主党': 'LDP'}
title = f'<b>Difference in Voting Rate for {title_map[selected_party]}<b>'
fig = px.choropleth_map(df_diff_ratio[(df_diff_ratio['party']==selected_party)],
                    geojson=geojson,
                    color='change',
                    locations='location',
                    featureidkey='properties.N03_001',
                    animation_frame= 'range',
                    color_continuous_scale='RdBu_r',
                    range_color=(-20, 20),
                    map_style="carto-positron",
                    zoom=4,
                    center = {"lat": 40, "lon": 139.8},
                    opacity=0.8,
                    hover_data=['location', 'party', 'change'],
                    title= title,
                    width = 700,
                    height = 800
                    )
frames = []
for s, fr in enumerate(fig.frames):
    # set main traces to appropriate traces within plotly frame
    fig.update(data=fr.data)
    # move slider to correct place
    fig.layout.sliders[0].update(active=s)
    # generate image of current state
    frames.append(PIL.Image.open(io.BytesIO(fig.to_image(format="png"))))

# create animated GIF
frames[0].save(
        "animated_vote_ratio_change.gif",
        save_all=True,
        append_images=frames[1:],
        optimize=True,
        duration=500,
        loop=0,
    )