## 1. ⚡ Upload the CSV file

- `Google Colab` provides **temporary cloud storage** for uploaded files.  
- When you use _`files.upload()`_, your dataset is stored in Colab’s session storage.  
- Uploaded datasets are only available during the current session.  
- Once the runtime resets or the session ends, all uploaded files are deleted.  
- For long-term use, mount **Google Drive** and load your dataset from there.  


In [9]:
from google.colab import files
uploaded = files.upload()


Saving malawi_elections_2004_2025.csv to malawi_elections_2004_2025 (1).csv


## 2. 📌 Install Required Libraries in Google Colab

Google Colab supports most Python data libraries by default.  
Some (like **Plotly**) may need manual installation.  

#### 🔹 Installation (if missing):
```python
!pip install plotly
```


In [None]:
!pip install plotly seaborn matplotlib numpy pandas



## 3. 🔹 **Libraries Overview:**

* pandas → load, clean, and analyze datasets
* numpy → numerical operations and array handling
* matplotlib.pyplot → basic plotting and static visualizations
* seaborn → statistical and advanced plots (built on matplotlib)
* plotly.express / graph_objects / subplots / figure_factory → interactive & animated charts
* IPython.display (display, HTML) → show rich outputs inside notebook (tables, HTML, animations)


In [1]:
# Malawi Elections 2004-2025 Data Analysis
# Interactive Visualizations with Plotly

# Required Libraries
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.subplots as sp
from plotly.subplots import make_subplots
import plotly.figure_factory as ff
import seaborn as sns
import matplotlib.pyplot as plt
from IPython.display import display, HTML


In [2]:
# Set dark theme for plots
import plotly.io as pio
pio.templates.default = "plotly_dark"

## 4. 📌 Load the Dataset

- Use **pandas** to read CSV files into a DataFrame.  
- The dataset must be uploaded or available in the working directory.

In [3]:
# remove all the warnings in uploading the CSV file
import warnings
warnings.filterwarnings('ignore')
# 📌 Load the Dataset
df = pd.read_csv("malawi_elections_2004_2025.csv")


## 5. ✨ Dataset Statistics & Overview

- Check dataset size with **shape** (rows × columns).  
- List all **columns** to understand available features.  
- Preview the first rows using **head()**.  
- Identify **missing values** with `isnull().sum()`.  
- Get basic numerical **statistics** (mean, std, min, max, quartiles) using `describe()`.  


In [4]:
# Display basic information about the dataset
print("Dataset Shape:", df.shape)
print("\nDataset Columns:", df.columns.tolist())
print("\nFirst few rows:")
display(df.head())

print("\nMissing Values:")
print(df.isnull().sum())

print("\nBasic Statistics:")
display(df.describe())

Dataset Shape: (5000, 10)

Dataset Columns: ['Election_Year', 'Region', 'Ethnic_Group', 'Language', 'Education_Level', 'Population_Density', 'GDP_per_Capita', 'Candidate_Origin_Region', 'Voted_For', 'Election_Winner_Party']

First few rows:


Unnamed: 0,Election_Year,Region,Ethnic_Group,Language,Education_Level,Population_Density,GDP_per_Capita,Candidate_Origin_Region,Voted_For,Election_Winner_Party
0,2004,Central,Lomwe,Chichewa,Primary,179,947,Southern,DPP,UDF
1,2004,Central,Lambya/Nyiha,Chichewa,Secondary,140,508,Central,MCP,UDF
2,2004,Central,Yao,Chichewa,Primary,374,744,Southern,AFORD,UDF
3,2004,Central,Tumbuka,Chichewa,Secondary,127,537,Southern,AFORD,UDF
4,2004,Southern,Lomwe,Chilomwe,Primary,394,1035,Central,DPP,UDF



Missing Values:
Election_Year              0
Region                     0
Ethnic_Group               0
Language                   0
Education_Level            0
Population_Density         0
GDP_per_Capita             0
Candidate_Origin_Region    0
Voted_For                  0
Election_Winner_Party      0
dtype: int64

Basic Statistics:


Unnamed: 0,Election_Year,Population_Density,GDP_per_Capita
count,5000.0,5000.0,5000.0
mean,2014.2,229.9548,653.3712
std,7.360084,100.270524,221.89927
min,2004.0,20.0,100.0
25%,2009.0,157.0,498.0
50%,2014.0,223.0,645.0
75%,2019.0,294.0,801.0
max,2025.0,608.0,1539.0


## 6. 📊 Visualizations Overview


##### 1. Barplot - Votes Count by Party
- Shows how many votes each **political party** received.  
- Useful for comparing party popularity. 

In [11]:
# Most informative visualization
votes_by_year = df.groupby(['Election_Year', 'Voted_For']).size().reset_index(name='Vote_Count')

fig3 = px.line(votes_by_year, x='Election_Year', y='Vote_Count', 
               color='Voted_For',
               title='Party Performance in Malawi Elections (2004-2025)',
               markers=True)
fig3.update_layout(yaxis_title='Number of Votes')
fig3.show()

# Additional: Show winners clearly
print("🏆 ELECTION WINNERS TIMELINE:")
for idx, row in winners_by_year.iterrows():
    print(f"{row['Election_Year']}: {row['Election_Winner_Party']} won")

🏆 ELECTION WINNERS TIMELINE:
2004: UDF won
2009: DPP won
2014: DPP won
2019: MCP won
2025: DPP won


In [13]:

# 1. Barplot - Votes Count by Party
fig1 = px.bar(df, x='Voted_For', color='Voted_For',
              title='Votes Distribution by Political Party',
              labels={'Voted_For': 'Political Party', 'count': 'Number of Votes'})
fig1.update_layout(showlegend=True)
fig1.show()


##### 2. GDP per Capita Analysis by Region
- Boxplot of **GDP per capita** across regions.  
- Helps understand economic differences between regions.  


In [6]:

# 2. GDP per Capita Analysis by Region
fig2 = px.box(df, x='Region', y='GDP_per_Capita', color='Region',
              title='GDP per Capita Distribution by Region')
fig2.show()

##### 3. Election Winners Over Time
- Line plot showing **winning parties by year**.  
- Identifies political trends over election years.  

In [7]:

# 3. Election Winners Over Time
winners_by_year = df.groupby(['Election_Year', 'Election_Winner_Party']).size().reset_index(name='Count')
fig3 = px.line(winners_by_year, x='Election_Year', y='Count', color='Election_Winner_Party',
               title='Election Winners Over Time')
fig3.show()

##### 4. Population Density vs GDP by Region
- Scatter plot comparing **population density** with **GDP per capita**.  
- Highlights how economy relates to population size.  

In [8]:
# 4. Population Density vs GDP by Region
fig4 = px.scatter(df, x='Population_Density', y='GDP_per_Capita', color='Region',
                 size='GDP_per_Capita', hover_data=['Ethnic_Group'],
                 title='Population Density vs GDP per Capita by Region')
fig4.show()

##### 5. Education Level Distribution
- Pie chart of voters’ **education levels**.  
- Shows the educational background of the population.  

In [9]:
# 5. Education Level Distribution
fig5 = px.pie(df, names='Education_Level', title='Education Level Distribution of Voters')
fig5.show()

##### 6. Language Distribution
- Barplot of **languages spoken by voters**.  
- Useful for analyzing linguistic diversity.

In [10]:
# 6. Language Distribution
fig6 = px.bar(df['Language'].value_counts().reset_index(),
              x='index', y='Language',
              title='Language Distribution in Elections',
              labels={'index': 'Language', 'Language': 'Count'})
fig6.show()

ValueError: Value of 'x' is not the name of a column in 'data_frame'. Expected one of ['Language', 'count'] but received: index
 To use the index, pass it in directly as `df.index`.


##### 7. Ethnic Group Distribution by Region
- Heatmap of **ethnic groups** across regions.  
- Helps visualize cultural and demographic patterns.  

In [None]:
# 7. Ethnic Group Distribution by Region
ethnic_region = pd.crosstab(df['Region'], df['Ethnic_Group'])
fig7 = px.imshow(ethnic_region, title='Ethnic Group Distribution by Region')
fig7.show()

##### 8. Animated Plot - GDP Growth Over Years by Region
- Animated scatter plot of **GDP per capita over time**.  
- Displays how regions’ economies evolved during elections.  


In [None]:
# 8. Animated Plot - GDP Growth Over Years by Region
fig8 = px.scatter(df, x='Population_Density', y='GDP_per_Capita', animation_frame='Election_Year',
                 color='Region', size='GDP_per_Capita', hover_name='Ethnic_Group',
                 title='GDP per Capita and Population Density Over Time (Animated)',
                 range_x=[df['Population_Density'].min(), df['Population_Density'].max()],
                 range_y=[df['GDP_per_Capita'].min(), df['GDP_per_Capita'].max()])
fig8.show()


##### 9. Candidate Origin vs Voting Pattern
- Barplot comparing **candidate origin region** with votes.  
- Shows how regional origins influence voter preferences.

In [None]:
# 9. Candidate Origin vs Voting Pattern
origin_vote = pd.crosstab(df['Candidate_Origin_Region'], df['Voted_For'])
fig9 = px.bar(origin_vote, barmode='group', title='Voting Pattern by Candidate Origin Region')
fig9.show()

##### 10. Interactive Heatmap - Correlation Matrix
- Correlation matrix of **numeric features**.  
- Helps find relationships between variables (positive/negative).

In [None]:
# 10. Interactive Heatmap - Correlation Matrix
numeric_cols = df.select_dtypes(include=[np.number]).columns
corr_matrix = df[numeric_cols].corr()
fig10 = px.imshow(corr_matrix, title='Correlation Matrix Heatmap')
fig10.show()

## 7. 🔮 combined Interactive animated plot

This section brings together **9 different types of plots** into a single **3x3 grid** for a comprehensive overview of the dataset.  
It helps compare multiple perspectives of the Malawi Elections dataset at once.  

---

##### 📌 9 Plots in the Combined Grid:

1. **Boxplot - GDP by Region**  
   - Shows economic variation across regions.  
   - Detects outliers and spread of income levels.  

2. **Histogram - Population Density**  
   - Distribution of population density across the dataset.  
   - Useful for identifying highly populated vs. sparsely populated regions.  

3. **Line Plot - Election Results Over Time**  
   - Displays number of election entries per year.  
   - Reveals participation trends and frequency of elections.  

4. **Histogram - GDP Distribution**  
   - Highlights the overall distribution of GDP per capita.  
   - Helps find whether GDP values are skewed or normally distributed.  

5. **Pie Chart - Party Distribution**  
   - Percentage share of votes for each political party.  
   - Quick comparison of dominant vs. minor parties.  

6. **Donut Chart - Education Levels**  
   - Voter education level breakdown.  
   - Inner hole emphasizes proportional differences.  

7. **Barplot - Ethnic Group Distribution (Top 10)**  
   - Shows most common ethnic groups in the dataset.  
   - Helps analyze diversity patterns.  

8. **Heatmap - Regional Voting Patterns**  
   - Cross-tab of regions vs. parties.  
   - Identifies regional strongholds of political parties.  

9. **Violin Plot - GDP by Education**  
   - Distribution of GDP per capita based on education levels.  
   - Combines boxplot + density for deeper insights.  

---

### 🎥 Additional Visualization: Animated Plot
- **Evolution of Voting Patterns Over Elections**  
- Animated bar chart shows how party vote counts changed over time.  
- Helps visualize political shifts across multiple elections.  

---

✅ Together, these combined plots provide a **holistic EDA (Exploratory Data Analysis)** view — covering economics, demographics, education, ethnicity, and voting trends in Malawi’s elections dataset.  


In [None]:
# COMBINED PLOTS SECTION - 9 Different Visualizations
print("\n" + "="*80)
print("EXPLORATORY DATA ANALYSIS - 9 COMBINED VISUALIZATIONS")
print("="*80)

# Create a 3x3 subplot grid
fig_combined = make_subplots(
    rows=3, cols=3,
    subplot_titles=('1. Boxplot - GDP by Region', '2. Histogram - Population Density', 
                    '3. Line Plot - Election Results Over Time', '4. Histplot - GDP Distribution',
                    '5. Pie Chart - Party Distribution', '6. Donut Chart - Education Levels',
                    '7. Barplot - Ethnic Group Distribution', '8. Heatmap - Regional Patterns',
                    '9. Violin Plot - GDP by Education'),
    specs=[[{"type": "box"}, {"type": "histogram"}, {"type": "scatter"}],
           [{"type": "histogram"}, {"type": "pie"}, {"type": "bar"}],
           [{"type": "bar"}, {"type": "heatmap"}, {"type": "violin"}]]
)

# 1. Boxplot - GDP by Region
for i, region in enumerate(df['Region'].unique()):
    region_data = df[df['Region'] == region]['GDP_per_Capita']
    fig_combined.add_trace(go.Box(y=region_data, name=region, legendgroup=region),
                          row=1, col=1)

# 2. Histogram - Population Density
fig_combined.add_trace(go.Histogram(x=df['Population_Density'], name='Population Density'),
                      row=1, col=2)

# 3. Line Plot - Election Results Over Time
yearly_results = df.groupby('Election_Year').size()
fig_combined.add_trace(go.Scatter(x=yearly_results.index, y=yearly_results.values,
                                 mode='lines+markers', name='Election Entries'),
                      row=1, col=3)

# 4. Histplot - GDP Distribution
fig_combined.add_trace(go.Histogram(x=df['GDP_per_Capita'], name='GDP Distribution'),
                      row=2, col=1)

# 5. Pie Chart - Party Distribution
party_counts = df['Voted_For'].value_counts()
fig_combined.add_trace(go.Pie(labels=party_counts.index, values=party_counts.values,
                             name='Party Distribution'),
                      row=2, col=2)

# 6. Donut Chart - Education Levels (using pie chart with hole)
education_counts = df['Education_Level'].value_counts()
fig_combined.add_trace(go.Pie(labels=education_counts.index, values=education_counts.values,
                             hole=0.4, name='Education Levels'),
                      row=2, col=3)

# 7. Barplot - Ethnic Group Distribution
ethnic_counts = df['Ethnic_Group'].value_counts().head(10)  # Top 10 ethnic groups
fig_combined.add_trace(go.Bar(x=ethnic_counts.index, y=ethnic_counts.values,
                             name='Ethnic Groups'),
                      row=3, col=1)

# 8. Heatmap - Regional Patterns (simplified)
region_party = pd.crosstab(df['Region'], df['Voted_For'])
fig_combined.add_trace(go.Heatmap(z=region_party.values, x=region_party.columns,
                                 y=region_party.index, name='Region-Party Heatmap'),
                      row=3, col=2)

# 9. Violin Plot - GDP by Education
education_levels = df['Education_Level'].unique()
for i, level in enumerate(education_levels):
    level_data = df[df['Education_Level'] == level]['GDP_per_Capita']
    fig_combined.add_trace(go.Violin(y=level_data, name=level, legendgroup=level),
                          row=3, col=3)

fig_combined.update_layout(height=1200, title_text="Comprehensive EDA of Malawi Elections Dataset")
fig_combined.show()

# Additional Animated Plot - Election Results Evolution
fig_animated = px.bar(df.groupby(['Election_Year', 'Voted_For']).size().reset_index(name='Count'),
                     x='Voted_For', y='Count', color='Voted_For',
                     animation_frame='Election_Year',
                     title='Evolution of Voting Patterns Over Elections (Animated)')
fig_animated.show()

## 8. 👨‍💻 Advanced Visualizations: Animated & Pair Plot

This section includes **animated plots** to show trends over time and a **pair plot** to highlight relationships among variables.

---

#### 🎥 Animated Plots

**1. Election Results Evolution (Bar Animation)**  
Displays how vote counts for different political parties **change across election years**, making political shifts easy to visualize.

**2. Regional Voting Patterns (Sunburst Animation)**  
Breaks down results into a **hierarchy of Year → Region → Party**, helping us understand regional influence and distribution of votes.

---

##### 🔗 Pair Plot (Scatter Matrix)

The **Pair Plot** compares numerical variables such as:  
- **Population Density**  
- **GDP per Capita**  
- **Election Year**

✔ Shows **correlations and clusters** among these variables.  
✔ Highlights **regional differences** with color coding.  
✔ Powerful for **exploratory analysis** — helps link **economic/demographic factors** with election outcomes.

👉 The Pair Plot is the **most valuable visualization** here because it reveals deeper patterns that single plots cannot.


In [None]:
# Regional Analysis with Animation
fig_region_animated = px.sunburst(df, path=['Election_Year', 'Region', 'Voted_For'],
                                 values=df.groupby(['Election_Year', 'Region', 'Voted_For']).size().values,
                                 title='Regional Voting Patterns Over Time (Sunburst Chart)')
fig_region_animated.show()

In [None]:
# Pair Plot for Numerical Variables (using Plotly Express)
numerical_df = df[['Population_Density', 'GDP_per_Capita', 'Election_Year']]
fig_pair = px.scatter_matrix(numerical_df, dimensions=['Population_Density', 'GDP_per_Capita'],
                            color=df['Region'], title='Pair Plot of Numerical Variables')
fig_pair.show()

In [None]:

# Advanced Analysis: Trend of GDP Growth by Region Over Time
gdp_trend = df.groupby(['Election_Year', 'Region'])['GDP_per_Capita'].mean().reset_index()
fig_gdp_trend = px.line(gdp_trend, x='Election_Year', y='GDP_per_Capita', color='Region',
                       title='GDP per Capita Trend by Region Over Time',
                       markers=True)
fig_gdp_trend.show()

## 9. 🖥️ Interactive Dashboard Visualization

This section demonstrates how to make data **interactive** using Plotly with dropdown menus.

---

##### 🔽 Dropdown-Based Party Analysis

- A **dropdown menu** is added to switch between different **political parties**.  
- For each selected party, the plot shows how its **GDP per Capita** trend changed across **Election Years**.  
- Makes it easy to **focus on one party at a time** instead of cluttered multi-party plots.  

---

##### 🎯 Why This Matters?

✔ Adds an **interactive dashboard feel** to the notebook.  
✔ Helps compare **economic trends per party** without visual overload.  
✔ User can explore insights dynamically instead of static charts.


In [None]:
# Create a dropdown-based interactive plot
party_options = [{'label': party, 'value': party} for party in df['Voted_For'].unique()]

fig_interactive = go.Figure()

for party in df['Voted_For'].unique():
    party_data = df[df['Voted_For'] == party]
    fig_interactive.add_trace(go.Scatter(
        x=party_data['Election_Year'],
        y=party_data['GDP_per_Capita'],
        mode='markers',
        name=party,
        visible=False
    ))

# Set first trace visible
fig_interactive.data[0].visible = True

# Create buttons for dropdown
buttons = []
for i, party in enumerate(df['Voted_For'].unique()):
    args = [False] * len(df['Voted_For'].unique())
    args[i] = True
    button = dict(label=party,
                  method="update",
                  args=[{"visible": args}])
    buttons.append(button)

fig_interactive.update_layout(
    updatemenus=[dict(type="dropdown",
                      direction="down",
                      x=0.1,
                      y=1.15,
                      buttons=buttons)],
    title="Interactive Analysis: GDP per Capita by Political Party (Use Dropdown)",
    xaxis_title="Election Year",
    yaxis_title="GDP per Capita"
)

fig_interactive.show()


## 10. 🎮 Conclusion

- The analysis is complete with **15+ interactive visualizations**.  
- Dataset covers the period from **first to last election year** available.  
- Insights explored include **political parties, regions, ethnic groups, and economic indicators**.  
- Overall, this notebook provided both **static and interactive views** for a deeper understanding of Malawi elections data.


In [None]:
print("_________________________________________________________________________")
print("=========================================================================")
print("-------------------------------------------------------------------------")
print("\n" + "="*80)
print("ANALYSIS COMPLETE!")
print("="*80)
print(f"Total Visualizations Created: 15+ interactive plots")
print(f"Dataset Period: {df['Election_Year'].min()} - {df['Election_Year'].max()}")
print(f"Political Parties: {df['Voted_For'].nunique()}")
print(f"Regions: {df['Region'].nunique()}")
print(f"Ethnic Groups: {df['Ethnic_Group'].nunique()}")
print('1. import the CSV file')
print('         ✅completed')
print("2. Install required libraries")
print('         ✅completed')
print('3. Import Libraries')
print('         ✅completed')
print('4. Load the dataset')
print('         ✅completed')
print('5. dataset statistics and overview')
print('         ✅completed')
print('6. 9 different combined plots')
print('         ✅completed')
print('7. combined interactive animated plot')
print('         ✅completed')
print('8. Advanced Visulaizations: Animated & Pair Plots')
print('         ✅completed')
print('9. Interactive Dashboard Visualization')
print('         ✅completed')
print('10. conlusion')
print('         ✅completed')
print("__________________________________________________________________________")
print("==========================================================================")
print("--------------------------------------------------------------------------")