<u><b><h1 style="text-align:center; line-height:1.5; color:#000000; background:#EFEFEF; border: 1px solid #FF6B6B ; padding:20px;">Visual exploration of a Dataset: 2022 French Presidential Elections - First Round Results by Departments</h1></b></u>

This analysis examines the 2022 French Presidential election results from the first round, obtained from the official French governmental portal (<a href="https://www.data.gouv.fr/fr/datasets/election-presidentielle-des-10-et-24-avril-2022-resultats-definitifs-du-1er-tour/">data.gouv.fr</a>).

### General approach

<b><h2 style="padding: 10px; border-left: 3px solid #FF6B6B;">Dataset Overview</h2></b>

### Dataset Characteristics:

- Geographic Coverage: 107 French departments (including overseas territories)
- Electoral Data: Registered voters, abstentions, blank votes, invalid votes, and expressed votes
- Candidate Coverage: All 12 presidential candidates across the political spectrum
- Data Granularity: Department-level aggregation with both absolute counts and calculated percentages

### Candidate Political Spectrum:
The dataset includes candidates ranging from Nathalie Arthaud (far-left) to Marine Le Pen (far-right), including incumbent President Emmanuel Macron (center).

### Data Structure:
For each department-candidate combination, the dataset provides vote counts and percentages calculated relative to both registered voters and expressed votes, facilitating multi-dimensional electoral analysis.

### Initial Data Quality Assessment

Basic information about the dataset and the percentage of neull values in each column were retrieved and exported to an Excel in dedicated output file folder for easier interpretation.

In [None]:
# Get a summary of the dataset
print(df.info())

# Calculate missing values in each column
missing_percentage = df.isnull().mean() * 100

In [None]:
# Console output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 107 entries, 0 to 106
Data columns (total 89 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   department_code                107 non-null    object 
 1   department_name                107 non-null    object 
 2   status                         107 non-null    object 
 3   total_registered_voters        107 non-null    int64  
...
 88  candidate_12_votes_pct_voters  107 non-null    float64
dtypes: float64(32), int64(18), object(39)
memory usage: 74.5+ KB
None

<div style="background-color: #f8f9fa; color: #000000; padding: 15px; border-radius: 8px; border: 1px solid #6c757d; margin: 15px 0;">
<p><strong>💡 Conclusion:</strong></p> <li>107 rows and 89 columns, with each row representing one French department and multiple columns dedicated to each of the 12 presidential candidates.</li>
<li>The memory usage of 74.5+ KB indicates a manageable dataset size for computational processing</li>
<li>No null values accross columns</li>
<li>The data types appear appropriate, with numerical voting data stored as integers and floats, while categorical information (candidate names, genders, department names) is stored as object types.</li></div>

<div style="background-color: #fff3e0; color: #000000; padding: 15px; border-radius: 8px; border: 1px solid #ff9800; margin: 15px 0;">
<p><strong>⚠️ Potential issue - challenges:</strong></p> <li>Wide format spreads candidate information across multiple columns, requiring data aggregation to consolidate all candidates into a single column structure. This transformation will necessitate careful consideration of how to preserve department-level information while enabling candidate-specific analysis.</li>
<li>Column headers for 11 out of 12 candidates appear to be missing.</li>
<li>The dataset includes a row for "French citizens living abroad," which should be excluded from geo-graphic department-level analysis while remaining relevant for overall national statistics.</li></div>

## Preparing the dataset for analysis
1. Columns titles translated into English and changed into a Python friendly format e.g. department_code easier to manipulate during the analysis.
2. The results for each candidates were split among different columns, some of them without header i.e Candidate 1 gender | Candidate 1 surname | Candidate 1 first name | Candidate 1 number of votes | Candidate 1 % of registered voters | Candidate 1 % of valid votes and the pattern repeats for other 11 candidates. It was decided to name manually all columns headers in Excel so that, once uploaded in Python, we can merge the candidate result details in one single consolidate column

<b><h2 style="padding: 10px; border-left: 3px solid #FF6B6B;">Working hypothesis / questions</h2></b>

- What are the results on national and departmental level?
- What about people who didn't vote for a candidate (null, abstention...)?
- Do smaller departments vote differently from larger ones?
- Is abstention more common in low-density areas?
- Do certain candidates perform better in urban vs rural areas?
- Difference between metropolitan et outre mer
- Correlation between the candidate with the most votes and absention
- Is there a candidate that won significantly more departments?

<b><h2 style="padding: 10px; border-left: 3px solid #FF6B6B;">Initial Steps</h2></b>

### Required libraries
- pandas
- matplotlib
- plotly
- seaborn
- requests
- pathlib