<a href="https://colab.research.google.com/github/dube-mthemb0/maps/blob/main/HND11.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd
from geopy.distance import geodesic
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Load the datasets
election_data = pd.read_csv('/content/ZAMFARA_crosschecked.csv')
coordinates_data = pd.read_csv('/content/zamfara___coordinates.csv')

In [None]:
election_data.info()
coordinates_data.info()

In [None]:
# Merge datasets on common columns
merged_data = pd.merge(election_data, coordinates_data, on='State')

In [None]:
merged_data.tail()

In [None]:
merged_data.describe()

In [None]:

# Check for missing values and handle them
merged_data = merged_data.dropna(subset=['Latitude', 'Longitude'])

# Define a radius (1 km)
radius = 1  # in km


In [None]:
# Function to find neighbours
def find_neighbours(lat, lon, df, radius):
    neighbours = []
    for index, row in df.iterrows():
        distance = geodesic((lat, lon), (row['Latitude'], row['Longitude'])).km
        if distance <= radius:
            neighbours.append(index)
    return neighbours

# Apply the function to each polling unit
merged_data['neighbours'] = merged_data.apply(lambda row: find_neighbours(row['Latitude'], row['Longitude'], merged_data, radius), axis=1)


In [None]:
# Function to calculate outlier scores
def calculate_outlier_score(row, df):
    neighbours = df.loc[row['neighbours']]
    scores = {}
    for party in ['APC', 'LP', 'PDP', 'NNPP']:  # Replace with actual party columns
        score = abs(row[party] - neighbours[party].mean())
        scores[party + '_outlier_score'] = score
    return pd.Series(scores)


In [None]:
# Apply the function to each polling unit
outlier_scores = merged_data.apply(lambda row: calculate_outlier_score(row, merged_data), axis=1)
merged_data = pd.concat([merged_data, outlier_scores], axis=1)

# Save the dataset with outlier scores
merged_data.to_csv('/mnt/data/selected_state_with_outlier_scores.csv', index=False)

In [None]:
# Plot the outlier scores for one party as an example
sns.scatterplot(data=merged_data, x='longitude', y='latitude', hue='APC_outlier_score', size='APC_outlier_score', sizes=(20, 200))
plt.title('Outlier Scores for APC')
plt.savefig('/mnt/data/APC_outlier_scores.png')



In [None]:
# Create a sorted list of polling units by outlier scores for each party
sorted_data = merged_data.sort_values(by=['APC_outlier_score', 'LP_outlier_score', 'PDP_outlier_score', 'NNPP_outlier_score'], ascending=False)
sorted_data.to_excel('/mnt/data/sorted_outlier_scores.xlsx', index=False)



In [None]:
# Generate a detailed report
report = """
# Outlier Detection Report

## Methodology

The analysis involved geospatial techniques to identify neighbouring polling units within a 1 km radius. For each polling unit, the votes received by each party were compared with those of its neighbours. An outlier score was calculated based on the deviation of votes from neighbouring units.

## Summary of Findings

The dataset was sorted by the outlier scores for each party to identify the most significant outliers. The top 3 outliers for each party are highlighted below with their closest polling units.

## Top 3 Outliers for APC

1. Polling Unit 1: [Details]
2. Polling Unit 2: [Details]
3. Polling Unit 3: [Details]

## Top 3 Outliers for LP

1. Polling Unit 1: [Details]
2. Polling Unit 2: [Details]
3. Polling Unit 3: [Details]

## Top 3 Outliers for PDP

1. Polling Unit 1: [Details]
2. Polling Unit 2: [Details]
3. Polling Unit 3: [Details]

## Top 3 Outliers for NNPP

1. Polling Unit 1: [Details]
2. Polling Unit 2: [Details]
3. Polling Unit 3: [Details]

## Conclusion

The analysis has identified significant outliers in the voting results, which may indicate potential irregularities or influences. Further investigation is recommended for these polling units to ensure the transparency and integrity of the election results.
"""

with open('/mnt/data/outlier_detection_report.txt', 'w') as file:
    file.write(report)


In [None]:
import zipfile

with zipfile.ZipFile('/mnt/data/outlier_detection_results.zip', 'w') as zipf:
    zipf.write('/mnt/data/selected_state_with_outlier_scores.csv', arcname='selected_state_with_outlier_scores.csv')
    zipf.write('/mnt/data/APC_outlier_scores.png', arcname='APC_outlier_scores.png')
    zipf.write('/mnt/data/sorted_outlier_scores.xlsx', arcname='sorted_outlier_scores.xlsx')
    zipf.write('/mnt/data/outlier_detection_report.txt', arcname='outlier_detection_report.txt')
