# Political Donations EDA 💸

This is a quick starter Exploratory Data Analysis (EDA) of the Political Donations by American Sports Owners Dataset. Although I do use quite a lot of one-liners in this EDA, I think there is enough commenting and analysis for anyone to follow along and understand the code.

<h3> Table of Contents </h3>

<ol>
    <li> <a href="https://www.kaggle.com/ironicninja/political-donations-eda#Essential-Imports"> Essential Imports </a> </li>
    <li> <a href="https://www.kaggle.com/ironicninja/political-donations-eda#Data-Preprocessing"> Data Preprocessing </a> </li>
    <li> <a href="https://www.kaggle.com/ironicninja/political-donations-eda#Analysis-of-Individual-Donations"> Analysis of Individual Donations </a> </li>
    <li> <a href="https://www.kaggle.com/ironicninja/political-donations-eda#Analysis-of-Individual-Donators"> Analysis of Individual Donators </a> </li>
    <li> <a href="https://www.kaggle.com/ironicninja/political-donations-eda#Analysis-of-Donations-to-Specific-Political-Parties"> Analysis of Donations to Specific Political Parties </a> </li>
</ol>

<p style="font-size: 16px"> If you like this notebook, please give it an <span style="color: green; font-weight: bold"> upvote!</span> Let's jump right into the analysis. </p>

# Essential Imports

In [None]:
import numpy as np
import pandas as pd
import os
import sys
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
import plotly.graph_objects as go
import plotly.offline as py
import math
import itertools
py.init_notebook_mode(connected=True)

The variable below, ```LOOK_AT```, controls the visualizations done below. If you fork this notebook and would like to visualize more/less per graph, the easiest way to do so is by changing the value of ```LOOK_AT``` below.

In [None]:
LOOK_AT = 10

# Data Preprocessing

In [None]:
df = pd.read_csv("../input/political-donations-by-american-sports-owners/sports-political-donations.csv")
df

One of the most important steps is here to convert each of the columns in the dataset to its appropriate type. Here, we want to convert the column "Amount" to a float so we can utilize the vectorized implementations Pandas DataFrames offer. We do so with the simple one-liner shown below.

In [None]:
if df['Amount'].dtype != 'int64':
    df['Amount'] = (df['Amount'].str.replace('$', '', regex=False)).str.replace(',', '', regex=False).astype('int64')

df

# Analysis of Individual Donations

<h3> Primary Questions </h3>

* What is the distribution of donation values?
* What are the largest individual donations in terms of amount donated?

<h3> Distribution of Donation Values </h3>

In [None]:
amount = df['Amount']
fig = px.histogram(amount)
fig.update_layout(barmode='group', title={'text': f"Distribution of Donation Values", 'x': 0.5,
                             'xanchor': 'center', 'font': {'size': 20}}, xaxis_title="Amount, in USD", yaxis_title="Count", showlegend=False)
fig.show()
print("Number of Donations: %d\nMean Donation Value: $%.2f\nMedian Donation Value: $%.2f" % (len(amount), amount.mean(), amount.median()))
print("\nAdditional Statistical Measures:\nStandard Deviation: $%.3f\nSkew: %.3f\nKurtosis: %.3f\n95%% of Data is Between $%.3f and $%.3f" % (amount.std(), amount.skew(),
                                                                                                        amount.kurtosis(), np.quantile(amount, 0.025), np.quantile(amount, 0.975)))

"A general guideline for skewness is that if the number is greater than +1 or lower than –1, this is an indication of a substantially skewed distribution. For kurtosis, the general guideline is that if the number is greater than +1, the distribution is too peaked" (<a href="https://www.smartpls.com/documentation/functionalities/excess-kurtosis-and-skewness">Source</a>). 

Clearly, this distribution has many outliers, and that's confirmed by visual analysis of the plot above. Since we are not necessarily doing predictions with this data, removing these outliers is not necessary, but it is still good to note that **the top donators in this dataset are statistical outliers.**

<h3> Top 10 Individual Donations </h3>

In [None]:
donations = df.sort_values('Amount', ascending=False)
fig = px.bar(donations[:LOOK_AT], x="Owner", y="Amount")
fig.update_layout(barmode='group', title={'text': f"Top {LOOK_AT} Individual Donations", 'x': 0.5,
                             'xanchor': 'center', 'font': {'size': 20}}, yaxis_title="Amount, in USD", showlegend=False)
fig.show()

Yep, you read that right. San Fransisco Giants owner Charles Johnson holds the top 3 donations and accounts for 5 out of the top 10 donations in terms of numerical value (amount donated). He is clearly an outlier in this analysis (If you want to look at more/other owners in your own analysis, feel free to fork the notebook and increase the value of ```LOOK_AT``` and/or drop Charles Johnson from the data entirely).

# Analysis of Individual Donators

<h3> Primary Questions: </h3>

* Which owner has donated the most TOTAL money?
* Which owner has donated the most times?
* Which owner donates, ON AVERAGE, the most money?

<h3> Top Donators by Gross Value Donated </h3>

In [None]:
sum_df = df.groupby('Owner').sum().sort_values('Amount', ascending=False)
sum_donated = sum_df['Amount']
fig = px.bar(sum_donated[:LOOK_AT])
fig.update_layout(barmode='group', title={'text': f"Top {LOOK_AT} Individual Donators, Gross Value", 'x': 0.5,
                             'xanchor': 'center', 'font': {'size': 20}}, yaxis_title="Amount, in USD", showlegend=False)
fig.show()

Unsurprisingly, Charles Johnson tops the list again with a whopping **$11.03 MILLION** donated. Let's see what percentage of donated money he claims.

In [None]:
pie_df = sum_df.reset_index()
fig = px.pie(pie_df, values="Amount", names="Owner")
fig.update_traces(textposition='inside')
fig.update_layout(uniformtext_minsize=12, uniformtext_mode='hide', title={'text': f"Pie Chart of Individual Donators (Total Donated: ${sum_donated.sum()})", 'x': 0.4,
                             'xanchor': 'center', 'font': {'size': 20}})
fig.show()

Charles Johnson accounts for almost a quarter of all donations! Now, let's take a look at the distribution of individual donators.

In [None]:
amount = df.groupby('Owner').sum()['Amount']
fig = px.histogram(amount)
fig.update_layout(barmode='group', title={'text': f"Distribution of Donators", 'x': 0.5,
                             'xanchor': 'center', 'font': {'size': 20}}, xaxis_title="Amount, in USD", yaxis_title="Count", showlegend=False)
fig.show()
print("Number of Unique Donators: %d\nMean Donator: $%.2f\nMedian Donator: $%.2f" % (len(amount), amount.mean(), amount.median()))
print("\nAdditional Statistical Measures:\nStandard Deviation: $%.3f\nSkew: %.3f\nKurtosis: %.3f\n95%% of Data is Between $%.3f and $%.3f" % (amount.std(), amount.skew(),
                                                                                                        amount.kurtosis(), np.quantile(amount, 0.025), np.quantile(amount, 0.975)))

Ok, Charles Johnson is quite obviously an outlier. Let's take a closer look at his donations.

In [None]:
cj_donations = df.loc[df['Owner'] == 'Charles Johnson']
cj_donations

In [None]:
fig = px.histogram(cj_donations['Amount'])
fig.update_layout(barmode='group', title={'text': f"Charles Johnson's Donations", 'x': 0.5,
                             'xanchor': 'center', 'font': {'size': 20}}, xaxis_title="Amount, in USD", yaxis_title="Count", showlegend=False)
fig.show()

It looks like most of his donations are comparatively small, but he has a couple of extremely large donations.

<h3> Number of Donations per Donator </h3>

In [None]:
count_donated = df.groupby('Owner').size().sort_values(ascending=False)
fig = px.bar(count_donated[:LOOK_AT])
fig.update_layout(barmode='group', title={'text': f"Top {LOOK_AT} Donators by Number of Donations", 'x': 0.5,
                             'xanchor': 'center', 'font': {'size': 20}}, yaxis_title="Count", showlegend=False)
fig.show()

What a surprise, it's Charles Johnson at the top AGAIN. It seems like he really loves to donate to political groups, huh.

In [None]:
fig = px.histogram(count_donated, nbins=200)
fig.update_layout(barmode='group', title={'text': f"Distribution of Number of Donations Per Donator", 'x': 0.5,
                             'xanchor': 'center', 'font': {'size': 20}}, xaxis_title="Donated Count", yaxis_title="Count", showlegend=False)
fig.show()

print("Mean Donations per Donator: %.3f\nMedian Donations per Donator: %.1f" % (count_donated.mean(), count_donated.median()))
print("\nAdditional Statistical Measures:\nStandard Deviation: %.3f\nSkew: %.3f\nKurtosis: %.3f\n95%% of Data is Between %d Donation(s) and %d Donation(s)" % (count_donated.std(), 
                                            count_donated.skew(), count_donated.kurtosis(), np.quantile(count_donated, 0.025), np.quantile(count_donated, 0.975)))

Most sports team owners in this dataset donate less than 5 times, but there are still quite a lot of statistical outliers (I'm looking at you, Charles Johnson).

<h3> Highest Average Donator </h3>

In [None]:
avg_donated = df.groupby('Owner').mean()['Amount'].sort_values(ascending=False)
fig = px.bar(avg_donated[:LOOK_AT])
fig.update_layout(barmode='group', title={'text': f"Top {LOOK_AT} Donators by Average Donation", 'x': 0.5,
                             'xanchor': 'center', 'font': {'size': 20}}, yaxis_title="Count", showlegend=False)
fig.show()

No Charles Johnson, very cool.

# Analysis of Donations to Specific Political Parties

Let's add the feature "Party" to the mix.

<h3> Primary Questions </h3>

* Which party receives the most donations?
* Which party receives the most money?
* Are there any owners that donate to multiple parties?

<h3> Which Party Receives the Most Donations? </h3>

In [None]:
df['Party'].unique()

In [None]:
COLOR_MAP = {'Democrat': 'blue', 'Bipartisan': 'yellow', 'Republican': 'red', 'Bipartisan, but mostly Republican': 'orange', 'Bipartisan, but mostly Democratic': 'cyan',
            'Independent': 'green'}
party_df = pd.DataFrame(df.groupby('Party').size().sort_values(ascending=False), columns=["Count"]).reset_index()
fig = px.bar(party_df, x="Party", y="Count", color="Party", color_discrete_map=COLOR_MAP)
fig.update_layout(title={'text': f"Political Parties Which Receive the Most Donations", 'x': 0.5,
                             'xanchor': 'center', 'font': {'size': 20}}, yaxis_title="Count", showlegend=False)
fig.show()

Republicans receive the most donations by a decent margin.

<h3> Which Party Receives the Most Money? </h3>

In [None]:
COLOR_MAP = {'Democrat': 'blue', 'Bipartisan': 'yellow', 'Republican': 'red', 'Bipartisan, but mostly Republican': 'orange', 'Bipartisan, but mostly Democratic': 'cyan',
            'Independent': 'green'}
party_sum_df = df.groupby('Party').sum().sort_values('Amount', ascending=False).reset_index()
fig = px.bar(party_sum_df, x="Party", y="Amount", color="Party", color_discrete_map=COLOR_MAP)
fig.update_layout(title={'text': f"Political Parties Which Receive the Most Money", 'x': 0.5,
                             'xanchor': 'center', 'font': {'size': 20}}, yaxis_title="Amount, in USD", showlegend=False)
fig.show()

The Republican party also receives the most money by sports owners.

<h3> Polarity of Donations </h3>

We calculate polarity as ```sum - max(donations)```. This will take into consideration all donations that are NOT the donator's primary party they donated to.

In [None]:
polar_df = df.groupby(['Owner', 'Party']).size()
mcc_owner = {}
for owner in df['Owner'].unique():
    mcc = polar_df[owner].sum()-max(polar_df[owner])
    mcc_owner[owner] = mcc
        
mcc_df = pd.DataFrame.from_dict(mcc_owner, orient='index', columns=["Value"]).sort_values("Value", ascending=False)
mcc_df

In [None]:
fig = px.bar(mcc_df[:LOOK_AT])
fig.update_layout(title={'text': f"Which Donators Have the Most Polar Donations", 'x': 0.5,
                             'xanchor': 'center', 'font': {'size': 20}}, yaxis_title="Value", showlegend=False)
fig.show()

Let's take a look at the person with the most polar donations, Micky Arison.

In [None]:
polar_df['Micky Arison']

Interestingly, Micky Arison donates to both Democrat AND Republican groups.

And that's all for this notebook right now! There's definitely some other stuff you could probably do with this data and/or you could extend this data beyond donations by only American sports owners. Once again, if you liked this notebook, please give it an upvote! It would be greatly appreciated.