# <a style="color: red;">filter for mvm emails</a>

# <u>What makes someone engaged?</u>

### Determine engagement based on `donations` and `actions` 
### Rank that engagement using a <a style="color: red;">R</a><a style="color: orange;">A</a><a style="color: green;">G</a> system


### `Donations`
 - Frequency:   How often is this person donating?
 - Size:   How much is this person donating?


### `Actions`
 - Petitions
 - Surveys
 - Email engagement
 - Volunteering

## Donations

In [111]:
import pandas as pd 
import os  

In [112]:
donations = pd.read_csv("../data/raw/an_group_transaction_report_make-votes-matter_2025-11-30-12-34.csv")
report = pd.read_csv("../data/raw/an_report_all_col_eval_1_2025-11-30-13-36.csv")

donations.columns



Index(['First name', 'Last name', 'Email', 'Address', 'City', 'State/Province',
       'State/Province Abbreviated', 'Zip code', 'Country', 'Language',
       'Mobile Number', 'Mobile Opt-In', 'Recipient', 'Donation Amount', 'Net',
       'Fee', 'Status', 'Transaction ID', 'Recurring', 'Recurring Status',
       'Page Name', 'Administrative Title', 'Page Tags', 'Referrer Code',
       'Source Code', 'Timestamp (ET)', 'ch_response', 'test_mode'],
      dtype='object')

Alot of people aren't going to donate, so we can make the <a style='color : red'>red</a> group, all the people who haven't donated<br>
Since <a style='color : red'>red</a> is anyone who hasn't donated, <a style='color : orange'>amber</a> can be anyone who *has* donated

In [113]:
unique_donators = donations.nunique()['Email']
print(f"Unique donators: {unique_donators}")

Unique donators: 3729


In [114]:
# Group all donations by email, i.e. using email as a unique identifier
columns_to_group = ['Email', 'Donation Amount']
donations_by_email = donations[columns_to_group].groupby('Email').sum().reset_index()

# Finding average donation metrics
average_donation_size = donations_by_email["Donation Amount"].mean()
median_donation_size = donations_by_email["Donation Amount"].median()

count_of_each_email = donations.groupby("Email").size()
average_donation_count = count_of_each_email.mean()
median_donation_count = count_of_each_email.median()
max_donation_count = count_of_each_email.max()

quartiles = count_of_each_email.quantile([0.25, 0.5, 0.75])

print(f"Average donation size: £{average_donation_size:.2f}")
print(f"Median donation size: £{median_donation_size:.2f}")
print(f"Average donation count: {average_donation_count:.0f}")
print(f"Median donation count: {median_donation_count:.0f}")
print(f"Max donation count: {max_donation_count:.0f}")

print(quartiles)

Average donation size: £51.07
Median donation size: £33.00
Average donation count: 6
Median donation count: 4
Max donation count: 45
0.25     1.0
0.50     4.0
0.75    11.0
dtype: float64


In [115]:

'''
Make a new column in main report for donation count and total value 
Remove old total value as it is outside the time scope of this project/data
Populate donation_count and total_donation_value columns
'''

donations_by_email.rename(columns={'Email': 'email'}, inplace=True)

# email + total value is already part of donations_by_email
donation_subset_to_join = pd.DataFrame(donations_by_email)

# for each email, count number of donations made in donations 
donation_subset_to_join["donation_count"] = donation_subset_to_join["email"].map(donations["Email"].value_counts())



donation_subset_to_join.set_index("email", inplace=True)



report = report.merge(donation_subset_to_join, on="email", how="left")

# new_report.columns.to_list()


In [116]:
report.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 110179 entries, 0 to 110178
Data columns (total 10 columns):
 #   Column                                              Non-Null Count   Dtype  
---  ------                                              --------------   -----  
 0   email                                               110158 non-null  object 
 1   can2_user_tags                                      101633 non-null  object 
 2   uuid                                                110179 non-null  object 
 3   can2_subscription_status                            110179 non-null  object 
 4   can2_lifetime_value                                 110179 non-null  int64  
 5   Select your age group                               189 non-null     object 
 6   Please tell us the party you support                186 non-null     object 
 7   Do_you_feel_the_current_government_will_Deliver_PR  5831 non-null    object 
 8   Donation Amount                                     3167 non-nul

The <a style='color : green'>green</a> group is somewhat arbitrary, but after meeting with stakeholders we decided on the top **25%** of donators 

## Actions

To obtain 'actions', which include petitions, events, ticketed events, forms, surveys, letter campaigns and call campaigns I would need every instance of each of these, and count actions for every supporter. <br>
Due to the ActionNetwork subscription tier MVM has, I cannot access the data directly.<br>
ActionNetwork offers a service on its site which can be used to work around this, and due to the time constraints of this project, I will make full use of it.

This will require generating reports with no-code filtering solutions, and manually assessing the values to build an understanding of overall action data, which is not something MVM currently has

After meeting with stakeholders we decided the following margins for the <a style="color: red;">R</a><a style="color: orange;">A</a><a style="color: green;">G</a> system, covering data since 01/01/25

<a style="color: red;">Red:</a> No actions or their actions completed is in the lowest quartile<br/>
<a style="color: orange;">Amber:</a> Their actions completed is between the first and fourth quartile <br/>
<a style="color: green;">Green:</a> Their actions completed is in the top quartile 

I will also need to remove any MVM native emails as they often have a high number of actions which will skew data 

In [117]:
'''
Extracting the action data and putting it into the full report as a new column 
'''

file_path = "../data/raw/action_data/"
# add new column to `report`
# add value according to file 

report["actions_taken"] = 0


for file_name in os.listdir(file_path):
    action_count = int(file_name.split("-")[0])
    df = pd.read_csv(file_path+file_name)
    df.dropna(inplace=True)
    
    report.loc[report['email'].isin(df["email"]), "actions_taken"] = action_count
    
# report.loc[report["actions_taken"] == 11, ["email", "actions_taken"]]




In [118]:
'''
Finding statistics around actions taken 
'''
actions_without_0 = report.loc[report["actions_taken"] > 0, "actions_taken"]

average_action_count = actions_without_0.mean()
median_action_count = actions_without_0.median()

action_quartiles = actions_without_0.quantile([0.25, 0.5, 0.75])

print(f"Average actions taken {average_action_count} when excluding 0 actions taken")
print(f"Median actions taken {median_action_count} when excluding 0 actions taken")

print(action_quartiles)



Average actions taken 2.6107795427417133 when excluding 0 actions taken
Median actions taken 2.0 when excluding 0 actions taken
0.25    1.0
0.50    2.0
0.75    3.0
Name: actions_taken, dtype: float64


## Evaluating Engagement 

In [None]:
'''
Add two new columns which are how far they are distributed for donation engagement and action in engagement 
Add a third column which combines the 2 somehow 
'''

# total value of a person, displayed as their percentile score from -100 to 100 and added to 'report'
donations_by_email["donation_size_score_100"] = 200 * (donations_by_email["Donation Amount"].rank(pct=True) - 0.5)

score_subset_to_join = donations_by_email[["email", "donation_size_score_100"]]

score_subset_to_join.set_index("email", inplace=True)

report = report.merge(score_subset_to_join, on="email", how="left")

report.columns

Index(['email', 'can2_user_tags', 'uuid', 'can2_subscription_status',
       'can2_lifetime_value', 'Select your age group',
       'Please tell us the party you support',
       'Do_you_feel_the_current_government_will_Deliver_PR', 'Donation Amount',
       'donation_count', 'actions_taken', 'donation_size_score_100'],
      dtype='object')