<a href="https://colab.research.google.com/github/MashaKubyshina/solving_work_data_analytics_problems/blob/main/Metrics_WI_Vaccine_Initiative.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Quick GoToVacc data from FB ads**

**Goal of this tempalte** is to facilitate reporting of GoToVacc (get out to vaccinate) metrics. This script allows to pull a customized report with key GoToVacc metrics from 2 different data scources in less than 5 minutes.
It used to take my team around 2 hours to pull this report by using excel and going through each data sheet manually. 

This template can be used for any state. In this template I am using the state of Wisconsin.

**Key questions it will help you answer**

This report will answer questions such as:


*   What is the number of vaccination response engagements?
*   What percent of users clicking on our ads are already vaccinated?
*   How many users in our audience will share the vaccination message with their family and friends on FB?


**What you need before running the code**

To use this template you will need the following data sheets downloaded on your machine:


*   Chatfuel data in csv (select the user segment in people's tab using unique defining attribute and export users selecting all the attributes, click on "select all")
*   Facebook Ads data in xlsx (go to the ad campaign and create a report only using "ad set" as a parameter, don't include "ad name", pay attention to the date range you select in FB Ads, the results might be wrong or different if wrong dates are selected)


**Adding the data to your code**

After you have downloaded all the data to your machine, you will add it to your files folder by clicking on the file icon on the right. Click on the option "Upload to session storage" and upload both sheets here. The 2 data sheets will appear next to "sample data" folder.
Next you need to copy the path of each sheet by clicking on 3 dots next to each file name and clicking on "copy path". You will paste this path in the spaces provided in the top portion of this code.

Now you can go through the code and follow the instructions hashed in green.

In [10]:
# Import librarires

import pandas as pd
import numpy as np
from datetime import date

In [None]:
# Copy the path from Chatfuel data by clicking on 3 dots next to the file name
# Paste this path in the place of the previous path, you will replace the "/content/Localyst_2021_09_28_20_16_04.csv" bit)
# If needed please read the instructions above

cf_data=pd.read_csv("/content/Localyst_2021_11_10_00_09_53.csv")
cf_data.head(5)

In [None]:
# Copy the path from Facebook Ads data by clicking on 3 dots next to the file name
# Paste this path in the place of the previous path, you will replace the "/content/Virginia_09-28_new.xlsx" bit)
# If needed please read the instructions above

fb_data=pd.read_excel("/content/WI_Vaccine_FB_11_07_2021-Jun-1-2021-to-Nov-7-2021.xlsx")
fb_data.head(3)

If you have succesfully copied and pasted the 2 paths, you can run the whole script. To run the script go to "Runtime" in the top menu and click on "Run all". The script will pause when it will ask you to enter your authentification to mount the drive to place the produced report on it. You will find more instructions at that step.

In [13]:
# set bold style for headers

class style:
   BOLD = '\033[1m'
   END = '\033[0m'

In [None]:
# Drop first row from the dataframe in Facebook ads data

fb_data.drop([0], inplace = True)
fb_data.head(3)

In [None]:
# Grab the first row and make it the dataframe header in Facebook Ads data

new_header = fb_data.iloc[0] #grab the first row for the header
fb_data.columns = new_header #set the header row as the df header
fb_data.drop([1], inplace = True) # drop the first row (only use this line if it is an extra text header)
fb_data.head(10)

In [16]:
# Check columns in Facebook Ads data

fb_data.columns

Index([                        nan,               'Ad Set Name',
                   'Campaign Name',                   'Ad Name',
                               nan,           'Delivery Status',
                  'Delivery Level',             'Campaign Name',
             'Attribution Setting',               'Result Type',
                         'Results',                     'Reach',
                     'Impressions',           'Cost per Result',
                 'Quality Ranking',   'Engagement Rate Ranking',
         'Conversion Rate Ranking',        'Amount Spent (USD)',
       'New Messaging Connections',               'Link Clicks',
                'Reporting Starts',            'Reporting Ends'],
      dtype='object', name=1)

In [17]:
# Rename certian columns in Facebook Ads data to avoid spaces

fb_data = fb_data.rename(columns={'Ad Name':'ad_name'})
fb_data = fb_data.rename(columns={'Campaign Name':'campaign_name'})
fb_data = fb_data.rename(columns={'Ad Set Name':'adset_name'})
fb_data = fb_data.rename(columns={'New Messaging Connections':'new_messaging_connections'})
fb_data = fb_data.rename(columns={'Cost per Result':'cost_per_result'})
fb_data = fb_data.rename(columns={'Amount Spent (USD)':'amount_spent_usd'})
fb_data = fb_data.rename(columns={'Link Clicks':'link_clicks'})
fb_data.columns

Index([                        nan,                'adset_name',
                   'campaign_name',                   'ad_name',
                               nan,           'Delivery Status',
                  'Delivery Level',             'campaign_name',
             'Attribution Setting',               'Result Type',
                         'Results',                     'Reach',
                     'Impressions',           'cost_per_result',
                 'Quality Ranking',   'Engagement Rate Ranking',
         'Conversion Rate Ranking',          'amount_spent_usd',
       'new_messaging_connections',               'link_clicks',
                'Reporting Starts',            'Reporting Ends'],
      dtype='object', name=1)

In [None]:
# Delete rows in Facebook Ads Data where "results" and "adset_name" column values are NaN (this allows us to delete summary rows)

fb_data = fb_data.dropna(subset=['Results', 'adset_name'])
fb_data

In [None]:
# delete adset for Racine

fb_data = fb_data.loc[fb_data['adset_name'] != 'GTVac_Racine_set']
fb_data

In [20]:
# fb results sum of all adsets

fb_results=fb_data['Results'].sum()
fb_results

3810

In [21]:
# fb mean of all results

fb_cost_per_result_r=fb_data['cost_per_result'].mean()
fb_cost_per_result='${:0,.2f}'.format(fb_data['cost_per_result'].mean())
fb_cost_per_result

'$5.59'

In [22]:
# alternative way to get cost per result

fb_cost_per_result_alt=fb_data['amount_spent_usd'].sum()/fb_data['Results'].sum()
fb_cost_per_result_alt
fb_cost_per_result_alt_for=fb_cost_per_result='${:0,.2f}'.format(fb_cost_per_result_alt)
fb_cost_per_result_alt_for

'$5.73'

In [None]:
# fb ad spend all adsets

fb_total_ad_spend_r=fb_data['amount_spent_usd'].sum()
fb_total_ad_spend='${:0,.2f}'.format(fb_data['amount_spent_usd'].sum())
fb_total_ad_spend

In [24]:
# fb new messaging conversations started

fb_new_messaging_connections=fb_data['new_messaging_connections'].sum()
fb_new_messaging_connections

3216

In [25]:
# check all unique adset names

fb_data['adset_name'].unique()

array(['GTVac_WISC_Misc_set', 'GTVac_Milwaukee_set', 'GTVac_Dane_set',
       'GTVac_Waukesha_set', 'GTVac_Brown_set', 'GTVac_Rock_set',
       'GTVac_Kenosha_set', 'GTVac_Outagamie_set', 'GTVac_Winnebago_set',
       'GTVac_WISC_100_adset', 'GTVac_La_Crosse_set',
       'GTVac_Marathon_set', 'GTVac_Fond_du_Lac_set',
       'GTVac_Washington_set', 'GTVac_Eau_Claire_set',
       'GTVac_Sheboygan_set', 'GTVac_Walworth_set'], dtype=object)

In [26]:
# Save total number of Chatfuel subscribers

total_cf_subscribers=len(cf_data)
total_cf_subscribers

3506

In [27]:
cf_data.shape

(3506, 483)

In [28]:
# Check the county attribute in CF

cf_data['county_localyst'].value_counts()

your state     1510
Milwaukee       435
Racine          397
Dane            211
Waukesha        125
Brown           104
Rock             99
Outagamie        92
Winnebago        69
Kenosha          65
La Crosse        64
Marathon         61
Fond du Lac      58
Eau Claire       56
Sheboygan        47
your county      44
Washington       42
Walworth         23
Name: county_localyst, dtype: int64

In [29]:
# take away Racine users from CF

cf_data = cf_data.loc[cf_data['county_localyst'] != 'Racine']
cf_data.shape

(3109, 483)

In [30]:
# Check the values for vaccination status attribute in CF

cf_data['Vaccinated_Wisc_from_ad'].value_counts()

yes    2706
no      400
Name: Vaccinated_Wisc_from_ad, dtype: int64

In [31]:
# Check the county attribute in CF to make sure there are no Racine users

cf_data['county_localyst'].value_counts()

your state     1510
Milwaukee       435
Dane            211
Waukesha        125
Brown           104
Rock             99
Outagamie        92
Winnebago        69
Kenosha          65
La Crosse        64
Marathon         61
Fond du Lac      58
Eau Claire       56
Sheboygan        47
your county      44
Washington       42
Walworth         23
Name: county_localyst, dtype: int64

In [32]:
cf_data['wi_GTVac_action'].value_counts()

already_vaccinated                      2584
not_vaccinated_yet                       323
no_action_taken_$100                      55
already_vaccinated_$100_not_eligible      43
need_vaccine_info                         27
need_vaccine_appt                         25
already_vaccinated_$100                   24
have_appointment                          14
not_vaccinated_yet_$100                    8
share_ask_by_champion                      2
no_action_taken                            1
Name: wi_GTVac_action, dtype: int64

In [33]:
cf_data['share_ask_received'].value_counts()

wi_GTVac_2021                             2649
vaccine_Racine_2021                          9
requested_vaccine_appointment_WI_GTVac       9
gotv_apr_2021_wisc                           2
cpd_voting_rights                            2
aart_legalize_test_1                         1
marijuana_2021_loc_main_phone                1
Name: share_ask_received, dtype: int64

In [34]:
cf_data['share_card_received'].value_counts()

wi_go_to_vac_backup      805
wi_GTVac_2021             26
gotv_apr_2021_wisc        13
racine_vaccine_backup      5
vaccine_Racine_2021        3
cpd_voting_rights          2
aart_legalize_test_1       1
Name: share_card_received, dtype: int64

In [35]:
# Save total number of live chat users

live_chat=sum(cf_data['live_chat_test'].value_counts())
live_chat

143

In [36]:
# Save total number of phone number aks

phones_cf=sum(cf_data['phone_number'].value_counts())
phones_cf

34

In [37]:
# Save total number of share agrees
share_agree=sum(cf_data['share_card_received'].value_counts())
share_agree

855

In [None]:
# Save a slice of data with phone numbers that are digits (not words)

phone_list=cf_data[cf_data['phone_number'].apply(lambda x: str(x).isdigit())]
phone_list

In [39]:
# Save total number of phone numbers submitted as digits

number_phones_submitted=len(phone_list['phone_number'].value_counts())
number_phones_submitted

19

In [40]:
cf_data['wi_GTVac_action'].value_counts()

already_vaccinated                      2584
not_vaccinated_yet                       323
no_action_taken_$100                      55
already_vaccinated_$100_not_eligible      43
need_vaccine_info                         27
need_vaccine_appt                         25
already_vaccinated_$100                   24
have_appointment                          14
not_vaccinated_yet_$100                    8
share_ask_by_champion                      2
no_action_taken                            1
Name: wi_GTVac_action, dtype: int64

In [41]:
vaccinated_users=len(cf_data[cf_data['wi_GTVac_action']== 'already_vaccinated']) + len(cf_data[cf_data['wi_GTVac_action']== 'already_vaccinated_$100_not_eligible'])+len(cf_data[cf_data['wi_GTVac_action']== 'already_vaccinated_$100'])
vaccinated_users

2651

In [42]:
# filter for vaccianted that clicked on "I will share"

filtered_values = cf_data[(cf_data['wi_GTVac_action']=='already_vaccinated') & (cf_data['share_card_received'] != 'not set')]
filtered_values['share_card_received'].value_counts()

wi_go_to_vac_backup      780
wi_GTVac_2021             25
gotv_apr_2021_wisc        12
racine_vaccine_backup      2
aart_legalize_test_1       1
cpd_voting_rights          1
vaccine_Racine_2021        1
Name: share_card_received, dtype: int64

In [43]:
# total vaccinated action takers

vaccinated_action_takers=sum(filtered_values['share_card_received'].value_counts())
vaccinated_action_takers

822

In [44]:
not_vac_action_takers=len(cf_data[cf_data['wi_GTVac_action']== 'need_vaccine_info']) + len(cf_data[cf_data['wi_GTVac_action']== 'need_vaccine_appt'])+len(cf_data[cf_data['wi_GTVac_action']== 'have_appointment'])
not_vac_action_takers

66

In [45]:
total_action_takers=not_vac_action_takers+vaccinated_action_takers
total_action_takers

888

In [46]:
# Create, format and save all the values we need for our report

vaccinated_cf=len(cf_data[cf_data['Vaccinated_Wisc_from_ad']== 'yes'])
not_vaccinated_cf=len(cf_data[cf_data['Vaccinated_Wisc_from_ad']== 'no'])
total_subscribers=vaccinated_cf+not_vaccinated_cf
percent_vaccinated='{:.0%}'.format(vaccinated_cf/total_subscribers)
percent_not_vaccinated='{:.0%}'.format(not_vaccinated_cf/total_subscribers)
cost_per_action_taker='${:0,.2f}'.format(fb_total_ad_spend_r/total_action_takers)
percent_per_action_taker='{:.0%}'.format(total_action_takers/total_cf_subscribers)
percent_per_action_taker_vac='{:.0%}'.format(vaccinated_action_takers/vaccinated_cf)
percent_per_action_taker_not_vac='{:.0%}'.format(not_vac_action_takers/not_vaccinated_cf)
cost_per_result_cf='${:0,.2f}'.format(fb_total_ad_spend_r/total_cf_subscribers)
cost_per_share_agree='${:0,.2f}'.format(fb_total_ad_spend_r/share_agree)
percent_share_agree='{:.0%}'.format(share_agree/total_cf_subscribers)

In [None]:
# initialize list of lists for the report

new_df = [
          ['Total Ad Spend (USD)', fb_total_ad_spend],
          ['Total Subscribers Acquired FB (Results)', fb_results],
          ['Total Subscribers Acquired CF', total_cf_subscribers],
          ['Cost Per Result FB Ads (USD)', fb_cost_per_result_alt_for],
          ['Cost Per Acquisition CF', cost_per_result_cf],
          ['Total Vaccinated', vaccinated_cf],
          ['% of Vaccinated', percent_vaccinated],
          ['Total Not Vaccinated', not_vaccinated_cf],
          ['% of Not Vaccinated', percent_not_vaccinated],
          ['Vaccinated Action Takers', vaccinated_action_takers],
          ['% of Vaccinated Action Takerse (from Vaccinated Users)', percent_per_action_taker_vac],
          ['Not Vaccinated Action Takers', not_vac_action_takers],
          ['% of Not Vaccinated Action Takers (from Not Vaccinated Users', percent_per_action_taker_not_vac],
          ['Total Action Takers', total_action_takers],
          ['% of Action Takers', percent_per_action_taker],
          ['Cost Per Action Taker', cost_per_action_taker],
          ['Total Agree to Relational Shares', share_agree],
          ['% of Total Agree to Relational Shares', percent_share_agree],
          ['Cost Per Agree to Relational share', cost_per_share_agree],
          ['Live Chat engagements', live_chat],
          
          
          
        ]
 
# Create the pandas DataFrame
GoToVacc_metrics = pd.DataFrame(new_df, columns = ['Description', 'Metric'])
GoToVacc_metrics

In [None]:
# EDA: vaccination status by county

cf_county = cf_data[['county_localyst', 'Vaccinated_Wisc_from_ad']]
breakdown_county_cf=cf_county.value_counts()
breakdown_county_cf

In [None]:
# EDA: petition signature by share agrees

cf_share = cf_data[['county_localyst', 'share_card_received']]
cf_share.value_counts()

**Here you need to click on the link and it will create a authentification token that you will plaste in the space provided in the code**
Click on the link that appears after "Go to this URL in a browser:" Chose your google account, sign in, copy the token. Paste the token in the slot provided in the script and click "enter".

In [50]:
# Mount drive from google

from google.colab import drive
drive.mount('drive')

Mounted at drive


In [51]:
from datetime import datetime
from pytz import timezone
import pytz

def get_pst_time():
    date_format='%m_%d_%Y_%H_%M_%S_%Z'
    date = datetime.now(tz=pytz.utc)
    date = date.astimezone(timezone('US/Pacific'))
    pstDateTime=date.strftime(date_format)
    return pstDateTime


date_PDT=get_pst_time()
date_PDT

'11_09_2021_16_23_15_PST'

In [52]:
today = date.today()
today = today.strftime("%b-%d-%Y")
today

'Nov-10-2021'

In [53]:
# Export metrics report back to files
# The files will be in the same folder where you uploaded the data

GoToVacc_metrics.to_csv(f'{date_PDT}-Wisconsin GoToVacc Metrics.csv')
!cp WISC_GoToVacc_Metrics.csv "drive/My Drive/"

cp: cannot stat 'WISC_GoToVacc_Metrics.csv': No such file or directory
