# Elections Ad Spending Analysis in Argentina using Python(2023)

#### We collected data from Google ads about how much money was spent in formats like text, videos or image by political parties during Argentina elections 2023. In this article, we perform elections ad spending analysis - EDA - using Python (and how it impacted the voting patterns).

## Data Collection

We extract the dataset from Ads Transparency Center which format is in Google Sheet. We transform this and download it in a .csv format. We have the variables: Ad_Type, Regions, Advertiser_ID, Advertiser_Name, Ad_Campaings_List, Date_Range_Start, Date_Range_End, Num_of_Days, Impressions, Spend_USD, First_Served_Timestamp, Last_Served_Times, Age_Targeting, Gender_Targeting, Geo_Targeting_Included, Geo_Targeting_Excluded, Spend_Range_Min_USD, Spend_Range_Max_USD, Spend_Range_Min_ARS and Spend_Range_Max_ARS. 

We import the library pandas for dataframe manipulation.

In [1]:
import pandas as pd

df_ads = pd.read_csv("/home/andy/Descargas/ds_googleads.csv") #cambiar esto, quiero que me aparezca
                                                                #sólo el nombre de la carpeta
print("Shape of dataset: ", df_ads.shape)

#show the first 5 rows
df_ads.head()

Shape of dataset:  (5813, 22)


Unnamed: 0,Ad_ID,Ad_URL,Ad_Type,Regions,Advertiser_ID,Advertiser_Name,Ad_Campaigns_List,Date_Range_Start,Date_Range_End,Num_of_Days,...,First_Served_Timestamp,Last_Served_Timestamp,Age_Targeting,Gender_Targeting,Geo_Targeting_Included,Geo_Targeting_Excluded,Spend_Range_Min_USD,Spend_Range_Max_USD,Spend_Range_Min_ARS,Spend_Range_Max_ARS
0,CR10079170796300795905,https://adstransparency.google.com/advertiser/...,IMAGE,AR,AR14823408713892626433,Fernando Rossetto,,2023-04-20,2023-05-09,17,...,2023-04-20T19:36:00Z,2023-05-09T10:43:00Z,,,"Tucuman,Argentina",,0,10000,"30.000,00","45.000,00"
1,CR07383048713104523265,https://adstransparency.google.com/advertiser/...,IMAGE,AR,AR14823408713892626433,Fernando Rossetto,,2023-04-12,2023-04-19,8,...,2023-04-12T07:26:00Z,2023-04-19T19:16:00Z,,,"Tucuman,Argentina",,0,10000,"45.000,00","60.000,00"
2,CR08190079252775829505,https://adstransparency.google.com/advertiser/...,TEXT,AR,AR17825122736721100801,Guillermo Carricavur,,2023-03-31,2023-04-15,15,...,2023-03-31T00:21:00Z,2023-04-15T04:09:00Z,,,"Rio Negro,Argentina",,0,10000,000,"15.000,00"
3,CR06020378872824987649,https://adstransparency.google.com/advertiser/...,IMAGE,AR,AR17825122736721100801,Guillermo Carricavur,,2023-03-29,2023-03-29,2,...,2023-03-29T00:20:00Z,2023-03-29T09:13:00Z,,,"Rio Negro,Argentina",,0,10000,000,"15.000,00"
4,CR01904118294063874049,https://adstransparency.google.com/advertiser/...,TEXT,AR,AR16558549250636513281,NUEVAS NOTICIAS COOPERATIVA DE TRABAJO LIMITADA,,2022-09-12,2023-05-09,238,...,2022-09-12T07:00:00Z,2023-05-09T11:03:00Z,,,"Argentina, Buenos Aires Province,Argentina, Bu...",,0,10000,000,"15.000,00"


### Data Cleaning & Data Preprocessing



Let's drop NaN values from the dataframe like Ad_Campaigns_List, Age_Trageting, Gender_Targeting,  Geo_Targeting_Excluded.

In [2]:
print(type(df_ads))  # This should print <class 'pandas.core.frame.DataFrame'>
print(df_ads.head())  # Print the first few rows to ensure it's a valid DataFrame


<class 'pandas.core.frame.DataFrame'>
                    Ad_ID                                             Ad_URL  \
0  CR10079170796300795905  https://adstransparency.google.com/advertiser/...   
1  CR07383048713104523265  https://adstransparency.google.com/advertiser/...   
2  CR08190079252775829505  https://adstransparency.google.com/advertiser/...   
3  CR06020378872824987649  https://adstransparency.google.com/advertiser/...   
4  CR01904118294063874049  https://adstransparency.google.com/advertiser/...   

  Ad_Type Regions           Advertiser_ID  \
0   IMAGE      AR  AR14823408713892626433   
1   IMAGE      AR  AR14823408713892626433   
2    TEXT      AR  AR17825122736721100801   
3   IMAGE      AR  AR17825122736721100801   
4    TEXT      AR  AR16558549250636513281   

                                   Advertiser_Name  Ad_Campaigns_List  \
0                                Fernando Rossetto                NaN   
1                                Fernando Rossetto              

In [3]:
df_ads = df_ads.dropna(axis=1)
#df_ads = df_ads.dropna(subset=['Ad_Campaigns_List', 'Age_Trageting', 'Gender_Targeting',  'Geo_Targeting_Excluded'])
print("Shape of dataset after cleaning: ", df_ads.shape)
df_ads.head(5)

Shape of dataset after cleaning:  (5813, 16)


Unnamed: 0,Ad_ID,Ad_URL,Ad_Type,Regions,Advertiser_ID,Advertiser_Name,Date_Range_Start,Date_Range_End,Num_of_Days,Impressions,First_Served_Timestamp,Last_Served_Timestamp,Spend_Range_Min_USD,Spend_Range_Max_USD,Spend_Range_Min_ARS,Spend_Range_Max_ARS
0,CR10079170796300795905,https://adstransparency.google.com/advertiser/...,IMAGE,AR,AR14823408713892626433,Fernando Rossetto,2023-04-20,2023-05-09,17,300000-350000,2023-04-20T19:36:00Z,2023-05-09T10:43:00Z,0,10000,"30.000,00","45.000,00"
1,CR07383048713104523265,https://adstransparency.google.com/advertiser/...,IMAGE,AR,AR14823408713892626433,Fernando Rossetto,2023-04-12,2023-04-19,8,2000000-2250000,2023-04-12T07:26:00Z,2023-04-19T19:16:00Z,0,10000,"45.000,00","60.000,00"
2,CR08190079252775829505,https://adstransparency.google.com/advertiser/...,TEXT,AR,AR17825122736721100801,Guillermo Carricavur,2023-03-31,2023-04-15,15,100000-125000,2023-03-31T00:21:00Z,2023-04-15T04:09:00Z,0,10000,000,"15.000,00"
3,CR06020378872824987649,https://adstransparency.google.com/advertiser/...,IMAGE,AR,AR17825122736721100801,Guillermo Carricavur,2023-03-29,2023-03-29,2,6000-7000,2023-03-29T00:20:00Z,2023-03-29T09:13:00Z,0,10000,000,"15.000,00"
4,CR01904118294063874049,https://adstransparency.google.com/advertiser/...,TEXT,AR,AR16558549250636513281,NUEVAS NOTICIAS COOPERATIVA DE TRABAJO LIMITADA,2022-09-12,2023-05-09,238,3000-4000,2022-09-12T07:00:00Z,2023-05-09T11:03:00Z,0,10000,000,"15.000,00"


In [4]:
print(type(df_ads.describe()))

<class 'pandas.core.frame.DataFrame'>


Now, using the method .describe() let's see summary statistics of relevant columns, for example: Spend_Range_Max_ARS:

In [5]:
print(type(df_ads['Spend_Range_Max_ARS']))

<class 'pandas.core.series.Series'>


We convert the data type:

In [6]:
# Convert the 'Spend_Range_Max_ARS' column to float
df_ads['Spend_Range_Max_ARS'] = df_ads['Spend_Range_Max_ARS'].astype(float)

# Display the updated data types
print(df_ads.dtypes)


ValueError: could not convert string to float: '45.000,00'

Given that the inner elements from column 'Spend_Range_Max_AR' has a typing error(?), we fix it in the following way: 

In [7]:
# Remove the thousand separator and replace the decimal separator
df_ads['Spend_Range_Max_ARS'] = df_ads['Spend_Range_Max_ARS'].str.replace('.', '', regex=False).str.replace(',', '.', regex=False)

# Convert the column to float
df_ads['Spend_Range_Max_ARS'] = df_ads['Spend_Range_Max_ARS'].astype(float)

# Display the updated DataFrame and data types
print(df_ads)
print(df_ads.dtypes)


                       Ad_ID  \
0     CR10079170796300795905   
1     CR07383048713104523265   
2     CR08190079252775829505   
3     CR06020378872824987649   
4     CR01904118294063874049   
...                      ...   
5808  CR01746279345447501825   
5809  CR06369083625454960641   
5810  CR14634596316563374081   
5811  CR03606406729039872001   
5812  CR12197502405318803457   

                                                 Ad_URL Ad_Type Regions  \
0     https://adstransparency.google.com/advertiser/...   IMAGE      AR   
1     https://adstransparency.google.com/advertiser/...   IMAGE      AR   
2     https://adstransparency.google.com/advertiser/...    TEXT      AR   
3     https://adstransparency.google.com/advertiser/...   IMAGE      AR   
4     https://adstransparency.google.com/advertiser/...    TEXT      AR   
...                                                 ...     ...     ...   
5808  https://adstransparency.google.com/advertiser/...   IMAGE      AR   
5809  https://a

In [8]:
print(df_ads['Spend_Range_Max_ARS'].describe())

count    5.813000e+03
mean     3.287115e+04
std      9.486278e+04
min      1.500000e+04
25%      1.500000e+04
50%      1.500000e+04
75%      1.500000e+04
max      3.000000e+06
Name: Spend_Range_Max_ARS, dtype: float64


We calculate the mode (most frequent value) of the column 'Spend_Range_Max_ARS':

In [11]:
df_ads['Spend_Range_Max_ARS'].value_counts()

Spend_Range_Max_ARS
15000.0      4730
30000.0       400
45000.0       166
60000.0       112
75000.0        82
200000.0       43
250000.0       39
120000.0       38
90000.0        35
105000.0       33
135000.0       29
150000.0       21
350000.0       17
300000.0       14
450000.0        9
400000.0        9
700000.0        8
500000.0        6
900000.0        6
600000.0        5
1500000.0       4
800000.0        3
3000000.0       2
1000000.0       2
Name: count, dtype: int64

For practical purposes, we calculate de standard deviation of the numerical columns Num_of_Days and Spend_Range_Max_ARS:

In [10]:
print(df_ads.describe().loc['std'])

Num_of_Days               24.966256
Spend_Range_Max_ARS    94862.780324
Name: std, dtype: float64
