### Welcome to the Case Project Notebook on Digital Marketing 

**Problem Statement:**
* In the competitive landscape of online advertising, understanding the performance metrics of advertising campaigns is crucial for optimizing strategies and maximizing return on investment (ROI). 
* This dataset provides insights into the advertising performance of "Company X" from April 1, 2020, to June 30, 2020, including user engagement, ad sizes, placements, costs, clicks, revenue, and post-click conversions.

The _goal_ of this analysis is to uncover trends in user engagement, assess the impact of ad sizes on clicks, evaluate publisher placements, investigate the correlation between costs and revenue, identify high-performing campaigns, and analyze conversion rates. This will provide actionable insights for enhancing future advertising strategies and optimizing campaign performance.

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

**Data Loading and Exploration:**
   - Load the dataset (`online_advertising_performance_data.csv`) into a DataFrame using Pandas.
   - Exploring the structure of the dataset (number of rows, columns, data types, unique values and recognising patterns for data analysis).

In [3]:
data = pd.read_csv('./online_advertising_performance_data.csv')
#pd.set_option('display.max_columns', None)
data.head(10)

Unnamed: 0,month,day,campaign_number,user_engagement,banner,placement,displays,cost,clicks,revenue,post_click_conversions,post_click_sales_amount,Unnamed: 12,Unnamed: 13
0,April,1,camp 1,High,160 x 600,abc,4,0.006,0,0.0,0,0.0,,
1,April,1,camp 1,High,160 x 600,def,20170,26.7824,158,28.9717,23,1972.4602,,
2,April,1,camp 1,High,160 x 600,ghi,14701,27.6304,158,28.9771,78,2497.2636,,
3,April,1,camp 1,High,160 x 600,mno,171259,216.875,1796,329.4518,617,24625.3234,,
4,April,1,camp 1,Low,160 x 600,def,552,0.067,1,0.1834,0,0.0,,
5,April,1,camp 1,Low,160 x 600,ghi,16,0.0249,0,0.0,0,0.0,,
6,April,1,camp 1,Low,160 x 600,mno,2234,0.4044,10,1.8347,3,101.7494,,
7,April,1,camp 1,Medium,160 x 600,def,2963,1.8899,4,0.7338,4,100.5044,,
8,April,1,camp 1,Medium,160 x 600,ghi,580,0.9917,9,1.6512,0,0.0,,
9,April,1,camp 1,Medium,160 x 600,mno,20152,11.1678,185,33.9397,13,653.1896,,


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15408 entries, 0 to 15407
Data columns (total 14 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   month                    15408 non-null  object 
 1   day                      15408 non-null  int64  
 2   campaign_number          15408 non-null  object 
 3   user_engagement          15408 non-null  object 
 4   banner                   15408 non-null  object 
 5   placement                14995 non-null  object 
 6   displays                 15408 non-null  int64  
 7   cost                     15408 non-null  float64
 8   clicks                   15408 non-null  int64  
 9   revenue                  15408 non-null  float64
 10  post_click_conversions   15408 non-null  int64  
 11  post_click_sales_amount  15408 non-null  float64
 12  Unnamed: 12              0 non-null      float64
 13  Unnamed: 13              0 non-null      float64
dtypes: float64(5), int64(4

In [5]:
data.describe()

Unnamed: 0,day,displays,cost,clicks,revenue,post_click_conversions,post_click_sales_amount,Unnamed: 12,Unnamed: 13
count,15408.0,15408.0,15408.0,15408.0,15408.0,15408.0,15408.0,0.0,0.0
mean,15.518886,15512.573014,11.370262,161.788487,17.929943,42.300623,2123.288058,,
std,8.740909,44392.39289,45.369499,728.276911,96.781834,213.68566,10523.029607,,
min,1.0,0.0,0.0,0.0,0.0,0.0,0.0,,
25%,8.0,78.0,0.024,0.0,0.0,0.0,0.0,,
50%,15.0,1182.0,0.33985,6.0,0.48395,0.0,0.0,,
75%,23.0,8960.25,2.536225,53.0,3.8398,3.0,163.3512,,
max,31.0,455986.0,556.7048,14566.0,2096.2116,3369.0,199930.318,,


In [6]:
#finding the NaN values in our dataset
missing_values_count = data.isnull().sum()
print(missing_values_count[missing_values_count > 0])

placement        413
Unnamed: 12    15408
Unnamed: 13    15408
dtype: int64


After looking around through the columns and info, we get to know about the dtypes and null values in our dataset, We can start with Data Cleaning and Preprocessing for furthur analysis. 
1) We can remove the unnamed columns 12, 13 in the dataset
2) Deal with NaN values in placement
