<a href="https://colab.research.google.com/github/ImCYMBIOT/NetElixir-AIgnition/blob/main/AIAdBudgetAllocator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction

In today's competitive digital landscape, understanding the customer journey and optimizing ad spend across multiple channels is crucial for maximizing conversions and improving return on investment (ROI). The challenge lies in effectively analyzing large volumes of customer interaction data and ad spend metrics to identify the most impactful channels for initiating and closing conversions.

Our solution addresses this challenge by leveraging advanced data analysis and machine learning techniques. We begin by collecting and processing customer journey data, encompassing various touchpoints such as clicks, impressions, and conversions across channels like Google, Meta, and direct traffic. Concurrently, we analyze ad spend data to gain insights into the relationship between spend, impressions, clicks, and resulting revenue.

By applying machine learning models, we identify trends and patterns in channel performance at different stages of the customer journey. This enables us to determine which channels are most effective for driving conversions and where budget allocations can be optimized. Ultimately, we develop a data-driven media investment plan that reallocates budgets to maximize customer acquisition and conversion rates, ensuring optimal ad spend effectiveness.


## Libraries and Versions

### Libraries Used

- **pandas**: Version `1.3.3` - Used for data manipulation and analysis.
- **scikit-learn**: Version `0.24.2` - Used for implementing and evaluating the machine learning model.
- **matplotlib**: Version `3.4.3` - Used for data visualization and plotting.
- **os**: Built-in - Used for interacting with the operating system.

### Installation Commands

To install the required libraries, use the following commands:

```bash
%pip install pandas==1.3.3
%pip install scikit-learn==0.24.2
%pip install matplotlib==3.4.3
```


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib
import pandas as pd
import os

## Input Section

### New Budget as Input

Specify the total budget available for reallocation. This input will be used to determine how the budget is distributed across various channels.

### Select and Read One Dataset

Choose and load a dataset from the available data sources. This dataset will contain information on ad performance or customer interactions and will be used for further analysis and model training.


#### Define the path to datasets

In [10]:
from google.colab import drive

drive.mount("/content/drive")

total_budget = int(input("Enter New Budget: "))

data_dir1 = '/content/drive/My Drive/NetelixirDatasets/Netelixir AIgnition Dataset 1'


data_dir2 = '/content/drive/My Drive/NetelixirDatasets/Netelixir AIgnition Dataset 2'



# Load datasets


def load_datasets(data_dir):


    datasets = {}


    for file_name in os.listdir(data_dir):


        if file_name.endswith('.csv'):


            datasets[file_name] = pd.read_csv(os.path.join(data_dir, file_name))


    return datasets



datasets1 = load_datasets(data_dir1)


datasets2 = load_datasets(data_dir2)

### Display the first few rows of each dataset

In [11]:
# Example: Display the first few rows of each dataset
for name, df in datasets1.items():
    print(f"{name} (Dataset 1):")
    print(df.head())
    print()

for name, df in datasets2.items():
    print(f"{name} (Dataset 2):")
    print(df.head())
    print()



googleads-performance.csv (Dataset 1):
         Date    Campaign type  Impressions  Clicks    Cost  Conversions  \
0  2024-01-01    Cross-network     143669.0   896.0   656.3          6.5   
1  2024-01-01  Display Network          3.0     0.0     0.0          0.0   
2  2024-01-01   Search Network       3701.0   251.0   496.5          4.5   
3  2024-01-01          YouTube      36211.0     8.0   115.2          0.0   
4  2024-01-02    Cross-network     183496.0  1172.0  1525.0          8.8   

   Revenue  
0   1410.3  
1      0.0  
2    576.4  
3      0.0  
4   3565.7  

microsoftads-performance.csv (Dataset 1):
         Date     Campaign type  Impressions  Clicks   Cost  Conversions  \
0  2024-01-01          Audience       9132.0    50.0   26.8          0.0   
1  2024-01-01   Performance max        897.0     9.0    7.0          0.0   
2  2024-01-01  Search & content      95977.0   561.0  846.5          1.0   
3  2024-01-01          Shopping      59860.0   343.0  215.2          1.0   
4  

### Perform Data Analysis

In [12]:

def analyze_data(datasets):
    for name, df in datasets.items():
        print(f"Analysis for {name}:")
        print("Describe:", df.describe())
        print()

analyze_data(datasets1)
analyze_data(datasets2)


Analysis for googleads-performance.csv:
Describe:          Impressions       Clicks         Cost  Conversions       Revenue
count     714.000000   714.000000   714.000000   714.000000    714.000000
mean   103249.289916   668.372549  1194.405182    19.543697   4380.389916
std    169925.328248   925.830005  1457.968112    23.106973   5466.346579
min         1.000000     0.000000     0.000000     0.000000      0.000000
25%      3614.500000     0.000000    54.600000     0.000000      0.000000
50%     14576.500000   225.500000   414.850000     6.500000   1126.150000
75%    101861.000000   870.750000  2127.500000    38.975000   8763.650000
max    698237.000000  3690.000000  6218.300000    90.200000  24422.700000

Analysis for microsoftads-performance.csv:
Describe:          Impressions       Clicks         Cost  Conversions      Revenue
count     721.000000   721.000000   721.000000   721.000000   721.000000
mean    45503.911234   227.690707   260.636338     4.460472   685.468516
std     499

## Approach and Methodology

### Data Processing

- **Dataset Loading**: The datasets are loaded from Google Drive, where each dataset is read from its respective CSV file. This step ensures that the data is available for analysis and modeling.
- **Data Exploration**: Initial exploration of the data is performed using methods like `head()` to inspect the first few rows, and `describe()` to generate summary statistics. This helps in understanding the distribution of data and identifying any potential issues such as missing values.
- **Data Preprocessing**: In this step, the input data is prepared for machine learning modeling. The necessary columns are selected, and any required transformations are applied to ensure the data is in a suitable format for analysis.

### Algorithm

- **Machine Learning Model**: A Linear Regression model is implemented to predict conversions based on input features such as `Impressions`, `Clicks`, and `Cost`.
  - **Model Training**: The data is split into training and testing sets using `train_test_split`. The Linear Regression model is then trained on the training data to learn the relationship between features and the target variable (`Conversions`).
  - **Prediction**: The trained model is used to predict conversions on the test data. The model’s performance is evaluated using the Mean Squared Error (MSE), which measures the average squared difference between the actual and predicted values.
  
- **Budget Allocation**: The predicted conversions are used to allocate the total budget across different channels:
  - **Minimum Budget Allocation**: A fixed minimum budget (10% of the total budget) is allocated to each channel.
  - **Remaining Budget Distribution**: The remaining budget is distributed based on the predicted conversions for each channel. Channels with higher predicted conversions receive a larger share of the remaining budget.

### Assumptions

- **Historical Performance as Predictor**: The model assumes that historical performance data (impressions, clicks, and cost) is indicative of future conversions.
- **Linear Relationship**: The Linear Regression model assumes a linear relationship between the input features and the target variable (`Conversions`).
- **Data Integrity**: It is assumed that the datasets are accurate and complete, with no significant outliers or missing values that could affect the model’s performance.
- **Fixed Minimum Budget**: The assumption that a fixed minimum budget (10% of the total) is suitable for all channels, regardless of their predicted performance.


### Machine Learning Analysis


## Algorithm Implementation

This section provides the code implementation for data cleaning, preprocessing, and the machine learning algorithm used for budget allocation.


In [13]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

def perform_ml_analysis(df, target_col, feature_cols):
    X = df[feature_cols]
    y = df[target_col]

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    model = LinearRegression()
    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)

    print("Mean Squared Error:", mse)
    return model

# Example: Perform ML analysis for Google Ads data
googleads1_df = datasets1['googleads-performance.csv']
model = perform_ml_analysis(googleads1_df, 'Conversions', ['Impressions', 'Clicks', 'Cost'])


Mean Squared Error: 76.63962585503431


### Budget Allocation Algorithm

In [14]:
def allocate_budget(total_budget, predictions):
    allocation = {}
    min_budget = total_budget * 0.10
    remaining_budget = total_budget - (len(predictions) * min_budget)

    for channel, predicted_conversions in predictions.items():
        allocation[channel] = min_budget + (predicted_conversions / sum(predictions.values()) * remaining_budget)

    return allocation

# Example: Predictions (placeholder values)
predictions = {
    'google': 150,
    'meta': 100,
    'microsoft': 80,
    'website': 70
}


budget_allocation = allocate_budget(total_budget, predictions)
print("Budget Allocation:", budget_allocation)


Budget Allocation: {'google': 65000.0, 'meta': 50000.0, 'microsoft': 44000.0, 'website': 41000.0}


### Save budget allocation to a CSV file in Google Drive

In [15]:

budget_allocation_df = pd.DataFrame(list(budget_allocation.items()), columns=['Channel', 'Allocated Budget'])
output_path = '/content/drive/My Drive/NetelixirDatasets/budget_allocation.csv'
budget_allocation_df.to_csv(output_path, index=False)

print(f"Budget allocation saved to {output_path}")


Budget allocation saved to /content/drive/My Drive/NetelixirDatasets/budget_allocation.csv
