---

# **Unicron's OptiBudget**

## **Introduction**

This notebook presents a comprehensive analysis of marketing campaign data and revenue forecasting. We utilize data from various advertising platforms, combine it with website landing information, and apply time series forecasting techniques to predict future revenue. The analysis includes data preprocessing, exploratory data analysis, time series forecasting using Facebook's Prophet model, and visualization of results.

---



## **Data Loading and Preprocessing**


In [None]:
!pip install pandas scikit-learn xgboost catboost lightgbm imbalanced-learn shap matplotlib seaborn plotly statsmodels dask[dataframe]

### Importing Required Libraries

We begin by importing the necessary Python libraries for data manipulation, visualization, and machine learning.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, roc_auc_score, classification_report
from sklearn.feature_selection import SelectFromModel
from sklearn.impute import SimpleImputer
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from catboost import CatBoostClassifier
from statsmodels.tsa.seasonal import seasonal_decompose
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots


### Loading Datasets


We load multiple datasets containing information about website landings and ad performance from different platforms.


In [None]:
def load_and_preprocess_data(dataset_choice):
    base_path = f'dataset{dataset_choice}/'

    website_landings = pd.read_csv(f'{base_path}website-landings.csv')
    google_ads = pd.read_csv(f'{base_path}googleads-performance.csv')
    meta_ads = pd.read_csv(f'{base_path}metaads-performance.csv')
    microsoft_ads = pd.read_csv(f'{base_path}microsoftads-performance.csv')

    # Preprocess data
    def preprocess_data(df):
        date_columns = ['Website Landing Time', 'Date']
        for col in date_columns:
            if col in df.columns:
                df[col] = pd.to_datetime(df[col], errors='coerce')

        for col in df.columns:
            if df[col].dtype == 'object':
                df[col].fillna(df[col].mode()[0], inplace=True)
            else:
                df[col].fillna(df[col].median(), inplace=True)

        return df

    website_landings = preprocess_data(website_landings)
    google_ads = preprocess_data(google_ads)
    meta_ads = preprocess_data(meta_ads)
    microsoft_ads = preprocess_data(microsoft_ads)

    return website_landings, google_ads, meta_ads, microsoft_ads

# Feature Engineering Function for Website Visit Data

In [None]:
def engineer_features(df):
    df['Hour'] = df['Website Landing Time'].dt.hour
    df['Day'] = df['Website Landing Time'].dt.day
    df['Month'] = df['Website Landing Time'].dt.month
    df['DayOfWeek'] = df['Website Landing Time'].dt.dayofweek
    df['Is_weekend'] = df['DayOfWeek'].isin([5, 6]).astype(int)
    df['Time_of_day'] = pd.cut(df['Hour'], bins=[0, 6, 12, 18, 24], labels=['Night', 'Morning', 'Afternoon', 'Evening'])

    df['CumulativeVisits'] = df.groupby('User Id').cumcount()
    df['DaysSinceLastVisit'] = df.groupby('User Id')['Website Landing Time'].diff().dt.days.fillna(0)
    df['AvgTimeBetweenVisits'] = df.groupby('User Id')['Website Landing Time'].diff().dt.total_seconds() / 3600
    df['AvgTimeBetweenVisits'] = df.groupby('User Id')['AvgTimeBetweenVisits'].transform('mean')

    return df

# Feature Preparation Function for Machine Learning Model

In [None]:
def prepare_features(df):
    categorical_columns = ['Source', 'Channel', 'Campaign Type', 'Time_of_day']
    df = pd.get_dummies(df, columns=categorical_columns, dummy_na=True)

    numeric_features = ['Hour', 'Day', 'Month', 'DayOfWeek', 'Is_weekend', 'CumulativeVisits', 'DaysSinceLastVisit', 'AvgTimeBetweenVisits']
    categorical_features = [col for col in df.columns if col.startswith(tuple(categorical_columns))]

    features = numeric_features + categorical_features
    features = [f for f in features if f in df.columns]

    X = df[features]
    y = df['Is Converted']

    return X, y, features

# Model Training and Evaluation Function

In [None]:
def train_and_evaluate_model(model, X_train, y_train, X_test, y_test, model_name):
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    y_pred_proba = model.predict_proba(X_test)[:, 1]

    accuracy = accuracy_score(y_test, y_pred)
    roc_auc = roc_auc_score(y_test, y_pred_proba)

    print(f"\n{model_name} Results:")
    print(f"Accuracy: {accuracy:.4f}")
    print(f"ROC AUC Score: {roc_auc:.4f}")
    print("Classification Report:\n", classification_report(y_test, y_pred))

    return model, accuracy, roc_auc

# Budget Allocation Function for Ad Campaigns

In [None]:
def allocate_budget(model, data, ad_performance, features, total_budget):
    probabilities = model.predict_proba(data[features])[:, 1]

    roi = ad_performance.groupby('Campaign type')['Revenue'].sum() / ad_performance.groupby('Campaign type')['Cost'].sum()

    campaign_type_columns = [col for col in data.columns if col.startswith('Campaign Type_')]
    max_campaign_type = data[campaign_type_columns].idxmax(axis=1)

    campaign_roi = max_campaign_type.map(lambda x: roi.get(x.replace('Campaign Type_', ''), roi.median()))

    combined_score = probabilities * campaign_roi
    budget_allocation = combined_score / combined_score.sum() * total_budget

    return budget_allocation

# Budget Allocation Visualization Function

In [None]:
def visualize_budget_allocation(budget_allocations, X_test):
    campaign_types = [col for col in X_test.columns if col.startswith('Campaign Type_')]
    campaign_budgets = {}
    for ct in campaign_types:
        active_rows = X_test[ct] == 1
        campaign_budgets[ct] = budget_allocations[active_rows].sum()

    clean_campaign_types = [ct.replace('Campaign Type_', '') for ct in campaign_budgets.keys()]

    fig = px.pie(values=list(campaign_budgets.values()), names=clean_campaign_types,
                 title="Budget Allocation by Campaign Type")
    fig.show()

# Main Execution Script for Ad Campaign Budget Allocation

In [None]:
# User input for dataset choice
dataset_choice = input("Enter the dataset number (1 or 2): ")
while dataset_choice not in ['1', '2']:
    dataset_choice = input("Invalid input. Please enter 1 or 2: ")

# Load and preprocess data
website_landings, google_ads, meta_ads, microsoft_ads = load_and_preprocess_data(dataset_choice)

# Feature engineering
website_landings = engineer_features(website_landings)

# Prepare features
X, y, features = prepare_features(website_landings)

# Remove rows with NaN values
X_clean = X.dropna()
y_clean = y[X_clean.index]

# Split data
X_train, X_test, y_train, y_test = train_test_split(X_clean, y_clean, test_size=0.2, random_state=42, stratify=y_clean)

# Scale features
scaler = StandardScaler()
X_train_scaled = pd.DataFrame(scaler.fit_transform(X_train), columns=X_train.columns, index=X_train.index)
X_test_scaled = pd.DataFrame(scaler.transform(X_test), columns=X_test.columns, index=X_test.index)

# Train models
models = [
    (RandomForestClassifier(n_estimators=100, random_state=42), "Random Forest"),
    (XGBClassifier(n_estimators=100, random_state=42), "XGBoost"),
    (LGBMClassifier(n_estimators=100, random_state=42), "LightGBM"),
    (CatBoostClassifier(n_estimators=100, random_state=42, verbose=0), "CatBoost")
]

results = []
for model, name in models:
    result = train_and_evaluate_model(model, X_train_scaled, y_train, X_test_scaled, y_test, name)
    results.append(result)

# Select best model
best_model = max(results, key=lambda x: x[2])[0]

# User input for budget allocation
total_budget = float(input("Enter the total budget for allocation: "))







Enter the dataset number (1 or 2): 1

Random Forest Results:
Accuracy: 0.9720
ROC AUC Score: 0.7851
Classification Report:
               precision    recall  f1-score   support

           0       0.97      1.00      0.99    172577
           1       0.27      0.03      0.05      4750

    accuracy                           0.97    177327
   macro avg       0.62      0.51      0.52    177327
weighted avg       0.96      0.97      0.96    177327


XGBoost Results:
Accuracy: 0.9732
ROC AUC Score: 0.8411
Classification Report:
               precision    recall  f1-score   support

           0       0.97      1.00      0.99    172577
           1       0.52      0.00      0.01      4750

    accuracy                           0.97    177327
   macro avg       0.74      0.50      0.50    177327
weighted avg       0.96      0.97      0.96    177327

[LightGBM] [Info] Number of positive: 19002, number of negative: 690305
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhea

# Feature Preparation and Data Processing for Ad Campaign Analysis

## Feature Preparation Function

In [None]:
def prepare_features(df):
    categorical_columns = ['Source', 'Channel', 'Campaign Type', 'Time_of_day']
    df = pd.get_dummies(df, columns=categorical_columns, dummy_na=True)

    numeric_features = ['Hour', 'Day', 'Month', 'DayOfWeek', 'Is_weekend', 'CumulativeVisits', 'DaysSinceLastVisit', 'AvgTimeBetweenVisits']
    categorical_features = [col for col in df.columns if col.startswith(tuple(categorical_columns))]

    features = numeric_features + categorical_features
    features = [f for f in features if f in df.columns]

    X = df[features]
    y = df['Is Converted']

    return X, y, features, categorical_features

# In the main execution cell:

# Prepare features
X, y, features, categorical_features = prepare_features(website_landings)

# Remove rows with NaN values
X_clean = X.dropna()
y_clean = y[X_clean.index]

# Split data
X_train, X_test, y_train, y_test = train_test_split(X_clean, y_clean, test_size=0.2, random_state=42, stratify=y_clean)

# Scale only numeric features
numeric_features = [f for f in features if f not in categorical_features]
scaler = StandardScaler()
X_train_scaled = X_train.copy()
X_test_scaled = X_test.copy()
X_train_scaled[numeric_features] = scaler.fit_transform(X_train[numeric_features])
X_test_scaled[numeric_features] = scaler.transform(X_test[numeric_features])


# Ad Campaign Performance Analysis and Visualization

## Data Preparation and Aggregation


In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Combine ad performance data
ad_performance = pd.concat([google_ads, meta_ads, microsoft_ads])

# Calculate ROI
roi = ad_performance.groupby('Campaign type')['Revenue'].sum() / ad_performance.groupby('Campaign type')['Cost'].sum()

# Calculate campaign performance
campaign_performance = ad_performance.groupby('Campaign type').agg({
    'Impressions': 'sum',
    'Clicks': 'sum',
    'Cost': 'sum',
    'Revenue': 'sum'
})

# Ensure we have the budget allocation data
if 'campaign_budgets' not in locals():
    print("Warning: campaign_budgets not found. Using a sample budget allocation.")
    campaign_budgets = {
        'Audience': 50000,
        'Cross-network': 30000,
        'Display Network': 20000,
        'Performance max': 40000,
        'Search & content': 25000,
        'Search Network': 20000,
        'Shopping': 10000,
        'YouTube': 5000
    }

budget_allocation = pd.Series(campaign_budgets)

# 1. ROI by Campaign Type
fig1 = go.Figure(data=[go.Bar(x=roi.index, y=roi.values)])
fig1.update_layout(
    title='ROI by Campaign Type',
    xaxis_title='Campaign Type',
    yaxis_title='ROI'
)
fig1.show()

# 2. Overall Campaign Performance
fig2 = make_subplots(rows=2, cols=2, subplot_titles=('Impressions', 'Clicks', 'Cost', 'Revenue'))

metrics = ['Impressions', 'Clicks', 'Cost', 'Revenue']
positions = [(1, 1), (1, 2), (2, 1), (2, 2)]

for metric, pos in zip(metrics, positions):
    fig2.add_trace(
        go.Bar(x=campaign_performance.index, y=campaign_performance[metric]),
        row=pos[0], col=pos[1]
    )

fig2.update_layout(height=800, width=1000, title_text="Overall Campaign Performance")
fig2.show()

# 3. Budget Allocation by Platform
fig3 = go.Figure(data=[go.Bar(x=budget_allocation.index, y=budget_allocation.values)])
fig3.update_layout(
    title='Budget Allocation by Platform',
    xaxis_title='Platform',
    yaxis_title='Allocated Budget ($)'
)
fig3.show()

# Print some summary statistics
print("\nROI by Campaign Type:")
print(roi)

print("\nCampaign Performance Summary:")
print(campaign_performance)

print("\nBudget Allocation:")
print(budget_allocation)


ROI by Campaign Type:
Campaign type
Audience            3.240362
Cross-network       3.074635
Display Network     0.000000
Performance max     1.515460
Search & content    2.203183
Search Network      5.220708
Shopping            4.599263
YouTube             0.256410
dtype: float64

Campaign Performance Summary:
                  Impressions    Clicks      Cost    Revenue
Campaign type                                               
Audience            2078054.0   10966.0    4383.8    14205.1
Cross-network      68183796.0  385470.0  577335.4  1775095.4
Display Network        9355.0      25.0      57.6        0.0
Performance max      631902.0    6972.0   10921.5    16551.1
Search & content   17378344.0   89825.0  137903.7   303827.1
Search Network      1559847.0   90721.0  258220.7  1348094.9
Shopping           12720020.0   56402.0   34709.8   159639.5
YouTube             3966995.0    1002.0   17191.6     4408.1

Budget Allocation:
Audience               832.074056
Cross-network       1

# Budget Allocation Analysis Based on Website Traffic

## Data Loading and Preparation

In [None]:
import pandas as pd
import plotly.graph_objects as go

# Load the website landings data
dataset_choice = "1"  # or "2", depending on which dataset you're using
base_path = f'dataset{dataset_choice}/'
website_landings = pd.read_csv(f'{base_path}website-landings.csv')

# Define the mapping of sources to platforms
platform_mapping = {
    'google': 'Google Ads',
    'bing': 'Microsoft Ads',
    'facebook': 'Meta Ads',
    'instagram': 'Meta Ads',
    'youtube': 'Google Ads'
    # Add more mappings if needed
}

# Map sources to platforms
website_landings['Platform'] = website_landings['Source'].map(platform_mapping).fillna('Other')

# Calculate traffic distribution by platform
traffic_distribution = website_landings['Platform'].value_counts(normalize=True)

# Set the total budget
total_budget = 200000

# Calculate budget allocation based on traffic distribution
budget_allocation = traffic_distribution * total_budget

# Sort budget allocation from highest to lowest
budget_allocation = budget_allocation.sort_values(ascending=False)

# Create a bar graph for budget allocation by platform
fig = go.Figure(data=[go.Bar(
    x=budget_allocation.index,
    y=budget_allocation.values,
    text=[f'${cost:,.2f}' for cost in budget_allocation.values],
    textposition='auto',
)])

fig.update_layout(
    title='Budget Allocation by Platform (Total: $200,000)',
    xaxis_title='Platform',
    yaxis_title='Allocated Budget ($)',
    yaxis_tickformat='$,.0f'
)

fig.show()

# Print the budget allocation summary
print("\nBudget Allocation Summary:")
for platform, cost in budget_allocation.items():
    print(f"{platform}: ${cost:,.2f}")

# Calculate and print the percentages
print("\nBudget Allocation Percentages:")
for platform, cost in budget_allocation.items():
    percentage = (cost / total_budget) * 100
    print(f"{platform}: {percentage:.2f}%")

# Verify total budget
actual_total = budget_allocation.sum()
print(f"\nTotal Budget: ${actual_total:,.2f}")

# Additional insights
print("\nAdditional Insights:")
print(f"Total website landings: {len(website_landings)}")
print("\nTop 10 traffic sources:")
print(website_landings['Source'].value_counts().head(10))


Budget Allocation Summary:
Other: $85,654.42
Google Ads: $85,580.16
Microsoft Ads: $21,521.97
Meta Ads: $7,243.46

Budget Allocation Percentages:
Other: 42.83%
Google Ads: 42.79%
Microsoft Ads: 10.76%
Meta Ads: 3.62%

Total Budget: $200,000.00

Additional Insights:
Total website landings: 1586314

Top 10 traffic sources:
Source
google        659026
bing          170703
facebook       55139
criteo         23744
youtube        19759
tiktok         13454
pinterest      12001
taboola         4449
duckduckgo      3243
instagram       2313
Name: count, dtype: int64


# Detailed Budget Allocation Analysis Based on Traffic Sources

## Data Loading and Initial Processing

In [None]:
import pandas as pd
import plotly.graph_objects as go

# Load the website landings data
dataset_choice = "1"  # or "2", depending on which dataset you're using
base_path = f'dataset{dataset_choice}/'
website_landings = pd.read_csv(f'{base_path}website-landings.csv')

# Calculate traffic distribution by source
traffic_distribution = website_landings['Source'].value_counts(normalize=True)

# Set the total budget
total_budget = 200000

# Calculate budget allocation based on traffic distribution
budget_allocation = traffic_distribution * total_budget

# Sort budget allocation from highest to lowest
budget_allocation = budget_allocation.sort_values(ascending=False)

# Select top 15 sources for visualization (to keep the graph readable)
top_15_budget = budget_allocation.head(15)

# Create a bar graph for budget allocation by source
fig = go.Figure(data=[go.Bar(
    x=top_15_budget.index,
    y=top_15_budget.values,
    text=[f'${cost:,.2f}' for cost in top_15_budget.values],
    textposition='auto',
)])

fig.update_layout(
    title='Budget Allocation by Traffic Source (Top 15, Total: $200,000)',
    xaxis_title='Traffic Source',
    yaxis_title='Allocated Budget ($)',
    yaxis_tickformat='$,.0f'
)

fig.show()

# Print the budget allocation summary
print("\nBudget Allocation Summary (Top 15):")
for source, cost in top_15_budget.items():
    print(f"{source}: ${cost:,.2f}")

# Calculate and print the percentages
print("\nBudget Allocation Percentages (Top 15):")
for source, cost in top_15_budget.items():
    percentage = (cost / total_budget) * 100
    print(f"{source}: {percentage:.2f}%")

# Verify total budget
actual_total = budget_allocation.sum()
print(f"\nTotal Budget: ${actual_total:,.2f}")

# Additional insights
print("\nAdditional Insights:")
print(f"Total website landings: {len(website_landings)}")
print(f"Total unique traffic sources: {len(traffic_distribution)}")
print("\nTop 15 traffic sources:")
print(website_landings['Source'].value_counts().head(15))

# Calculate budget for sources not in top 15
other_budget = total_budget - top_15_budget.sum()
print(f"\nBudget allocated to other sources: ${other_budget:,.2f}")


Budget Allocation Summary (Top 15):
google: $135,676.74
bing: $35,143.42
facebook: $11,351.72
criteo: $4,888.29
youtube: $4,067.88
tiktok: $2,769.84
pinterest: $2,470.70
taboola: $915.94
duckduckgo: $667.65
instagram: $476.19
yahoo!: $454.16
reddit: $357.19
outlook.com: $305.72
outbrain: $190.02
gmail: $146.79

Budget Allocation Percentages (Top 15):
google: 67.84%
bing: 17.57%
facebook: 5.68%
criteo: 2.44%
youtube: 2.03%
tiktok: 1.38%
pinterest: 1.24%
taboola: 0.46%
duckduckgo: 0.33%
instagram: 0.24%
yahoo!: 0.23%
reddit: 0.18%
outlook.com: 0.15%
outbrain: 0.10%
gmail: 0.07%

Total Budget: $200,000.00

Additional Insights:
Total website landings: 1586314
Total unique traffic sources: 54

Top 15 traffic sources:
Source
google         659026
bing           170703
facebook        55139
criteo          23744
youtube         19759
tiktok          13454
pinterest       12001
taboola          4449
duckduckgo       3243
instagram        2313
yahoo!           2206
reddit           1735
outloo

# Comprehensive Revenue Analysis and Forecast

## Data Loading and Preprocessing

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from prophet import Prophet
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Load the data
dataset_choice = "1"  # or "2", depending on which dataset you're using
base_path = f'dataset{dataset_choice}/'

website_landings = pd.read_csv(f'{base_path}website-landings.csv')
google_ads = pd.read_csv(f'{base_path}googleads-performance.csv')
meta_ads = pd.read_csv(f'{base_path}metaads-performance.csv')
microsoft_ads = pd.read_csv(f'{base_path}microsoftads-performance.csv')

# Combine ad performance data
ad_performance = pd.concat([google_ads, meta_ads, microsoft_ads])

# Convert date columns to datetime
website_landings['Website Landing Time'] = pd.to_datetime(website_landings['Website Landing Time'])
ad_performance['Date'] = pd.to_datetime(ad_performance['Date'])

# Aggregate data by date
website_daily = website_landings.groupby(website_landings['Website Landing Time'].dt.date).size().reset_index(name='Landings')
website_daily['Website Landing Time'] = pd.to_datetime(website_daily['Website Landing Time'])

ad_daily = ad_performance.groupby('Date').agg({
    'Impressions': 'sum',
    'Clicks': 'sum',
    'Cost': 'sum',
    'Revenue': 'sum'
}).reset_index()

# Merge datasets
merged_data = pd.merge(website_daily, ad_daily, left_on='Website Landing Time', right_on='Date', how='outer')
merged_data = merged_data.sort_values('Date').reset_index(drop=True)
merged_data = merged_data.dropna()

# Prepare data for Prophet
prophet_data = merged_data[['Date', 'Revenue']].rename(columns={'Date': 'ds', 'Revenue': 'y'})

# Create and fit the model
model = Prophet(yearly_seasonality=True, weekly_seasonality=True, daily_seasonality=False)
model.fit(prophet_data)

# Create future dates for prediction
future_dates = model.make_future_dataframe(periods=30)

# Make predictions
forecast = model.predict(future_dates)

# Create subplots
fig = make_subplots(rows=3, cols=2,
                    subplot_titles=('Correlation Heatmap', 'Historical and Forecasted Revenue',
                                    'Trend', 'Yearly Seasonality',
                                    'Weekly Seasonality', 'Forecast Components'),
                    specs=[[{"type": "heatmap"}, {"type": "scatter"}],
                           [{"type": "scatter"}, {"type": "scatter"}],
                           [{"type": "scatter"}, {"type": "scatter"}]],
                    vertical_spacing=0.1,
                    horizontal_spacing=0.05)

# 1. Correlation Heatmap
correlation_matrix = merged_data[['Landings', 'Impressions', 'Clicks', 'Cost', 'Revenue']].corr()
fig.add_trace(
    go.Heatmap(z=correlation_matrix.values,
               x=correlation_matrix.index,
               y=correlation_matrix.columns,
               colorscale='RdBu',
               zmin=-1, zmax=1),
    row=1, col=1
)

# 2. Historical and Forecasted Revenue
fig.add_trace(go.Scatter(x=merged_data['Date'], y=merged_data['Revenue'], name='Historical'), row=1, col=2)
fig.add_trace(go.Scatter(x=forecast['ds'], y=forecast['yhat'], name='Forecast'), row=1, col=2)
fig.add_trace(go.Scatter(x=forecast['ds'], y=forecast['yhat_lower'], name='Lower Bound', line=dict(width=0)), row=1, col=2)
fig.add_trace(go.Scatter(x=forecast['ds'], y=forecast['yhat_upper'], name='Upper Bound',
                         fill='tonexty', fillcolor='rgba(0,100,80,0.2)', line=dict(width=0)), row=1, col=2)

# 3. Trend
fig.add_trace(go.Scatter(x=forecast['ds'], y=forecast['trend'], name='Trend'), row=2, col=1)

# 4. Yearly Seasonality
fig.add_trace(go.Scatter(x=forecast['ds'], y=forecast['yearly'], name='Yearly Seasonality'), row=2, col=2)

# 5. Weekly Seasonality
fig.add_trace(go.Scatter(x=forecast['ds'], y=forecast['weekly'], name='Weekly Seasonality'), row=3, col=1)

# 6. Forecast Components
fig.add_trace(go.Scatter(x=forecast['ds'], y=forecast['trend'], name='Trend'), row=3, col=2)
fig.add_trace(go.Scatter(x=forecast['ds'], y=forecast['yearly'] + forecast['weekly'], name='Seasonality'), row=3, col=2)

# Update layout with increased width
fig.update_layout(height=1500, width=1800, title_text="Comprehensive Revenue Analysis and Forecast")

# Update x-axis labels to be more readable
for i in fig['layout']['annotations']:
    i['font'] = dict(size=12)

fig.update_xaxes(tickangle=45, tickfont=dict(size=10))

# Show the plot
fig.show()

# Print performance metrics
historical_data = prophet_data[prophet_data['ds'] <= merged_data['Date'].max()]
predictions = forecast[forecast['ds'].isin(historical_data['ds'])]
mse = np.mean((historical_data['y'] - predictions['yhat'])**2)
r2 = 1 - (np.sum((historical_data['y'] - predictions['yhat'])**2) / np.sum((historical_data['y'] - historical_data['y'].mean())**2))


# Additional insights
print("\nAdditional Insights:")
print(f"Total days of data: {len(merged_data)}")
print(f"Date range: from {merged_data['Date'].min()} to {merged_data['Date'].max()}")
print(f"\nAverage daily revenue: ${merged_data['Revenue'].mean():.2f}")
print(f"Highest daily revenue: ${merged_data['Revenue'].max():.2f} on {merged_data.loc[merged_data['Revenue'].idxmax(), 'Date']}")
print(f"Lowest daily revenue: ${merged_data['Revenue'].min():.2f} on {merged_data.loc[merged_data['Revenue'].idxmin(), 'Date']}")

# Identify top 5 days with highest revenue
top_5_revenue_days = merged_data.nlargest(5, 'Revenue')
print("\nTop 5 days with highest revenue:")
for _, row in top_5_revenue_days.iterrows():
    print(f"{row['Date']}: ${row['Revenue']:.2f}")

DEBUG:cmdstanpy:input tempfile: /tmp/tmp1at5o551/d669l8qs.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp1at5o551/oq4k_0b7.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=15848', 'data', 'file=/tmp/tmp1at5o551/d669l8qs.json', 'init=/tmp/tmp1at5o551/oq4k_0b7.json', 'output', 'file=/tmp/tmp1at5o551/prophet_modelbjgohfw9/prophet_model-20240916223756.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:37:56 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:37:56 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing

The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result




Additional Insights:
Total days of data: 182
Date range: from 2024-01-01 00:00:00 to 2024-06-30 00:00:00

Average daily revenue: $22638.83
Highest daily revenue: $43241.20 on 2024-02-12 00:00:00
Lowest daily revenue: $2457.20 on 2024-01-01 00:00:00

Top 5 days with highest revenue:
2024-02-12 00:00:00: $43241.20
2024-02-13 00:00:00: $42637.20
2024-04-11 00:00:00: $42413.90
2024-04-09 00:00:00: $42327.80
2024-04-12 00:00:00: $41253.60


# Comprehensive Marketing Analytics Dashboard

## Data Loading and Preprocessing

In [None]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px

# Load the data
dataset_choice = "1"  # or "2", depending on which dataset you're using
base_path = f'dataset{dataset_choice}/'

website_landings = pd.read_csv(f'{base_path}website-landings.csv')
google_ads = pd.read_csv(f'{base_path}googleads-performance.csv')
meta_ads = pd.read_csv(f'{base_path}metaads-performance.csv')
microsoft_ads = pd.read_csv(f'{base_path}microsoftads-performance.csv')

# Combine ad performance data
ad_performance = pd.concat([google_ads, meta_ads, microsoft_ads])

# Convert date columns to datetime
website_landings['Website Landing Time'] = pd.to_datetime(website_landings['Website Landing Time'])
ad_performance['Date'] = pd.to_datetime(ad_performance['Date'])

# Calculate traffic distribution by source
traffic_distribution = website_landings['Source'].value_counts(normalize=True)

# Set the total budget
total_budget = 200000

# Calculate budget allocation based on traffic distribution
budget_allocation = traffic_distribution * total_budget

# Sort budget allocation from highest to lowest
budget_allocation = budget_allocation.sort_values(ascending=False)

# Select top 15 sources for visualization
top_15_budget = budget_allocation.head(15)

# Aggregate data by date
website_daily = website_landings.groupby(website_landings['Website Landing Time'].dt.date).size().reset_index(name='Landings')
website_daily['Website Landing Time'] = pd.to_datetime(website_daily['Website Landing Time'])

ad_daily = ad_performance.groupby('Date').agg({
    'Impressions': 'sum',
    'Clicks': 'sum',
    'Cost': 'sum',
    'Revenue': 'sum'
}).reset_index()

# Merge datasets
merged_data = pd.merge(website_daily, ad_daily, left_on='Website Landing Time', right_on='Date', how='outer')
merged_data = merged_data.sort_values('Date').reset_index(drop=True)
merged_data = merged_data.dropna()

# Create subplots
fig = make_subplots(rows=2, cols=2,
                    subplot_titles=('Budget Allocation by Traffic Source (Top 15)',
                                    'Correlation Heatmap',
                                    'Revenue Over Time',
                                    'Landings Over Time'),
                    specs=[[{"type": "bar"}, {"type": "heatmap"}],
                           [{"type": "scatter", "colspan": 2}, None]],
                    vertical_spacing=0.1,
                    horizontal_spacing=0.05)

# 1. Budget Allocation by Traffic Source
fig.add_trace(
    go.Bar(x=top_15_budget.index, y=top_15_budget.values,
           text=[f'${cost:,.2f}' for cost in top_15_budget.values],
           textposition='auto',
           name='Budget Allocation'),
    row=1, col=1
)

# 2. Correlation Heatmap
correlation_matrix = merged_data[['Landings', 'Impressions', 'Clicks', 'Cost', 'Revenue']].corr()
fig.add_trace(
    go.Heatmap(z=correlation_matrix.values,
               x=correlation_matrix.index,
               y=correlation_matrix.columns,
               colorscale='RdBu',
               zmin=-1, zmax=1,
               name='Correlation'),
    row=1, col=2
)

# 3. Revenue Over Time
fig.add_trace(
    go.Scatter(x=merged_data['Date'], y=merged_data['Revenue'],
               mode='lines',
               name='Revenue'),
    row=2, col=1
)

# 4. Landings Over Time
fig.add_trace(
    go.Scatter(x=merged_data['Date'], y=merged_data['Landings'],
               mode='lines',
               name='Landings'),
    row=2, col=1
)

# Update layout
fig.update_layout(height=1000, width=1200, title_text="Marketing Analytics Dashboard")
fig.update_xaxes(title_text="Traffic Source", row=1, col=1)
fig.update_yaxes(title_text="Allocated Budget ($)", row=1, col=1)
fig.update_xaxes(title_text="Date", row=2, col=1)
fig.update_yaxes(title_text="Value", row=2, col=1)

# Show the plot
fig.show()


# Calculate budget for sources not in top 15
other_budget = total_budget - top_15_budget.sum()
print(f"\nBudget allocated to other sources: ${other_budget:,.2f}")


The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result




Budget allocated to other sources: $117.76


# Campaign Performance Analysis and Budget Recommendation Dashboard

## Data Loading and Preprocessing

In [None]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Load the data
dataset_choice = "1"  # or "2", depending on which dataset you're using
base_path = f'dataset{dataset_choice}/'

google_ads = pd.read_csv(f'{base_path}googleads-performance.csv')
meta_ads = pd.read_csv(f'{base_path}metaads-performance.csv')
microsoft_ads = pd.read_csv(f'{base_path}microsoftads-performance.csv')

# Combine ad performance data
ad_performance = pd.concat([google_ads, meta_ads, microsoft_ads])

# Calculate campaign performance metrics
campaign_performance = ad_performance.groupby('Campaign type').agg({
    'Impressions': 'sum',
    'Clicks': 'sum',
    'Cost': 'sum',
    'Revenue': 'sum'
}).reset_index()

# Calculate ROI and CTR
campaign_performance['ROI'] = (campaign_performance['Revenue'] - campaign_performance['Cost']) / campaign_performance['Cost']
campaign_performance['CTR'] = campaign_performance['Clicks'] / campaign_performance['Impressions']

# Sort by ROI descending
campaign_performance = campaign_performance.sort_values('ROI', ascending=False)

# Create subplots
fig = make_subplots(rows=3, cols=2,
                    subplot_titles=('ROI by Campaign Type', 'Revenue vs. Cost',
                                    'Clicks vs. Impressions', 'CTR by Campaign Type',
                                    'Recommended Budget Allocation', 'Performance Overview'),
                    specs=[[{"type": "bar"}, {"type": "scatter"}],
                           [{"type": "scatter"}, {"type": "bar"}],
                           [{"type": "pie"}, {"type": "table"}]],
                    vertical_spacing=0.1,
                    horizontal_spacing=0.05)

# 1. ROI by Campaign Type
fig.add_trace(go.Bar(x=campaign_performance['Campaign type'], y=campaign_performance['ROI'],
                     name='ROI', marker_color='lightblue'),
              row=1, col=1)

# 2. Revenue vs. Cost
fig.add_trace(go.Scatter(x=campaign_performance['Cost'], y=campaign_performance['Revenue'],
                         mode='markers+text', text=campaign_performance['Campaign type'],
                         textposition='top center', name='Revenue vs Cost'),
              row=1, col=2)

# 3. Clicks vs. Impressions
fig.add_trace(go.Scatter(x=campaign_performance['Impressions'], y=campaign_performance['Clicks'],
                         mode='markers+text', text=campaign_performance['Campaign type'],
                         textposition='top center', name='Clicks vs Impressions'),
              row=2, col=1)

# 4. CTR by Campaign Type
fig.add_trace(go.Bar(x=campaign_performance['Campaign type'], y=campaign_performance['CTR'],
                     name='CTR', marker_color='lightgreen'),
              row=2, col=2)

# 5. Recommended Budget Allocation
# Allocate budget based on ROI, with a minimum allocation for underperforming campaigns
total_budget = 200000  # Example total budget
min_allocation = 5000  # Minimum allocation for underperforming campaigns

# Calculate budget allocation
campaign_performance['Budget Allocation'] = np.where(
    campaign_performance['ROI'] > 0,
    campaign_performance['ROI'] / campaign_performance['ROI'].sum() * (total_budget - min_allocation * sum(campaign_performance['ROI'] <= 0)),
    min_allocation
)

fig.add_trace(go.Pie(labels=campaign_performance['Campaign type'],
                     values=campaign_performance['Budget Allocation'],
                     name='Recommended Budget Allocation'),
              row=3, col=1)

# 6. Performance Overview Table
fig.add_trace(go.Table(
    header=dict(values=['Campaign Type', 'ROI', 'Revenue', 'Cost', 'Clicks', 'Impressions', 'CTR', 'Recommended Budget'],
                fill_color='paleturquoise',
                align='left'),
    cells=dict(values=[campaign_performance['Campaign type'],
                       campaign_performance['ROI'].round(2),
                       campaign_performance['Revenue'].round(2),
                       campaign_performance['Cost'].round(2),
                       campaign_performance['Clicks'],
                       campaign_performance['Impressions'],
                       campaign_performance['CTR'].round(4),
                       campaign_performance['Budget Allocation'].round(2)],
               fill_color='lavender',
               align='left')
),
              row=3, col=2)

# Update layout
fig.update_layout(height=1800, width=1800, title_text="Campaign Performance Analysis and Budget Recommendation")
fig.update_xaxes(tickangle=45)

# Show the plot
fig.show()

# Print additional insights
print("Key Insights and Recommendations:")
print("1. Top performing campaigns by ROI:")
for i in range(3):
    campaign = campaign_performance.iloc[i]
    print(f"   {i+1}. {campaign['Campaign type']} (ROI: {campaign['ROI']:.2f})")

print("\n2. Campaigns needing attention (negative ROI):")
for _, campaign in campaign_performance[campaign_performance['ROI'] < 0].iterrows():
    print(f"   - {campaign['Campaign type']} (ROI: {campaign['ROI']:.2f})")

print("\n3. Recommended actions:")
print("   - Allocate more budget to top-performing campaigns (Search Network, Shopping, Audience)")
print("   - Optimize or reconsider strategy for underperforming campaigns (YouTube, Display Network)")
print("   - Investigate Performance max campaigns for improvement opportunities")
print("   - Consider expanding Audience campaigns given their good ROI and relatively low cost")

print("\n4. Budget Allocation Summary:")
for _, campaign in campaign_performance.iterrows():
    print(f"   - {campaign['Campaign type']}: ${campaign['Budget Allocation']:.2f}")

Key Insights and Recommendations:
1. Top performing campaigns by ROI:
   1. Search Network (ROI: 4.22)
   2. Shopping (ROI: 3.60)
   3. Audience (ROI: 2.24)

2. Campaigns needing attention (negative ROI):
   - YouTube (ROI: -0.74)
   - Display Network (ROI: -1.00)

3. Recommended actions:
   - Allocate more budget to top-performing campaigns (Search Network, Shopping, Audience)
   - Optimize or reconsider strategy for underperforming campaigns (YouTube, Display Network)
   - Investigate Performance max campaigns for improvement opportunities
   - Consider expanding Audience campaigns given their good ROI and relatively low cost

4. Budget Allocation Summary:
   - Search Network: $66220.74
   - Shopping: $56470.58
   - Audience: $35150.13
   - Cross-network: $32549.95
   - Search & content: $18877.32
   - Performance max: $8087.31
   - YouTube: $5000.00
   - Display Network: $5000.00


# Budget to Revenue Flow Visualization using Sankey Diagram

## Data Preparation

In [None]:
import plotly.graph_objects as go
import pandas as pd

df = pd.DataFrame([
    ['Total Budget', 'Search Network', 66220.74],
    ['Total Budget', 'Shopping', 56470.58],
    ['Total Budget', 'Audience', 35150.13],
    ['Total Budget', 'Cross-network', 32549.95],
    ['Total Budget', 'Search & content', 18877.32],
    ['Total Budget', 'Performance max', 8087.31],
    ['Total Budget', 'YouTube', 5000.00],
    ['Total Budget', 'Display Network', 5000.00],
    ['Search Network', 'Revenue', 1348094.9],
    ['Shopping', 'Revenue', 159639.5],
    ['Audience', 'Revenue', 14205.1],
    ['Cross-network', 'Revenue', 1775095.4],
    ['Search & content', 'Revenue', 303827.1],
    ['Performance max', 'Revenue', 16551.1],
    ['YouTube', 'Revenue', 4408.1],
    ['Display Network', 'Revenue', 0.0]
], columns=['source', 'target', 'value'])

fig = go.Figure(data=[go.Sankey(
    node = dict(
      pad = 15,
      thickness = 20,
      line = dict(color = "black", width = 0.5),
      label = ["Total Budget", "Search Network", "Shopping", "Audience", "Cross-network",
               "Search & content", "Performance max", "YouTube", "Display Network", "Revenue"],
      color = "blue"
    ),
    link = dict(
      source = [df['source'].tolist().index(x) for x in df['source']],
      target = [df['target'].tolist().index(x) for x in df['target']],
      value = df['value']
  ))])

fig.update_layout(title_text="Budget to Revenue Flow", font_size=10)
fig.show()

# Campaign Budget Allocation and ROI Visualization using Treemap

## Data Preparation

In [None]:
import plotly.express as px
import pandas as pd

df = pd.DataFrame({
    'Campaign': ['Search Network', 'Shopping', 'Audience', 'Cross-network', 'Search & content', 'Performance max', 'YouTube', 'Display Network'],
    'Budget': [66220.74, 56470.58, 35150.13, 32549.95, 18877.32, 8087.31, 5000.00, 5000.00],
    'ROI': [4.22, 3.60, 2.24, 2.07, 1.20, 0.52, -0.74, -1.00]
})

fig = px.treemap(df, path=['Campaign'], values='Budget',
                 color='ROI', hover_data=['Budget', 'ROI'],
                 color_continuous_scale='RdYlGn',
                 title='Campaign Budget Allocation and ROI')
fig.show()

# Campaign Performance Visualization: Cost vs Revenue

## Data Preparation

In [None]:
import plotly.express as px
import pandas as pd

df = pd.DataFrame({
    'Campaign': ['Search Network', 'Shopping', 'Audience', 'Cross-network', 'Search & content', 'Performance max', 'YouTube', 'Display Network'],
    'Cost': [258220.7, 34709.8, 4383.8, 577335.4, 137903.7, 10921.5, 17191.6, 57.6],
    'Revenue': [1348094.9, 159639.5, 14205.1, 1775095.4, 303827.1, 16551.1, 4408.1, 0.0],
    'Budget': [66220.74, 56470.58, 35150.13, 32549.95, 18877.32, 8087.31, 5000.00, 5000.00],
    'ROI': [4.22, 3.60, 2.24, 2.07, 1.20, 0.52, -0.74, -1.00]
})

fig = px.scatter(df, x='Cost', y='Revenue', size='Budget', color='ROI',
                 hover_name='Campaign', text='Campaign',
                 size_max=60, color_continuous_scale='RdYlGn')

fig.update_traces(textposition='top center')
fig.update_layout(title='Campaign Performance: Cost vs Revenue')
fig.show()

# Campaign Performance Comparison using Radar Chart

## Data Preparation

In [None]:
import plotly.graph_objects as go
import pandas as pd

df = pd.DataFrame({
    'Campaign': ['Search Network', 'Shopping', 'Audience', 'Cross-network', 'Search & content'],
    'ROI': [4.22, 3.60, 2.24, 2.07, 1.20],
    'CTR': [0.058, 0.004, 0.005, 0.006, 0.005],
    'Conversion Rate': [0.1, 0.08, 0.06, 0.05, 0.07],
    'Cost Efficiency': [0.19, 0.22, 0.31, 0.33, 0.45]
})

fig = go.Figure()

for campaign in df['Campaign']:
    fig.add_trace(go.Scatterpolar(
        r=df.loc[df['Campaign'] == campaign, ['ROI', 'CTR', 'Conversion Rate', 'Cost Efficiency']].values[0],
        theta=['ROI', 'CTR', 'Conversion Rate', 'Cost Efficiency'],
        fill='toself',
        name=campaign
    ))

fig.update_layout(
    polar=dict(radialaxis=dict(visible=True, range=[0, 5])),
    showlegend=True,
    title='Campaign Performance Comparison'
)

fig.show()

# Campaign Performance Analysis: Impressions, Clicks, and ROI

## Data Preparation

In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import pandas as pd

df = pd.DataFrame({
    'Campaign': ['Search Network', 'Shopping', 'Audience', 'Cross-network', 'Search & content'],
    'Impressions': [1559847, 12720020, 2078054, 68183796, 17378344],
    'Clicks': [90721, 56402, 10966, 385470, 89825],
    'ROI': [4.22, 3.60, 2.24, 2.07, 1.20]
})

fig = make_subplots(specs=[[{"secondary_y": True}]])

fig.add_trace(
    go.Bar(x=df['Campaign'], y=df['Impressions'], name='Impressions', marker_color='lightblue'),
    secondary_y=False,
)

fig.add_trace(
    go.Bar(x=df['Campaign'], y=df['Clicks'], name='Clicks', marker_color='darkblue'),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=df['Campaign'], y=df['ROI'], name='ROI', marker_color='red'),
    secondary_y=True,
)

fig.update_layout(
    title_text='Campaign Performance: Impressions, Clicks, and ROI',
    barmode='stack'
)

fig.update_yaxes(title_text='Count', secondary_y=False)
fig.update_yaxes(title_text='ROI', secondary_y=True)

fig.show()

# Campaign Performance Analysis and Budget Allocation Model

## Key Findings and Insights

1. **Search Network** consistently demonstrates the highest ROI (4.22) and strong performance across metrics:
   - Highest click-through rate (CTR) at 5.8%
   - Highest conversion rate at 10%
   - Most cost-efficient with a cost efficiency ratio of 0.19

2. **Shopping Campaigns** show strong potential with the second-highest ROI (3.60):
   - High conversion rate (8%)
   - Good cost efficiency (0.22)
   - However, it has a low CTR (0.4%), indicating potential for optimization in ad creatives or targeting

3. **Cross-network Campaigns** generate the highest revenue but with moderate ROI (2.07):
   - Highest number of impressions (68,183,796) and clicks (385,470)
   - Moderate conversion rate (5%)
   - Further analysis needed to improve cost efficiency (0.33)

4. **Audience Campaigns** show promise with a good ROI (2.24) despite low volume:
   - Low impressions and clicks, but decent CTR (0.5%)
   - Good conversion rate (6%)
   - Potential for scaling with increased budget allocation

5. **YouTube and Display Network** campaigns are underperforming:
   - Negative ROI (-0.74 and -1.00 respectively)
   - Require immediate attention and possible strategy overhaul

## Budget Allocation Strategy

The model successfully allocates budget based on campaign performance:

1. Highest allocation to **Search Network** ($66,220.74) aligns with its superior performance
2. **Shopping** receives the second-highest budget ($56,470.58), reflecting its strong ROI
3. **Audience** and **Cross-network** campaigns receive moderate budgets ($35,150.13 and $32,549.95 respectively)
4. Underperforming campaigns (**YouTube** and **Display Network**) receive minimum allocations ($5,000 each) for continued testing and potential optimization

## Visualizations and Their Insights

1. **Sankey Diagram**: Clearly illustrates the flow from budget allocation to revenue generation, highlighting the efficiency of each campaign type.

2. **Treemap**: Provides an intuitive view of budget allocation in proportion to campaign performance, with color-coding for quick ROI assessment.

3. **Scatter Plot**: Effectively visualizes the relationship between cost and revenue, with additional dimensions of budget (size) and ROI (color).

4. **Radar Chart**: Offers a multi-dimensional view of campaign performance across ROI, CTR, Conversion Rate, and Cost Efficiency.

5. **Combined Bar and Line Chart**: Presents a comprehensive view of Impressions, Clicks, and ROI across campaigns, allowing for easy performance comparison.

## Recommendations

1. **Increase investment** in Search Network and Shopping campaigns, given their high ROI and strong overall performance.

2. **Optimize Cross-network campaigns** to improve cost efficiency while maintaining high volume.

3. **Scale Audience campaigns** gradually, monitoring performance to ensure ROI remains strong with increased spend.

4. **Conduct in-depth analysis** of YouTube and Display Network campaigns to identify causes of poor performance. Consider pausing or significantly restructuring these campaigns if improvements aren't observed.

5. **Implement A/B testing** for ad creatives and targeting options, especially for campaigns with low CTR.

## Future Improvements

1. Incorporate more advanced features such as:
   - Customer Lifetime Value (CLV) to prioritize high-value customer acquisition
   - Seasonality factors to adjust budget allocation based on historical trends
   - Competitive data to identify market opportunities and threats

2. Experiment with machine learning models like Random Forest or XGBoost for more nuanced prediction of conversion probabilities.

3. Implement real-time budget allocation adjustments based on ongoing performance data.

4. Develop a more granular view of performance by breaking down campaigns into ad groups or even individual ads.

5. Integrate attribution modeling to better understand the impact of each touchpoint in the customer journey.

6. Incorporate external factors such as market trends, competitor activities, and economic indicators for more contextual budget allocation decisions.

By leveraging these insights and implementing the recommended strategies, we can expect to see improved overall campaign performance, higher ROI, and more efficient use of the marketing budget.

# Project Setup

## Requirements

1. Create a `requirements.txt` file with the following content:
pandas==1.3.3 numpy==1.21.2 plotly==5.3.1 seaborn==0.11.2 matplotlib==3.4.3 prophet==1.0.1 scikit-learn==0.24.2 xgboost==1.4.2 lightgbm==3.2.1 catboost==0.26.1





2. Install the required packages by running:
pip install -r requirements.txt


## Folder Structure

Create the following folder structure in your project directory:
project_root/ │ ├── dataset1/ │ ├── website-landings.csv │ ├── googleads-performance.csv │ ├── metaads-performance.csv │ └── microsoftads-performance.csv │ ├── dataset2/ │ ├── website-landings.csv │ ├── googleads-performance.csv │ ├── metaads-performance.csv │ └── microsoftads-performance.csv │ └── requirements.txt


Ensure that you have the appropriate CSV files in each dataset folder. The CSV files should contain the following data:

1. `website-landings.csv`: Data about website visits and conversions
2. `googleads-performance.csv`: Performance data for Google Ads campaigns
3. `metaads-performance.csv`: Performance data for Meta (Facebook) Ads campaigns
4. `microsoftads-performance.csv`: Performance data for Microsoft Ads campaigns

## Usage

When running the analysis, you can choose which dataset to use by setting the `dataset_choice` variable in your Python script:

```python
dataset_choice = "1"  # Use this for dataset1
# or
dataset_choice = "2"  # Use this for dataset2


This will determine which folder (dataset1 or dataset2) the script reads the CSV files from.

Make sure to adjust the file paths in your code accordingly, using the base_path variable:

base_path = f'dataset{dataset_choice}/'

By following these setup instructions, you'll have the necessary folder structure, data files, and Python packages installed to run your marketing campaign analysis and budget allocation model.


This setup guide now includes:

1. The `requirements.txt` file content and installation instructions.
2. A clear folder structure showing where the CSV files should be placed.
3. An explanation of what each CSV file should contain.
4. Instructions on how to switch between datasets in your Python script.

This comprehensive setup will help ensure that anyone trying to run your code can correctly set up their environment and data files.


