<a href="https://colab.research.google.com/github/Amit2balag/Scaler-Projects/blob/main/AdEase_Time_Series.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##**Project Brief: AdEase Data Science Team - Wikipedia Page Views Forecasting**

## Project Overview
AdEase, a leading ads and marketing company, is dedicated to maximizing clicks at a minimum cost for businesses. The Data Science team at AdEase is initiating a project to understand and forecast per-page view reports for different Wikipedia pages over a 550-day period. The primary objective is to predict and optimize ad placements for clients across diverse regions, necessitating insights into ad performance on pages in various languages.

## Dataset
[Link to Dataset](https://drive.google.com/drive/folders/1mdgQscjqnCtdg7LGItomyK0abN6lcHBb)

### Data Files
1. **train_1.csv:** Contains daily page view data for 145,000 Wikipedia pages. Each row corresponds to a specific article, and each column represents a date.

2. **Exog_Campaign_eng.csv:** Binary data indicating the presence (1) or absence (0) of campaigns or significant events for specific dates, affecting views for pages in English.

## Concepts to be Tested
- Exploratory Data Analysis (EDA)
- Time Series Forecasting: ARIMA, SARIMAX, and Prophet

## Tasks

### 1. Exploratory Data Analysis
- Import the dataset and conduct exploratory analysis.
- Check dataset structure, characteristics, and null values.
- Understand the page name format and extract relevant information.
- Visualize data and draw insights.
- Convert data to a format suitable for ARIMA modeling.

### 2. Stationarity Check
- Assess data stationarity using the Dickey-Fuller test.
- Apply methods for achieving stationarity, such as decomposition and differencing.
- Plot AutoCorrelation Function (ACF) and Partial AutoCorrelation Function (PACF) plots.

### 3. Modeling
- Develop an ARIMA model for forecasting.
- Utilize the exogenous variable from Exog_Campaign_eng.csv to train a SARIMAX model.
- Implement Facebook Prophet for forecasting.
- Explore methods (e.g., grid search) to find optimal parameters for at least one modeling approach.

### 4. Evaluation
- Define functions for all tasks to ensure reproducibility.
- Compare results across different languages.
- Calculate Mean Absolute Percentage Error (MAPE) for model evaluation.
- Provide insights and recommendations based on the comparison.

## Success Metrics
- MAPE within the range of 4-8% for previous batches.
- Successful implementation of ARIMA, SARIMAX, and Prophet models.
- Clear insights and recommendations derived from the comparison.

## Team Collaboration
Regular collaboration with other teams at AdEase, especially the AI modules team, to integrate forecasting insights into the overall advertising solution.

## Deliverables
- Comprehensive EDA report.
- ARIMA, SARIMAX, and Prophet forecasting models.
- Comparison report across different languages.
- Recommendations for optimizing ad placements based on forecasting insights.

By successfully executing this project, the Data Science team at AdEase aims to enhance the precision and efficiency of ad placements, providing our clients with data-driven strategies to maximize their advertising impact across diverse regions and languages.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# ! gdown https://drive.google.com/drive/folders/1MkxqOoaNMmXl3sGX2mEvKyDiJzlPDS31
# Link may not work as the folder has multiple files inside.

Downloading...
From: https://drive.google.com/drive/folders/1MkxqOoaNMmXl3sGX2mEvKyDiJzlPDS31
To: /content/1MkxqOoaNMmXl3sGX2mEvKyDiJzlPDS31
1.09MB [00:00, 64.9MB/s]


In [13]:
df = pd.read_csv("/train_1.csv")
df

Unnamed: 0,Page,2015-07-01,2015-07-02,2015-07-03,2015-07-04,2015-07-05,2015-07-06,2015-07-07,2015-07-08,2015-07-09,...,2016-12-22,2016-12-23,2016-12-24,2016-12-25,2016-12-26,2016-12-27,2016-12-28,2016-12-29,2016-12-30,2016-12-31
0,2NE1_zh.wikipedia.org_all-access_spider,18.0,11.0,5.0,13.0,14.0,9.0,9.0,22.0,26.0,...,32.0,63.0,15.0,26.0,14.0,20.0,22.0,19.0,18.0,20.0
1,2PM_zh.wikipedia.org_all-access_spider,11.0,14.0,15.0,18.0,11.0,13.0,22.0,11.0,10.0,...,17.0,42.0,28.0,15.0,9.0,30.0,52.0,45.0,26.0,20.0
2,3C_zh.wikipedia.org_all-access_spider,1.0,0.0,1.0,1.0,0.0,4.0,0.0,3.0,4.0,...,3.0,1.0,1.0,7.0,4.0,4.0,6.0,3.0,4.0,17.0
3,4minute_zh.wikipedia.org_all-access_spider,35.0,13.0,10.0,94.0,4.0,26.0,14.0,9.0,11.0,...,32.0,10.0,26.0,27.0,16.0,11.0,17.0,19.0,10.0,11.0
4,52_Hz_I_Love_You_zh.wikipedia.org_all-access_s...,,,,,,,,,,...,48.0,9.0,25.0,13.0,3.0,11.0,27.0,13.0,36.0,10.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1466,世界地球日_zh.wikipedia.org_all-access_spider,40.0,14.0,8.0,3.0,3.0,8.0,3.0,5.0,11.0,...,11.0,12.0,6.0,11.0,7.0,13.0,12.0,11.0,6.0,15.0
1467,沉默_(電影)_zh.wikipedia.org_all-access_spider,2.0,1.0,2.0,6.0,4.0,3.0,2.0,3.0,10.0,...,6.0,7.0,11.0,10.0,7.0,11.0,13.0,19.0,13.0,17.0
1468,CM-11勇虎式戰車_zh.wikipedia.org_all-access_spider,3.0,2.0,8.0,5.0,1.0,3.0,4.0,6.0,3.0,...,13.0,12.0,13.0,11.0,8.0,7.0,3.0,12.0,13.0,4.0
1469,陳嘉君_(社運人士)_zh.wikipedia.org_all-access_spider,1.0,2.0,1.0,3.0,1.0,2.0,3.0,2.0,2.0,...,4.0,4.0,2.0,9.0,11.0,9.0,4.0,3.0,11.0,5.0
