# Data Scientist Hiring Challenge

## Background
An E-commerce website has hundreds of thousands of visitors everyday. The visitors come from several marketing channels such as digital campaigns on social media, referrals from publishers, organic search and CRM.


The main business here is collecting traffic from different sources/channels and converting those visitors to leads. A marketing lead is **a person who shows interest in a brand's products or services**, which makes a visitor a potential customer for the seller/service provider. The primary goal of any company is to generate as many leads as possible to ultimately increase conversion rates in the sales funnel.


The main aim in the challenge is to develop a model that can forecast the next day number of leads.

 

## Datasets and Features
There is one main dataset and three others as auxiliary:

1. Transactions dataset is the main dataset that stores the number of leads for each minute broken down according to products and channels. The data goes back till 29.09.2020.
    
    Features:

    * pk: primary key and unique ID in the database table
    * ga_transactionid: the id of the transaction from google Analytics
    * ga_datehour: the time of the transaction in yyyymmddHH format
    * ga_products: name of the products (Product A, Product B, Product C, Product D, Product E, Product F)
    * ga_channels: the channel a visitor comes for (Facebook, Google Ads, Organic search, Direct, CRM)
    * ga_itemquantity: number of leads

2. Economic calendar dataset keeps record of all events that may affect economic variables such as currency exchange rate, interest rate, and stockes in the market.
    
    Features:

    * pk: primary key and unique ID in the database table  
    * date: starts from 28.04.2021
    * time: when the event takes place
    * country: where the event happened
    * indicator: the name of the event
    * priority: there are three levels (1, 2, 3) where 3 is the highest priority
    * exception: anticipated market impact  
    * previous: represents the previous market impact either positive or negative
 

3. Economic variables dataset observes and keeps track of the changes in terms of important variables such as USDTRY or BIST100. The dataset stores the variables daily at three different hours (09, 12, 15) hrs.
    
    Features:

    * pk: primary key and unique ID in the database table
    * date: starts from 28.04.2021
    * hour: (09, 12, 15) hrs
    * bist100: Borsa Istanbul stock exchange
    * usdtry: usd and try exchange rate
    * eurtry: eur and try exchange rate
    * eurusd: eur and usd exchange rate
    * faiz: interest rate in Turkey
    * xau: gold price in ounce
    * brent: Atlantic basin crude oils price
4. Live Digital campaigns dataset that has the number of live digital campaigns for everyday since 29.09.2020.
    
    Features:

    * date: since 29.09.2020
    * live_campaigns: numeric value of the number of campaign


## Tasks
1. Give some analysis on the relationship between the economical events and variables and their impacts on the daily number of visitors.
2. Using Transaction dataset, forecast the next day leads for each channel (Facebook, CRM and so on).
3. Given the number of live digital campaigns and other auxiliary datasets, try to optimise the performance of your forecasting model (or even develop a new model).

## Deliverables


Write your solution on jupyter notebooks for each task (analysis and model development) and make it clear you explain what you are doing properly.


Your jupyter notebooks for each task should be named in the following format: Task1.ipynb, Task2.ipynb and Task3.ipynb


Make sure that your code is replicable and you document your approach and code in a clear way.