[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aynetdia/Budget_Constrained_Bidding/blob/master/Tutorial.ipynb) 

# Setting up the notebook in Colab

In [None]:
# Mount on GDrive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Cloning the repo (only when setting up for the first time)
%cd drive/MyDrive
!git clone https://github.com/aynetdia/budget_constrained_bidding.git

Here's a link to the dataset: https://drive.google.com/drive/folders/1YYyxGMDW0EuZA2BI-uR_2j60lpngKgy8?usp=sharing

In order to set up the dataset, you either have to download the linked folder containing the dataset and put it into `/budget_constrained_biddig/data/ipinyou/`, in case you choose to run the notebook locally, or add a GDrive shortcut (go to: Shared with me -> Right click on the dataset folder -> Add shortcut to Drive) and select the  same `/budget_constrained_biddig/data/ipinyou/` as a location to place the shortcut in, if you want to run the notebook in Colab.

In [None]:
# Go into the project folder and pull the last changes (do that every time before running the notebook)
!cd budget_constrained_bidding
!git pull

In [36]:
import sys
import os

module_path = os.path.abspath(os.path.join('..'))
sys.path.append(module_path+"\\budget_constrained_bidding\src\gym-auction_emulator\gym_auction_emulator\envs")
from split_time_interval import Split
import configparser
import pandas as pd



# Table of Contents

1. Introduction into the application domain
2. Methods
3. DRLB framework
4. Implementation
5. Results

# 1. Introduction

# 2. Methods

## 2.1 Introduction and Methods Comparison


Reinforcement Learning (RL) is a tool to support sequential decision-making events and is a highly active research field of machine learning. It finds enormous application in e.g. real-time-bidding in advertisement auction.

RL is a core control problem in which an agent sequentially interacts with an unknown environment to maximize its cumulative reward [1]. It is a simulation of the human learning process in dynamic environments without any supervision. To find the best strategy of reward maximization, the agent continuously interacts with the environment and finds the optimal action under different states. Roughly speaking, the agent’s goal is to get as much reward as possible over the long run.

For instance, board game playing is a classical dynamic decision-making process. Players need to interact with their counterparts. It is a procedure filled with immediate rewards, intuitive judgments, and each action will generate an impact in the end. The cumulative reward will decide its win-or-failure [1].

In reality, such scenarios will consist of several problems that can be solved by a class of solution methods, but not all of them can achieve the rewards maximization objective. In the learning process of RL, each action the agent made, will generate some impact not only on the immediate reward but also on the future states. In other words, RL faces a dynamic decision-making problem, the agent will obtain continuous rewards and punishments in this learning process and modify its behaviors according to the environment's feedbacks and finally to maximize the cumulative reward.

In order to make the process more clear, we explain some basic concept of RL as follows: 
Policy: the whole actions the learning agent has taken in a concrete period 
Agent: the one who takes the action
Environment: the place where the agent takes action and interacts with him
Action: the move the agent makes to interact with the environment
State: a situation in which the agent perceives 
Reward: feedback of the agent’s action 	

 Image 1 illustrates the whole process of RL.

![Image 1: Reinforcement Learning Process](process_RL.png)
	

	
Besides RL, supervised learning and unsupervised learning are the most classic machine learning methods. In this section, we explain why RL is the best choice for solving dynamic decision-making problems. Thus, the core differences among these three methods are shown in table(#).

![Table 1: Machine Learning Methods Comparison](comparison_methods.png)
	



Supervised learning is mainly used to solve regression and classification problems. It explores the relation among labeled, the target variable, and other input variables and produces a model to predict further category identification.

In comparison, unsupervised learning obtains unlabeled observations and searches the hidden structure behind the input data. Both of these methods are not suitable for the desired behavior optimization problem in a dynamic process. Unlike supervised and unsupervised learning, which will not consider the capacity or some constrained environments, reinforcement learning will start with a clear initial setting, goal, and interactive agent. All the sequential action the agent takes will generate some influence in this environment. After a dynamic model training process, the agent obtains more and more rewards as desired by taking optimal actions in order to maximize the cumulative reward. 

Reinforcement Learning models problems into a Markov Decision Process and matching the output to the problem. In theory, when we can map our problems into an MDP, we can expect that reinforcement learning can be a useful tool to solve such problems. In the next section, we will explain how it works with Markov Decision Process and give more details about Deep Q-Learning, which is a core concept in our project.

	





# 3. DRLB framework

# 4. Implementation

In [41]:
#data read and extract the minutes interval from timestamp
fields = ['click', 'weekday', 'hour', 'bidid', 'timestamp', 'logtype', 'ipinyouid', 'useragent',
        'IP', 'region', 'city', 'adexchange', 'domain', 'url', 'urlid', 'slotid', 'slotwidth', 'slotheight',
        'slotvisibility', 'slotformat', 'slotprice', 'creative', 'bidprice', 'payprice', 'keypage',
        'advertiser', 'usertag']

        # insert the ctr in this row
cfg = configparser.ConfigParser(allow_no_value=True)
#env_dir need to be motified for
env_dir =module_path+ '\\budget_constrained_bidding\src\gym-auction_emulator\gym_auction_emulator\envs'
cfg.read(env_dir + '/config.cfg')
data_src = cfg['data']['dtype']
if data_src == 'ipinyou':
        file_in = env_dir + str(cfg['data']['ipinyou_path'])
metric = str(cfg['data']['metric'])

bid_requests = pd.read_csv(file_in, sep="\t", usecols=fields)

bid_requests["timestamp"]=bid_requests["timestamp"].apply(str)
bid_requests["minute"]=bid_requests["timestamp"].apply(Split.get_time_interval)
bid_requests

         click  weekday  hour                             bidid  \
0            0        4     0  81aced04baad90f9358aa39a4521cd6f   
1            0        4     0    572fa35095e8b6c30b1aa871e52b2d   
2            0        4     0   e1e44b8a725b957a626991ecb15b56f   
3            0        4     0  f250658c8cfc615c824f9552497d6325   
4            0        4     0  125f1b570af0b42341eb60a532544f0b   
5            0        4     0  3a799517748639990b899fc611ec8830   
6            0        4     0  f4cc8a2ac2dd52b5cff548e5bbf45677   
7            0        4     0  85f33913696667652389ad45d1131977   
8            0        4     0  c854e61d821a2076054328212704906d   
9            0        4     0  8b089b1437f9937d4cd3bc8dcfc00285   
10           0        4     0  72879b068fef3c7bc26c2d3a6c2afd51   
11           0        4     0  badb9e7c89f2bd6f44c1c1297374ead8   
12           0        4     0  44e9969505423047c9d16d53bd4c3dfd   
13           0        4     0  c65a5c2c3b9ddcd0aed408fdd1263d8

# 5. Results