# Lambda cost simulator for web-based traffic

This notebook simulates a syntetic month of requests based on wikipedia traffic shape. You can tune the `monthly_scale_factor` to adjust the total number of requests for the month.

After simulation, the total cost for the requests is calculated, plotting nice graphs with [plotly](https://plot.ly/).

We have choosen english wikipedia as source data, as it can be a fair representation of worldwide traffic. Other languages from wikipedia can be used to localize it further.

## 0. Initial setup

### Imports

In [None]:
# dataframe-related imports
from datetime import datetime
import pandas as pd
import numpy as np

import webish_simulator
import wikimedia_scraper as ws

### Plotly setup

In [None]:
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
import plotly.graph_objs as go

init_notebook_mode(connected=True)

requests_date_layout = go.Layout(
    title='Requests distribution',
    legend=dict(orientation="h"),
    height=500,
    xaxis=dict(
        title='Date',
        autorange=True,
        type='date',
    ),
    yaxis=dict(
        title='Requests'
    )
)

day_cost_layout = go.Layout(
    title='Monthly Cost by traffic',
    legend=dict(orientation="h"),
    height=500,
    xaxis=dict(
        title='Day of month',
        autorange=True,
    ),
    yaxis=dict(
        title='Cost ($)'
    )
)

breakeven_scale_layout = go.Layout(
    title='Break-even - scale',
    legend=dict(orientation="h"),
    height=500,
    xaxis=dict(
        title='Mean requests per second',
        autorange=True,
#         type='log',
    ),
    yaxis=dict(
        title='% total monthly reqs to hit breakeven '
    )
)

## 1.Get data from wikipedia source

`ws.get_traffic_generator` connects to wikimedia and downloads traffic logs for the selected date range. Data is cached for faster access in subsequent executions.

**Warning** During the first execution this could take long time!

Parameters to tune:
- `project`: project from wikipedia (language) to use.
- `start_date` and `end_date`: date range to extract from wikipedia.

In [None]:
project = 'en'

start_date = datetime(2017, 1,  1)
end_date   = datetime(2017, 12, 31)

ws.output_notebook()

traffic_generator = ws.get_traffic_generator(start_date, end_date, projects=(project,))
df = pd.DataFrame(list(traffic_generator))

In [None]:
# Change DF index to a datetime index

df = df.set_index(pd.DatetimeIndex(df['date']))
df = df.drop(['date'], axis=1)
#df = df.loc[df['project']==project]

# Calculate rolling mean (week)
df['rolling'] = df['hits'].rolling(window=24*7, min_periods=3).mean()

df.head()

### 1.1 A (random) week of Wikipedia requests

In [None]:
data = []

#df2 = df.loc['2017-01-23':'2017-01-29'].hits
#df2['rolling'] = df2['hits'].rolling(window=24, min_periods=1).mean()

requests_trace = go.Scatter(
    x=df.loc['2017-01-23':'2017-01-29'].index,
    y=df.loc['2017-01-23':'2017-01-29'].hits,
    name='Requests (EN)'
)

data.append(requests_trace)

rolling_trace = go.Scatter(
    x=df.loc['2017-01-23':'2017-01-29'].index,
    y=df.loc['2017-01-23':'2017-01-29'].hits.rolling(window=24, min_periods=0).mean(),
    name='Requests (EN - day mean)'
)

data.append(rolling_trace)

# Customize date format
requests_date_layout.title="Wikipedia requests distribution (one week)"
requests_date_layout.xaxis=dict(
    tickformat='%a'
)

fig = go.Figure(data=data, layout=requests_date_layout)
iplot(fig)

### 1.2 Plot a whole year of requests of Wikipedia in english

In [None]:
data = []

requests_trace = go.Scatter(
    x=df.index,
    y=df['hits'],
    name='Requests (EN)'
)

data.append(requests_trace)

rolling_trace = go.Scatter(
    x=df.index,
    y=df['rolling'],
    name='Requests (EN - week mean)'
)

data.append(rolling_trace)

# Customize date format
requests_date_layout.title="Wikipedia requests"
requests_date_layout.xaxis=dict(
    tickformat='%b'
)

fig = go.Figure(data=data, layout=requests_date_layout)
iplot(fig)

## 2. Build a synthetic month of requests

Taken the wikipedia traffic shape per hour, simulate a _mean month_ whose requests distibution have the shape of the selected wikipedia project.

The resulting scale (i.e. the same total amount of requests in a month) is configurable.

Note: The scale (total reqs in a month) can be controlled by using the param `monthly_scale_factor` in `webish_simulator.simulate`

In [None]:
#month_df = webish_simulator.simulate(df)
monthly_scale_factor=90000000
month_df = webish_simulator.simulate(df, monthly_scale_factor=monthly_scale_factor)


month_df.head()

### 2.1 Plot synthetic month

In [None]:
data = []

requests_trace = go.Scatter(
    x=month_df.index,
    y=month_df.requests,
    name='Requests'
)

data.append(requests_trace)

rolling_trace = go.Scatter(
    x=month_df.index,
    y=month_df.requests.rolling(window=24, min_periods=0).mean(),
    name='Requests (day mean)'
)

data.append(rolling_trace)

# Customize date format
requests_date_layout.title="Requests distribution (synthetic)"
requests_date_layout.xaxis=dict(
    tickformat='%d'
)

fig = go.Figure(data=data, layout=requests_date_layout)
iplot(fig)

### 2.2 Calculate costs

In [None]:
ec2_flavors = {
    'm3.medium': 1000,
    'm4.large': 1500,
    'm4.xlarge': 2000,
}

month_df = webish_simulator.get_lambda_cost(month_df, MB_per_request=128, ms_per_req=200)

for flavor,reqs in ec2_flavors.items():
    month_df = webish_simulator.get_ec2_cost(month_df, flavor=flavor, max_reqs_per_second=reqs )
    month_df[flavor+'_break_even'] = month_df['lambda_sum'] - month_df[flavor+'_sum']

month_df.tail()

### 2.3 Plot costs (synthetic year)

In [None]:
# TODO: Show number of instances in 
data = []

lambda_trace = go.Scatter(
    x=month_df.index,
    y=month_df['lambda_sum'],
    name='Lambda'
)

data.append(lambda_trace)

for flavor in ec2_flavors.keys():
    ec2_trace = go.Scatter(
        x=month_df.index,
        y=month_df[flavor+'_sum'],
        name=flavor
    )

    data.append(ec2_trace)

# Customize date format
day_cost_layout.xaxis=dict(
    tickformat='%d'
)
    
fig = go.Figure(data=data, layout=day_cost_layout)
iplot(fig)

# Simulate multiple scenarios with different total requests in a month

In [None]:
x, y = webish_simulator.get_breakeven(df, range(0, 200000000, 500000), ec2_flavors)

# devices_list = np.logspace(0, 100000000, num=10, endpoint=True, base=10.0)

In [None]:
data = []
for flavor, breakeven_points in y.items():
    breakeven_trace = go.Scatter(
        x=x,
        y=breakeven_points,
        name=flavor,
    )
    data.append(breakeven_trace)
    
fig = go.Figure(data=data, layout=breakeven_scale_layout)

iplot(fig)