# Lambda cost simuator for web-based traffic

This notebook simulates a syntetic month of requests based on wikipedia traffic shape. You can tune the `monthly_scale_factor` to adjust the total number of requests for the month.

After simulation, the total cost for the requests is calculated, plotting nice graphs with [plotly](https://plot.ly/).

We have choosen english wikipedia as source data, as it can be a fair representation of worldwide traffic. Other languages from wikipedia can be used to localize it further.

In [None]:
# dataframe-related imports
import wikimedia_scraper as ws
from datetime import datetime
import pandas as pd
import numpy as np
import webish_simulator

# Get data from wikipedia source

Parameters to tune:
- `project`: project from wikipedia (language) to use.
- `start_date` and `end_date`: date range to extract from wikipedia.

In [None]:
project = 'en'

start_date = datetime(2016, 11,  1)
end_date   = datetime(2017, 1, 31)

ws.output_notebook()

traffic_generator = ws.get_traffic_generator(start_date, end_date, projects=(project,))
df = pd.DataFrame(list(traffic_generator))

df = df.set_index(pd.DatetimeIndex(df['date']))
df = df.drop(['date'], axis=1)
df = df.loc[df['project']==project]

df.head()

## Simulation of a synthetic month

Taken the wikipedia traffic shape per hour, simulate a _mean month_ that has the scale (i.e. the same total amount of requests in a month) of the selected wikipedia project.

Note: The scale (total reqs in a month) can be controlled by using the param `monthly_scale_factor` in `webish_simulator.simulate`

In [None]:
# monthly_scale_factor=1000000000
# month_df = webish_simulator.simulate(df, monthly_scale_factor=monthly_scale_factor)

month_df = webish_simulator.simulate(df)

month_df.head()

## Calculate costs

In [None]:
month_df = webish_simulator.get_cost(month_df, MB_per_request=512, ms_per_req=200, max_reqs_per_second=1000)

month_df.head()

## Plotly setup

In [None]:
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
import plotly.graph_objs as go

init_notebook_mode(connected=True)

day_cost_layout = go.Layout(
    title='WW Traffic Monthly Cost',
    legend=dict(orientation="h"),
    xaxis=dict(
        title='Day of month',
        autorange=True
    ),
    yaxis=dict(
        title='Cost ($)'
    )
)

breakeven_scale_layout = go.Layout(
    title='Break-even - scale',
    legend=dict(orientation="h"),
    xaxis=dict(
        title='Mean requests per second',
        autorange=True,
#         type='log'
    ),
    yaxis=dict(
        title='% total reqs at breakeven '
    )
)

## Plot Wikipedia costs

In [None]:
# TODO: Show number of instances in 
data = []

lambda_trace = go.Scatter(
    x=month_df.index,
    y=month_df['lambda_sum'],
    name='Lambda'
)

data.append(lambda_trace)

ec2_trace = go.Scatter(
    x=month_df.index,
    y=month_df['ec2_sum'],
    name='EC2'
)

data.append(ec2_trace)

fig = go.Figure(data=data, layout=day_cost_layout)
iplot(fig)

# Simulate multiple scenarios with different total requests in a month

In [None]:
x, y = webish_simulator.get_breakeven(df, range(0, 100000000, 500000))

# devices_list = np.logspace(0, 100000000, num=10, endpoint=True, base=10.0)

# x, y = webish_simulator.get_breakeven(df, devices_list)

In [None]:
data = []

breakeven_trace = go.Scatter(
    x=x,
    y=y
)
data.append(breakeven_trace)
    
fig = go.Figure(data=data, layout=breakeven_scale_layout)

iplot(fig)