## Cost Simulator

This notebook estimates the costs of a JupyterHub on an autoscaling k8s cluster on the clouds.

Let's get an idea of what is JupyterHub and Kubernetes.

- [JupyterHub](https://github.com/jupyterhub/jupyterhub) is a multi-user Hub that spawns, manages, multiple instances of the single-user Jupyter notebook server.
- [Kubernetes](https://kubernetes.io/docs/concepts/) (commonly stylized as k8s)
is a popular open source platform which allows users to build application services across multiple containers, schedule those containers across a cluster, scale those containers, and manage the health of those containers over time. 

#### JupyterHub Distribution

1. If you need a simple case for a small amount of users (0-100) and single server then use, [The Littlest JupyterHub distribution](https://github.com/jupyterhub/the-littlest-jupyterhub).
1. If you need to allow for even more users, a dynamic amount of servers can be used on a cloud, with [The Zero to JupyterHub with Kubernetes](https://z2jh.jupyter.org/en/latest/). 
    For documentation on deploying JupyterHub on Kubernetes refer to: [Deploy a JupyterHub on Kubernetes](https://github.com/jupyterhub/zero-to-jupyterhub-k8s)

We are focussed on the deployment of JupyterHub with Kubernetes so, let's get familiar with Kubernetes terminology for a better understanding of the simulation process:

- Node: 
A node is a single machine with a set of CPU/RAM resources that can be utilized. A cluster in kubernetes is a group of nodes. 
 - Kubernetes will **automatically scale up** your cluster as soon as you need it, and **scale it back down** when you don't need it.
- User pod:
A Pod is the smallest and simplest unit in the Kubernetes object model that you create or deploy. It represents processes running on your Cluster. 
 - JupyterHub will **automatically delete** any user pods that have no activity for a period of time. This helps free up computational resources and keeps costs down if you are using an autoscaling cluster.

#### Input provided to the simulation
- Users with their minute by minute activity - It is generated by using the data of number of users estimated to use the Jupyterhub hour by hour.
- Configurations about the Node(memory/CPU), user pod(memory/CPU) and the cost per month.

#### Output of the simulation
- The data of minute by minute utilization of the cluster.
- The graph helps to visualize the utilization of the cluster for any day of the week.


In [None]:
# autoreload allows us to make changes to the z2jh_cost_simulator package
# and workaround a caching of the module, so our changes can be seen.
%load_ext autoreload
%autoreload 2

In [None]:
#Import the simulator.py and generate_user_activity modules from the z2jh_cost_simulator package

from z2jh_cost_simulator import simulator
from z2jh_cost_simulator import generate_user_activity

## Set the configurations for the simulation.
1. Info about a single node pool where users reside
    - CPU / Memory, for example 4 CPU cores / 26 GB memory node
    - Autoscaling limits, for example 0-5 nodes
    - Cost, for example 120 USD / month and node
    - Cluster autoscaler details, how long of node inactivity is required(10 minutes) 
1. Info about the resource requests (guaranteed resources) for the users
1. How much time before a user pod is culled by inactivity 
1. The max lifetime for the user pod before it can be culled down.


In [None]:
import ipywidgets as widgets
from ipywidgets import Layout, VBox

style_for_widget = {"description_width": "250px"}
layout_for_widget = {"width": "500px"}
box_layout = Layout(border="solid", width="600px")

heading = widgets.HTML(value="<h3 style=text-align:center;>Select the simulation configuration.</h3>")
max_min_nodes = widgets.IntRangeSlider(min=0, max=10, step=1, description='Number of nodes:', value=(1,3), style=style_for_widget, layout=layout_for_widget,)
node_cpu        = widgets.IntSlider(min=1, max=16, description="Node CPU", value=4, style=style_for_widget, layout=layout_for_widget,)
user_pod_cpu    = widgets.FloatSlider(min=0, max=5, step =.05, value=.04, description="User pod CPU", style=style_for_widget, layout=layout_for_widget,)
node_memory     = widgets.FloatSlider(min=0, max=128.0, step =1, value=5.0, description="Node Memory(in GB)", style=style_for_widget, layout=layout_for_widget,)
user_pod_memory = widgets.FloatSlider(min=0, max=5000.0, step=64, value=1408, description="User pod Memory(in MB)", style=style_for_widget, layout=layout_for_widget,)
cost_per_month  = widgets.FloatSlider(min=0, max=150.0, step=5.5, value=12.8, description="Cost per month(in USD)", style=style_for_widget, layout=layout_for_widget,)
node_stop_time  = widgets.IntSlider(min=1, max=100, step=20, value=10, description="Node stop time", style=style_for_widget, layout=layout_for_widget,)
pod_culling_max_inactivity_time = widgets.IntSlider(min=0, max=300, step=5, value=60, description="Pod culling for inactivity(in min)", style=style_for_widget, layout=layout_for_widget,)
pod_culling_max_lifetime         = widgets.IntSlider(min=0, max=300, step=20, value=0, description="Pod culling for max lifetime(in min)", style=style_for_widget, layout=layout_for_widget,)

simulation_configurations = [
    heading,
    max_min_nodes,
    node_cpu,
    user_pod_cpu,
    node_memory,
    user_pod_memory,
    cost_per_month,
    pod_culling_max_inactivity_time,
    pod_culling_max_lifetime,
   ]

VBox(simulation_configurations, layout=box_layout)

In [None]:
#The configurations which will be passed to the Simulation object.
simulation_configurations = {
    'min_nodes'      : max_min_nodes.lower,
    'max_nodes'      :max_min_nodes.upper,
    'node_cpu'       : node_cpu.value,
    'node_memory'    : node_memory.value,
    'user_pod_cpu'   :user_pod_cpu.value,
    'user_pod_memory':user_pod_memory.value,
    'cost_per_month' : cost_per_month.value,
    'pod_inactivity_time':pod_culling_max_inactivity_time.value,
    'pod_max_lifetime'   :pod_culling_max_lifetime.value,
    'node_stop_time' :10,
}


In [None]:
# Hardcoded test values to be removed later, or replaced with ...
# eh... you help me describe this :)
# A full weeks worth of input to the user activity generator is provided here.

#List of number of users per hour for one weekday.
no_users_on_weekday = [2,3,4,2,3,4,5,5,4,5,5,5,5,3,3,3,3,3,4,4,4,4,4,4]
#List of number of users per hour for one week-end day.
no_users_on_weekend = [2,2,2,2,2,3,3,3,3,3,4,4,4,3,3,3,3,3,5,5,6,6,6,6]  

hour_wise_users_full_week = []
hour_wise_users_full_week.extend(no_users_on_weekday * 5)  
hour_wise_users_full_week.extend(no_users_on_weekend * 2) 

#Generate a user activity list with the hour_wise_users by calling the generate_user_activity function.
user_activity = generate_user_activity.generate_user_activity(hour_wise_users_full_week)


In [None]:
#Create an object of the Simulation, and pass configurations and the user activity as parameters to the Simulation object.
sim = simulator.Simulation(simulation_configurations, user_activity)

#The run method runs the simulation for one week. 
sim.run()

# The cluster utilization data stores the node wise utilization information for each minute.
cluster_utilization_data = sim.create_utilization_data()

## Visualize the simulation
A line chart is plotted which shows the node utilization for the selected day of the week.
A **Slider** widget has been used to select a particular day.

In [None]:
# imports required for the visualization of the simulation
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from ipywidgets import Layout, VBox

widget_style = {"description_width": "initial"}
selected_day = widgets.IntSlider(
    min=1, max=7, description="Day of the week", style=widget_style
)
list_nodes = list(
    node for node in cluster_utilization_data.columns if node.find("percent") != -1
)
data = []
for node in list_nodes:
    data.append(
        go.Scatter(
            x=cluster_utilization_data["time"],
            y=cluster_utilization_data[node][0:1440],
            mode="lines",
            name="Node" + str(list_nodes.index(node) + 1),
        )
    )

figure = go.FigureWidget(
    data=data,
    layout=go.Layout(
        title=dict(text="Cluster utilization data"),
        xaxis=dict(
            title="Time in (hours)",
            tickmode="array",
            tickangle=45,
            tickvals=list(range(0, 1440, 120)),
            ticktext=[str(i) + " hours" for i in range(0, 24, 2)],
        ),
        yaxis=dict(title="Utilization(%)", tickformat="0%"),
    ),
)


def response(change):
    for node in list_nodes:
        data_for_selected_day = cluster_utilization_data[node][
            (selected_day.value - 1) * 1440 : selected_day.value * 1440
        ]
        figure.data[list_nodes.index(node)].y = data_for_selected_day


selected_day.observe(response)
graph_data = widgets.HBox([selected_day])
widgets.VBox([graph_data, figure])