<a href="https://colab.research.google.com/github/artemkurylev/Interactive-Statistics-Notebooks/blob/master/Binomial_distribution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# METADATA

- [Fastpages](https://fastpages.fast.ai/fastpages/jupyter/2020/02/21/introducing-fastpages.html) - the serving solution, they can beautifully present/server notebooks as blog posts, with code highlighting, visualiztion enabled, etc (e.g. [1](https://drscotthawley.github.io/devblog3/2019/02/08/My-1st-NN-Part-3-Multi-Layer-and-Backprop.html)) - maybe too complex for our needs so far

- [This blog post](https://towardsdatascience.com/interactive-controls-for-jupyter-notebooks-f5c94829aee6) - that explains how to use 
IPyWidgets and stuff

- Awesome visuzliation library [Altair](https://altair-viz.github.io/gallery/index.html)

In [1]:
! pip install ipywidgets
! jupyter nbextension enable --py widgetsnbextension

import ipywidgets as widgets
from ipywidgets import interact, interact_manual

Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: [32mOK[0m


# < Topic Title >

- table of contents
- links to other notebooks
- image of the mind map and where are we

## Intro

- We are going to consider Bernoulli and Binomial distribution and their properties etc.
- Expected outcome: Understanding of bernoulli and binomial distributions and their properties


## Setup

- install required packages
- etc

## Data load / synthesis

- introduce theoretical background for synthetic data
- state hypothesis upfront if you take data from real life (why it has the properties you will use, e.g. normality/t-distribution/etc.)

#### I use syntheized data both for bernouilli and binomial distributions, it is easier to explore them with differenet parameters.


In [2]:
import numpy as np
import seaborn as sns
import itertools
from scipy.stats import bernoulli
import matplotlib.pyplot as plt
sns.set(color_codes=True)

  import pandas.util.testing as tm


### Bernoilli distribution


In [3]:
@interact
def bernoulli_dist(p = 0.5, size=100):
  bern_data = bernoulli.rvs(p,size=size)
  sns.distplot(bern_data);
  print('Calculated mean = p:', p)
  print('Actual mean is:', np.mean(bern_data))
  print('Calculated variance is:', p*(1-p))
  print('Actual mean is:', np.mean(np.square(bern_data)) - np.mean(bern_data)**2)

interactive(children=(FloatSlider(value=0.5, description='p', max=1.5, min=-0.5), IntSlider(value=100, descrip…

#### Binomial distribution
Here we have inctaracting function which shows binomial distribution with differenet values of p, n and size of the sample
Also here we can explore probabilities of different numbers of k

In [4]:
@interact
def binomial(p=0.5,n=10,size=50, k = 5):
  bin_data = np.random.binomial(n,p,size)
  sns.distplot(bin_data);
  q = 1 - p
  print('Expected value via formula is E = np = ', n*p)
  print('Actual Mean Value is: ', np.mean(bin_data))
  bin_coef = len(list(itertools.combinations(list(range(1,n+1)),k)))
  print('Calculated P(n=k) using fromula bin_coef*(p**k)*(q**(n-k)) is: ', bin_coef*(p**k)*(q**(n-k)))
  prob = (bin_data==k).sum()/len(bin_data)
  print('Actual P(n==k) is', prob)

interactive(children=(FloatSlider(value=0.5, description='p', max=1.5, min=-0.5), IntSlider(value=10, descript…

## Application
 
- a sample task / application
- parameters and what they mean/affect (if applicable)
- hypothesis tests
- statistical tests
- vizializtion


#### Task: 
We need to improve the ROI (Return on Investment) of our company’s call center, where employees attempt to cold call potential customers and get them to purchase our product.
We have some this data


*   The typical call center employee completes on average 50 calls per day.
*   The probability of a conversion (purchase) for each call is 4%.
*   The average revenue to your company for each conversion is $20

*  The call center you are analyzing has 100 employees.
*   Each employee is paid $200 per day of work.


We can think of each employee as a binomially distributed random variable with the following parameters:
n = 50
p = 4%

Here you can see simulation of this call center, and we can see how different parameters influence on our revenues

In [5]:
@interact(p=(0,0.1,0.01))
def simulate_call_center(n_employees=100, wage=200, num_of_calls=50,p=0.04,revenue=100, sims=1000):
  # Call Center Simulation
  # Binomial random variables of call center employees
  conversions = np.random.binomial(num_of_calls, p, size=n_employees)
  sim_conversions = [np.sum(np.random.binomial(num_of_calls, p, size=n_employees)) for i in range(sims)]
  sim_profits = np.array(sim_conversions)*revenue - n_employees*wage
  # Plot and save the results as a histogram
  fig, ax = plt.subplots(figsize=(14,7))
  #ax = sns.distplot(sim_profits, bins=20, label='original call center simulation results')
  ax = sns.distplot(sim_profits, bins=20, label='call center simulation results', color='red')
  ax.set_xlabel("Profits",fontsize=16)
  ax.set_ylabel("Frequency",fontsize=16)
  plt.legend()

interactive(children=(IntSlider(value=100, description='n_employees', max=300, min=-100), IntSlider(value=200,…

## Conclusion

We learned bernoilli and binomial distribution and understand 
how it can be applicable for the practical tasks

## References / Acknowledgements

Interacrive version of this notebook you can find here:

https://colab.research.google.com/drive/1zsb5Au5uJapblvG5kM7AzCqcf0WMNmII

Other useful links:

https://en.wikipedia.org/wiki/Bernoulli_distribution

https://en.wikipedia.org/wiki/Binomial_distribution

https://towardsdatascience.com/fun-with-the-binomial-distribution-96a5ecabf65b
