Skip to content

Commit

Permalink
before oct stats class
Browse files Browse the repository at this point in the history
  • Loading branch information
ADGEfficiency committed Oct 23, 2019
1 parent cb8d0ed commit f8a2d4b
Show file tree
Hide file tree
Showing 12 changed files with 151 additions and 217 deletions.
4 changes: 3 additions & 1 deletion backprop/readme.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Backpropagation

<img src="assets/reddit.png" alt="" width="600"/>
<img src="assets/reddit.png" alt="" width="800"/>

## Lecture

Expand Down Expand Up @@ -36,6 +36,8 @@ The practical for this class is to:

[CS231n - Backprop](http://cs231n.github.io/optimization-2/)

[Derivatives, Backpropagation, and Vectorization - Justin Johnson - 2017](http://cs231n.stanford.edu/handouts/derivatives.pdf)

## Classification neural net from scratch

[dennybritz/nn-from-scratch](https://github.com/dennybritz/nn-from-scratch/blob/master/nn-from-scratch.ipynb)
Expand Down
Binary file modified numpy/assets/reddit.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion python/basics/readme.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
A series of notebooks designed to teach Python from the top down. The notes are designed for students with no Python experience.
A series of notebooks designed to teach Python from the bottom up. The notes are designed for students with no Python experience.

## Further reading

Expand Down
9 changes: 8 additions & 1 deletion python/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,14 @@ Python
- allows quick iteration over ideas
- can be put into production


## Further resources

[See here](https://github.com/ADGEfficiency/programming-resources/tree/master/python)

[Google Python course](https://developers.google.com/edu/python/introduction)

[Learn Python the Hard Way](https://learnpythonthehardway.org)

[codewars](https://www.codewars.com/)

[coderbyte](https://coderbyte.com/)
55 changes: 19 additions & 36 deletions statistics/bandits.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,9 @@
"cells": [
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"metadata": {},
"outputs": [
{
"ename": "ImportError",
"evalue": "cannot import name 'ucb'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mImportError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-11-db7a22c23c06>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mpandas\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0manswers\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mexpectation\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mucb\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrun_ucb_expt\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 8\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0mget_ipython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun_line_magic\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'matplotlib'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'inline'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mImportError\u001b[0m: cannot import name 'ucb'"
]
}
],
"outputs": [],
"source": [
"from collections import namedtuple\n",
"\n",
Expand All @@ -25,6 +13,7 @@
"import pandas as pd\n",
"\n",
"from answers import expectation, ucb, run_ucb_expt\n",
"from common import generate_bandit_dataset\n",
"\n",
"%matplotlib inline"
]
Expand All @@ -37,7 +26,8 @@
"\n",
"## Resources\n",
"\n",
"Chapter 3 of [Practical Statistics for Data Scientists](https://www.oreilly.com/library/view/practical-statistics-for/9781491952955/), Chapter 2 of [Sutton & Barto - Reinforcement Learning: An Introduction](http://incompleteideas.net/book/RLbook2018.pdf).\n",
"- Chapter 3 of [Practical Statistics for Data Scientists](https://www.oreilly.com/library/view/practical-statistics-for/9781491952955/)\n",
"- Chapter 2 of [Sutton & Barto - Reinforcement Learning: An Introduction](http://incompleteideas.net/book/RLbook2018.pdf).\n",
"\n",
"## When is classical hypothesis testing not enough?\n",
"\n",
Expand All @@ -52,11 +42,9 @@
"- in a real business, there is no 'experiment over' date\n",
"- the real world is non-stationary - the results we collect might be from a distribution that is changing\n",
"\n",
"We want an experiment where we can **take advantage of the results as we learn**\n",
"- not have to wait\n",
"We want an experiment where we can **take advantage of the results as we learn** - not have to wait until the experiment results can be used\n",
"\n",
"In a business context we are not concerned with statistical significance\n",
"- we are concerned with optimizing \"user experience\" as quickly as possible\n",
"In a business context we are not concerned with statistical significance - we are (often) concerned with optimizing money as quickly as possible\n",
"\n",
"## The mulit-armed bandit\n",
"\n",
Expand All @@ -65,7 +53,7 @@
"- reach conclusions faster\n",
"\n",
"The term bandit comes from slot machines \n",
"- known as one armed bandits for their ability to extract money from gambles\n",
"- one armed bandits for their ability to extract money from gamblers\n",
"\n",
"The goal of a multi armed bandit problem is to vin as much money as possible\n",
"- this is the same as figuring out which arm is best as quick as possible\n",
Expand Down Expand Up @@ -104,7 +92,7 @@
"\n",
"## Example\n",
"\n",
"You have the following results from comparing three different landing pages:"
"You have the following results from comparing different landing pages:"
]
},
{
Expand All @@ -113,9 +101,7 @@
"metadata": {},
"outputs": [],
"source": [
"from common import generate_bandit_dataset\n",
"\n",
"params, results = generate_bandit_dataset()\n",
"params, results = generate_bandit_dataset(arms=20, samples=3)\n",
"\n",
"results"
]
Expand All @@ -126,7 +112,8 @@
"source": [
"## Practical \n",
"\n",
"Write a function to take an expectation over the results:"
"Write a function to take an expectation over the results\n",
"- one number for each arm"
]
},
{
Expand All @@ -150,7 +137,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"One approach would be to conclude that one option is optimal and send all our users there\n",
"One approach would be to conclude that one of the arms is optimal and send all our users there\n",
"- this is a **greedy** solution to the exploration & exploitation dilemma\n",
"\n",
"## Practical\n",
Expand All @@ -170,14 +157,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"which me- not this one\n",
"The problem with a greedy stragety is that we might have noise in our samples that\n",
"- our expectation has variance\n",
"\n",
"The problem with a greedy stragety is that we might not \n",
"## Question to class\n",
"\n",
"Another solution would be to favour the option that appears optimal, while still sampling from the options that appear sub-optimal.\n",
"Is the expectation above biased?\n",
"\n",
"## epsilon-greedy\n",
"\n",
"Another solution would be to favour the option that appears optimal, while still sampling from the options that appear sub-optimal.\n",
"\n",
"A simple algorithm to tackle the exploration-exploitation dilemma is known as **epsilon-greedy** - it is the method used for exploration in DeepMind's 2013 DQN.\n",
"\n",
"The algorithm has a single parameter $\\epsilon$, which controls how greedy we are. \n",
Expand Down Expand Up @@ -309,13 +299,6 @@
"#plt.plot(ucb_performance, label='ucb')\n",
"#_ = plt.legend()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down
7 changes: 4 additions & 3 deletions statistics/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ def load_iris():
print('target.shape = {}'.format(target.shape))
return Data(features, target, pd.DataFrame(dataset.target, columns=['class']))


def load_forest_fires():
os.makedirs('./data', exist_ok=True)

Expand Down Expand Up @@ -61,16 +62,16 @@ def make_cdf(samples):
return [(percentile_rank(s, samples), s) for s in sorted(samples)]


def generate_bandit_dataset():
def generate_bandit_dataset(arms=20, samples=2):
np.random.seed(42)

Param = namedtuple('Parameter', ['loc', 'scale', 'initial_size'])
start = 10
end = 50
num_options = 20
num_options = arms

params = {
str(option): Param(loc, scale, 2)
str(option): Param(loc, scale, samples)
for option, (loc, scale)
in enumerate(zip(np.linspace(start, end, num_options), np.random.uniform(10, size=num_options)))
}
Expand Down
7 changes: 5 additions & 2 deletions statistics/correlation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Relationships between variables\n",
"# Correlation\n",
"\n",
"Chapter 7 of [Think Stats 2nd Edition](https://greenteapress.com/wp/think-stats-2e/)."
"Chapter 7 of [Think Stats 2nd Edition](https://greenteapress.com/wp/think-stats-2e/).\n",
"\n",
"**Relationships between variables**\n",
"- linear and/or monotonic"
]
},
{
Expand Down
32 changes: 24 additions & 8 deletions statistics/distributions.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,24 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Distributions\n",
"# Distributions\n",
"\n",
"Chapter 2-4, 6 of [Think Stats 2nd Edition](https://greenteapress.com/wp/think-stats-2e/).\n",
"\n",
"In this notebook we wil be working with tensors of at two dimensions (batch_size, num_features)\n",
"## Two dimensional tensors\n",
"\n",
"In this notebook we wil be working with tensors of at two dimensions `(batch_size, num_features)`\n",
"\n",
"- this is to get you used to having multiple samples in a single array \n",
"- when training neural networks your `x_train`, `y_train` will have at leat two dimensions\n",
"\n",
"Sample data from **two normal** distributions and **two uniform** distributions (four in total):"
"## What is a distribution?\n",
"\n",
"A distribution is\n",
"- a set of values (continuous or discrete)\n",
"- probabilities of those values (between 0 and 1)\n",
"\n",
"Let's sample data from **two normal** distributions and **two uniform** distributions (four in total):"
]
},
{
Expand All @@ -55,14 +63,20 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## What is a statistic?\n",
"\n",
"Infomation about a distribution\n",
"\n",
"Scalars\n",
"\n",
"## Central tendency"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The **mean** - also known as the **expected value** (*on expectation* == on average):"
"The **mean** - also known as the **expected value** (expectation == on average)"
]
},
{
Expand All @@ -78,7 +92,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The **median** is a percentile based statistic (more on them later). It is informative when you have outliers:"
"The **median** is a percentile based statistic (50th percentile) informative when you have outliers"
]
},
{
Expand Down Expand Up @@ -112,7 +126,8 @@
"source": [
"## Spread / variability of the data\n",
"\n",
"**Variance** - how far away a variable is from its mean\n",
"**Variance** \n",
"- how far away a variable is from its mean\n",
"\n",
"$$ \\sigma^2_x = \\frac{1}{n} \\sum(x_n - \\mu_x)^2 $$"
]
Expand All @@ -130,9 +145,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**Standard deviation** - square root of the variance (in the same units as the data):\n",
"**Standard deviation** \n",
"- square root of the variance (in the same units as the data):\n",
"\n",
"$$ \\sigma^2_x = \\sqrt{\\sigma^2_x}$$"
"$$ \\sigma = \\sqrt{\\sigma^2_x}$$"
]
},
{
Expand Down
Loading

0 comments on commit f8a2d4b

Please sign in to comment.