# Run this cell first

In [None]:
# this code enables the automated feedback. If you remove this, you won't get any feedback
# so don't delete this cell!
try:
  import AutoFeedback
except (ModuleNotFoundError, ImportError):
  !pip install git+https://github.com/abrown41/AutoFeedback@notebook
  import AutoFeedback

try:
  from testsrc import test_main
except (ModuleNotFoundError, ImportError):
  !pip install "git+https://github.com/autofeedback-exercises/exercises.git#subdirectory=New-SOR3012/Trends"
  from testsrc import test_main

def runtest(tlist):
  import unittest
  from contextlib import redirect_stderr
  from os import devnull
  with redirect_stderr(open(devnull, 'w')):
    suite = unittest.TestSuite()
    for tname in tlist:
      suite.addTest(eval(f"test_main.UnitTests.{tname}"))
    runner = unittest.TextTestRunner()
    try:
      runner.run(suite)
    except AssertionError:
      pass


# Introduction

The random variables that have been introduced in previous blocks contain parameters.  For example, the Bernoulli random variable has a parameter $p$, the binomial random variable has two parameters $n$ and $p$ and the uniform random variable has two parameters $a$ and $b$.  Many stochastic models have parameters that can be varied by the modler like these.  When we are investigating such models an interesting exercise is to draw a graph that shows how the output of the model changes as the parameter is changed.  In other words, we draw a graph showing the value of the dependent variable (the random variable) on the y-axis and the independent variable (the parameter) on the x-axis.

When the model is stochastic drawing these graphs is slightly more complicated as we must include estimates of the error on any random variables.  As you will see at the end of this series of exercises, you already know everything you need to know to estimate and plot these errors.  In other words, this weeks exercise are just consolidating material that you have already seen in a slightly different guise.

As always start by running the cell below to load the libraries that we need for the exercises.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats

# Simulating a random walk

The exercise you are going to complete for this block involve simulating a random walk in 1 dimension.  During this walk you will thus take discrete steps either forward or backwards by one unit.  The decision to move forward or backward on each turn is made by generating a Bernoulli random variable with parameter `p`.  If the Bernoulli random variable is equal to 1 the walker moves forward one step.  If the Bernoulli random variabel is equal to 0 the walker moves backwards one step.  This process of generating Bernoulli random variables and deciding how to move in response is then repeated multiple times.

This idea, and the code you will need to simulate the motion of this walker, is explained in the following video

In [1]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/QlwATPnVESU?si=I8OPSBpUmS8cWmA1" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

Now write a function called `random_walk` to simulate the random walker in the cell below.  This function should take three arguments:

- `startpoint` is the initial locatin of the walker on the line.
- `nsteps` is the number of random steps that you would like the walker to take
- `p` is the probablity that the walker takes a step forward (i.e. in the positive direction)

Your function should return the final position that the walker arrives at after his series of `nsteps` random steps.

In [None]:
runtest(['test_random_walk'])

# Simulating a gambler

You can use what you have just learned about modelling a 1D random walk can be used to model how a gambler plays a game such as roulette in a casino.  The idea in such games is that the gambler places a small stake on each game.  Lets suppose this is a stake of one pound.  When the game is played the gambler will either loose their stake and the total amount of money he has will thus decrease by one unit.  Alternatively, he will win the game and in that case he wins back his stake and a prize, which we will set as one pound.  If he wins the game he will thus have one pound than the amount of money that he came in with.  As you can see if the gambler repeats this process of staking money and playing the amount of money he has will undergo a random walk in one dimension.

Importantly there is a difference between the gambler and the 1D random walk, however.  The gambler usually only has a finite amount of money to gamble with.  If he looses a large number of games he is therefore forced to stop playing.  Similarly, the gambler may also have some target for how much money he would like to win.  In other words, he should have some figure N pounds, which is more than the amouont of money he entered the casino with.  He will stop gambling once he has N pounds in his pocket.

The amount of money that the gambler has is an example of a stochastic process.  A stochastic process is for want of a better description a time-dependent random variable. Furthermore, the amount of money the gambler has is an example of a special kind of stochastic process that is known as a Markov chain.  

A question that might be of interest is whether the gambler leaves the casino with zero pounds or whether he leaves with N pounds.  We can answer this question by running a simulation of the gambler.  Essentially we start the gambler with `s` pounds and simulate the process of him playing the game only stopping once he has zero  or `n` pounds.  We can then set a random variable equal to 1 if he has 0 pounds and 1 if he has `n` pounds.  The following video explains how we can write a program to estimate this quantity.

In [2]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/WCCV7lXhnUI?si=CK0UT7B8tI2alvII" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

__Your task in this exercise is use the ideas from the video to write a function called `gambler` that simulates this procedure.__ Your function should take three arguments:

1. `start` the amount of money the gambler starts with.  This should be a positive number
2. `n` the target amount of money that the gambler wants to win.  The gambler should stop playing once he has `n` pounds or when he runs out of money and has zero pounds.
3. `p` the probability of winning each individual game the gambler plays.  If the gambler wins a game the amount of money he has increases by one pound.  If he looses the amount of money he has decreases by one pound.

Within the function the random 1D walk should be simulated until the walker arrives in state 0 or in state N.  If the walk finishes with the walker arriving in state 0 the function should return 1.  If the walk finishes with the walker arriving in state N the function should return 0.

In [None]:
runtest(['test_gambler'])

# Calculating the hitting time

When we use the 1D random walk to model how a gambler plays a game such as roulette in a casino we can use simulation to determine the likelihood the gambler has of loosing all their money.  We can also use simulation to determine how many spins of the wheel the gambler will play before leaving the casino, however.  In this exercise we are thus going to learn how write a program to calculate the number of games the gambler plays before leaving the casino.

Remember the gambler usually only has a finite amount of money to gamble with. If he looses a large number of games he is therefore forced to stop playing. Similarly, the gambler may also have some target for how much money he would like to win. In other words, he should have some figure N pounds, which is more than the amouont of money he entered the casino with. He will stop gambling once he has N pounds in his pocket.

To calculate the number of spins of the wheel the gambler will bet on we thus need to calculate the number of steps the random walk takes before arriving in state 0 or state n.  __Your task in this exericse is to write a function called `nplays` that simulates the changes in the amount of money the gambler has and that returns the number of spins of the wheel that take place.__  Your function should take three arguments:

1. `start` the amount of money the gambler starts with. This should be a positive number
2. `n` the target amount of money that the gambler wants to win. The gambler should stop playing once he has n pounds or when he runs out of money and has zero pounds.
3. `p` the probability of winning each individual game the gambler plays. If the gambler wins a game the amount of money he has increases by one pound. If he looses the amount of money he has decreases by one pound.

Within the function the random 1D walk should be simulated until the walker arrives in state 0 or in state N.  The number of steps the walker takes should be accumulated and it is this number of steps that should be returned at the end of the function.


In [None]:
runtest(['test_gambler_1'])

# Calculating the average probability of ruin

We can use simulation to determine the likelihood the gambler has of loosing all his money when he repeatedly plays a game such as roulette.  As we have seen elsewhere it is straightforward to write programs to simulate the ammount of money the gambler has and that return the final outcome for the series of games played.  For instance, we can write a funtion that simulates a 1D random walk that stops once the walker enters state 0 or state n.  This function can also be made to return a 1 if the walker finishes in state 0 and a 0 if the walker finishes in state n.  As the return value for such a function is 0 or 1 what has been generated here is  a Bernoulli random variable.  The method via which this random variable is generated is more complicated than the typical method we have used for generating Bernoulli random variables but the return value is still a random variable that can only equal 0 or 1.  It is thus a Bernoulli random variable.

You know that a Bernoulli random variable has only one parameter `p`.  In this exercise we are thus going to learn how to estimate this parameter by performing a series of simulations of the walker.  To complete the exercise you will need to complete the function called `sample_mean`. This function should take four arguments.  

1. `start` the amount of money the gambler starts with.  This should be a positive number
2. `n` the target amount of money that the gambler wants to win.  The gambler should stop playing once he has `n` pounds or when he runs out of money and has zero pounds.
3. `p` the probability of winning each individual game the gambler plays.  If the gambler wins a game the amount of money he has increases by one pound.  If he looses the amount of money he has decreases by one pound.
4. `m` then tells you the number of times the function `gambler` which you wrote earlier in this sequence of exericse.  

Each time you call the `gambler` function you generate a sample of the random variable of interest.  You can thus calculate a sample mean and a sample variance from all the the Bernoulli random variables you will have generated.  The sample mean you calculate in this way is an estiamte for the `p` parameter of the Bernoulli random variable.  As you have estimated this parameter by sampling you need to quote a confidence limit around your estimate using what you have learned about the variance of the sample mean from other exercises.  Your `sample_mean` function should therefore return three numbers.  `lower` is the lower extent of a 90% confidence limit on the estimate of the sample mean, `mean` is the value of the sample mean and `upper` is the upper extent for the 90% confidence limit.

This exercise should be revision of material from block 2.

In [None]:
def sample_mean(start,n,p,m) :
  # Your code to calculate the the sample mean for m random variables that are generated by calling random_walker goes here


  # When completed this function should return
  # lower = the 5th percentile of the distribution for the sample mean
  # mean = your sample mean
  # upper = the 95th percentile of the distribution for the sample mean
  return lower, mean, upper


l, m, u = sample_mean( 5, 10, 0.3, 200 )
print('200 random walks were generated for a chain with length 10 and a probablity of winning of 0.3')
print('These random walks all started from state 5')
print('A fraction',m,'of these walks finishes in state 0')
print('Our simulations show that there is a 90% chance that the probablity of ruin lies between',l,'and',u)


In [None]:
runtest(['test_random_walker', 'test_mean'])

# Graphing the probablity of ruin

We now know how to use simulation to determine the likelihood the gambler has of loosing all his money when he repeatedly plays a game such as roulette.  We are thus in a position to look at how this probablity depends on the values of the parameters of the model.  This probablity will depend on the three parameters of the model:

* The amount of money the gambler enters the casino with.
* The amount of money he would like to win.
* The probablity that he has of winning each individual game.

We would generally investigate one of these parameters at a time.  This parameter that we vary is often referred to as the independent variable. The output parameter that is calculated through simulation (the probablity of ruin in this case) would then be referred to as the dependent variable.  Any other parameters that might affect the final result would be kept fixed in all the simulations we perform and are referred to as control variables.

Lets supppose that we want to use the amount of money the gambler enters the casino with `s` as our independent variable.  We would run a series of simulations with `s` values of 1, 2, 3, 4, 5, 6, 7, 8 and 9.  In all these simulatiosn we might fix the amount of money the gambler wants to win `n` equal to 10 as this is a control variable.  Furthermore, we would also fix the probablity of winning each individual game `p` at 0.4 as this is also a control variable.  By plotting a graph with the various values of `s` on the x-axis and the final averages that we got from the simulations on the y-axis we can see how the probablity or ruin depends on the amount of money the gambler has at the start of the game.  Furhtermore, because we have fixed `n` and `p` in all these simulations any variations we see in the probability of ruin are a consequence of the different `s` values that were used.

With all this in mind __your task is to draw a graph that shows how the probablity of ruin depends on the amount of money the gambler enters the casino with.__  You should set the amount of money the gambler wants to win equal to 10 in all your simulations and the probablity of winning each game equal to 0.4.  Furthermore, each estimate of the probablity of ruin should be calculated by performing 200 simulations of the random walk.  To pass the test you will need to have calculated probability of ruin for s values of 1, 2, 3, 4, 5, 6, 7, 8 and 9.  The values of this parameter should be put on the x-axis and your estimates for the probablity of ruin should appear on the y-axis.  The x-axis title should be 'start point' and the y-axis label should be 'probability of ruin'.


In [3]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/Ln2PPEyLh7Y?si=LHlmfzKz4QrmtqWD" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

In [None]:
import matplotlib.pyplot as plt
import numpy as np


# This code is required for the autofeedback- don't delete it!
fighand = plt.gca()

In [None]:
runtest(['test_plot'])

# Gamblers ruin plots with errors

The previous exercise showed you how to plot a dependent variable as a function of an independent variable.  There was one important factor that was left out in the previous exercise, however.  __The depedent variable in the plot that we drew is a random variable as such we need to provide some confidence interval around the estimate of this quantity that we have extracted via simulation.__  A confidence limit of this type is almost always drawn around dependent variables in graphs as there are almost always control variables that it is not possible to control in the experiment. These uncontrolled variables effect on the outcome of the experiment and thus introduce some randomness to the final value that is extracted.

__In this exercise I would like you draw another of these graphs of a dependent variable as a function of a independent variable.__  This time, however, I want you to also use what you have learned elsewhere to calculate and plot suitable error bars on each of your estimates of the dependent variable.  The dependent variable will be the probability of ruin that you computed in the previous exercise.  Once again you should simulate the gamblers ruin problem stop the walk if the walker arrives in state `n=4`.  `n` will thus be one of your control variables.  The second control variable will be the start point which will be set as `s=2`.  The independent variable in your siulations is the probability of winning each game, `p`.  You should consider `p` values of 0.3, 0.4, 0.5 and 0.7.
For each of these values of `p` you should estimate the probablity of ruin by simulating 200 random walks and calculating a mean.  Notice that you will also need to calculate a sample variance from these 200 random variables as you will need to calculate the error bar for a 90% confidence limit.

I have written the following plot commands in the `main.py` for you:

```python
plt.errorbar( x, y, yerr=error, fmt='ko' )
plt.xlabel('Probability of winning each game')
plt.ylabel('Probability of ruin')
```

Notice that you need to set the elements of the NumPy array called `error` equal to the width of the 90% confidence limit in order to pass the test.  This variable needs to be present and set in order to pass the tests as I check for its existence.

Notice, furthermore, that when you complete this exercise for real you can extend out the range of values you investigate.  I kept the range artifically low here so as to make the calculation less expensive.


In [None]:
import matplotlib.pyplot as plt
import scipy.stats
import numpy as np

# Your code to simulate the walker and to calculate the errors goes here


plt.errorbar( x, y, yerr=error, fmt='ko' )
plt.xlabel('Probability of winning each game')
plt.ylabel('Probability of ruin')


# This code is required for the autofeedback- don't delete it!
fighand = plt.gca()


In [None]:
runtest(['test_plot_1', 'test_errors'])

# Taking it further

You can demonstrate that you have understood the ideas in this block by making a graph that shows the value of a random variable that was generated from a stochastic model as a function of one of the parameters of the model.  If you want to consider the model for the gamblers ruin problem you could plot either:

* The probability of winning 
* The probability of loosing
* The number of games played

as a function of:

* The amount of money that the gambler wants to win
* The start amount of money the gambler starts with
* The probability that the gambler wins each game

Alternatively, you can draw a histogram illustrating the distribution for the number of turns the gambler takes before loosing all their money or winning.

Be aware that because you are estimating the random variables by sampling you **should always include estimates for errors on any sample means you compute.**

If you want to make your project more interesting you can make the behavior of the gambler more complicated.  For example, you can model a random walk where there is a finite probility of staying still or a finite probability of taking more than one step forward or back.

If you don't want to study a gamblers ruin problem you can plot a graph showing how the mean for a random variable changes as one of the parameter values is changed.  In other words, you can calculate a sample mean for a random variables for a number of different parameter values (along with the error on your estimate) and you can plot this estimate of the true mean on a graph.  Please remember that we have expressions for the true mean for all the random variables that you have studied in this module.  If you take this option you should include a line showing how the true expectation changes with the parameter value for comparison.