# 1.2.5: Bike Share (Improving the Code through Iteration)

<br>



---



*Modeling and Simulation in Python*

Copyright 2021 Allen Downey, (License: [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-nc-sa/4.0/))

Revised, Mike Augspurger (2021-present)

<br>



---



We've done our investigation, abstraction, and even implementation.  So we now have a functioning code.   But it's not perfect yet, so now we enter the validation-iteration-implementation feedback loop.  

In [1]:
# Import libraries
import pandas as pd
import numpy.random as npr



---



## Improving the Code

First, we'll consider how to improve an already functioning simulation by making it more fundamentally sound and flexible.

### Adding flexibility to functions through generalization

We already used generalization to make `change_func()` able to accept arguments.   But we can push this process further.  As is, `bike_to_moline()` is *hard-coded* to only change a Series called "bikeshare":

In [2]:
def bike_to_moline():
    bikeshare.augie -= 1
    bikeshare.moline += 1

When this function is called, it modifies a particular object called `bikeshare`. As long as there
is only one state object, that's fine.  But what if there is more than
one bike share system in the world? Or what if we want to run more than
one simulation?

<br>

We want to be able to use this function with any `bikeshare`-like object.  We can do this by allowing it to take a `Series` object as a
argument. Here's what that looks like:

In [3]:
def bike_to_moline(state):
    state.augie -= 1
    state.moline += 1

The name of the parameter is `state`, rather than `bikeshare`, as a
reminder that the value of `state` could be any state `Series` object, not just the one we called `bikeshare`.

<br>

This version of `bike_to_moline` requires a `Series` object as a
parameter, so we have to provide one when we call it:

In [4]:
import pandas as pd
bikeshare = pd.Series(dict(augie=10,moline=2),name="Number of Bikes")
bike_to_moline(bikeshare)

Again, the argument we provide gets assigned to the parameter, so this
function call has the same effect as:

```
state = bikeshare
state.augie -= 1
state.moline += 1
```

Now we can create as many `State` objects as we want:

In [5]:
bikeshare1 = pd.Series(dict(augie=10,moline=2),name="Number of Bikes")
bikeshare2 = pd.Series(dict(augie=2,moline=10),name="Number of Bikes")

And update them independently:

In [6]:
bike_to_moline(bikeshare1)
bike_to_moline(bikeshare2)
bikeshare2

augie      1
moline    11
Name: Number of Bikes, dtype: int64

Changes in `bikeshare1` do not affect `bikeshare2`, and vice versa.

### Adding return values to functions

It is clear that generalizing opens new possibilities.  Another flaw with the current code is harder to see.  In `bike_to_moline()`, we are modifying a global variable `bikeshare` inside a function.  This isn't producing any current errors, but code like this limits future iteration and can lead to hard-to-debug errors.

<br>  

So we want to not only bring the state object *into* the function as an argument, but we want to deliberately *output* it from the function as a *returned* value.

<br>

We have used several functions that return values.
For example, when you run `sqrt`, it returns a number you can assign to a variable.

In [7]:
from numpy import sqrt
root_2 = sqrt(2)
root_2

1.4142135623730951

Notice what this does.  On the right side of line two, we call `sqrt()`.
 Colab runs that function, which returns a value (1.414).  This all happens on the right side of the equation.  The assignment indicated by the `=` then happens, and 1.414 is assigned to the variable `root_2`.

 <br>

 A similar process occurs When we run `pd.Series()`: the right side of the line returns a new state object, and then this is assigned to a variable:

In [9]:
bikeshare = pd.Series(dict(augie=10,moline=2),name='Number of Bikes')

To write our own functions that return values, we can use a `return` statement, like this:

In [10]:
def add_five(x):
    return x + 5

`add_five` takes a argument, `x`, which could be any number. The function
computes `x + 5` and returns the result. When the function is called, the returned result is assigned to the variable on the left side of the assignment statement:

In [None]:
x = add_five(3)
x

8

### Adding a return values to our functions

When `bike_to_moline()` changes `state`, we want to clearly return this changed data object to the global environment:

In [11]:
def bike_to_moline(state):
    state.augie -= 1
    state.moline += 1
    return state

def bike_to_augie(state):
    state.augie += 1
    state.moline -= 1
    return state

When we call these functions, we need to assign this returned Series to a variable in `change_func`:

In [14]:
def change_func(state, ptm, pta):
    if npr.random() < ptm:
        state = bike_to_moline(state)

    if npr.random() < pta:
        state = bike_to_augie(state)
    return state

# Now call the function
change_func(bikeshare, 0.5, 0.4)

augie     10
moline     2
Name: Number of Bikes, dtype: int64

Notice the "nested" nature of the arguments and returned values.  
* When we call `change_func()`, the argument pulls our `bikeshare` into that function, but calls it `state` inside the function.
* When the function reaches line 3, `bike_to_moline()` pulls the `state` Series into the `bike_to_moline()` environment, and after altering it, returns it to the `change_func()` environment.
* After repeating this with `bike_to_moline()`, `change_func` returns the now twice altered `state` Series back to the global environment.



The usefulness of these becomes clear when we create a function that can run an entire simulation.  `run_simulation`
creates a state object, runs a simulation, and then returns the
state object.  

In [19]:
def run_simulation(ptm, pta, iAug, iMol, num_steps):
    state = pd.Series(dict(augie=iAug,moline=iMol),name="Number of Bikes")

    for i in range(num_steps):
        state = change_func(state, ptm, pta)

    return state

We can call `run_simulation` like this:

In [20]:
final_state = run_simulation(0.5, 0.4, 10, 2, 60)
final_state

augie     5
moline    7
Name: Number of Bikes, dtype: int64

Notice that we enter our independent variables (the state variables, `augie=10` and `moline=2`) as well as the parameters for our model (`ptm=0.5`, `pta=0.4`, and `num_steps=60`) without interacting with the code at all.  This is very efficient!

---

## Improving the Model

We've made some improvements to our code.  Let's turn to the more substantive issues.  The model we have so far is simple, but it is based on unrealistic
assumptions. What weaknesses did you identify in the exercises for the previous notebook?

<br>

Here are some of the weaknesses you might have found:

-   In the model, a student is equally likely to arrive during any
    15-minute period. In reality, this probability varies depending on time of day, day of the week, etc.

-   The model does not account for travel time from one bike station to another.

-   The model does not allow more than one student to arrive in a given 15-minute period.

-   The model does not check whether a bike is available, so it's
    possible for the number of bikes to be negative (as you might have
    noticed in some of your simulations).

Some of these modeling decisions are better than others:
* the first assumption might be reasonable if we simulate the system for a short period of time, like one hour.
* the second and third assumptions are not very realistic, but they might not affect the results very much, depending on what we use the model for.

* the last assumption seems problematic, so let's start there.

This process, starting with a simple model, identifying the most
important problems, and making gradual improvements, is called
*iterative modeling*. It often takes several
iterations to develop a model that is good enough for the intended
purpose, but no more complicated than necessary.

### Eliminating Negative Bikes

Currently the simulation does not check whether a bike is available when a customer arrives, so the number of bikes at a location can be
negative. That's not very realistic. Here's a version of `bike_to_augie` that fixes the problem:

In [None]:
def bike_to_augie(state):
    if state.moline > 0:
        state.moline -= 1
        state.augie += 1
    return state

The first line checks whether the number of bikes at Moline is greater than zero. If not, it skips to the return line of the function.  So if there are no bikes at Moline, the state is unchanged.

<br>

We can test it by initializing a state with no bikes at Moline and calling `bike_to_augie`.

In [None]:
bikeshare = pd.Series(dict(augie=12,moline=0),name="Number of Bikes")
bike_to_augie(bikeshare)

augie     12
moline     0
Name: Number of Bikes, dtype: int64

The state of the system should be unchanged.  No more negative bikes (at least at Moline)!

## Exercises

---

<br>

🟨 🟨

### Exercise 1

Here is a copy of `run_simulation`.  Add an inline comment (with #) above each of the lines of code.  Each comment should explain what that line does.  Remember that a comment should have its own line, and should be *above* the line it is commenting on.

In [None]:
def run_simulation(ptm, pta, iAug, iMol, num_steps):
    state = pd.Series(dict(augie=iAug,moline=iMol),name="Number of Bikes")

    for i in range(num_steps):
        state = change_func(state, ptm, pta)

    return state

---

<br>

🟨 🟨

### Exercise 2

Modify `bike_to_moline` so it checks whether a bike is available at Augustana.  To test it, create a `bikeshare` state object, run `bike_to_moline`, and check the result.  Hint: use the updated `bike_to_augie` as a model.

In [None]:
# Define new bike_to_moline


In [None]:
# Test function


---

<br>

🟨 🟨

### Exercise 3

Now run the simulation with parameters `ptm=0.5`, `pta=0.4`, and `num_steps=60` (that is, 15 hours), and confirm that the number of bikes is never negative.

<br>

Start with this initial state listed below.  You may need to use a higher value for num_steps to check the solution (or play with the values for `ptm` and `pta`).

In [None]:
# This is the initial state
bikeshare = pd.Series(dict(augie=10,moline=2),name="Number of Bikes")

In [None]:
# Test run_simulation
run_simulation(0.5, 0.4, 10, 2, 60)
pd.DataFrame(bikeshare)

Unnamed: 0,Number of Bikes
augie,10
moline,2
augie_empty,0
moline_empty,0
