### PNI Biomath Bootcamp 2016
# Day 2 Exercises
---
### Using Jupyter Notebooks:
To run a cell and advance to the next cell, press `Shift + Return`

To run a cell without advancing to the next cell, press `Control + Return` 

You can find a variety of shortcuts at **Keyboard Shortcuts** in the Help menu above

**If you're confused:** Google and Python are the best of friends! Throw a few words describing your problem into Google and click on the first Stack Overflow link — this will solve 95% of your problems!

If you would simply like to know more about a particular function, press `Shift + Tab` while inside the function to bring up a snippet of documentation; press `Tab` again (while still holding `Shift`) to bring up an even larger box of documentation; a third press of `Tab` will turn the bottom half of your screen into a window with the full documentation for your function (including definitions of the function's inputs, outputs, parameters and their default settings, and often some example code!)

---

Before we start let's run some magic commands to automatically save our progress once a second (with `%autosave 1`) and force all graphics from the `matplotlib` package to be displayed inline (with `%matplotlib inline`)

In [None]:
%autosave 1
%matplotlib inline

**1)** Use the Python function given to you below, `myrand`, to write your own n-sided `dice` function. That is, write a
function that outputs an integer from 1 to *n* with equal probability. Allow the user to specify the value *n*.

In [None]:
### Using the function myrand(), define your own dice(n) function below
from Day2_Helper import myrand


**2)** Expand on your dice function to add a second input parameter *K* that allows the user to
specify the number of rolls of the dice to be returned. The rolls should be returned as
a vector. For example, if you ask for 6 rolls of a 3-sided dice, you might get

> \>\> dice(3,6)

> [1 0 2 2 1 0]

NumPy has the function `np.random.randint()` that does this, but you should write your function using `myrand`.

In [None]:
### Expand on your previous function to define dice(n,k) below


"**3) Histogramming some of your results.** \n",
You'll use `matplotlib`'s `hist()` function in this problem. This function takes a vector of numbers, puts them into bins, and makes a bar plot for how many are in each bin. By default it uses 10 equally spaced bins, but you can use the optional \"bins\" argument to tell it where to put the bin edges. For example, calling it with \"`bins=np.arange(-0.5, 6, 1)`\" will put bin edges at `-0.5, 0.5, 1.5, 2.5, 3.5, 4.5, 5.5`; correspondingle, the bin *centers* will be at `0, 1, 2, 3, 4, 5`.  To see an example, run:,

        x = [0, 0, 1, 2, 3, 4, 5, 5, 5, 5]
        plt.hist(x, bins=np.arange(-0.5, 6, 1))

**The problem itself:** Use the function you just wrote to roll your Python dice *K* times, where *K* is some
large number. Use `plt.hist()` to make a histogram the number of times you got `dice=1`, `dice=2`, `dice=3`, ... , `dice=n`. If the distribution is uniform, the fraction of rolls that came out in each of these bins should be 1/*n*. Are you close to that?


In [None]:
### Call your function K times and then plot the results as a histogram below
from matplotlib import pyplot as plt


**4)** To assess the answer to the previous question quantitatively, compute *E*, the mean
squared difference between the expected fraction (1/*n*) and each of the values you obtained. In other words, compute the mean, over all the histogram bins, of 

$$(\frac{1}{n} - \frac{\text{bin height}}{K})^2$$

To do this, you can use the `np.mean()` function, as well as the fact that NumPy arrays allows arithmetic operations on vectors (for example, if `vec = [1 2 3]`, `vec + 1 = [2 3 4]`).

In [None]:
### Write code to calculate E below
### TIP: It may be useful to change the output of dice from a list to a numpy array
import numpy as np


**5)** Compute *E* for many different *K*s, and make a plot of *E* as a function of *K*. What do
you observe?

In [None]:
### Write code to compute E's and plot below
### TIP: since you're computing E many times, it would be useful if your solution to #4 were a function


**6)** The Day_2 folder containing this notebook also contains a file called `joyce.mat`. Within Python, you can load easily MATLAB data using the SciPy function `loadmat()`, done for you below. This will create a variable called `mystring` that contains some text. We also define a new string called `vowels`.


Using a loop through the elements of your `vowels` string, compute how many times
each vowel appears in the text of `mystring` (i.e., how many ‘a’, how many ‘e’, how many ‘i’,
etc.). 

In [None]:
### Here we load the string from the .mat file for you (try printing mystring to see what it says!)
from scipy.io import loadmat
mystring = loadmat('joyce.mat')['str'][0]
vowels = 'aeiou'

In [None]:
### Count how many times each vowel appears in mystring below


**7)** Write a `first_sentence` function that takes a string as input, then extracts and returns the first
sentence of the string (i.e., all the characters up to and including the first period; if the
string contains no period, then return the original string).

In [None]:
### Define your function first_sentence below


**8)** Write a function `replacer` that takes a string as input,
replaces every letter “e” with an “X”, and returns the new string. Use your
`first_sentence` to test your new `replacer` function with the first sentence of the
text. Then use `replacer` on the entire text.

In [None]:
### Define your function replacer below


**8b)** Write a second version of your solution to problem 8, but this time the function should use list comprehension.

Hint 1: in Python, strings are iterables, meaning you can iterate through the letters like you would through the elements of a list. 

Hint 2: if you have a list of characters, you can convert it to a string using the ```join``` method. Here is an example:

In [None]:
my_string = 'hello world'
print(my_string)

for char in my_string:
    print(char)

my_string_as_a_list = list(my_string)
print(my_string_as_a_list)

my_string_as_a_string_again = ''.join(my_string_as_a_list)
print(my_string_as_a_string_again)

In [None]:
# write and test another solution to problem 8 using list comprehension


**9)** Write a function `replacerD` that takes a string as input,
then replaces every letter “e” with an “X”, **but only for those “e”s that are followed
by a “d”**, and returns the new string. Once again, test it on the first sentence of the
text, and then run it on the entire text.

In [None]:
### Define your function replacerD below


**10)** We’re going to examine the text you loaded from the `joyce.mat` file again. Expand
your function from problem 6 so that now you count the frequency of occurrence of
every character in the text file. To help you do this, the function `unique` has been provided below.
This function takes a string and returns a unique sorted list of each of the values in the vector. For example,

> \>\> unique(‘open sesame’)

> ‘ aemnops’

Note how the space character, ' ', is included. All special charcaters will be included and uppercase letters will be treated as different then lowercase letters.

*Extra Credit:* Modify the `unique` function in the accompanying `Day2_Helper.py` file such that it returns only letters and treats uppercase and lowercase the same. Be sure to `%reload` the module in the notebook after making changes!

In [None]:
### Use the provided function to calculate the frequency of every charcter in mystring below
from Day2_Helper import unique


**11)** Write an equivalent code using a series of *FOR* loops and *IF* statements for the list comprehension discussed in lecture:

> \>\> N = 3

> mylist = [[[x,y] for x in range(0,N)] for y in range(0,N)]

Remember the order of execution of *FOR* statements in a list comprehension.