# Libraries

This notebook contains several examples of using the numerical calculations package, numpy, and the plotting package, matplotlib. 

## Essential Exercises

### Exercise 1

Explore the numpy random sampling [documentation](https://numpy.org/doc/stable/reference/random/index.html)).

Below is an example of how to generate a histogram using numpy. Create an array of 1000 random (uniformly distributed) values in an array of shape (1000), and compute the histogram of your array.

In [None]:
import numpy as np

datapoints = [95, 3, 4, 24, 18, 23, 84, 76, 58, 2, 34, 67, 52, 1, 8, 99, 38, 74, 50, 62]
hist_values, bin_edges = np.histogram(datapoints, bins=10)
print(hist_values)

### Example 2

Explore the numpy statistics [documentation](https://numpy.org/doc/stable/reference/routines.statistics.html).

Using numpy, calculate the mean and variance of the random values you generated in Example 1.



### Example 3

Now create your own functions that take a list and returns its mean/variance. (A numpy array, my_array, can be converted into a list using my_array.tolist()) 

### Example 4

Matplotlib is a powerful, but complex, plotting package. Fortunately, it includes a module, pyplot, which provides a simple interface for generating quality plots. If you've ever used MATLAB, you'll find the pyplot interface very familiar. Below is an example of plotting a [histogram](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.hist.html) using pyplot.

Using pyplot, create a bar plot showing the histogram of the random values generated in Example 1. Next, create a [bar plot](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.bar.html) showing the difference between your implementation of the mean and varaince, and that of numpy. Verify that both implementations yield identical results.

In [None]:
import matplotlib.pyplot as plt

datapoints = [95, 3, 4, 24, 18, 23, 84, 76, 58, 2, 34, 67, 52, 1, 8, 99, 38, 74, 50, 62]
 
plt.hist(datapoints) 
plt.title("Histogram")
plt.ylabel("Count")
plt.xlabel("Bins")
plt.show()


### Exercise 5

Documentation is key to understanding how to use a library. Write some documentation for your mean and variance functions from Exercise 3, describing what the functions do, what arguments they take, and what values they return. A user should be able to read your documentation and use your function without seeing any of the code for your function.

## Optional Exercises

### Exercise 6

Keyword arguments are named arguments of a function, that can be passed to the function in any order. Often, keywords are used to specify optional behaviour, for example, which method a function should use, and can be given a default value that is used if the keyword is not specified when the function is called.

Write a single function that takes an array of values, together with a 'statistic' keyword argument that can be either 'mean', 'median', 'mode', or 'histogram', and computes the appropriate statistic for the values. You can use the following function definition to get started.

In [None]:
def compute_statistic(values, statistic='mean'):
    pass

### Exercise 7

Revisiting Lesson 8 on multi-dimensional arrays and raster graphics, use the numpy code provided below to load the set of pixel values in the giraffe.pgm file (remember, you'll need to upload the file to Google Drive, or use google.colab.files.upload()). Then use your newfound matplotlib skills to graph the histogram of the file.


### Exercise 8

The marks and labels along the axes of a plot are stored as sets of variables known as 'ticks'. For example, 'xticks' determines the position of the markers, while 'xticklabels' determines the labels displayed at each marker. Using the [documentation](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.xticks.html) as a starting point, revisit the bar plot from Example 4, and plot one bar for your implementation, and another for the numpy implementation. Modify the xticks to label the bars 'Mine' and 'Numpy' respectively. You can use the keyword arguments of xticks() to modify the properties of the labels, such as the font size and rotation.

### Exercise 9

Modify your function from Exercise 6 so that it plots a histogram if the 'histogram' option is used.

### Exercise 10

Use the [time](https://docs.python.org/3/library/time.html) module from the Python standard library to determine how long your implementation of mean/variance takes, and compare it to the time taken by the numpy implementation. Repeat the time test for 100 iterations, and plot the timing differences as a bar plot with well labelled axes. You can use the following code snippet to get started (you might also want to think about why the time difference for the snippet below is not exactly two seconds).

In [None]:
import time

start_time = time.time()

# Run a set of code/functions
# For now, simply sleep for 2 seconds
time.sleep(2)

total_time = time.time() - start_time

print(total_time)


### Exercise 11

Repeat the timing test in Exercise 10 for arrays of different numbers of random values. How does the timing difference change for arrays of size 1,000, 10,000, 100,000, 1,000,000? Create a plot to visualise your results, and label the axes appropriately.