Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [1]:
NAME = "Adam Ten Hoeve"
COLLABORATORS = ""

---

## Introduction

The purpose of this notebook is to:
1. Verify you can run Jupyter notebooks and have all the necessary packages
2. Provide some practice problems for brushing up on Python, Matplotlib, and Numpy

## Environment Verification

The two cells below load the required modules. You should be able to load all these modules. If you cannot load a module, you should `conda install` it if using Anaconda. If using COLAB, you need to consult the COLAB docs on how to run a `pip install` command from the notebook.

In [2]:
# If you need extra Python import statements, or pip install (for COLAB), you can add them to the below cell.

In [3]:
import numpy as np
import matplotlib as mpl
import sklearn
import functools
import math
import time
from scipy.stats import norm
import matplotlib.pyplot as plt

In [4]:
print("Numpy: {}".format(np.__version__))
print("Matplotlib: {}".format(mpl.__version__))
print("SkLearn: {}".format(sklearn.__version__))


Numpy: 1.16.3
Matplotlib: 2.2.2
SkLearn: 0.20.3


## Python and Numpy Review

This section consists of several problems that will help you review basic Python scripting. If you are completely new to Python or need a refresher, you should explore explore the thousands of tutorials on-line or see TAs for references. Note that we are using Python 3 (specifically >=3.6). Note that these exercises below are not exhaustive. 

The practice problems focus on things that are potentially useful and basic:

1. Open a file and read data from storage (a file on the local disk)
2. Store the data in some data structure that allows us to inspect and iterate over the data
3. Plot the results (using Matplotlib)

### Basic File I/O, Functions, Arrays, Dictionaries 

To test our system, we need data, but we don't have any on hand. So we will write some functions to generate fake data. This is more work, but allows us the flexibility for better testing. You will:
1. Write a function to generate some fake data.
2. Use the above function to write the fake data to a file.
3. Write a function to re-read.

Use only Python functions (not Numpy yet) to accomplish the above. The function specs are given for you.

### Problem 1.1: Generate synthetic test data

In [11]:
random.normalvariate(10, 1)

2.0

In [14]:
# hint: use Python's default "random" module
import random
import math

''' generate_data
Args:
    mean : Python float
    variance : Python float
    num_samples: How many samples to return (defualt = 1)
    
Returns: 
    A list of numbers drawn from a normal distribution with mean 'mean' and variance 'variance'
    
Hints: Basic list manipulations, random number generation
Hints: Use list comprehension, can be done in one or two lines of code.
'''
def generate_data(mean : float, variance :float, num_samples :int):
    data = []
    # YOUR CODE HERE
    data = [random.normalvariate(mean, math.sqrt(variance)) for i in range(num_samples)]
    return data

Each problem in HW assignments will ask you to write code in the above form. We also provide test cases that verify your code. Passing the test is part of the grade. There may be additional held-out tests, so be sure to test your code sufficiently (you can add additional code in the same cell above.)

In [17]:
# A simple (but not the only) for 1.1
def simple_test_1_1(test_mean, test_var, num_iterations=2000):
    num_samples = 100
    print("Test: mean: {}, var: {}, num_samples: {}".format(test_mean, test_var, num_samples))
    samp_mean_var = test_var / num_samples
    num_pass = 0
    for i in range(num_iterations):
        test_data = generate_data(test_mean, test_var, num_samples)
        stat = functools.reduce( lambda x,y: x+y, test_data, 0.0) / num_samples
        stat_error = abs((stat - test_mean)/math.sqrt(samp_mean_var))
        error_bounds = norm.ppf(0.975)
        run_result = stat_error < error_bounds
        if(run_result): 
            num_pass += 1
    pass_rate = num_pass / num_iterations
    print("Pass rate:", pass_rate)
    return pass_rate

assert(simple_test_1_1(1.0, 2.0) > 0.94)
assert(simple_test_1_1(2.0, 2) > 0.94)
assert(simple_test_1_1(-1.0, 0.4) > 0.94)
assert(simple_test_1_1(-3.0, 0.01) > 0.94)
assert(simple_test_1_1(-3.0, 10.0) > 0.94)

Test: mean: 1.0, var: 2.0, num_samples: 100
Pass rate: 0.9475
Test: mean: 2.0, var: 2, num_samples: 100
Pass rate: 0.955
Test: mean: -1.0, var: 0.4, num_samples: 100
Pass rate: 0.9595
Test: mean: -3.0, var: 0.01, num_samples: 100
Pass rate: 0.9525
Test: mean: -3.0, var: 10.0, num_samples: 100
Pass rate: 0.944


### Problem 1.2: Write synthetic test data to a local file 

In [None]:
''' write_data_file
Args:
    data_array: the Python list 
    
Actions: 
    Write the elements of data_array as default-formatted floating points strings to 
    the file "./test.txt", one element per line.
'''
def write_data_file(data_list : list):
    # Your code here
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
def test_file_write(test_list):
    write_data_file(test_list)
    with open("./test.txt", "r") as infile:
        numbers = infile.readlines()
    num_list = [float(num.strip()) for num in numbers]
    if(len(num_list) != len(test_list)):
        return False
    for idx, num in enumerate(num_list):
        if(num != test_list[idx]):
            return False
    return True

assert(test_file_write([1.0, 1.1, 2.0, 3.4, 4.3]))
assert(test_file_write([1.4333]))

### Problem 1.3: Read synthetic data and create a dictionary from it.

The function opens "./test.txt", reads in the data array.
You can use the test_file_write function above for reference.
However, we would now like to return a list of dictionaries. 
The list should contain one dictionary for each element read from the file.
Each dictionary should have the following format:

```
{
    "data": (float read from file),
    "index": (index of file line starting from 0),
    "time": (a time stamp using python's time.time(), this returns unix time in seconds as a float)
}
```
    

In [None]:
'''read_data_file
Args:
    none
    
Returns:
    See above
    
Hints: Can be accomplished in two-four lines.
'''
def read_data_file():
    data_dict_list = []
    #Your code here
    # YOUR CODE HERE
    raise NotImplementedError()
    return data_dict_list

In [None]:
test_data = [1.0, 2.0, 3.0, -1.1, 2.3]
write_data_file(test_data)
test_result = read_data_file()
assert(type(test_result) is list)
for idx, i in enumerate(test_result):
    assert(type(i) is dict)
    assert(type(i["time"]) is float)
    print(i["time"])
    assert(i["index"] == idx)
    assert(i["data"] == test_data[idx])

### Part 2: Matplotlib Familiarization

Now we will use Matplotlib to create some graphs. This section is manually graded, where we will inspect the graphs you have plotted for completeness.  

Each graph should have the following:

1. A title
2. Labels on the X/Y axis
3. Multiple plots answering one question should be combined into one plot using subplot whenever possible.

Since we are just making up these data and plots, use generic names for the labels, as shown in the worked example below.

Plot 1 (worked example): Generate some random data using generate_data (~100 samples), write_data_file and read_data_file. Plot the data as a line graph using the time attribute in milliseconds as the x-axis and the "data" attribute as the y axis. The x axis should be an offset from $t_0$ where $t_0$ is the first time stamp in the dictionary list.

In [None]:
example_data = generate_data(10.0, 1.0, 100)
write_data_file(example_data)
example_data = read_data_file()

t_0 = example_data[0]["time"]
time_stamps = [(entry["time"]-t_0)*1e3 for entry in example_data]
data_series = [entry["data"] for entry in example_data]

plt.plot(time_stamps, data_series)
plt.xlabel("Time offset (msec)")
plt.ylabel("Signal mu=10, sigma=1.0")
plt.title("Signal over time, 100 samples")
plt.show()

### Plots 2-3
Do the same as the above plot, but plot 2 different "signals" with the same mean, but different deviations. Create one plot with these two lines plotted on the same axis. Then, create a second plot which has two sub-graphs ("subplots") stacked vertically, with one line per subplot.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

### Plot 4

Again, create two different series of random data, now with different means and different deviations. Create a scatter plot using the first series for the x-axis and the second series for the y-axis. Make sure all points less than a certain distance (decided by you) from the center of mass of the distribution are marked with triangles. All other points should be marked by circles. Choose the distance so roughly half the points are triangles and half are circles.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

### Numpy Familiarization
The absolute best way to learn NumPy is to work through the ENTIRE [quickstart](https://docs.scipy.org/doc/numpy/user/quickstart.html) here. When you are comfortable with that, you should be ready for nearly anything we throw at you. Note that NumPy replaces nearly everything you can do in Python that involves manipulating homogenously-typed arrays of numbers. We provide the below exercise that mimics problem 1.1. Look thorugh the NumPy docs to find the correct function call to make. It can be done in 1 line.

In [None]:
''' generate_data
Args:
    mean : Python float
    variance : Python float
    num_samples: How many samples to return (defualt = 1)
    
Returns: 
    A Numpy-Array of numbers drawn from a normal distribution with mean 'mean' and variance 'variance'
    
Hints: Use np.random module. Can be done in one line of code.
'''
def generate_data_numpy(mean : float, variance :float, shape :tuple):
    data = np.zeros(shape)
    # Your code here
    # YOUR CODE HERE
    raise NotImplementedError()
    return data

In [None]:
test_data = generate_data_numpy(1.0, 1.0, (2, 3))
assert(test_data.shape == (2, 3))


def simple_test_3_1(test_mean, test_var, shape=(10,10), num_iterations=2000):
    num_samples = np.prod(shape)
    print("Test: mean: {}, var: {}, num_samples: {}".format(test_mean, test_var, num_samples))
    samp_mean_var = test_var / num_samples
    num_pass = 0
    for i in range(num_iterations):
        test_data = generate_data_numpy(test_mean, test_var, shape)
        stat = np.sum(test_data)/num_samples
        stat_error = abs((stat - test_mean)/math.sqrt(samp_mean_var))
        error_bounds = norm.ppf(0.975)
        run_result = stat_error < error_bounds
        if(run_result): 
            num_pass += 1
    pass_rate = num_pass / num_iterations
    print("Pass rate:", pass_rate)
    return pass_rate

assert(simple_test_3_1(-4, 2.0, (100, 10)))
assert(simple_test_3_1(10, 0.1, (10, 10)))
assert(simple_test_3_1(3, .01, (1, 10)))