# Data Fundamentals (H)
John H. Williamson -- Session 2018/2019

----
<font color="red"> Read the submission instructions at the bottom of this notebook **carefully** before submitting </font> 

**This submission must be your own work; you will have to make a Declaration of Originality on submission.**

Note that marks shown when tests pass are **provisional** and could change after grading.

In [2]:
NAME = "Ross Angus" ## fill these in 
STUDENT_ID = "2244073A"  ## e.g. 2222222

---

## Lab 3: Assessed
# Scientific visualisation


### Notes
It is recommended to keep the lecture notes open while doing this lab exercise.

I recommend reading the lecture notes supplement *"Criticising Visualisations"* on Moodle as a quick reference summary. 

**This exercise is assessed**. Make sure you upload your solution by the deadline. See the notes at the bottom of this notebook for submission guidance.  This exercise is manually graded. Marks are out of 60. Each cell shows the marks available at the top (e.g. `## 5 marks`). `summarise_marks()` will not do anything in this lab, as there is no automarking.

### References
If you are stuck, the following resources are very helpful:

* [Introduction to Matplotlib](https://jakevdp.github.io/PythonDataScienceHandbook/04.00-introduction-to-matplotlib.html)
* [Matplotlib command summary](https://matplotlib.org/api/pyplot_summary.html)




In [3]:
# Standard imports
# Make sure you run this cell!
from __future__ import print_function, division
import numpy as np  # NumPy

# make the plots look good inline
%matplotlib notebook
# Set up Matplotlib
import matplotlib as mpl   
import matplotlib.pyplot as plt
print("Everything imported OK")


Everything imported OK


## Purpose of this lab
This lab should help you:
* understand how to use Matplotlib for basic plotting tasks
* create simple, clean and correct 2D plots of two variables 
* create plots with multiple conditions
* plot basic statistics of datasets, representing uncertainty appropriately
* explicitly criticise existing visualisations and  suggest and implement concrete suggestions to improve them


# matplotlib Tutorial
We'll go through the first example from the lecture notes. You'll need to apply these ideas yourself later, so make sure you understand what happens here. This part of the lab is for information, and is not part of the assessment.

**Follow this tutorial carefully before attempting the lab exercise below**

In this example, the plotting commands are split up among notebook cells so that each step can be explained. In your code, just have all of the commands in one cell, to avoid having to scroll up and down as you make changes.

## Some data
This data is synthetic. It's a simple trigonometric function; the details don't particularly matter.

In [4]:
# a simple function, returns pulses with a shape determined by k
def pulse(x, k):
    return np.cos(x) * np.exp(np.cos(x) * k - k)

## generate an x value to be transformed
x = np.linspace(-3*np.pi, 3 * np.pi, 500)

## Figures
To begin any plotting we must create a **figure**, which is a "blank canvas" onto which we can add visualisations. **Important: the visualisation will always appear in the output of whichever cell has the `plt.figure()` call.** As a consequence, all of the commands below will affect in the output of the cell below.

When you go through the various steps below, scroll back up to this cell to see their effect. Note that usually all plotting commands go in *one* cell, so we don't end up scrolling about.

In [5]:
fig = plt.figure()  # create a new figure. It will be blank.

## If you want a different size of figure, you can use:
# fig = plt.figure(figsize=(3,3)) # quite small
# the default size set here is good for this exercise

<IPython.core.display.Javascript object>

## Axes
To draw anything, we must define **axes**. Each axes is a facet of a plot. It has a coordinate system which can be used to draw data. 

The call to create a new axis is formatted `fig.add_subplot(rows, columns, index)` which will create a subplot in a matrix of axes indexed by the index. The index increases column-wise, then row-wise, and starts from *1* (not 0!)

For example, we could create a 3x2 array of plots, and select the middle-left plot
using `plt.add_subplot(3, 2, 3)`

        --------
       | 1 | 2 |
       | 3 | 4 |
       | 5 | 6 |
       ---------

Most of the time, though, we just want one axes that fills the figure and `fig.add_subplot(1,1,1)` does that. The object it returns is what we use for all subsequent plotting.

In [6]:
ax = fig.add_subplot(1, 1, 1)  
# create a new subplot, returning a set of axes
# look above ^ ^ at the figure. You should see the axes appear


We make a line plot of `x` against `f(x, k)` for a few fixed values of `k`.
Each subsequent plot will be a new color, and all of the plots will be overlaid on the axes

`ax.plot(x,y)` is the basic line plotting command. It is called on an axes object.

Note that the `label=` gives a label that the `legend` command will use to label the graph afterwards. Always label plots if you want readers to be able to distinguish them.

In [7]:
ax.cla()  # cla means to CLear Axes. 
# it does nothing the first time we run it, but it will clear the plot and redraw if
# you run this cell multiple times. Try commenting it out and running this cell twice!


ax.plot(x, pulse(x,1), label='k=1')
ax.plot(x, pulse(x, 5), label='k=5')
ax.plot(x, pulse(x, 100), label='k=100')

# you can adjust the styling of the plot manually: 
#   here the color is black ("k") 
#   and the linestyle is dotted (":")
ax.plot(x, pulse(x, 500), label='k=500', color='k', linestyle=':')

## note that there are several built-in colors called
# C0, C1, C2, C3, C4, C5, and C6
# they will generally look good
# try changing the 'k' above to 'C6'

[<matplotlib.lines.Line2D at 0x7fbe916dff60>]

**Look above to see the result**. Notice the different colouring. Repeated plots on one axis create new **layers** in the visualisation.

Now we can add **labels** to the plot. There should always be a label for the x-axis, y-axis and a title for the axes. We should also have a **legend** if multiple layers are used. `ax.legend()` will draw one. It can be configured in many ways, but the defaults are fine here.

In [8]:
# label the plot 
ax.set_xlabel("Phase (radians)")  # x-axis label
ax.set_ylabel("Amplitude")        # y-axis label
ax.set_title("Pulse wave function for various $k$")  # title of plot (appears above plot)

# create a legend (key) for the plot, using the labels specified
# in the ax.plot() calls, like ax.plot(x,y, label="xy")
ax.legend()

<matplotlib.legend.Legend at 0x7fbe91620cf8>

By default, the scaling of the axis will be adjusted to fit the data. This isn't always a good idea, so you can adjust it manually. The axis limits are set by `ax.set_xlim(min,max)` and `ax.set_ylim(min,max)` and these adjust the scaling of the axes. This configures the **coords** used to draw data.

You can try changing these to see different parts of the curve.

**Note that you don't need to call the plot commands again when you update the axis limits.**

In [9]:
## set the limits of the plot
# (if this is omitted, sensible autoscaling will be applied)
ax.set_xlim(np.min(x), np.max(x))
ax.set_ylim(-0.25, 1.2)

(-0.25, 1.2)

### Tweaking
We can tweak the plot in many ways. Try some of the below.

In [10]:
# grid 
ax.grid(True) # or False to turn it off

In [11]:
# frame
ax.set_frame_on(False) # or True to turn it back on

In [12]:
ax.set_xticks([-10, -5, 0, 5, 10]) # Tick positions on the x axis
ax.set_yticks([0,0.5,1.0]) # and on the y-axis

[<matplotlib.axis.YTick at 0x7fbe91976160>,
 <matplotlib.axis.YTick at 0x7fbe91976390>,
 <matplotlib.axis.YTick at 0x7fbe916c33c8>]

In [13]:
# this is a fancier tick adjustment
# Tick positions on the x axis
ax.set_xticks([-2*np.pi, -np.pi, 0, np.pi, 2*np.pi]) 

# we can relabel the ticks using the same order. 
# LaTeX formulae work if inside $ symbols
ax.set_xticklabels(["$-2\pi$", "$-\pi$", "0", "$\pi$", "$2\pi$"])

[Text(-10,0,'$-2\\pi$'),
 Text(-5,0,'$-\\pi$'),
 Text(0,0,'0'),
 Text(5,0,'$\\pi$'),
 Text(10,0,'$2\\pi$')]

The standard colours in matplotlib are shown below (you can also specify custom colours)

In [14]:
## Standard colours
fig = plt.figure(figsize=(10,2))
ax = fig.add_subplot(1,1,1)
fig.set_facecolor("#f0f0f0")  # can always use Hex colors, or floating point arrays
for i,col in enumerate(["C0", "C1", "C2", "C3", "C4", "C5", "C6", "C7", 
                        "r", "g", "b", "c", "m", "y", "k", "w"]):
    
    # plot, and add some simple text
    ax.plot(i, 0.5, c=col, marker='s', markersize=20)
    # alpha sets opacity of rendering
    ax.text(i, 0.5+0.15, col, ha='center', color=col, alpha=0.5)
    
ax.set_ylim(0,1) # set axis limits    
ax.axis("off") # remove axis; there are no units to show
 
    

<IPython.core.display.Javascript object>

(-0.75, 15.75, 0.0, 1.0)

---------

# 1. Simple plots [30 minutes]

For these exercises, you need to plot graphs showing the data which is provided to you. To get full credit you must:
    
* choose the right kind of plot (line, scatter, bar, histogram). There may be more than one right choice.
* plot the data correctly
* make sure all the details are sensible (axes, labelling, etc.)
* **write a short caption for the data in the cell provided.**

You will get the name of the file with the data, along with a comment that explains the format of the data. You can use `np.loadtxt()` to load the datasets.

You will have to look at the lecture notes and/or the documentation to complete this exercise.


A)
* Data file: `data/cherry_trees.txt`
* Description: Height and volume of black cherry trees  measured in an orchard.
* Columns:
  
       Height (ft)  Volume (ft^3)

Note: plot your graph in **metric units**. 1 ft = 0.3048m
       
    

In [15]:
## 4 marks
# YOUR CODE HERE

from matplotlib import *
import sys
from pylab import *

cherry = np.loadtxt("data/cherry_trees.txt")
cherry_height = (cherry[:,0]/3.2808)
cherry_volume = (cherry[:,1]/35.314667)


fig = plt.figure()
ax1 = fig.add_subplot(1, 1, 1)
ax1.cla()
ax1.plot(cherry_height, 'black', label='Height (m)')

fig = plt.figure()
ax2 = fig.add_subplot(1, 1, 1)
ax2.plot(cherry_volume, 'red', label='Volume (m^3)')


ax1.set_xlabel("Tree number") 
ax1.set_ylabel("Height (m)") 
ax1.set_title("cherry trees")  
ax1.legend()

ax2.set_xlabel("Tree number") 
ax2.set_ylabel("Volume (m^3)") 
ax2.set_title("cherry trees")  
ax2.legend()

##remember to change to metric later

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x7fbe90514b70>

### Caption [1 mark]
doesn't seem to be much correlation between cherry volume and cherry tree height. However, it can be seen that the tallest tree did indeed produce the greatest volume of cherries, but the rest of the data doesnt correlate so clearly. I split it into 2 graphs cos both on one figure didnt seem so clear as their Y values differ greatly.

B)
* Data file: `data/air_passengers.txt`
* Description: The number of international air passengers, each month, 1949 to 1960.
* Columns:

      year   passenger_count


In [6]:
## 4 marks
# YOUR CODE HERE
from matplotlib import *
import sys
from pylab import *

passengers = np.loadtxt("data/air_passengers.txt")
passengers_p_year = passengers[:,0]
passengers_count = passengers[:,1]

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.cla()

ax.plot( passengers_p_year, passengers_count, 'red', label='passenger_count', marker = ".")

ax.set_xlabel("year(-ish)") 
ax.set_ylabel("passengers") 
ax.set_title("The number of international air passengers, each month, 1949 to 1960")  
ax.legend()


<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x7f5ac1f3d358>

### Caption [1 mark]
clear upward trend for average monthly passengers. Crests for summer and troughs for winter seem obvious

C) 
* Data file: `data/rivers.txt`
* Description: Length of major rivers in the United States (miles)
* Columns:
   
       river_length



In [18]:
## 6 marks
# YOUR CODE HERE

from matplotlib import *
import sys
from pylab import *

rivers = np.loadtxt("data/rivers.txt")

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.cla()

ax.hist( rivers, bins = 50, label='River Length', color = "navy")

ax.set_xlabel("River Lengths") 
ax.set_ylabel("number of rivers with given length") 
ax.set_title("Length of major rivers in the United States (miles)")  
ax.legend()


<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x7fbe903a6128>

### Caption [1 mark]
uhhh, yeah, theres a selection of rivers in the US, from short to long... most of them fall within the first quartile of length though.


# 2. Layered and faceted plots [45 minutes]

A **layered** plot has more than one set of geoms overlaid on the same coordinate system. A **faceted** plot uses multiple coordinate systems to show different views of the data.

For the dataset, appropriately use layering, faceting and reduction operations to show the dataset. 


<img src="imgs/chocolate.jpg"> <br><br>*[[Image](https://flickr.com/photos/myhsu/3146912657 "Black As Chocolate") by [myhsu](https://flickr.com/people/myhsu) shared [CC BY-ND](https://creativecommons.org/licenses/by-nd/2.0/)]*

A)
* Data file `data/cake.txt`
* Description: 
>Data on the breakage angle of chocolate cakes made with three different
recipes and baked at six different temperatures. The angle of breakage is affected by the recipe and temperature. The experiment was repeated 15 times (replicates).

* Columns:

        replicate(1-15)    recipe(0-2)    temp(deg F)    angle(deg)

Use this model:
* Facet `recipes`
* Layer `replicates`

* Colour each replicate identically, and use lowered opacity.

* As well as the layered replicates, clearly show the mean and standard deviation of the breakage angle in each facet as a line geom and a ribbon geom.

* Convert Fahrenheit to Celsius before plotting. 

* `plt.tight_layout()` will fix layout of facets. Set a super-title across all facets using `fig.suptitle()`. 

* You will need one or more `for` loops (probably) to solve this problem.
* Use Boolean arrays to perform `group by` like operations.



In [8]:
# 10 marks

def celsius(f):
    return 100.0 * ((f - 32) / (212-32))

from matplotlib import *
import sys
from pylab import *

cake = np.loadtxt("data/cake.txt")
cake.T
cake_replicate = cake[:,0]
cake_recipe = cake[:,1]
cake_temp = cake[:,2]
cake_temp = celsius(cake_temp)
cake_angle = cake[:,3]

x = np.arange(90)

r1 = cake[::3, :]
r2 = cake[1::3, :]
r3 = cake[2::3, :]

fig = plt.figure(figsize=(10,10))

for ix,third,recipe in zip([1,2,3],[r1,r2,r3],['recipe1', 'recipe2', 'recipe3']):
    plt = fig.add_subplot(2,2,ix)
    
    plt.bar(x, third[:,0]*3, width = 1, bottom = None, alpha = 0.7, color = "pink", label = 'replication * 3')
    plt.bar(x, celsius(third[:,2])/10, width = 0.8, bottom = None,  color = "red", label = 'temperature / 10')
    plt.bar(x, third[:,3],width = 1, bottom = None, alpha = 0.6,  color = "orange", label = 'cutting angle')

    plt.set_xlabel("x") 
    plt.set_ylabel("y (number, temperature(C), cutting angle(degrees))") 
    plt.set_title(recipe)  
    plt.legend()
    
    std = np.std(celsius(third[:,2]), axis = 0)



# YOUR CODE HERE

<IPython.core.display.Javascript object>

### Caption
* I won't lie, im unsure as to what I want this to look like. I'm going to leave it for now and hope for inspiration.

# 3. Some uncertainty [30 minutes]

You are provided with data on the effect of five insecticide sprays on populations of pest insects. Compare and contrast these sprays, **appropriately representing uncertainty**.


* Data file: `data/insects.txt`
* Description: The counts of insects on each leaf of a plant in agricultural experimental units treated with
different insecticides.
* Columns:

            insect_count spray_id (0-5)


* Plot the data, on three separate figures, using:
    * A simple bar chart of the mean insect counts (grouped by spray).
    * A barchart showing the mean counts (grouped), and half a standard deviation above and below the mean. Find a way to show this interval (hint: look at the [`plt.bar` documentation](https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.bar.html)). The standard deviation of an array can be computed by `np.std(x, axis)`, just like `np.mean()`.
    * A Box plot of the insect counts.

* Mark the ticks on the x axis using the names of the sprays.

        0 = Insecticator
        1 = Placebo
        2 = BuzzNoMore
        3 = Aprotex
        4 = DieOff

* For this section, you don't need to write the caption. Assume the caption is:

> Effectiveness of insecticides in a farm environment. Five different aerosol insecticides were tested.



In [9]:
# 4 marks
# load and group the data
# YOUR CODE HERE

insects = np.loadtxt("data/insects.txt")

insect_count = insects[:,0]
spray_id = insects[:,1]



In [10]:
# 2 marks
# plot the means
# YOUR CODE HERE
from matplotlib import *
import sys
from pylab import *

Insecticator = (spray_id == 0)
Placebo = (spray_id == 1)
BuzzNoMore = (spray_id == 2)
Aprotex = (spray_id == 3)
DieOff = (spray_id == 4)

fig = plt.figure()
plt = fig.add_subplot(1, 1, 1)
plt.cla()

for ix,spray, title,colour in zip([1,2,3,4,5],[Placebo,Insecticator,Aprotex,DieOff,BuzzNoMore],['Insecticator','Placebo','BuzzNoMore','Aprotex', 'DieOff'],['red','orange','yellow','green','blue']):
    plt.bar(ix, (np.mean(insects[spray, 0])), width = 1, bottom = None, alpha = 0.5, label = title, color = colour)
    
plt.set_xlabel("Spray ID") 
plt.set_ylabel("Mean Insect Count") 
plt.set_title("Effectiveness of insecticides in a farm environment")  
plt.legend()


<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x7f5ac1b25358>

In [21]:
# 2 marks
# plot the means with std. devs.
# YOUR CODE HERE

from matplotlib import *
import sys
from pylab import *

Insecticator = (spray_id == 0)
Placebo = (spray_id == 1)
BuzzNoMore = (spray_id == 2)
Aprotex = (spray_id == 3)
DieOff = (spray_id == 4)

fig = plt.figure()
plt = fig.add_subplot(1, 1, 1)
plt.cla()

for ix,spray, title,colour in zip([1,2,3,4,5],[Placebo,Insecticator,Aprotex,DieOff,BuzzNoMore],['Insecticator','Placebo','BuzzNoMore','Aprotex', 'DieOff'],['red','orange','yellow','green','blue']):
    plt.bar(ix, (np.mean(insects[spray, 0])), width = 1, bottom = None, alpha = 0.5, label = title, color = colour)
    
for ix,spray, title in zip([1,2,3,4,5],[Placebo,Insecticator,Aprotex,DieOff,BuzzNoMore],['Insecticator','Placebo','BuzzNoMore','Aprotex', 'DieOff']):
    plt.bar(ix, (np.mean(insects[spray, 0])+(np.std(insects[spray,0])/2)), width = 1, bottom = None, alpha = 0.5)

for ix,spray, title in zip([1,2,3,4,5],[Placebo,Insecticator,Aprotex,DieOff,BuzzNoMore],['Insecticator','Placebo','BuzzNoMore','Aprotex', 'DieOff']):
    plt.bar(ix, (np.mean(insects[spray, 0])-(np.std(insects[spray,0])/2)), width = 1, bottom = None, alpha = 0.5)    
    
plt.set_xlabel("Spray ID") 
plt.set_ylabel("Mean Insect Count") 
plt.set_title("Effectiveness of insecticides in a farm environment")  
plt.legend()



<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x7fbe8aaa3f28>

In [11]:
# 2 marks
# a Box plot
# YOUR CODE HERE
from matplotlib import *
import sys
from pylab import *

fig = plt.figure()
ax = fig.add_subplot(1,1,1)

ax.boxplot((insect_count[Insecticator],insect_count[Placebo],insect_count[BuzzNoMore],insect_count[Aprotex],insect_count[DieOff]), labels=["Insecticator", "Placebo", "BuzzNoMore", "Aprotex", "DieOff"], notch=True, bootstrap=1000)

ax.set_xlabel('Spray ID')
ax.set_ylabel('Insect Count')
ax.set_title("Insect Count Boxplot")

<IPython.core.display.Javascript object>

Text(0.5,1,'Insect Count Boxplot')

# 4. Constructively criticising visualisations [1.25 hour]
Write a short criticism of the plot below each one. Your criticism should reflect upon the scientific and aesthetic quality of the plots. You are provided with the code which generates the plots. **Copy the cell** that generates the plot and improve the flaws you found.

You criticism should be *a few bullet points* and not more. Note that you can format a bulleted list by using an asterisk at the start of a line:


    * this
    * will
    * be 
    * bulleted
    
when you edit the criticism cell.


## A: Earthquakes in California

* Dataset: Acceleration measurements at seismic stations placed around California, measuring the peak acceleration experienced during earthquakes, along with the distance of the station to the hypocenter of that earthquake.
* File: `data/cali_earthquakes.txt`
* Columns

       earthquake_id magnitude(Richter) station_id distance_to_hypocenter(km) acceleration(g) 

* Caption:

>    This plot shows the variation in acceleration at seismic monitoring stations as a function of distance to hypocentre of earthquakes in California. More distant stations measure smaller signals with some variation according to the strength of the originating earthquake.
    


In [None]:
earthquakes = np.loadtxt("data/cali_earthquakes.txt")
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
# name columns
earthquake, mag, station, distance, accel = [0,1,2,3,4]
plt.scatter(earthquakes[:,mag], earthquakes[:, accel], c=earthquakes[:, distance], cmap='Pastel2', s=300)


### Criticism [6 marks]

* The markers overlap on the x axis, not a terrible thing but perhaps theyre just a bit big
* they also overlap on the y, theyre probably always going to, but maybe they should be more transparent so you can see each marker
* axes aren't labelled, what is this graph even showing me?
* no title
* pastel colour sets suck for showing the magnitude of anything
* 

In [25]:
# 5 marks
# YOUR CODE HERE

from matplotlib import *
import sys
from pylab import *

earthquakes = np.loadtxt("data/cali_earthquakes.txt")
fig = plt.figure()
ax = fig.add_subplot(1,1,1)

earthquake, mag, station, distance, accel = [0,1,2,3,4]

#replaces cmap with plasma cos the bright oranges and yellows show how close the earthquakes were quite well
plt.scatter(earthquakes[:,mag], earthquakes[:, accel], c=earthquakes[:, distance], cmap='viridis', s=200, alpha= 0.5)

ax.set_xlabel('Earthquake Magnitude(Richter)')
ax.set_ylabel('Acceleration(g)')
ax.set_title("variation in acceleration at seismic monitoring stations")

plt.tight_layout()

<IPython.core.display.Javascript object>

## B: Reaction times and sleep

* Dataset: The average reaction time per day for subjects in a sleep deprivation study. On day 0 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time on a series of tests given each day to each subject.

* File `data/sleep_study.txt`
* Columns

         reaction time (ms)    sleep_deprivation (days)  subject_id (id)

* Caption:
> This plot shows how visual reaction time varies as subjects are deprived of sleep. Up to 10 days of sleep deprivation were tested.
    


In [26]:
sleep_study = np.loadtxt("data/sleep_study.txt")
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
reaction, sleep, subject = 0,1,2

# group each day
grouped =np.array([sleep_study[sleep_study[:,sleep]==i] for i in range(10)])
# take mean for each day and plot it
mean_reactions = np.mean(grouped, axis=1)[:,0] 
ax.plot(mean_reactions, np.arange(10))

# adjust axes
ax.set_xlim(0,1000)
ax.set_ylim(-2, 15)

ax.set_ylabel("Sleep deprivation")
ax.set_xlabel("Reaction time")

<IPython.core.display.Javascript object>

Text(0.5,0,'Reaction time')

## Criticism [6 marks]
* very hard to view as the plot itself appears small on this scale
* uhh, i dont like the colour, its pretty uninteresting. I'm not gonna change it though.
* no units on axes
* its just a really boring graph, ill try think of someway to make it more interesting.
* no title
* 3 columns are given but only 1 is plotted?
* why use the mean when you could plot the actual data? atleast do both, not just the mean

In [27]:
# 6 marks
# YOUR CODE HERE

from matplotlib import *
import sys
from pylab import *

sleep_study = np.loadtxt("data/sleep_study.txt")
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
reaction, sleep, subject = 0,1,2

# group each day
grouped =np.array([sleep_study[sleep_study[:,sleep]==i] for i in range(10)])
# take mean for each day and plot it
mean_reactions = np.mean(grouped, axis=1)[:,0] 
ax.plot(mean_reactions, np.arange(10), marker = '.')
#plt.scatter(sleep_study[:,sleep], sleep_study[:,subject], c = sleep_study[:,reaction], s=200, cmap='viridis', )

# adjust axes
ax.set_xlim(255,355)
ax.set_ylim(-0.2, 10)

ax.set_ylabel("Sleep deprivation(days)")
ax.set_xlabel("days")
ax.set_title("the variation of visual reaction time against sleep deprivation")

<IPython.core.display.Javascript object>

Text(0.5,1,'the variation of visual reaction time against sleep deprivation')

-----

# Submission instructions

### Checking your work
## Mark summary
You should check the marks you've got before submitting. To do this, 
* Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and matriculation number at the top.
* SAVE THE NOTEBOOK, 
* Go to `Cell/Restart and Run All` in the menu.
* Check the output of the cell here.

Note that this is an estimated mark, and if you don't do the above procedure *carefully* you may get nonsense estimates.


In [None]:
summarise_marks()

### Formatting the submission
* **WARNING**: If you do not submit the correct file, you will not get any marks.
* Submit this file **only** on Moodle. It will be named `week_<xxx>.ipynb`.


## Penalties (only for assessed labs)
<font color="red">
    
**Malformatted submissions**
</font>
These assignments are processed with an automatic tool; failure to follow instructions *precisely* will lead to you automatically losing two bands in grade regardless of whether the work is correct (not to mention a long delay in getting your work back). **If you submit a file without your work in it, it will be marked and you will get 0 marks.**

<font color="red">**Late submission**</font>
Be aware that there is a two band penalty for every *day* of late submission, starting the moment of the deadline.

<font color="red">
    
**Plagiarism**
</font> Any form of plagiarism will be subject to the Plagiarism Policy. The penalties are severe.