# Day 6 In-class Assignment: Exploring Great Lakes Water Levels using NumPy

In today's activity, were going to use NumPy and Matplotlib to interact with some data that pertains to the water levels of the Great Lakes in Michigan, USA. You'll also get to practice some of the plot modifications that you learned out in the last class.

![picture](https://upload.wikimedia.org/wikipedia/commons/5/57/Great_Lakes_from_space_crop_labeled.jpg)

In [2]:
# Although there are some exceptions, it is generally a good idea to keep all of your
# imports in one place so that you can easily manage them. Doing so also makes it easy
# to copy all of them at once and paste them into a new notebook you are starting.

# Bring in NumPy and Matplotlib, allowing for plots inside of the notebook.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

To use this notebook for your in-class assignment, you will need these files:
* `mhu.csv`
* `sup.csv`
* `eri.csv`
* `ont.csv`

These files are given with this notebook. You will read data from those files and it is important that the files are in the correct place on your computer; that is, the Jupyter notebook needs to know where your computer put the files on your laptop's disk. Take some time right now to be sure you have the files on your laptop and that you know where they are. Work with other members of your group to be sure everyone knows where the files are. For example, on a Mac (which uses OSX, a unix-like operating system) they will be in the Downloads folder - with the Finder, you can move the files wherever you wish.

---
## Part 1: Reading Data

We are going to use NumPy to read data in from files and look at the data. The standard method for doing this in NumPy is `loadtxt`. In principle, `loadtxt` is simple - it loads your data into NumPy arrays for you to use them. Unfortunately, data almost never comes in a form that is entirely clean and you will need to give many options that are file dependent. You will need to look at your file to know what you need to; for today, we already looked at the file for you and wrote the full `loadtxt` command for you. Here it is; run this cell:

In [6]:
# use NumPy to read data from a csv file
mhu_date, mhu_level = np.loadtxt("mhu.csv", usecols = (0,1), unpack=True, skiprows = 1, delimiter=',') # example for the mhu.csv file

In order, here is what it is doing:
* finding the file `mhu.csv` for reading
* ignoring all columns except the first two, which are called $0$ and $1$, since Python starts counting at $0$
* for `unpack`, see below....
* the first row in the file is different. It's a header that tells us what is in the file, so we just skip over that row.
* the data in the file is separated by comma delimiters; this is why the file is called "csv" for *comma separated values*

To understand what the *unpack* parameter is doing, `loadtxt` uses the `unpack=True` to unpack the colummns into separate items so that you can put them into separate variables. This is extremely handy, as you will see many times this semester.

(As you can see, **there is a lot in that one line**. Reading and writing files is never very clean and easy -- take your time thinking through it every time you do it. Although we won't do it today, there is another command `savetxt` for saving into a file: if you use these consistently, they will make files that are easy to read in later. If you get data from some other source, you never know what you are getting.)

Once you have your data, it is always a good idea to look at some of it to be sure it is what you think it is. You could use a print statement, or just type the data variable name in an empty cell.

Next, write some code in this cell to read the data from the other files. Use descriptive variable names to store the results.

In [2]:
# Read in data from the remaining files.


___
## Part 2: Statistics
___

Now that you have read in the data, use NumPy's [statistics operations](https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.statistics.html) from the pre-class to compare various properties of the water levels for all of the lakes.
* mean
* median
* standard deviation


In [3]:
# Put your code here to print the statistical properties of each lake's water levels.


___
Now, let's see what is in the files by plotting the second column versus the first column using `matplotlib`. Do this for all of the files. This is our first example of doing some (very simple!) data science - looking at some real data. Just so you know, the data came from [here](https://www.glerl.noaa.gov/data/wlevels/#observations); if you ever find data like this in the real world, you could build a notebook like this one to examine it. In fact, your projects at the end of the semester might be much larger versions of this. 

In [5]:
# plot the water levels here


Plots like this are not very useful. If you showed them to someone else they would have no idea what is in them. In fact, if *you* looked at them next week, you wouldn't remember what is in them. Let's use a little more `matplotlib` to make them of professional quality. There are two things that every plot should have: labels on each axis. And, there are many other options:
* [grid](https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.grid.html)
* [title](https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.title.html)
* [markers](https://matplotlib.org/examples/lines_bars_and_markers/marker_reference.html)
* [legend](https://matplotlib.org/users/legend_guide.html)
* and many more we will see over time...

Remake separate figures for each of the datasets you read in and include in the plots: $x$-axis labels, $y$-axis labels, grid lines, a legend, markers, and a title. Then, make all of them in the *same* plot using the same formating techniques you used in the separate plots. We are not going to tell you how to do this directly! But, we're here to help you to figure it out. If you find yourself waiting for help from an instructor, you can also try using Google to answer your questions. Searching the internet for coding tips and tricks is a very common practice!

The Python community also provides helpful resources: they have created a comprehensive gallery of just about any plot you can think of with an example *and the code that goes with it*. That gallery is [here](https://matplotlib.org/gallery.html) and you should be able to find many examples of how to make your plots look professional. (You just might want to bookmark that webpage.....)

In [6]:
# Put your code here to make each plot separately. You might need to create multiple notebook cells or use "subplots"
# Make sure they are professionaly constructed using all of the options above.


In [7]:
# plot the water levels here all in the same plot


**What observations about the data do you have?** **Do you see any overall trends? Put your answer here**:

< Answer here >

___
## Part 3: Correlations in the data.
___

In the plots you have made so far you have plotted water levels versus time. This is fairly intuitive and corresponds to the way the data was given to us. Next, we are going to do something a little more abstract to seek correlations in the data, a standard goal in data science. As you have seen, there are a lot of fluctuations in the data - what do they tell us? For example, do the levels go up at certain times of year? In certain years that had more rain? Can we see evidence of global warming? While we won't answer these questions at this point, we can look for patterns across the lakes to see if the fluctations in levels might correspond to trends. To do this, we will plot the level of one lake versus the level of another lake. Note that we somewhat lose the time information because we aren't using that array anymore. 

Look at these plots. Think about what they are telling us.

![correlations](https://c1.staticflickr.com/5/4195/34878771446_3d1e5a1173_o.jpg)

Now, in the cell below, plot the level of one lake versus the level of another lake - do this for several combinations. Put them in separate cells if you need to - otherwise each will be in the same plot, which might be less useful. (If you're feeling comfortable using [subplots](https://matplotlib.org/examples/pylab_examples/subplots_demo.html) feel free to use those.)

In [10]:
# add your plots here (with labels, titles, legend, grid)
# what line type should you use? what are the best markers (or what kind of plot?) to use?


In [9]:
# next lake here, and so on.....


**In this cell**, write your observations. What can you saw about the lake levels? 

I observed.....



< Answer here >

___
## Part 4: Saving Plots
___
Finally, you will need to use your plots for something. In your other classes and labs you often will need to make plots for your assignments and lab reports - now is the time to start using Python for that! Modify the code above to write the plot into a file in PNG format. Here are a couple of examples for how you can save files as a PNG file and as a PDF file:

`plt.savefig('foo.png')`

`plt.savefig('foo.pdf')`

**Put your name in the filename** so that we can keep track of your work.

In [11]:
# Put your code here


---
### Assignment wrap-up

I hope you enjoy all these exercises! Make sure you try (doesn't matter if you fail along the way!) everything and **take notes** of what you're confused of.

Be sure to **send me an email or text** of *all* the things you understand (and most importantly) don't understand! I'll make sure to address them and emphasize more on our in-class session.

-----
# Congratulations, you're done with your in-class assignment!

&#169; Copyright 2020,  Amani Ahnuar