# Laboratory Experiment 2: Measuring Markers of Polluted Air
## CHM410 / CHM1410

Welcome to the data analysis workshop for this experiment. We will be using this Jupyter notebook to gently dip our toes into the pool that is scientific computing.  

Here is an outline of what we'll be doing:  
1. Introduction to programming with Python and using this notebook
    1. [Basics](#basics)
    2. [Comments](#comments)
    3. [Variables](#variables)
    4. [Types](#types)
    5. [Lists](#lists)
    6. [Packages](#packages)
    7. [Functions and Methods](#methods)
    8. [Numpy](#nparrays)
2. [Loading your data](#loading)
3. [Making basic plots](#plotting)
4. [Maths and stats](#stats)
5. [Creating publication quality figures](#figures)
6. [Plotting on maps](#maps)
6. [Saving/exporting your data](#saving)
7. [Space for you to create your plots](#free_space)

The table of contents also provides links for ease of navigation.  
If you have prior experience with coding, Python, and/or Jupyter notebooks, feel free to take what you want from the following introduction and move on to the second section.  

----

<a id='basics'></a>

## Just the basics
Below this is a cell of code. Try running it by clicking inside the cell and then pressing __shift__ + __enter__ or clicking the __Run__ button on the toolbar at the top of the page. There is also a run button on the left of the cell that appears when your mouse hovers over the cell.

In [None]:
print("Hello world")

You should see the printed phrase appear below the cell. Ultimately, this is what we use computers to do — provide us with output based on the input we give it. In order to get the expected output for our input, we are using a programming language called Python, and it is in fact a language! It has rules of syntax, grammar, and usage just like any human language. We're going to go over some basics of this language, just enough to get you used to looking at the code and making some changes. You'll learn enough to be able to make some great figures by the end!  

---

Your computer will try to interpret everything you tell it. So in programming you need to use language with precision, or else your code will give you an error. Try running the cells below to look at some errors.

In [None]:
print(Hello world)

In [None]:
print "Hello world"

In [None]:
print("Hello world)

In [None]:
pirnt("Hello world")

As you can see, there's not much forgiveness in doing something as simple as printing out a statement. We're going to go through a few Python concepts so you can figure out how to make your code run without errors.  

It may be helpful going forward to click on "View" on the toolbar and select the option to "toggle line numbers". This will display line numbers on the left side of the code, which can be very useful.

<a id='comments'></a>

## Comments

Anyone who writes or uses code needs to know about comments. Comments are bits of text that programmers write into their code that your computer will ignore. Comments are incredibly important for understanding your own code and interpretting other people's code, and it is also useful for making your computer ignore some lines of code you don't want to use right now.  

Comments are denoted with the __\#__ symbol. Anything that appears in the line after the \# symbol will be ignored by your computer. Look at the examples below to see how it works.

In [None]:
# This is a comment
print("This is not a comment") # but this is a comment!

As mentioned above, comments are important for communication, but are also just useful to "turn off" certain lines of code. We sometimes call this "commenting out" these lines of code. Try commenting out some lines in the code below by adding in the \# symbol.

In [None]:
var = 5 # this is a variable, which you'll see in the next example
# if you comment out the line above, the rest of the code will give you an error.

var = var + 8

print(var)

var += 1
var -= 3

print(var)

var = var**2

print(var)

The rest of this notebook contains really helpful information in the comments. Feel free to add your own comments in wherever you'd like!

<a id='variables'></a>

## Declaring variables

A variable is a place that Python stores information in. Variables are useful not only to store information, but also to make shortcuts for yourself. Continuing with the example we've been using...

In [None]:
statement = "Hello world"

print(statement)

We have stored some information in the variable named "statement" and then used that to tell your computer to print out the output. We declare the variable following the rule:  

_variable name_ __=__ _information_  

The name always goes to the left of the equal sign. Once we've declared the variable, we can just use the variable name to refer to the information stored in it. We can store new information in the same variable, but this will destroy the old information.

In [None]:
var = 5
print(var)
var = 6
print(var)

We can also modify the information in the variable as we please. Python uses arithmetic operators similarly to how you might use them in an Excel formula. Check out the examples below. Feel free to test them out for yourself!

In [None]:
var0 = 5
var1 = 13

var0 = var0 + 2
var0 = var0 + var1
var0 = var0 - 5

print("var0 = ", var0) # print commands can use commas to combine different objects or variables

var1 = var1 / 2
var1 = var1 * 3
var1 = var1**2 # this is an exponent
print("var1 = ", var1)


#There are also some convenient ways to do arithmetic and overwrite a variable at the same time
var0 = 5
var0 += 6 # this will add 6 to the value of var0 and store it in var0
print(var0)

<a id='types'></a>

## Types of information

The concept of _type_ is important to understanding a lot of issues you may face when programming. A _type_ is a specific kind of information. Python will interpret types in different ways. We can think of this in terms of different kinds of data. Some data are integers, some data are text, some data are decimals, et cetera. Try running the code below to see these _types_.


In [None]:
# using the type() command will have the computer tell you what type of information is inside the parentheses

print(type(1)) # this 1 is an integer

print(type(1.0)) # this is called a float, which is basically a decimal number

print(type("1.0")) # this is called a string, which is interpretted as text
#strings are created by placing quote marks around the information

<a id='lists'></a>

## Lists
Next we'll look at some other very useful types that are a little more complicated.  
A _list_ is an array of data, in a certain order. The data in a list can be many different types.

In [None]:
numbers = [1, 2, 3] # a list is made by placing information inside square brackets. The data are separated with commas
print(numbers)
print(type(numbers))

elements = ["H", "He", "Li"] # the data in this array are all strings (text)
print(elements)

mixture = [1, "He", 3.14] # the data in this array are of various types

One of the important properties of lists is called _indexing_. Each entry in the list has a position. By convention, computer scientists count from zero. Your knowledge of this fact can be used to impress your computer scientist friends.  

By telling the computer to use a certain _index_, you can access specific entries in your list. The syntax for _indexing_ is done with square brackets:

_list name_\[__index__\]  

Check out the example below. Try changing the index to access different entries in your list.  
Try changing the index to a number larger than the number of entries in the list, remembering we start at zero!
Try changing the index to -1 and to other negative numbers.

In [None]:
elements = ["H", "He", "Li", "Be", "B"]

entry = elements[3] # the index appears inside the square brackets

print(entry)

We can index across a list to get many parts of the list at once. This action is sometimes called a "slice". The syntax for a slice is:  

_list name_\[__start:end__\]

The value in the list at the __end__ position will not be included in the slice. Check out the example below to see how slicing works. Try changing the numbers in the slice.

In [None]:
numbers = [1, 2, 3, 4, 5]

entry = numbers[1:4] # this slice will go from the position 1 through 4

print(entry)

The : is a useful symbol for indicating ranges of values. You don't have to specify an end point or a beginning point, either.

In [None]:
print(numbers[:2]) # this slice will include the values at positions 0 and 1, but nothing at 2 or after
print(numbers[3:])
print(numbers[:]) # a : with no other numbers will reproduce the whole list

Understanding lists and indexes opens up a lot of possibilities. The example in the cell below shows how you can use the integers in one list ("numbers") as indices for another list ("elements").

In [None]:
numbers = [1, 2, 3, 4, 5]
elements = ["H", "He", "Li", "Be", "B"]

entry = elements[numbers[0]]

print(entry)

<a id='packages'></a>

## Packages

Now that we have an understanding of some of the basics, we can find out how to make the most of our coding experience. We could spend hours and hours manipulating the basic structures to give us our desired output, or we can rely on other smart people who have already done a lot of the work for us!  

Packages are bodies of code that we can use to make our work faster and more convenient. There are many, many packages in Python for all kinds of applications. We'll be looking at and using some common science packages.  

First let's learn how to use them! Ordinarily, you would first need to check to make sure a package is installed on your computer, but the University's Jupyter notebook service comes with many already installed. Thanks, IT department!  

We need to "import" the package before we can use any of the contents.

In [None]:
import math          # here we imported the "math" package
import numpy as np   # here we imported a package called "numpy" and gave it a shorthand name for convenience
from scipy import constants   # here we imported part of the "scipy" package called "constants"

In [None]:
# Package contents can be accessed using a . and the name of the particular function you'd like to use
print(math.log(4))

# Ok try it out! In the space below, type in "constants."
# and then press the tab key. This will show you all the contents of the package available!
# Choose some different ones by pressing enter or clicking on them and see if you can get them to work
# Make sure you add a print() statement around it to see the output





# just so you know, unfortunately the tab shortcut only works for some packages


# Packages open up the possibilities of powerful computations
sample_array = np.arange(0,100) # Here we have used the numpy package to produce a list of numbers from 0 to 99

print(sample_array)

print(np.sum(sample_array)) # numpy has many mathematical functions that make it easy to do operations on lists of numbers


<a id='methods'></a>
## Methods and functions and other package contents

In the above example of the math package, we used a . to access the contents of the package. Some of these contents we used as __functions__ or __methods__, meaning we used a set of parentheses after it to "call" the function or method. The items inside the parentheses are called __arguments__, and they are required to be used in a specified order. This order has been indicated in the notebook for you where appropriate. Arguments can be __positional__, meaning they occur in a certain sequence inside the parentheses. Arguments can also be __keywords__, meaning they use a specific name and an equal sign to designate themselves.  

Many of the package features we'll use today will be like these functions and methods. The packages also contain other objects. Look at the example below of a useful object in the numpy package:

In [None]:
print(np.pi)

This object is not a function or method, so we don't use parentheses. If you use a function without parentheses or you try to use parentheses on some other object, you'll encounter an error. Keep this in mind as you start writing your own code.

<a id='nparrays'></a>

## Arrays using numpy

We've already seen the numpy package a little, but you'll need to get a little more familiar with it before we move on.  Numpy arrays are great, and they're a lot like the basic python lists we saw above.  

The difference is that numpy arrays allow us to use "array operations", where the basic list type did not. Array operations means that we can perform mathematical operations and other such transformations to the entirety of a data set in one line of code. If we used a list, we'd have to apply the operation to each item in the list explicitly. If you want to learn more about that type of programming, you should read about python ["for loops"](https://www.dataquest.io/blog/python-for-loop-tutorial/). For loops will not be necessary for work in this notebook.

In [None]:
# in the lines below, we create an array and then add 1 to all the elements in the array
a = np.array([1,2,3,4,5,6])
print(a)

a = a + 1

print("a + 1 =",a)

In [None]:
# if we try the same thing with a list, it will cause an error
a = list([1,2,3,4,5,6])
print(a)

a = a + 1

Arrays can be one dimensional (like one column of data), or they can have mutliple dimensions. Two dimensional arrays can be very useful in doing array operations.
One feature of numpy is being able to "reshape" an array. In this notebook, you may want to turn your 1-d array of data into a 2-d array so that you can do some operations on it.  

The next cell shows an example of reshaping an array and then doing an operation on the array. 

In [None]:
a = np.array([1,2,3,4,5,6])
print(a)
print("size =", a.shape)  # the .shape attribute will tell you the length of your array in each dimension

a = np.reshape(a,(3,2))
print(a)
print("reshaped size=", a.shape)

print("mean = ", a.mean(axis=1))  # this line demonstrates an operation on the reshaped array

The array above was 6 elements long, and then we changed it to a 3 by 2 array. Take note of how the output gets shown with brackets. The mean on the last line was taken on the __axis__ we indicated. This tells the method if it should mean the rows or the columns of the 2-d array. In this case, axis 1 indicates the rows were meaned. You can already predict what will happen if we had used axis=0:

In [None]:
print("mean = ", a.mean(axis=0))

There are many ways to write your code to do the same thing. Understanding array dimensions and axes can be pretty abstract, but you can always achieve the result you want. Have a look at the example below, where we reshape the array into a 2 by 3 array:

In [None]:
a = np.array([1,2,3,4,5,6])

a = np.reshape(a,(2,3))
print(a)
print("reshaped size=", a.shape)

print("mean = ", a.mean(axis=0))  # what output will this produce?


Maybe that was the result we wanted, but maybe we were trying to get the mean of every 3 values. This can either be changed in the reshaping, or we can do a transpose of the array:

In [None]:
print(a)

b = np.transpose(a)  # the transpose function is applied to a

print("transposed array= ", b)

print("mean = ", b.mean(axis=0))

That's just one small example of something you might encounter in doing your analysis, or in other scientific programming spaces.  

One more aspect of numpy arrays you may want to know is how to use the index. In a 1-d array it is the same as the list type from earlier. With a 2-d array, you can use a comma to indicate how you'd like to treat the columns and rows.

In [None]:
print(b)
print(b[:,0])  # This will print all the values in the first column
print(b[0,:])  # This will print all the values in the first row
print(b[1,1])  # second column, second row value

That's the end of this introduction to Python. In the next section you'll get to actually work with your data and make some visualizations, using what you learned so far. Much of the code you need has been written for you; you'll need to make some adjustments to names, indexes, etc. and commenting/uncommenting lines of code as you need them.

<a id='loading'></a>

# Loading your data

Before we go further, you'll need to upload your data onto the Jupyter hub. Go back to your browser tab where the folder __Lab2.git__ is open. In the upper right corner there is an __Upload__ button. Click and use your system dialog to select the files you'd like to use. These should be the raw data provided in the .csv file format. Your data files should then appear in the list of files. You should also see a file called "sample.csv" in the same folder 


The package we'll use for loading your data is called _pandas_. [Consult wikipedia](https://en.wikipedia.org/wiki/Pandas_(software)) if you're wondering why it's called that. The files are in .csv format, which you can open in most analysis software, including Microsoft Excel.  

The next cell shows you an example of loading in a csv file.

In [None]:
# import pandas package
import pandas as pd

filename = "sample.csv"

data = pd.read_csv(filename) # this function will open the csv file and load it into your workspace

print(data.columns) # this prints the names of the columns in the sample data set.
print(data.head(5)) # this prints the first five rows of your data set.


In practice, the files from the sensors used in the experiment have a slightly more complicated format than the sample above, but it is important to see how easy it is to read in a csv file.  So, your TA has prepared functions to load data from the Aeroqual sensors and the Airbeam2 sensor. These functions are stored in the Lab2_Functions python file. In the cell below, please change the variable called "filename" to the exact name of your uploaded Airbeam data file. Other than that, the cell is already set up to load your data. The function used to load your data from the airbeam is called OpenAirBeam2()  
  
You should try printing the first few lines of some of the data to make sure it worked.

In [None]:
import Lab2_Functions as lab2  # This import statement will allow you to use the functions your TA has written

filename = "your file name.csv"

# The next line illustrates the use of the lab2 library
pm_datetimes, pm_rel_time, pm_temp, pm1, pm10, pm2, pm_rh, pm_lats, pm_longs  = lab2.OpenAirBeam2(filename)
#all of the objects on the left of the = are arrays containing your data
# pm_datetimes contains formatted date information of the absolute time
# pm_rel_time contains float values starting with 0 seconds counting up the relative time



A note for these functions: they always supply you with the same data in order like that, but you can change the variable names to whatever you want. E.g. you can change "pm_rel_time" to "time", but you can't change that it comes second in the order.

Let's load the rest of your data into arrays. The next two cells are set up similarly to the one above, showing the function for loading your Aeroqual monitor data. You'll need to change the file name.

In [None]:
# Use this cell for loading CO2 monitor data

filename = "your file name.csv"

CO2_datetimes, CO2_rel_time, CO2_vmr = lab2.OpenAeroqual(filename)
# absolute time and relative time are again included, and the concentration is in the CO2_vmr object
# vmr stands for volume mixing ratio

In [None]:
# Use this cell for loading O3 montior data

filename = "your file name.csv"

O3_datetimes, O3_rel_time, O3_vmr = lab2.OpenAeroqual(filename)


If you have more data you'd like to load, you can add more OpenAeroqual or OpenAirbeam2 lines above, or you can use the empty cell below. You don't need to use this cell if you don't have more data. You'll need to make sure you use the right filename, and you should change the variable name so you can differentiate between data sets.  

__PRO TIP:__  
If you want to work quickly without having to remember specific variable names, just type the first part of the name in, and press the tab key. A list of variables appear and you can select the one you want with arrow keys and then pressing enter. If there's only one variable it could possibly be, the variable name will autocomplete.

In [None]:
# empty cell for loading more data



<a id='plotting'></a>

# Plotting your data

If you've successfully loaded your data, next comes the fun part. We'll quickly try plotting some of these data. Exciting! The next cell of code is set up to plot your data in an interactive window. It should pop up on your screen, or you may have to click on the new window to see it. There are tools in the interactive window for you to zoom in and move about, as well as to adjust some of the appearance. If you find views of your data you think are interesting, you can save the current view with the save button.

In [None]:
# The matplotlib package is a ubiquitous Python plotting package. 
import matplotlib.pyplot as plt
# It also has helpful code for dealing with different kinds of data, such as dates
import matplotlib.dates as mdates


# These next lines determine how the plot will appear on your computer.
# Comment and uncomment the lines starting with "%" to try them out

# The next line is a bit of magic that makes the plot appear in a new interactive window
%matplotlib notebook 

# This next line will make the plot appear in the page, but you won't be able to interact with it.
#%matplotlib inline     


# These next lines are the actual plotting code.

#fig,ax = plt.subplots()                          # This creates a blank canvas to plot on

plt.plot(pm_rel_time, pm1, '.-')  # This "plot" function is what puts your data on the figure
plt.plot(pm_rel_time, pm2, '.-')
plt.plot(pm_rel_time, pm10, '.-')

#plt.show()                                       # This line displays the plot

The plt.plot function used above is very powerful and you'll be using it a bit going forward. The usage is:  

plt.plot(__x axis data__, __y axis data__, _symbol code_)  

the symbol code can change how the data appear on your figure. Try changing it to the following and replotting.  
'x'  
'--'  
'x--g'  

Using the plt.plot function on several lines allows you to add multiple data to the plot, and it automatically colours the points. You can also change the colour with the symbol code, and there are more details about colour in the next section.  

When looking at your data, you might want to know exactly which data point you're looking at. In this case you can add labels to your plot to help you identify the x value, y value, and x index of individual points. The Lab2 Functions package has a function that will add point labels for you. See the example below:

In [None]:
# interactive plot style
%matplotlib notebook 


x = np.linspace(0, 2.2, 100)  # this creates an array of 100 x values from 0 to 2.2

y = x**2 +3*x + 0.3  # creating a y series

fig,ax = plt.subplots()    # create a blank canvas

plt.plot(x, y, '.')    # plot x vs y in points

lab2.PointLabels(x, y, 5, plot_index=False)  
# the function requires arguments in order: x values, y values, number of points between labels (every nth label)
# the plot_index function can be set to True or to False
# this changes the label from showing the x value to the x index instead

plt.show()

The next cell includes lines allowing you to adjust the figure size and appearance and assign axis labels.

In [None]:
fig,ax = plt.subplots(figsize=(4,4))
# creating a blank canvas with a size 4 inch by 4 inch

fig.set_size_inches(6.5,3)
# this will change the already created canvas size, as an alternative to the above

plt.rcParams.update({"font.size":12})
# change the default font size for the plot

fig.set_dpi(300)
# change the resolution of the figure in dots per inch

plt.ylabel("velocity, m s-1")
# add a label to the y axis

plt.xlabel("time, s")
# add a label to the y axis

plt.tight_layout()
# a function that automatically reduces empty space around the canvas



Histograms can be very useful for understanding distributions of data. They are easy to create, and you can adjust the number of histogram bins easily

In [None]:
plt.hist(y, bins=20)

Do you want to use the absolute date time data on the x axis? This isn't always necessary since relative time is often easier to deal with, but if you need to indicate a specific time of day or for any other reason, you can follow the example:

In [None]:
import matplotlib.dates as mdates  # this package is a date plotting helper

fig, ax = plt.subplots()  # blank canvas

plt.plot(CO2_datetimes, CO2_vmr, '.')  #plot using absolute time


# The next three lines should all be used when plotting with absolute time
time_format = mdates.DateFormatter('%H:%M:%S')  # This sets the time to show hours:minutes:seconds
ax.xaxis.set_major_formatter(time_format)  # This applies the format to the x axis
fig.autofmt_xdate()  # This automatically rotates the labels to fit

plt.show()

Take some time to explore your data. You can create more plots in the empty cell below, or just edit the one above. Use point labels to record important ranges. Remember to save any figures you think are interesting. If you use the interactive plot (%matplotlib notebook), you can save with the button. If you use the inline plot (%matplotlib inline) you can right click and save the image. You can also use the savefig function to do this, as shown below:

In [None]:
plt.savefig("test_figure.png", format = 'png')  # you can save an image of your plot
# specifying the filename in the first argument

plt.savefig("test_figure.svg", format='svg', dpi=300)
# you can also save as a 'pdf', 'svg', 'eps', or 'jpeg' file
# you can specify the resolution to save at if you didn't change it before

___

# Take a break: what is your data telling you?
Now that you've had a detailed look at your data set, it may be useful to reflect on what is notable or surprising about your data and how that fits with your hypothesis. You are encouraged to take a moment to put pen to paper and write down your thoughts. Remember that in the end you are using your data to tell a story about your hypothesis, but your data can often have their own story to tell.

___

<a id='stats'></a>

# Calculations and Statistics

There are basically endless amounts of maths and statistics you can do with python, but let's cover the basics. These are the most important for summarizing your data and performing your analysis.  

This is not a statistics course, so you should only use statistical methods that you know inside and out since they almost always come with many assumptions.

We already got to look at a little bit of taking means when we went over [reshaping numpy arrays](#nparrays), but here's more you can easily do to enrich your analysis:

In [None]:
from scipy import stats  # this package contains additional stats functions
# remember you can type "stats." and then press tab to check them all out

data = np.array([1,2,3,4,1,6,2,3,2,1,4])

#there are two ways to take a mean
np.mean(data)  # one is a function in numpy
data.mean()    # one is a method of an array

#similarly, there are two ways to take the standard deviation
np.std(data)
data.std()

np.median(data)  # median is a very useful statistic

stats.mode(data)  # mode is an underrated statistic

stats.iqr(data, rng=(5,95))  # inter-quantile range is a robust dispersion descriptor
#the rng argument should be set to a sequence of quantiles, this case shows the range from 5% to 95% of the data

# performing a linear regression is quick and easy, and provides you with several useful statistics!
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
# see if you can make a plot using the slope and intercept!
# you can use the r_value to look for correlations between your different measured variables

# remember if you want to see the output to use the print() function

There's so much you can do with stats if you're interested. Some additional references for functions: [descriptive stats](https://docs.scipy.org/doc/scipy/reference/stats.html#summary-statistics), [more tests](https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests), and [correlations](https://docs.scipy.org/doc/scipy/reference/stats.html#correlation-functions).  

Using means of your data can help to reduce the number of points in your figure, effectively summarizing important information. You'd need to figure out how many points to mean down to one point. Review the reshaping arrays section if you need help.

<a id='figures'></a>

# Making more effective plots

We already got a brief look at plotting in python, but now we're going to dive deeper. What and how you plot your data depends on your hypothesis and research questions. We'll go over various options and scenarios for plotting your data, and then you'll be able to create your own script for plotting your data the way you'd like it.


Adjusting the range of the x and y axes is important. If you're comparing different plots, it can be vital to a reader's interpretation that you use the same y scale on each plot.

In [None]:
plt.plot()

plt.xlim(0, 40)  # set the range of x value shown, in x units
plt.ylim(100, 1000)  # set the range of y values shown, in y units

### Colour
Colour is another very important aspect of plotting. As mentioned before, the plot function will automatically cycle through colours as you plot separate series of data. The colours used are from a colour-blind friendly set.  

You can also manually choose from these colours using the __color__ argument and values of "C0", "C1", "C2", and so on.

If you are interested in more ways to specify colour, check out [this page](https://matplotlib.org/3.1.0/tutorials/colors/colors.html) that lists them.

In [None]:
# try changing the colour argument 
plt.plot(O3_rel_time, O3_vmr, color="C4")

### Legends
Adding a legend to your plot when you're showing multiple series of data is easy. Add a __label__ argument to your plot function and then call plt.legend(). It will be automatically placed in the spot with least overlap with the data window. The label value must be a text string.

In [None]:
plt.plot(O3_rel_time, O3_vmr, '.-', color="C0", label="O3")
plt.plot(CO2_rel_time, CO2_vmr, '.-', color="C1", label="CO2")
plt.legend()
plt.show()

### Vertical and horizontal lines
Adding in vertical lines can be helpful for indicating separations or events in your timeseries.  
Horizontal lines can be useful for indicating limits, minima, or threshold values

In [None]:
x_location = 1
y_min = 0
y_max = 1

plt.vlines(x_location, y_min, y_max)  # this demonstrates the arguments used in the vlines function

plt.vlines(2, 0, 1, linestyles='dashed') # you can also just add the numbers in directly
# the linestyles argument can change it to "dashed" or "dotted", but the default is "solid"

In [None]:
y_location = 1
x_min = 0
x_max = 1
plt.hlines(y_location, x_min, x_max)

plt.hlines(3, 0, 1, linestyles='dotted')

### Shaded regions
You can also create shaded rectangular areas on your graphs to indicate events occuring over a certain duration.

In [None]:
x_min = 0
x_max = 2.5
plt.axvspan(x_min, x_max, alpha=0.1)

### Gridlines
If you want to indicate a regular time interval, you can consider using vertical gridlines; if you want to assist the viewer in distinguishing between y-values, you can consider using horizontal gridlines. If overused, gridlines can make your plot look cluttered, and sometimes the absence of gridlines can convey a professional quality.

In [None]:
plt.plot([1,2,3,4], '.')


plt.grid(axis="both")  # creates gridlines on both x and y axes

plt.grid(axis="x")  # creates gridlines on x axis

plt.grid(axis="y")  #  creates gridlines on y axis

### Filling between values
Filling in a shaded region between values can be useful to indicate a range of values or some statistical information like a 95% confidence interval. Here is a small example:

In [None]:
x = np.arange(0,10)  # create a rang of x values
y1 = x**3  
y2 = x**2 - 10

plt.plot(x, y1)
plt.plot(x, y2)

plt.fill_between(x, y1, y2, color = 'C5', alpha=0.3)
# the alpha argument controls the transparency of the shaded region. Try changing it

### Secondary y axis
Sometimes you need to show data in two different ranges or different units, in which case you can use a secondary axis. Adding a secondary axis can be difficult to interpret though, so you should do this carefully. You'll need to manually specify the colour when you use the secondary axis.

In [None]:
plt.plot([1,2,3,2],'.',color='C0')
plt.ylabel("Data 1")

plt.twinx()  # this function switches the axis
#everything you do will change the right side axis now, including how you add labels

plt.plot([100,200,350,200],'.', color='C1')
plt.ylabel("Data 2")

plt.show()

 If you add a legend to a plot with a secondary axis, making it is a certainly more complicated than before. See below for an example

In [None]:
lines1 = plt.plot([1,2,3,2],'.',color='C0', label="series 1") # we'll need the variable stored in lines1 for the legend
plt.ylabel("Data 1")

plt.twinx()  # this function switches the axis
#everything you do will change the right side axis now, including how you add labels

lines2 = plt.plot([100,200,350,200],'.', color='C1', label="series 2") # we need lines2, just like lines1
plt.ylabel("Data 2")

lines = lines1 + lines2  # combine the list of lines from the plot function
labels = [l.get_label() for l in lines] # generate labels
plt.legend(lines, labels) # create the legend with the lines and labels generated

plt.show()

### Using variable colour and size

Sometimes you can show an additional dependent variable in your plot using the colour or size of the markers. Much like the secondary axis, this can be sometimes difficult to interpret, so make sure you're not overloading the plot with information.

The __plt.scatter()__ function allows you to change the colour and size of the markers according to a variable. If you use colour, you'll need to include a colourbar to indicate what the colour map means. (The __colour map__ is the scale of colours used). Python automatically uses a "perceptually uniform" colour map, meaning the human eye won't interpret it incorrectly, since the change in the hue and intensity of the colour is uniform. You can read more about colour scales [here](https://colorcet.holoviz.org/). The names of these peceptually uniform colour scales are listed here:
1. 'viridis'
2. 'plasma'
3. 'inferno'
4. 'magma'
5. 'cividis'

In [None]:
x = [1,2,3,4,5]
y = [4,5,3,1,9]

#the colours should be the other set of y-values you want to show
colours = [1,2,6,5,3]

# the scatter function takes the x and y data along with the colours and the colourmap name
plt.scatter(x, y, c=colours, cmap='viridis')

# this will display the colour bar, with label next to it
plt.colorbar(label="Scale (units)")

Changing the size of the points in your scatter plot can also be a way of showing another value. This style can be difficult to interpret as well, especially quantitatively. You can set the area of the marker to be proportional to a value. If you set the size parameter to the square of the value, this effectively scales the radius of the points instead of the area. You can also set the size to an exponential for an even more dramatic visualization.

In [None]:
x = [1,2,3,4,5]
y = [6,5,3,2,8]

sizes = np.array([4, 6, 8, 2, 12])

plt.scatter(x, y, s=sizes)

In [None]:
plt.scatter(x, y, s=sizes**2)

If you would like to use sizes more quantitatively, you can create a legend with points of different sizes corresponding to the values you'd like to represent.

In [None]:
# first we'll create the point markers that will go inside the legend
# this example has three points 
point0 = plt.scatter(0,0, s=2**2, color='C0')
point1 = plt.scatter(0,0, s=5**2, color='C0')
point2 = plt.scatter(0,0, s=10**2, color='C0')

plt.clf() # this command will clear the figure that was automatically created by the above lines

# the remaining lines will be where you'll create the actual plot

fig, ax = plt.subplots()          # creates a new figure
plt.scatter(x, y, s=sizes**2)     # the same kind of scatter plot as shown in the previous cell

points = [point0, point1, point2]       # makes a list of the point markers
labels = ['2 ppm', '5 ppm', '10 ppm']   # makes a list of labels for the legend
plt.legend(points,labels)               # displays the legend with the points and labels

### Log axes
Does your data span a wide range and is difficult to show on a linear axis? You can consider using log axes. But you must use this power responsibly, since log axes can be difficult to interpret.

In [None]:
# the semilogy function works just like the plot function, but automatically creates a log scale on the y axis
x=[1,2,3,4]
y=[4,40,400,500000]

plt.semilogy(x, y, '.')

### Box plots
Box plots are often used to summarize the statistical view of the data. They are quite easy to make:

In [None]:
box_data = [CO2_vmr, O3_vmr] # make a list of the data series you'd like in the plot

plt.boxplot(box_data, showmeans=True) # the show means argument can be True or False


### Error Bars

The __plt.errorbar()__ function will allow you to add error bars to your data set. It will also just plot your data for you. If you want to just make error bars without points, change the _markersize_ argument to zero.

In [None]:
x_data = [1,2,3,4,5]
y_data = [5,7,9,2,1]

# these next lines will determine the magnitude of your error bars
x_errors = 0
y_errors = [1,4,2,1,0.2]


# the errorbar function has a lot of arguments to play around with.
# try changing ecolor and elinwidth and capsize
# the xerr and yerr arguments are set to accept the error variables from above
plt.errorbar(x_data, y_data, xerr = x_errors, yerr = y_errors, fmt='.', markersize = 8, ecolor='black', elinewidth = 1, capsize=2)

<a id='maps'></a>

# Map Plotting
If your data has a geospatial component to it, you might be interested in looking at your data on a map. __If your data does not depend on location, you get to skip this part.__ Your TA has set up the following cells for you to use and make your maps. This requires special topographical files called shapefiles. Your TA can provide these for you, and you can upload them to your lab2 folder for use in the cell below.

In [None]:
import geopandas as gpd    # this is the mapping package we'll use

# the next two lines will load the shapefiles you uploaded
toronto_map = gpd.read_file("./toronto-centreline-wgs84-latitude-longitude/CENTRELINE_WGS84.shp")
peel_map = gpd.read_file("./Street_Centre_Line-shp-Peel/StreetCentreLine.shp")

# the coordinate reference systems of these two maps are different,
# so these next two lines makes them the same
crs = toronto_map.crs
peel_map = peel_map.to_crs(crs)

print("Maps loaded.")

The following cell has some code your TA has prepared to make your mapping experience a little smoother. Mostly you need to change the plt.scatter() arguments to whatever data you want to plot.

In [None]:
# the map plots only work in inline plotting mode
%matplotlib inline   


fig, ax = plt.subplots()    # creates a new figure
fig.set_dpi(300)            # sets a high resolution


# the next two lines plot the map data we loaded in the previous cell
# you can try changing the color parameter and the linewidth parameter.
# the zorder paramter forces these elements to be beneath anything else you plot above it
toronto_map.plot(ax=ax, color='k', facecolor='w', linewidth=0.1, zorder=0)
peel_map.plot(ax=ax, color='k', facecolor='w', linewidth=0.1, zorder=0)


# make a scatter plot of your data below!
# you need to change the arguments to meet your plotting needs
# remember s will set the size and c will change the colours
# cmap can set the colour map
plt.scatter(longitudes, latitudes, s=5, c=pm2, marker='.')

# add a colour bar
# the shrink paramter can change the size relative to the figure
plt.colorbar(shrink = 0.8, label="PM (units)")


# the next few lines set the x and y limits
# this means it matches the latitude and longitude window of your data
xmin = np.min(pm_longs) - 0.001         # find the smallest longitude
xmax = np.max(pm_longs) + 0.001         # find the largest longitude
ymin = np.min(pm_lats) - 0.001          # find the smallest latitude
ymax = np.max(pm_lats) + 0.001          # find the largest latitude

plt.xlim(xmin, xmax)
plt.ylim(ymin, ymax)


# the next few lines make sure the the longitude and latitude scales don't
#  get distorted based on the size of your figure.
x_dimension = 6.5     # this x dimension is in inches, you can set this however you like

x_aspect_ratio = np.abs(xmin - xmax) / x_dimension     # find the ratio of longitude to inches
y_dimension = np.abs(ymin - ymax) / x_aspect_ratio     # use the ratio to get the y dimension inches

fig.set_size_inches(x_dimension, y_dimension)          # set the figure size


plt.show()     # show your map!

# Example of advanced plotting
Below is am example of plotting more complicated things. You are not expected to recreate this. Just inspriration!

In [None]:
import numpy as np

fig,ax = plt.subplots(figsize=(4,4))
plt.rcParams.update({"font.size":12})
fig.set_size_inches(6.5,3)
fig.set_dpi(300)

string = 'C2'
boxprops = {'color':string,'alpha':0.3,'facecolor':string}
whiskerprops = {'color':string,'alpha':0.7}
capprops = {'color':string,'alpha':0.7}
medianprops = {'color':string,'alpha':1.0}
meanprops = {'marker':'.','color':string,'alpha':1.0,'markersize':2}
flierprops = {'markeredgecolor':string,'alpha':0.5,'markersize':1}

#length = 300
#array = np.random.randn(length)
end = (pm10.shape[0] % 100) * -1
array = np.asarray(pm10[0:end])
length = pm10[0:end].shape[0]

#time = np.linspace(0,120,length)


n_bins = int(length/100)
print(n_bins)
print(array.shape)
binned_time = np.asarray(pm_rel_time[0:-14:100])

binned_data = np.transpose(np.reshape(array, (n_bins,int(length/n_bins))))

#binned_time = np.reshape(time, (n_bins,int(length/n_bins)))
#binned_time = np.mean(binned_time, axis=1)

print(binned_data.shape)
print(binned_time.shape)

plt.boxplot(binned_data, showmeans=True, patch_artist=True,\
           boxprops = boxprops, whiskerprops = whiskerprops, capprops = capprops,\
            medianprops = medianprops, meanprops = meanprops, flierprops = flierprops)

from matplotlib.ticker import FormatStrFormatter
plt.gca().xaxis.set_major_formatter(FormatStrFormatter('%1.f'))

plt.show()


<a id='saving'></a>

# Saving your data
The last important step is to export your data from this notebook to a csv file. You can save individual arrays like so:

In [None]:
save_filename = "sample0.csv"   # change this to a descriptive file name

np.savetxt(save_filename, CO2_vmr, delimiter=',')
#second argument should be the array you want to save

Your TA has also included a function to save all your airbeam data and aeroqual data together in Lab2 Functions. If you've imported multiple data sets, you'll need to use multiple function calls and use the appropriate variable names.

In [None]:
# change these to descriptive file names
filename_pm = "sample0.csv"
filename_CO2 = "sample1.csv"
filename_O3 = "sample2.csv"


#The following function will save your Airbeam2 data
lab2.SaveAirbeam2(filename_pm, pm_datetimes, pm_rel_time, pm1, pm2, pm10, pm_temp, pm_rh)


#The following function will save your Aeroqual monitor data
lab2.SaveAeroqual(filename_CO2, CO2_datetimes, CO2_rel_time, CO2_vmr)

lab2.SaveAeroqual(filename_O3, O3_datetimes, O3_rel_time, O3_vmr)

Once you've run the save functions, your data should appear in the Lab2 folder on syzygy. From here you should select them using the check boxes on the left side of the list and click the download button.

<a id='free_space'></a>

# Use the free cells below to make plots and do your analysis

Remember you can add additional cells with the + button on the toolbar

# Conclusion: making your plots work for you

Now that you've made a customized plot, it may be helpful for the discussion in the synchronous session to begin reflecting on how your data presentation has changed from the initial plots to your latest figures. What changes did you make to emphasize the evidence for or against your hypothesis? What kind of story does your figure tell about the data? Again, you are encouraged to take a moment to write down your thoughts.