## CAOS Workshop Series: Introduction to Python Programming

### Chris McCray - 21 February 2018

Basic Python syntax (from https://www.stavros.io/tutorials/python/)

* Python has no mandatory statement termination characters (i.e. semicolons)
* Single-line comments start with a #
* Variable values are assigned using =
* Python automatically sets the data type (i.e. int, float, str)
* Basic data structures: lists, tuples, dictionaries (index of the first item is 0)

Python allows for a great deal of freedom, but does have a style guide (PEP 8) to make your code generally cleaner and more user-friendly: https://www.python.org/dev/peps/pep-0008/

Many good intro to Python resources can be found online (check them out for more detail than we can go through here) - e.g., Python tutorial: https://docs.python.org/3/tutorial/

Python code can be run in scripts, with an IDE (see Spyder), in Jupyter notebooks like this one

Python 2.7 vs. Python 3.6 ?

Jupyter Notebook:
* Allows code to be run in blocks/"cells"
* Allows for easy presentation of figures and code
* Can be hosted remotely (i.e. on a server, closer to your data)

### Variables, data types, and printing

Variable types are automatically assigned - no type declarations needed

Variable names 
* Can start with lowercase or uppercase letters (but not numbers or symbols!)
* Can contain numbers and certain symbols
* Words should be separated by underscores

In [69]:
#Integers
my_integer = 3
print(type(my_integer))

#Floats
my_float = 1.23
print(type(my_float))

#Strings
string_1 = 'This is a string.'
string_2 = "This is also a string!"
print(type(string_1))

<class 'int'>
<class 'float'>
<class 'str'>


In Python 2, we can print like below. Python 3 requires parentheses in the print call:

In [70]:
print my_integer

SyntaxError: Missing parentheses in call to 'print'. Did you mean print(my_integer)? (<ipython-input-70-212291a567bf>, line 1)

In [71]:
print(my_integer)
print('Test')
print('Test',my_integer)

3
Test
Test 3


In [72]:
print(string_1+my_integer)

TypeError: must be str, not int

In [73]:
print(string_1+' '+str(my_integer))

This is a string. 3


In [74]:
print(string_1, my_integer)

This is a string. 3


### Lists and dictionaries

##### Lists

In [75]:
list_one = ['a','b','c','d','e','f']
print(list_one[0])
print(list_one[-1])

a
f


In [76]:
list_one.append('g')

In [77]:
list_one

['a', 'b', 'c', 'd', 'e', 'f', 'g']

In [79]:
list_one

['a', 'b', 'c', 'd', 'e', 'f', 'g', 7]

![image.png](attachment:image.png)

##### Dictionaries

In [80]:
ages = {'Jim': 23, 
        'Sarah': 25,
        'Tom': 30}


print(ages['Jim'])

23


### Control Flow Tools/Loops

##### While loop

In [81]:
b = 10
while b <= 20:
    print (b)
    b+=1

10
11
12
13
14
15
16
17
18
19
20


##### If statements

In [82]:
x = int(input("Please enter an integer:"))
if x < 0:
    print(x,'is less than 0.')
elif x == 0:
    print(x,'is 0.')
else:
    print(x,'is greater than 0.')

Please enter an integer:20
20 is greater than 0.


##### For loop

In [83]:
words = ['cat', 'horse', 'chicken']
for w in words:
    print(w, len(w))

cat 3
horse 5
chicken 7


In [84]:
w

'chicken'

### Modules and functions

* Python source files (ending in .py) are known as **modules**
* Modules contain **functions**, which can be called separated from the module
* Below is an example of what's inside the module _welcome.py_ (modified from https://developers.google.com/edu/python/introduction)
* Try running welcome.py by typing ```python hello.py``` and your name, in your command line

```python
#!/usr/bin/env python

# import modules used here -- sys is a very standard one
import sys

# Gather our code in a main() function
def main():
    print('Hello there', sys.argv[1])
    # Command line args are in sys.argv[1], sys.argv[2] ...
    # sys.argv[0] is the script name itself and can be ignored

def string_information(string):
    num_characters = len(string)
    num_no_whitespace = (len(string.replace(" ", "")))
    print('This string has',num_characters,'characters including \\
    whitespace and',num_no_whitespace,'without whitespace.')

# Standard boilerplate to call the main() function to begin
# the program.
if __name__ == '__main__':
    main()

```

Functions can be imported into other modules:

In [88]:
from welcome import string_information as str_info
#from welcome import string_information

In [None]:
from welcome import 

In [86]:
test_string = 'This Is a Test'
string_information(test_string)

This string has 14 characters including whitespace and 11 without whitespace.


### Basic Math

In [90]:
print( 3.1 + 3.6 )
print( 3.1/392   )
print( 3.1*3.2   )
print( 4**2  )


6.7
0.007908163265306122
9.920000000000002
16


In [91]:
6/5

1.2

In [None]:
math.

In [93]:
del my_float

In [92]:
import math
from math import cos
print( cos(2*math.pi) )
print( math.sqrt(4) )

1.0
2.0


#### Warning: In Python 2, integer division is "floor division"
* 5/6 = 0
* 6/5 = 1

#### NumPy
From http://www.numpy.org/
> NumPy is the fundamental package for scientific computing with Python. It contains among other things
* a powerful N-dimensional array object
* sophisticated (broadcasting) functions
* tools for integrating C/C++ and Fortran code
* useful linear algebra, Fourier transform, and random number capabilities
"

In [None]:
import numpy as np

In [94]:
a = np.array([[1,2,3,4],[5,6,7,8]])
print(a)

[[1 2 3 4]
 [5 6 7 8]]


In [95]:
#Subtract 3 from each array element
print( a-3 )
# Get the cosine of each array element
print( np.cos(a) )
# Calculate e^x for each array element x
print (np.exp(a))
# Transpose a
print( a.T )

[[-2 -1  0  1]
 [ 2  3  4  5]]
[[ 0.54030231 -0.41614684 -0.9899925  -0.65364362]
 [ 0.28366219  0.96017029  0.75390225 -0.14550003]]
[[  2.71828183e+00   7.38905610e+00   2.00855369e+01   5.45981500e+01]
 [  1.48413159e+02   4.03428793e+02   1.09663316e+03   2.98095799e+03]]
[[1 5]
 [2 6]
 [3 7]
 [4 8]]


NumPy is a very important part of scientific Python, and forms an integral part of nearly all other scientific packages. You should look through https://docs.scipy.org/doc/numpy-dev/user/quickstart.html to get some background in how it works.

### Plotting with matplotlib

The most common plotting library with Python is currently [matplotlib](https://matplotlib.org/), which provides a MATLAB-style interface

* There are many ways to create figures and subplots with matplotlib, but here we'll just go over a basic example using the pyplot interface.
* Pyplot provides MATLAB-like functionality

In [96]:
import matplotlib.pyplot as plt

In [97]:
'''
This allows for an interactive figure interface within the jupyter notebook. 
If you just want to show the figure without interactivity, use %matplotlib inline
'''
%matplotlib notebook

In [98]:
x = np.arange(0,100,0.01)
y1 = np.cos(x)
y2 = np.sin(x)

In [99]:
x

array([  0.00000000e+00,   1.00000000e-02,   2.00000000e-02, ...,
         9.99700000e+01,   9.99800000e+01,   9.99900000e+01])

In [105]:
plt.figure()
plt.plot(x,y1, label='cos(x)')
plt.figure()
plt.plot(x,y2, label='sin(x)')
plt.axhline(y=0, color='k')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<matplotlib.lines.Line2D at 0x7f12c007ff98>

In [101]:
#Zoom in on the plot
plt.xlim([0,10])

(0, 10)

In [102]:
#Add labels to the axes
plt.xlabel("x")
plt.ylabel("y")
#Add a title
plt.title("$sin(x)$ and $cos(x)$")
#Add a grid
plt.grid()

In [103]:
#Plot a basic legend
plt.legend()

<matplotlib.legend.Legend at 0x7f12c2062a58>

In [104]:
plt.close()

### Exercise: Working with real data (CSV format) in Pandas

* There are many ways to read CSV data. Python's standard library includes the "csv" package
* "Pandas" (https://pandas.pydata.org/) is one of the key packages in scientific Python and data science
* Pandas makes reading CSVs easy, handles missing data well, and allows for quick calculations and plotting


In [None]:
import pandas as pd

We'll read in a CSV file that contains daily weather data for each day in 2017 from Environment and Climate Change Canada for CYUL (Montreal-Trudeau Airport)

In [106]:
cyul_2017 = pd.read_csv('http://www.cdmccray.com/python_tutorial/eng-daily-01012017-12312017.csv')

In [112]:
cyul_2017[cyul_2017.Day==4]

Unnamed: 0_level_0,Year,Month,Day,Data Quality,Tmax,Max Temp Flag,Tmin,Min Temp Flag,Tmean,Mean Temp Flag,...,Snow,Total Snow Flag,Precip,Total Precip Flag,Snow on Grnd (cm),Snow on Grnd Flag,Dir of Max Gust (10s deg),Dir of Max Gust Flag,Max_gust,Spd of Max Gust Flag
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2017-01-04,2017,1,4,‡,2.5,,-5.6,,-1.6,,...,9.0,,8.0,,12.0,,24.0,,85,
2017-02-04,2017,2,4,‡,-3.4,,-13.2,,-8.3,,...,0.0,T,0.0,T,6.0,,27.0,,46,
2017-03-04,2017,3,4,‡,-12.3,,-17.6,,-15.0,,...,0.0,T,0.0,T,0.0,T,29.0,,48,
2017-04-04,2017,4,4,‡,5.6,,2.1,,3.9,,...,0.0,,36.2,,,,12.0,,48,
2017-05-04,2017,5,4,‡,16.2,,2.2,,9.2,,...,0.0,,0.2,,,,23.0,,35,
2017-06-04,2017,6,4,‡,22.3,,9.2,,15.8,,...,0.0,,0.0,,,,20.0,,33,
2017-07-04,2017,7,4,‡,26.4,,13.2,,19.8,,...,0.0,,0.0,,,,27.0,,30,
2017-08-04,2017,8,4,‡,29.0,,19.4,,24.2,,...,0.0,,34.0,,,,13.0,,78,
2017-09-04,2017,9,4,‡,23.8,,12.7,,18.3,,...,0.0,,2.4,,,,20.0,,48,
2017-10-04,2017,10,4,‡,25.8,,12.9,,19.4,,...,0.0,,6.8,,,,22.0,,63,


In [109]:
#Set the index of the Pandas DataFrame to Date
cyul_2017.set_index('Date', inplace=True)

In [110]:
cyul_2017

Unnamed: 0_level_0,Year,Month,Day,Data Quality,Tmax,Max Temp Flag,Tmin,Min Temp Flag,Tmean,Mean Temp Flag,...,Snow,Total Snow Flag,Precip,Total Precip Flag,Snow on Grnd (cm),Snow on Grnd Flag,Dir of Max Gust (10s deg),Dir of Max Gust Flag,Max_gust,Spd of Max Gust Flag
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2017-01-01,2017,1,1,‡,-2.4,,-8.7,,-5.6,,...,2.8,,2.4,,12.0,,28.0,,54,
2017-01-02,2017,1,2,‡,1.8,,-10.2,,-4.2,,...,0.0,,0.0,,6.0,,,,<31,
2017-01-03,2017,1,3,‡,0.5,,-5.8,,-2.7,,...,3.6,,17.4,,6.0,,5.0,,43,
2017-01-04,2017,1,4,‡,2.5,,-5.6,,-1.6,,...,9.0,,8.0,,12.0,,24.0,,85,
2017-01-05,2017,1,5,‡,-5.3,,-10.7,,-8.0,,...,0.0,T,0.0,T,13.0,,24.0,,76,
2017-01-06,2017,1,6,‡,-7.7,,-16.4,,-12.1,,...,0.0,T,0.0,T,13.0,,10.0,,41,
2017-01-07,2017,1,7,‡,-9.6,,-18.9,,-14.3,,...,0.0,,0.0,,13.0,,,,<31,
2017-01-08,2017,1,8,‡,-10.9,,-20.3,,-15.6,,...,0.0,,0.0,,12.0,,30.0,,37,
2017-01-09,2017,1,9,‡,-7.0,,-21.9,,-14.5,,...,0.4,,0.4,,11.0,,15.0,,41,
2017-01-10,2017,1,10,‡,1.3,,-8.3,,-3.5,,...,1.0,,2.2,,11.0,,15.0,,63,


In [113]:
cyul_2017.loc['2017-05-29']

Year                         2017
Month                           5
Day                            29
Data Quality                    ‡
Tmax                         19.6
Max Temp Flag                 NaN
Tmin                         12.7
Min Temp Flag                 NaN
Tmean                        16.2
Mean Temp Flag                NaN
Heat Deg Days (°C)            1.8
Heat Deg Days Flag            NaN
Cool Deg Days (°C)              0
Cool Deg Days Flag            NaN
Rain                            8
Total Rain Flag               NaN
Snow                            0
Total Snow Flag               NaN
Precip                          8
Total Precip Flag             NaN
Snow on Grnd (cm)             NaN
Snow on Grnd Flag             NaN
Dir of Max Gust (10s deg)      14
Dir of Max Gust Flag          NaN
Max_gust                       52
Spd of Max Gust Flag          NaN
Name: 2017-05-29, dtype: object

Let's see what the warmest and coldest temperatures of 2017 were:

In [114]:
cyul_2017['Tmax'].nlargest(5)

Date
2017-06-18    32.1
2017-09-25    31.5
2017-06-12    31.4
2017-09-27    31.2
2017-09-24    30.6
Name: Tmax, dtype: float64

In [115]:
cyul_2017['Tmin'].nsmallest(5)

Date
2017-12-28   -26.6
2017-12-30   -24.9
2017-12-29   -24.8
2017-12-27   -24.5
2017-12-31   -23.8
Name: Tmin, dtype: float64

#### Now, let's try plotting some of this data

First, we'll grab the individual columns we want for our x and y data

In [116]:
max_temps = cyul_2017['Tmax']

In [119]:
max_temps.index

Index(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05',
       '2017-01-06', '2017-01-07', '2017-01-08', '2017-01-09', '2017-01-10',
       ...
       '2017-12-22', '2017-12-23', '2017-12-24', '2017-12-25', '2017-12-26',
       '2017-12-27', '2017-12-28', '2017-12-29', '2017-12-30', '2017-12-31'],
      dtype='object', name='Date', length=365)

In [120]:
plt.figure(figsize=[7,4])
plt.plot(max_temps)


<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x7f12c2062978>]

Pandas has many convenience functions that allow us to quickly plot our data in a much prettier way!

In [121]:
plt.close()
plt.figure()
cyul_2017['Tmax'].plot(color='red', figsize=[8,5])
cyul_2017['Tmin'].plot(color='blue')

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x7f12bb5ef668>

#### Now, add a title, axis labels, and legend

In [122]:
#Let's add a 30-day rolling average to the plot so that we smooth out the variations
cyul_2017['Tmean'].rolling(30,min_periods=2,center=True).mean().plot(c='k', label='30-day avg. Tmean')

plt.grid()

In [123]:
plt.legend()

<matplotlib.legend.Legend at 0x7f12bb919a58>

#### With pandas, you can do quick statistics/calculations as well. This is what makes it really powerful for data analysis

In [125]:
cyul_2017.median()

Year                         2017.00
Month                           7.00
Day                            16.00
Tmax                           14.15
Tmin                            5.00
Tmean                           9.30
Heat Deg Days (°C)              8.70
Cool Deg Days (°C)              0.00
Rain                            0.00
Snow                            0.00
Precip                          0.00
Snow on Grnd (cm)               5.00
Dir of Max Gust (10s deg)      22.00
dtype: float64

In [126]:
cyul_2017[cyul_2017.Month==12].describe()

Unnamed: 0,Year,Month,Day,Tmax,Tmin,Tmean,Heat Deg Days (°C),Cool Deg Days (°C),Rain,Snow,Precip,Snow on Grnd (cm),Dir of Max Gust (10s deg)
count,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,24.0,26.0
mean,2017.0,12.0,16.0,-5.612903,-12.064516,-8.858065,26.858065,0.0,0.409677,1.980645,2.409677,7.083333,20.076923
std,0.0,0.0,9.092121,8.416006,8.786526,8.493675,8.493675,0.0,1.808932,4.290876,4.69243,5.012304,9.142967
min,2017.0,12.0,1.0,-20.5,-26.6,-23.6,10.2,0.0,0.0,0.0,0.0,0.0,2.0
25%,2017.0,12.0,8.5,-11.05,-18.95,-14.1,19.5,0.0,0.0,0.0,0.0,4.0,16.25
50%,2017.0,12.0,16.0,-6.4,-12.2,-8.6,26.6,0.0,0.0,0.0,0.2,5.5,24.0
75%,2017.0,12.0,23.5,1.2,-3.75,-1.5,32.1,0.0,0.0,1.7,1.6,10.0,27.0
max,2017.0,12.0,31.0,10.0,5.6,7.8,41.6,0.0,9.8,18.6,17.2,19.0,29.0


### Try making a plot of one of the other fields in cyul_2017

In [None]:
plt.close()
#######

## Other useful AOS-related Python packages

* Xarray - reading, analyzing, editing, creating NetCDF files

* MetPy - https://unidata.github.io/MetPy/latest/
    * Meteorological functions and calculations ($\theta_e$, RH, dew point, etc.)
    * Plotting Skew-T diagrams, station plots, radar data, etc.

* Cartopy - http://scitools.org.uk/cartopy/
    * Plotting data on maps
    * Note that you'll come across matplotlib basemap - this is no longer developed and Cartopy is currently the successor to basemap