# Python Tutorial 01: Introduction to Python

This notebook provides you with an overview of the basic capabilities of Python and is the first of a series of tutorials. If you spot any mistakes or issues, please report them to christoph.renkl@dal.ca

## Simple Operations

The hashtag symbol `#` starts a comment. Python ignores everything to the right of it. Make use of them to leave yourself some notes in your code.

You can use Python as a calculator:

### Basic operators:

In [None]:
# addition
2 + 2

In [None]:
# subtraction
5 - 3

In [None]:
# multiplication
3 * 4

In [None]:
# division
7 / 2

In [None]:
# division (floor)
7 // 2

In [None]:
# modulo
7 % 2

In [None]:
# exponentiation
2 ** 8

### Variables

Use the equals sign to assign a variable:

In [None]:
x = 12

Let's do some math with the variable:

In [None]:
15 - x # = 15 - 12

You can store the result of that calculation in a new variable:

In [None]:
y = 15 - x # The variable y has now the value of 3 and can be used for further calculations

Multiply the variables `x` and `y`  

In [None]:
z = x * y  # = 12 * 3

# If you want to see the result of this calculation, use the print() function
print(z)

We will learn more about functions and how to use them soon.

## Data Types

Variables don't have to be numbers:

In [None]:
txt = "Hello World" # the variable `txt` is of data type string (str)

Strings are defined by single (' ') or double (" ") quotes.

There are many more data types. Here are a few common ones:

In [None]:
a = 5       # integer (int), i.e., no decimals
b = 3.14195 # float, i.e., any real number
c = "cat"   # string
d = True    # logical, boolean (bool) = true/false

You can check the data type of any variable/object using the function type()

In [None]:
type(c)

The variables above are single value data types which can be stored or organized in many other data types: 

In [None]:
# List - sequence of multiple values
lst1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
type(lst1)

Lists don't have to be just numbers, you can even mix data types

In [None]:
lst2 = ["cow", "pig", "horse", "tugboat", "pumpkin", "hamster", "onion", "bean"]
lst3 = [1, 2.3, c, False]

You can add (concatenate) two lists

In [None]:
lst1 + lst2

Multiplying a list with an integer creates copys of the list and concatenates them

In [None]:
lst1 * 3

Lists are what's called "mutable" which basically means that you can add, remove, or change their elements. In the next tutorial, we will learn how to access and modify individual items in a list.

In [None]:
# Tuple, a "read-only" list which become important for returning values from functions
tpl1 = (1, 2, 3)
tpl2 = ("Halifax", "Montreal", "Toronto", "Vancouver")
type(tpl1)

In [None]:
# Dictionary - pairs of keys and values
dct = {"Apple"  : "Fruit",
       "Carrot" : "Vegetable",
       "Pear"   : "Fruit",
       "Peach"  : "Fruit",
       "Potato" : "Vegetable",
       "Banana" : "Fruit"}
dct

In [None]:
print(dct["Carrot"]) # print value of specific key
print(dct["Banana"])
print(dct.keys())   # print all keys in dictionary

The values of dictionaries can be any data type.

## Packages

Python heavily relies on the concept of "modular programming" which is just a fancy term for dividing the entire code base into smaller chunks called modules or packages. They are basically collections of functions and later in the tutorials, we will learn how to write our own modules. An advantage of modular programming is that it makes it easier to maintain and, more importantly, reuse the code.

Pure Python only has a limited set of modules. However, due the beauty of being an open-source programming language, many people have dedicated their (spare) time and developed modules and packages with sophisticated functionalities and provide them to all Python users.

If you installed Python through the Anaconda distribution many packages are already pre-installed, but you have to import (load) them every time you want to use in order to make them available for your script.

There are different ways of importing a package. Typically this is done at the beginning of each notebook or script:

In [None]:
# Load the full `os` package which is helpful for path and filename operations.
import os

# Load full `pandas` package which is an excellent tool for time series analysis.
# We assign an abbreviation to it which makes it easier to use it's functions:
import pandas as pd

# only load the Path() function from pandas package
from pathlib import Path

Sometime you will see the following import statement:

```
from pandas import *
```

which means that you import all functions from a package. However, this syntax is **not recommended**, because different packages may have functions that share the same names, but have a different functionality or behavior.

Missing packages can be installed using the conda package manager. The exact procedure to install packages depends on your operating system. We will cover installation in the next tutorials.

## Reading Data

We now want to read a comma-separated values (CSV) file that contains a subset of the data collected through the Bedford Basin Monitoring Program. First, we specify the path to the directory where the file is stored using the `Path()` function from the `pathlib` package. This function makes it very convenient to deal with paths. For example, we can extend a path using a back slash `/` (see below).

Before you execute the next cell, change the path in the variable `datadir` to reflect your file structure.

In [None]:
# path
datadir = Path("/home/chrenkl/Projects/DISP/python_tutorial/data/raw")

# full file name including path (append the file name to the path of `datadir` using "/")
fname = datadir / "D18667042_subset.csv"

We imported the `pandas` package above. In order to use its `read_csv()` function, we type the package and function name with a period - now it is obvious why we assigned an abbreviation to it.

In [None]:
# read the file: 
ctd = pd.read_csv(fname)

We now have the data in file `D18667042_subset.csv`, which is stored in the `data` subdirectory, assigned to the variabel `ctd`. This variable is a `pandas.DataFrame` which you can think of as a table with headers. Let's have a look at the structure of the data by printing the first couple of rows:

In [None]:
ctd.head()

The bold numbers at the beginning of each row are indexes and each column has a name that we can use to access the data.

In [None]:
# print one column:
ctd["pressure"]

In [None]:
# Check the data type of the temperature column
type(ctd["temperature"])

A `pandas.DataFrame` typically consist of one or more `pandas.Series` (columns) which share the same indices. As we ill see in the upcoming tutorials, this index doesn't have to be an integer number. You can also print some summary statistics of each column that can help you explore the dataset.

In [None]:
ctd.describe()

## Plotting

Python itself does not have any functions to plot, but there are some packages which help you to create figures. The most common one is `matplotlib` which typically is imported like this:

In [None]:
import matplotlib.pyplot as plt

Plot the temperature column of our `DataFrame` in the variable `ctd`:

In [None]:
plt.plot(ctd["temperature"])

The figure shows the index on the x-axis and temperature on the y-axis. We know that this is the temperature at different depths in the water column which we also have information of in the `DataFrame` (pressure is closely related to depth in the ocean). Therefore, it would be more intuitive to have temperature on the x-axis and pressure on the y-axis. We can achieve that by providing more arguments to the `plot()` function:

In [None]:
plt.plot(ctd.temperature, ctd.pressure)

By default, the origin is in the lower left corner which makes sense for most data. In oceanography, depth is typically the (positive) distance from the sea surface and we therefore want to invert the y-axis. We will also parse more arguments to the plot function to make thee figure prettier:

In [None]:
plt.plot(ctd.temperature, ctd.pressure,  # plot temperature (on the x-axis) as function of pressure (y-axis)
         color="g",                      # specify the color of the line (green)
         linestyle="--",                 # specify the line style (dashed)
         linewidth=2)                    # make line thicker

# invert y-axis
plt.gca().invert_yaxis()

# add axis labels and title
plt.xlabel("Temperature")
plt.ylabel("Pressure")
plt.title("CDT Cast - Bedford Basin Monitoring Program")

The `pandas` package comes with its own plotting capabilities. These are based on the standard plotting package `matplotlib` which gets called under the hood:

In [None]:
ctd.plot(x="temperature", y="pressure",
         color="g",
         linestyle="--",
         linewidth=2,
         xlabel="Temperature",
         ylabel="Pressure",
         title="CDT Cast - Bedford Basin Monitoring Program",         
         ylim=[70, 0])

We can recreate the figure we plotted with `matplotlib` before. I prefer to use `matplotlib` because it gives you more flexibility. The plotting capabilities of `pandas` become very useful when you just want to have a quicklook at your data.

## Getting help

In order to get information about a package or function, you can use the help tab in the top right pane or the `help()` function, e.g.,

In [None]:
help(print)

If you want to get a deeper understanding of how Python works and a more detailed guide how to use it, read through the [SciPy Lectures](https://www.scipy-lectures.org/).

For more specific problems, consult the following ressources:

1. Other students (!!!)
2. Stackoverflow (or other code forums)
3. GitHub (or related code repositories)
4. Personal blogs
5. Open-source courses (Coursera, etc.)

Google is your friend. Well-articulated searches will (hopefully) send you to the right place. Good luck, and don't be afraid to ask for help!