<a href="https://colab.research.google.com/github/bmill42/streaming-data/blob/main/Introduction_to_Colab_and_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Attribution note: This tutorial is modified from the original [notebook](https://colab.research.google.com/github/data-psl/lectures2020/blob/master/notebooks/01_python_basics.ipynb) prepared by Mathieu Blondel, publicly available on GitHub. Here is a link to the [Apache 2.0 license](https://github.com/mrArpanM/learn-python/blob/main/LICENSE) for the original.

# Colab Notebooks

For the rest of the semester, we will be programming in Python using **Jupyter notebooks** hosted through **Google Colab**. A Jupyter notebook is a file that can contain both Python code and rich text contents - it is a user-friendly way to introduce code with accompanying instructions and commentary. It is not a great way to build an app that needs to run independently, but it is perfect for learning and for working with code that is exploratory or experimental, hence the notebook's popularity in the field of machine learning.

Colab is a service from Google that runs notebooks on the cloud for free, which means that you don't have to install Python on your own computer. This saves us a lot of trouble, though it does introduce some limitations: Google will end your programming session after a certain amount of time, and our code is running on a computer that we don't control.

Beause we're working in a notebook, you can both follow along with my examples and run/modify code directly. Notebooks consist of **code cells**, which are blocks of one or more Python instructions. For example, here is a code cell that stores the result of a computation (the number of seconds in a day) in a variable and prints its value:

In [None]:
seconds_in_a_day = 24 * 60 * 60
seconds_in_a_day

Click on the "play" button to execute the cell, or press Shift-Enter while the cursor is inside the cell. You should be able to see the result. Alternatively, you can also execute the cell by pressing Ctrl + Enter if you are on Windows / Linux or Command + Enter if you are on a Mac (Shift + Enter should also work on most platforms).

The entire notebook is a single active Python environment, so variables that you defined in one cell can later be used in other cells:

In [None]:
seconds_in_a_week = 7 * seconds_in_a_day
seconds_in_a_week

Note that the order of execution is important. For instance, if we do not run the cell storing *seconds_in_a_day* first, the above cell will raise an error, as it depends on this variable. You can always clear any errors and ensure things run in the correct order by going to Runtime > Restart Session in the menu.

**Exercise.** Add a cell below this cell: click on this cell then click on "+ Code". In the new cell, compute the number of seconds in a year by reusing the variable *seconds_in_a_day*. Run the new cell.

# Python

Python is one of the most popular programming languages for data science, both in academia and in industry. It is also relatively user-friendly and thus a good language to start with if you are just starting out with textual programming languages.

In this session we'll go over some of the basics of Python. The best way to work through this notebook (and most of the others we'll use) will be to run each cell as you get to it, after reading the accompanying text.

## Arithmetic operations and code comments

Python supports the usual arithmetic operators: `+` (addition), `*` (multiplication), `/` (division), `**` (power/exponent), `//` (integer division), `%` (modular division).

Many Python environments, including Jupyter notebooks, are *interactive*, which means that you can treat them almost like a calculator - if you want to know the value of a variable or the result of a calculation, you just type the operation in directly and run the cell:

In [None]:
81 + 42

In a lot of other programming languages, you would need to do this in multiple steps: assign a variable and then use some kind of `print` command to write the variable's value to the screen. Python can operate this way, too:

In [None]:
new_variable = 2**10
print(new_variable)

It's often helpful to add **comments** to code that don't affect the program's operation but help explain what's going on. In Python, the main way to add comments is to use the `#` sign; anything after the sign is ignored when the program runs.

In [None]:
45 // 4     # Integer division tells us how many times a number fits into another not counting any remainder

## Lists

All objects in Python have a **type**. There are integers (whole numbers), floating point numbers (decimals), and strings (characters of text), among others.

One of the most important types is a **list**. Lists are containers for ordered sequences of elements. Lists can be initialized empty using square brackets with nothing in them:

In [None]:
my_list = []

or with some initial elements

In [None]:
my_list = ['a', 'b', 'c'] # anything in quotation marks has the *string* data type

Lists have a dynamic size, which means that elements can be added (appended) to them or removed. Python objects typically have **methods** associated with them---operations that can modify their contents or perform other actions.

Methods are applied using the object name, a dot (`.`), and the method name with parentheses after it, e.g. `object.method()`. Sometimes the method takes in new information as an **argument**, as in the case we want to add something to a list. The argument goes inside the parentheses.

In [None]:
my_list.append('d')
my_list

We can access individual elements of a list using square brackets with a number inside. The number is an *index* that identifies where in the list an item sits. Python indices always start from zero, which is common in programming, so the third thing in the list is item `[2]`:

In [None]:
my_list[2]

We can access *slices* of a list using `my_list[i:j]` where `i` is the start of the slice (again, indexing starts from 0) and `j` the end of the slice. The slightly confusing part is that the end index is not included. For instance:

In [None]:
my_list[1:3]

If you've dealt with intervals in math, list slices are always closed intervals on the low end and open intervals on the high end. Confusing? Yes. These kinds of situations are so common in computing that there's a name for the most typical kind of mistake you're likely to make: an *off by one error*. Here's an example, where I want to get the first four items of my list. What went wrong?

In [None]:
my_list[1:4]

Sometimes, we may want everything in the list starting from a certain point. Omitting the second index means that the slice will run until the end of the list:

In [None]:
my_list[1:]

We can check if an element is in the list using `in`. This gives us a Boolean value, `True` or `False`:

In [None]:
'e' in my_list

In [None]:
'd' in my_list

The length of a list, which is simply the number of items in it, can be obtained using the `len` function - this one is very useful.

In [None]:
len(my_list)

## Strings

Strings are used to store text. They can be delimited using either single quotes or double quotes, but be consistent! You can't mix them up for the same string.

In [None]:
string1 = "some text"
string2 = 'some other text'

A string is basically a list of characters. As such, we can access individual elements in exactly the same way, by indexing the string variable:

In [None]:
string1[3] # asking for the fourth character in the string

and similarly for slices

In [None]:
string1[5:]

String concatenation is performed using the `+` operator. What's going on with the `" "`?

In [None]:
string1 + " " + string2

## Conditionals

As their name indicates, conditionals are a way to execute code depending on whether a condition is True or False. As in other languages, Python supports `if` and `else` but `else if` is contracted into `elif`, as the example below demonstrates.

Try changing the value of `my_variable` to test out the different conditions:

In [None]:
my_variable = 5

if my_variable < 0:
    print("negative")
elif my_variable == 0:
    print("null")
else: # my_variable > 0
    print("positive")

Here `<` and `>` are the strict `less than` and `greater than` operators, while `==` is the equality operator (the single `=` is reserved solely for *assigning* variables). The operators `<=` and `>=` are used for `less/greater than or equal to`.

Unlike most other languages, in Python blocks of code are delimited using indentation, i.e. tabs. That means that it's *crucial* that the code that goes along with each if/else condition is indented by one level. Try un-indenting one of the lines (or indent one that shouldn't be) and see what error you get.

## Loops

Loops allow us to execute a block of code multiple times. There are two main types of loops: `while` loops and `for` loops.

### While loops

A `while` loop has a condition that is checked each time the loop begins. If the condition is `True`, the loop continues. If not, it stops.

In [None]:
i = 0
while i < len(my_list):
    print(my_list[i])
    i += 1 # equivalent to i = i + 1

**Question:** Why did we set `i = 0` at the beginning, and why is `i` being modified in the loop even though it isn't being printed out?

### For loops

A `for` loop takes in some kind of list and performs an operation once for each item in the list. There is always a special **loop variable** that holds onto the item currently being referenced in the loop.

The loop variable is named right after `for` and assigned the values that come after `in`.

If the goal is simply to iterate over a list, we just assign the loop variable using the list itself:

In [None]:
for element in my_list:
    print(element)

Alternatively, we can use the loop variable as a **counter** that tracks where we are in the list. This code for this looks a bit more messy, but first run it and see what it does:

In [None]:
for i in range(len(my_list)):
  print(i, my_list[i])

See how two different things printed on each line? The first is `i`, the loop variable. The second is the actual list item, accessed using `i` an index.

**Exercise:** In the next cell, isolate the code that uses the `range` function and figure out what it outputs.

In [None]:
# Test out the range() function

## Functions

To improve code readability and reliability, it is common to assign a name to an operation so that we can call on it later without having to type out all the code again.

Fundamentally, a function takes some input(s) and processes it to produce an output. The output is always indicated with the `return` keyword, and we start a new function definition with `def` (for 'define').

A function's inputs, or **arguments**, are defined between the parentheses after the function's name. When we write a custom function, the name and arguments can be called anything we want.

In [None]:
def square(x):
    return x ** 2

def multiply(a, b):
    return a * b

Like a loop or an `if` statement, a colon follows the first line of the definition and the rest of the code is indented.

Once the functions are defined, we can use them:

In [None]:
multiply(3, 2)

In [None]:
square(5)

Functions can also be *composed*, which just means that the output to one function can be the input to another:

In [None]:
square(multiply(3, 2))

We can get the exact same result in a way that might be slightly more readable (but takes more code) by using some *intermediate variables*:

In [None]:
mult_result = multiply(3, 2)
square_result = square(mult_result)
print(square_result)

Sometimes for readability it can be useful to explicitly name the arguments when you call the function. The names come from the function's definition (for built-in functions that come with Python, you can find these names in the [documentation](https://docs.python.org/3/)):

In [None]:
square(multiply(a=3, b=2))

## Exercises

**Exercise 1.** Write a function that takes in a first and last name and returns a single string of the form `"<last>, <first>"`

In [None]:
def full_name(first_name, last_name):
  # Write your function here
  return

full_name(first_name="Taylor", last_name="Swift")

**Exercise 2.** Write a function that takes in a musical pitch (C, D, E, etc.) and returns the scale degree number of that pitch in C major.

If you don't know what that means, it's ok: C major contains the seven notes `C, D, E, F, G, A, B`. The *scale degree* just refers to a note's location in that list: C is scale degree 1, G is scale degree 5, etc.

Your function should take in a letter and return a number `1-7`. Return `0` if the provided note is not in the scale.

In [None]:
def C_scale_degree(pitch):
    # Write your function here
    return

C_scale_degree('F')