# CS 124: Python and Jupyter Tutorial

**Created by** Krishna Patel (Winter 2020)
**Updated by** Bryan Kim (Winter 2021) and Dilara Soylu (Fall 2021)

Welcome to `CS 124`! This notebook consists of 3 parts:
* `Jupyter Notebooks`: We will walk you through how to use `Jupyter Notebooks`, which we will use for our assignments in `CS 124`. 
* `Python Exercises`: We will go through some `Python` exercises to reinforce a couple important concepts. 
* `Regular Expression Exercises`: We will work through some `Regular Expression` examples as a preparation for our next assignment, `PA1`. 

We will end with a final note touching on some quirks of `Jupyter Notebooks`, which will be important for debugging your assignments. Let's dive in!

## Part 1. Jupyter Notebooks

__Feel free to skip this section is you already have experience with Jupyter Notebooks.__ `Jupyter Notebooks` are similar to `Python` scripts, except that they are comprised
of cells of `Python` code or text that can be run and modified separately and
out-of-order. They provide an easy way to modify and test code snippets, or to
provide explanations and visualizations to accompany your code. `Jupyter Notebooks` contain two types of cells:

* `Markdown` Cells
* `Code` Cells

__Editing Markdown Cells__ You can edit `Markdown` cells by double clicking them or hitting the `Enter` key while they are selected. In general, you shouldn't need to do this on any of the assignments, since we will only grade your code.

This cell is an example `Markdown` cell. 

__Running Code Cells__ You can run `Code` cells by selecting them and clicking the run button (`▶ Run`) in the top toolbar. You can also run `Code` cells by hitting `CTRL + Enter` or `Command + Enter` if you are on a `Mac`. If you also want your cursor to advance to the next cell, you can use `Shift + Enter`. The output of the code, such as `print` statements, will be shown below the cell after it is run. You can use the arrow keys to navigate between cells.

In [None]:
# This is cell is an example Code cell
print("I can run code!")

Alternatively, if the last line of a cell is a variable, the value of the variable will be printed as the output. 

In [None]:
var = "I am a string."
var

__Adding New Cells__ To add new cells, you can hit the `+` button in the top toolbar, or select `Inser Cell Above` or `Below` options in the `Insert` menu item. You can also use the shortcuts `A` to inser a cell above the current cell, and `B` to insert a cell below. By default, a newly created cell will be a code cell. You can change the type of a cell via the dropdown in the toolbar or by selecting it and hitting `Y` for `Code`, and `M` for `Markdown`. You can also quickly delete cells by hitting `D` twice (`D + D`). 

__Notebook Kernel__ Notebook `Kernel` is the piece of program that executes the code in your notebook. You can see the current `Kernel` on the top right of the notebook, which should say `Python [conda env:cs124]` if you started `Jupyter` after you activated the `cs124 conda` environment in your terminal. This means that the notebook is using the `Python` installation in our `conda` environment, which is the correct version of the `Python` (`Python 3.8`) we expect you to use in the assignments. This also ensures that the versions of the external libraries used in the assignments, such as `NumPy`, are the same for everyone in the class.

One of the common mistakes students run into is forgetting to activate our `conda` environment before starting to work on the assignments. In such cases, you may see `Python` complaining about not finding a module, or a specific method from a specific version of a module. It is always a good idea to ensure that your notebook is running the correct `Kernel` such cases.

__Interrupting the Kernel__ You may want to interrupt the execution of your code in some cases, like the example below. To stop the `Kernel`, you can click on the interrupt button (`■`) in the toolbar or select the `Interrupt` option in the `Kernel` menu item. 

In [None]:
# Code that will run forever
while True:
    pass

__Restarting the Kernel__ Because you can execute cells in any order you want in a notebook, you may find that some cells work differently when you run them after running other cells. We have provided one such example below. It is good to restart your `Kernel` in such cases to start with a clean state. You can restart your `Kernel` by clicking on the restart button (`⟳`) in the toolbar or by selecting the `Restart` option in the `Kernel` menu item. Once you are done with your assignments, we recommend restarting your `Kernel` and running your cells in sequential order to make sure your code works as you intended. 

You can restart the kernel and run all the cells in-order in one go by using the fast-forward button on the toolbar, or selecting the `Restart & Run All` option in the `Kernel` menu item.

In [None]:
# Initialize a variable
a = 2

In [None]:
# Perform an operation on the variable assuming it exists in the memory
4 / a

In [None]:
# Delete the variable and execute the cell immediately above this one
# You will get a NameError
del a

__Saving Your Work__ `Jupyter Notebook` automatically saves your changes once every `120 seconds`, but
to be safe we recommend saving frequently as you make changes. You can
save with the save button in the top toolbar or the usual shortcut
(`CTRL + S` or `Command + S`).

__Congratulation on Finishing this Part!__ This should cover everything you need to know to get started! There is more information on some of the confusing aspects of `Jupyter` at the end if you get to `Python Exercise 7`.

You can find the list of all shortcuts by hitting `H` if you ever need a reference!

## Part 2. Python Exercises

We will cover some `Python` exercises in this section as a refresher! Let's get started. 

### Environment Check

You will see an environment check at the beginning of your assignment notebooks. As explained earlier, this is important to ensure that you are running the correct version of `Python` in the right environment!

In [76]:
# Check the name of the conda environment
import os
assert os.environ['CONDA_DEFAULT_ENV'] == "cs124"

# Check that the Python version is 3.8
import sys
assert sys.version_info.major == 3 and sys.version_info.minor == 8

If the above cell causes an error, it means that you're using the wrong environment or `Python` version! If you get an error, go to the terminal from which you ran the `jupyter notebook` command, and kill notebook server by hitting `CTRL + C` or `Command + C`. You will be asked whether you want to shutdown the notebook server, to which you should respond with a `y`. 

You can then activate the correct environment by running the below command in your terminal.

`$ conda activate cs124`

Once the you are in the correct environment, you can start the notebook again with the below command. 

`$ jupyter notebook`

If you get the same error, you should go back and follow the setup and installation
instructions in the README.md file that came with this notebook.

__which command__ Another way to check whether `Jupyter` is running your intended installation of `Python` is through the `which` terminal command, which helps us find the program ran by our terminal given a keyboard. Note that `Jupyter Notebook` cells are intended to contain `Python` code. We can tell `Jupyter` that we want run our command outside of the notebook by using the exclamation mark (`!`). The output of the below cell shows us the location of the `Python` installation used to run the cells in the notebook. 

In [77]:
!which python

/Users/dilara/.miniconda3/miniconda3/envs/cs124/bin/python


The expected output of the above cell will be different based on your username and your `miniconda` install location, but it will contain a substring similar to `/miniconda3/envs/cs124/bin/python`. 

### Exercise 1: `Booleans`

Explore the __truthiness__ and __falseness__ of different values using the `bool()` function, which converts inputs into `Boolean` values. 

In [78]:
# Boolean value of True is True
bool(True)

True

In [79]:
# Boolean value of False is False
bool(False)

False

In [80]:
# True is equivalent to 1 in Python
True == 1

True

In [81]:
# False is equivalent to 0 in Python
False == 0

True

In [82]:
# Boolean value of any non-zero number is True
bool(8)

True

In [83]:
# Boolean value of any non-zero number is True
bool(-1)

True

In [84]:
# Boolean value of None is False
bool(None) 

False

In [85]:
# Boolean value of the empty objects (empty list etc.) are False
bool([])

False

In [86]:
# Boolean value of non-empty objects are True
bool([1,2])

True

In [87]:
# See how these properties come in handy for list operations

# If x is a list:
# * It must containt at least one element for its truth value to be True
# * It must be None or empty for its truth value to be False

x = []
print(f"The boolean value of x is {bool(x)}.")

if x:
  print("x is True")
else:
  print("x is False")

The boolean value of x is False.
x is False


### Exercise 2: Looping Issues

Explore the following code to see why modifying the object you are looping over is dangerous.

In [88]:
# This code attempts to print the list element by element
# while deleting each element after it is printed 

greetings = ['hello', 'hi', 'salve', 'ciao', 'bonjour', 'hola', 'merhaba', 'hallo']

for elem in greetings:                         
    print(elem) 
    # Remove first element           
    greetings.pop(0)
    print("Current state of greetings: " + str(greetings))
  
print("Final state of greetings: " + str(greetings))

hello
Current state of greetings: ['hi', 'salve', 'ciao', 'bonjour', 'hola', 'merhaba', 'hallo']
salve
Current state of greetings: ['salve', 'ciao', 'bonjour', 'hola', 'merhaba', 'hallo']
bonjour
Current state of greetings: ['ciao', 'bonjour', 'hola', 'merhaba', 'hallo']
merhaba
Current state of greetings: ['bonjour', 'hola', 'merhaba', 'hallo']
Final state of greetings: ['bonjour', 'hola', 'merhaba', 'hallo']


__What went wrong?__ It is never a good idea to modify a list as you are
looping through it!

### Exercise 3: Using Reserved Keywords

In this exercise we will explore what happens when we __overwrite__ reserved keywords in `Python`.

In [89]:
# list function usually creates a list
list()

TypeError: 'list' object is not callable

What would happen if we created a variable named `list`? A `TypeError`!

In [90]:
# We name our list as list
list = [1, 2, 3]
print(list)

# Let's make a new list
print("Attempting to make a new list ...")
new_list = list()

[1, 2, 3]
Attempting to make a new list ...


TypeError: 'list' object is not callable

__Don't Use Python Keywords as Variable Names__ To revert the above issue, we can delete the poorly named `list`. 

__Note:__ Running the below cell multiple times will raise an error because we can only delete `list` once.

In [91]:
del list
new_list = list()
print(new_list)

[]


### Exercise 4: Slicing `Python` Lists

In this exercise, we will practice slicing lists in `Python`.

In [52]:
nums = [0, 1, 2, 3, 4, 5]

In [53]:
# Slice from index 2 inclusive to 5 exclusive
nums[2:5]

[2, 3, 4]

In [54]:
# Slice from the beginning to index 3 exclusive
nums[:3]

[0, 1, 2]

In [55]:
# Slice from index 3 inclusive to the end
nums[3:]

[3, 4, 5]

In [57]:
# "Slice" the entire list
nums[:]

[0, 1, 2, 3, 4, 5]

### Exercise 5: Copying Data Structures

Run the following code to determine how data structures are copied in `Python`.

In [58]:
# Create a dictionary
d = {'one': 1, 'two': 2}

d

{'one': 1, 'two': 2}

In [59]:
# Try to copy d
copy = d

copy

{'one': 1, 'two': 2}

In [60]:
# Modify the copy
copy['three'] = 3

copy

{'one': 1, 'two': 2, 'three': 3}

In [61]:
# See if d remains unchanged
d

{'one': 1, 'two': 2, 'three': 3}

__Copying vs. Aliasing__ When we assign a variable containing a data structure (`d`) to a new variable (`copy`), `Python` simply makes the latter an alias of the former. Therefore, changing the copy changes the original data structure. Below is an example avoiding this problem.

In [62]:
# Proper copy
copy = {k:v for k, v in d.items()}

copy

{'one': 1, 'two': 2, 'three': 3}

In [65]:
# Modify the copy
copy['four'] = 3

copy

{'one': 1, 'two': 2, 'three': 3, 'four': 3}

In [66]:
# d remains unchanged
d

{'one': 1, 'two': 2, 'three': 3}

__Copying through Slicing__ We have seen a method to copy a `list` in the previous exercise!

In [75]:
# One way to properly copy list is through slicing
copy = nums[:] 

# Another alternative, similar to the dictionary example from before
copy = [i for i in nums]

# Modify the copy
copy[5] = 10

# Observe that nums didn't change
print(f"copy is {copy}")
print(f"nums is {nums}")

copy is [0, 1, 2, 3, 4, 10]
nums is [0, 1, 2, 3, 4, 5]


## Part 3. `Regular Expression` Exercises

`Regular Expressions` (`RegEx`) are usually used to search for patterns in strings, or check if a string matches a pattern. `Python` has a regular expression module that helps us execute regular expressions on bodies of text. 

In [93]:
# Import the standard Python standard RegEx library
import re

In [95]:
# As an example, we can use a comma as our RegEx pattern and use this pattern
# to split a string

input_str = "a::b,c.d,e;:f,g"

# It is a good habit to mark RegEx patterns with the "r" prefix. In this
# case it doesn't matter, but "r" is needed for the RegEx to be read
# correctly when using special RegEx characters like \b, \w, etc. 

# This pattern matches a single comma
pattern = r","

# re.split splits the input string at the matching patterns
tokens = re.split(pattern, input_str)

tokens

['a::b', 'c.d', 'e;:f', 'g']

In [97]:
# We could be a bit fancier, and allow our pattern to be any character in
# a set. Bracket notation [] indicates that we can match any of the characters
# in the brackets.

# This matches any ONE character in the set(a period, comma, semicolon, or
# colon). Note that because period (".") is a special character in RegExes, we
# need to indicate that we mean a normal period, not the special character.
# To do this, we "escape" the period character by putting a backslash "\"
# before it.
pattern = r"[\.,;:]"

tokens = re.split(pattern, input_str)

tokens

['a', '', 'b', 'c', 'd', 'e', '', 'f', 'g']

In [98]:
# We could even use special operators to describe more specific patterns.
# For example, the "+" operator means that it will match the object to its left
# at least once, but possibly multiple times. Note that if the object to the left
# is a set, it could be a different character from that set each time.

# This matches any sequence of one or more characters from the set
# [.,;:].
pattern = r"[\.,;:]+"

tokens = re.split(pattern, input_str)

tokens

['a', 'b', 'c', 'd', 'e', 'f', 'g']

In [99]:
# We can also use RegExes to find all the times a specific pattern appears
# in a string. For example, what if we wanted to find all the instances of
# the word "dog" in a text.

text = """F
I love my dog Spot! Spot is the best dog in the world. He likes playing
with other dogs at the park. But he doesn't like cats, he is scared of them.
Today I will take him to the dog park. One time he saw a cat there and got so
scared he wanted to go home.
"""

# This matches just the 3-character sequence "dog"
pattern = r"dog"

tokens = re.findall(pattern, text)

tokens

['dog', 'dog', 'dog', 'dog']

In [100]:
# There are also other operators we can use, like ? to match the object to its
# left 0 or 1 times (in other words, it is optional).

# This matches "dog" followed by an optional s.
pattern = r"dog[s]?"

tokens = re.findall(pattern, text)

tokens

['dog', 'dog', 'dogs', 'dog']

In [103]:
# Or we can match multiple possibilities (i.e. A or B)

# This matches "dog" or "cat".
pattern = r"(dog|cat)"

tokens = re.findall(pattern, text)

tokens

['dog', 'dog', 'dog', 'cat', 'dog', 'cat']

In [104]:
# A particularly common operator is the star "*" operator. It matches 0 or more
# of the object to its left. For example, we can use it to match any word that
# starts with an a (an a or A at the start of a word, followed by 0 or more of
# any letter).

# We can use the \b symbol to match the start of a word.
# and the \w symbol to match a letter.
# See https://docs.python.org/3/library/re.html for details and other
# special symbols.

# This matches any word starting with an a or A
pattern = r"\b[aA]\w*"

tokens = re.findall(pattern, text)

tokens

['at', 'a', 'and']

In [105]:
# Beyond the re.findall() function, we also often use re.search() if we want
# more flexible control over how we match/search
# (example from https://www.w3schools.com/python/python_regex.asp)

# re.search() is slightly more complicated than re.findall(), because it
# doesn't just return a list of matches as strings. It instead returns a Python
# match object that contains more detailed information about the match.

# NOTE: period (".") is a special character indicating any character (except
# a newline). "^" matches the start of a string and "$" matches the end of a
# string.

text = r"The rain in Spain"
match = re.search(r"^The.*Spain$", text)

# Print the match object
print(match)

# Find the first (and only) match
print(match[0])

# Get the original string back
print(match.string)

# Get the group (the entire part of the string where the match happened)
print(match.group())

# Find the span, the tuple (start_index, end_index).
# In other words the positions (from the start of the string, counting from
# 0), where the match starts and ends.
print("span: " + str(match.span()))

<re.Match object; span=(0, 17), match='The rain in Spain'>
The rain in Spain
The rain in Spain
The rain in Spain
span: (0, 17)


In [106]:
# We can also use capture groups (in parentheses) to indicate
# parts of the match that we want to save so that we can use them separately

# Using your knowledge of RegExes from the previous examples, what does this
# pattern mean? What do you expect it to do/match from text? What parts of text
# will fall in each of the 3 capture groups?
ai_match = re.search(r"(.*?)ai(.*?)ai(.*)", text)

In [107]:
# Let's check if your guesses were correct:

# Entire match
ai_match.group(0)

'The rain in Spain'

In [108]:
# First capturing group 
ai_match.group(1)

'The r'

In [109]:
# Second capturing group
ai_match.group(2)

'n in Sp'

In [110]:
# Third capturing group
ai_match.group(3)

'n'

Let's try a more difficult example: 

Try to extract the inside of an HTML Tag that follows the following rules:

*   Start tags start with "<," have at least 1 alpha numeric character
and then end with ">"
*   End tags start with "</", have at least 1 alpha numeric characters
and then end with ">"
*   There can be any character between the two tags

Here are some examples example of the text:

`<html>`this is what we want to extract`</html>`

don't want`<h1>`what we want `</h1>` don't want

---

If you want to make it harder make sure you pass this test case, since
technically HTML tags must match (the contents of the start tag must match
the contents of the end tag):

`<html>`this is what we want to extract `</h1>` `</html>` 

You should extract: this is what we want to extract `</h1>`

In [111]:
# Try to solve this here

test_str1 = "<html>this is what we want to extract</html>"
test_str2 = "don't want<h1>what we want </h1> don't want"

hard_case = "<html>this is what we want to extract </h1> </html>"

Solved it? You can find the answer [here](https://colab.research.google.com/drive/1IQ8LqxGY8B0A0ecqgHQrSfb34pCMUQFE) 

Let's try a longer example: 

Try to match a URL that follows the following rules:

* the URL must start with http or https followed by ://
* the domain name can only be alphanumeric or contain "." or "-"
* can contain a port specification (http://abc.com:80) (you can assume ports
go from 0 to 99)
* after the port, the URL can contain any number of  alphanumeric digits,
dots and hyphens

In [112]:
# Try to solve this here

test_str1 = "http://www.google.com"
test_str2 = "https://www.gmail.com:88/hello-hi"
test_str3 = "http://abd-fh.8rhgyt.org:90/h-"


Done? You can find the answer [here](https://colab.research.google.com/drive/1BrNRHLnpu_1W9bsKZzS1Vk9P6xuE7f_i)!

## Note on Jupyter Runtime

Before we wrap up with the `Python` review, there is one more thing we would like to
show you. This will be pretty important for your future homework assignments,
so we encourage you to read untill the end and not skip this part!

If you've gotten this far, that means you have figured out the basics of how to
walk through a `Jupyter Notebook`. As we are sure you've noticed, `Jupyter Notebooks`
are a bit different from the sort of `Python` scripting you may be familiar
with.

As a result, there are a few things that can cause confusion or trip students
up and are worth keeping in mind:

The most important thing to remember about `Jupyter Notebooks` is that
every time you run a cell, you are changing the global state of the notebook.
In other words, every time a cell is run, it

For example, try running this cell:

In [None]:
print(this_variable_has_not_been_defined_yet)

The cell should give an error, complaining that

```
NameError: name 'this_variable_has_not_been_defined_yet' is not defined
```

Now try running this cell:

In [None]:
this_variable_has_not_been_defined_yet = "until right now!"

And now go back and run the previous cell. It worked now!
What gives?

The reason it worked this time is that when you ran the previous cell, it
changed the state of the notebook, creating the new variable. This new variable
now exists and is accessible from __ALL__ cells
in the notebook that you run from now on! This applies to any future cell
you run anywhere in the notebook, whether it be "above" or "below" the
current cell.

It doesn't matter what order the cells appear in the notebook; the only
thing that matters is the order that the cells were run in! This is something
important to keep in mind as your notebooks become longer and more complex,
with lots of cells. If you run into weird or unexpected behavior in your
notebook, it probably has something to do with the order in which you
executed your cells!

Another corollary of the above fact is that you should be mindful of the side
effects of your cells! We'll show you what we mean with an example.

Let's create a variable:

In [None]:
counter = 0

And now we'll try incrementing it once and printing it. Simple, right? What
result would you expect if you ran the cell below (yes, it's obvious. please
just humor us for now)?

In [None]:
counter += 1
print(counter)

As you no doubt expected, it printed a 1! Now try running the same cell again a
few more times.
What do you expect the result to be? What actually happened?

You should find that it prints 2, 3, 4, ... and so on!

This may or may not surprise you. Often students who haven't worked with
Jupyter notebooks before can find this counter-intuitive, as it seems
natural to expect that the same part of your code should produce the same
output every time.

The reason why that isn't true here is because of the global
state issue that we showed above. Every time you run a cell, it's not
just redoing the same calculation from the same starting point. It's also
updating the global state of the notebook! Any changes you make to variables
persist, and will affect every future cell that you run.

In this particular case, it's not too confusing, and it's pretty easy to
grasp what's going on. But in a huge notebook with many cells, each containing
possibly complex logic with complicated or subtle side effects, this behavior
can often cause problems!

As a result, a good tip when working with `Jupyter Notebooks` is to try to avoid
making the side effects of your cells too complicated. Or at least try to keep
clear in your head which cells have side effects (and so can't be run many times
reproducibly) and which don't.

Obviously it's impossible to totally avoid cross-cell side effects, because if
you did it would defeat the purpose of using a Jupyter notebook at all!
You'd have to have one giant cell with all of your code, or have each cell only
do self-contained work, which wouldn't be very interesting or useful! The global
state maintained by notebooks is what makes them useful and powerful, but
it can also be a major point of pain and confusion if you're not careful.

__TLDR:__ If you run into weird behavior or notebook issues, try thinking
carefully about how things are affecting the global state of your
notebook.


Best of luck with your future `Jupyter` adventures! `Jupyter
Notebooks` are awesome tools for visualization, working with data,
and doing quick prototyping. We hope they can become a useful tool in your
toolbox if they aren't already!

If you're interested in learning a little more Jupyter-foo, check out some of
these links. Even if you're relatively experienced with Jupyter,
it's likely that many of the tips below will be new to you!

* https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/
* https://towardsdatascience.com/optimizing-jupyter-notebook-tips-tricks-and-nbextensions-26d75d502663
* https://towardsdatascience.com/15-tips-and-tricks-to-use-jupyter-notebook-more-efficiently-ef05ede4e4b9


## Bonus

Want more practice with regular expressions? Try [this question](https://www.learnpython.org/en/Regular_Expressions) with your neighbor!

Done with that as well? Complete the python tutorial [here](https://www.learnpython.org/en/Welcome). We recommend looking at the following sections:

* Variables and Types
* Lists
* Basic Operators
* String Formatting
* Basic String Operations
* Conditions
* Loops
* Functions
* Dictionaries
* Classes
* Modules and Packages
* Numpy Arrays
* Generators
* List Comprehensions
* Regular Expressions
* Sets
* Decorators