Welcome To CS 124's Python and Jupyter Tutorial!
---
Created by Krishna Patel (kpatel7@stanford.edu)

Updated by Bryan Kim (bskim96@stanford.edu), Winter 2020-2021.

### Jupyter Notebooks

If you're reading this, it means you got your Jupyter notebook server up
and running and were able to open the notebook.

It's possible that this is your first time working with an iPython/Jupyter
notebook. If it isn't, feel free to skip this section. If it is, here's a quick
rundown:

Jupyter notebooks are similar to Python scripts, except that they are comprised
of cells of Python code or text that can be run and modified separately and
out-of-order. They provide an easy way to modify and test code snippets, or to
provide explanations and visualizations to accompany your code.

Jupyter notebooks contain two types of cells:

* Text (markdown) cells like this one
* Executable (python) cells like the one below

Text cells can be edited by double-clicking them. In general, you shouldn't
need to do this on any of the notebooks, though, as only your code will
be graded.

On the other hand, executable cells, like the one below, contain Python code
that you can run. You can run a cell by selecting it and clicking the "run"
button in the toolbar above (with the play symbol), or, more easily, by pressing
CTRL-Enter. You can try it on the cell below:

In [None]:
print("I'm an executable cell")

The output of any cell (i.e. print statements) will be shown below it after it
is run. Alternatively, if the last line of a cell is a variable, it will print
that variable's value. For example:

In [None]:
var = "I am a string."

var

Notebooks will save your changes automatically every 2 minutes, but
to be safe we recommend saving frequently as you edit/make changes. You can
save with the save button on the toolbar or the usual shortcut
(CTRL-s or CMD-s).

You can use the arrow keys to navigate between cells.

As you're editing the notebook, you may want to add new cells. You can do this
from the toolbar with the '+' button, or the Insert menu, or the shortcuts 'a'
(to insert a cell above) or 'b' to insert a cell below. You can try this here
(select this cell and hit 'b').


Note that by default, a created
cell will be a code cell. You can change the type of a cell by selecting it and
hitting 'y' for code, 'm' for text/markdown (or using the Cell menu). Try it
on your newly created cell.

If the state of your notebook gets messed up, you'd like to stop your currently
running code, or you want to run things from scratch you may want to:

* Stop the notebook kernel (the stop button in the toolbar, or from the Kernel
tab up in the toolbar), or
* Restart the notebook kernel (the circular arrow button in the toolbar,
or from the Kernel tab up in the toolbar).

Optionally, you may want to restart the kernel (reset the notebook back to a
clean slate) and then run all the cells in-order in one go. You can do this
using the fast-forward button on the toolbar, or from the Kernel tab up in
the toolbar.

Feel free to try stopping the kernel, restarting it, or restarting it and
running all cells now to see what happens.

That should cover everything you need to know to get started! There's a little
more info on some of the confusing aspects of Jupyter notebooks at the
end if you get to Exercise 7.

P.S. If you'd like to see the full list of shortcuts, just hit 'h'.

__NOTE__: You may see that in the top toolbar of your notebook, it says
"Not Trusted", just to the left of where it says the name
of your environment ("Python [conda env: cs124]"). You don't need to worry about
this for this or any future notebooks, you can just leave it as untrusted.

### Installation Check

Before we write any code, let's check that you're running the correct version of
Python and in the right environment!

In [None]:
import os
assert os.environ['CONDA_DEFAULT_ENV'] == "cs124"

import sys
assert sys.version_info.major == 3 and sys.version_info.minor == 8

If the above cell complains (prints out an error when run),
it means that you're using the wrong environment or Python version!

If so, please exit this notebook (you can just close the browser tab),
go to the terminal window that you ran `jupyter notebook` in,
and kill the notebook server by typing CTRL-C (and then "y" when
asked if you want to shutdown). Then,
try running

$ conda activate cs124

and restarting your notebook server with

$ jupyter notebook

If that doesn't work, you should go back and follow the setup + installation
instructions in the README.md file that came with this notebook.

## Exercises

Below are some quick Python exercises to get you warmed up.

**Exercise 1 -- True and False**

Using the bool() function, which converts inputs into Boolean values,
we'll explore the "truthiness" and "falseness" of different values.

In [None]:
bool(True) # should be True

In [None]:
bool(False) # should be False

In [None]:
True == 1 # True is equivalent to 1 in Python

In [None]:
False == 0 # False is equivalent to 0 in Python

In [None]:
bool(8) # Any number that is not 0 will evaluate to True

In [None]:
bool(None) # None is "False"

In [None]:
bool([]) # Objects that are empty (ie an empty list) are also False

In [None]:
bool([1,2])

In [None]:
# See how we can utilize these properties in an 
# if statement by changing the values of x

x = -1
print("The boolean value of x is " + str(bool(x)))

if x:
  print("x is True")
else:
  print("x is False")


**Exercise 2 -- Looping Issues**

Explore the following code to see why modifying something you're
looping over is dangerous.

In [None]:
# This code attempts to print the list element by element
# while deleting each element after it is printed 

greetings = ['hello', 'hi', 'salve', 'ciao', 'bonjour', 'hola']

for elem in greetings:                         
    print(elem) 
    # Remove first element           
    greetings.pop(0)
    print("Current state of greetings: " + str(greetings))
  
print("Final state of greetings: " + str(greetings))

What went wrong? (It's never a good idea to modify a list as you're
looping through it)

**Exercise 3 -- Using Reserved Names**

In this exercise we'll explore what happens when we "overwrite" names in Python.

In [None]:
# This function usually creates a list
list()

#### The following cell will raise a TypeError

In [None]:
# But we want to name our list list
list = [1, 2, 3]
print(list)

# Let's make a new list
print("Attempting to make a new list ...")
new_list = list()

To prevent this, stay away from using the names of default Python functions
as variables.

To fix it, let's delete our poorly named list and try it again.

__Note:__ Running this cell multiple times will raise an error because you can
only delete `list` once.

In [None]:
del list
new_list = list()
print(new_list)

**Exercise 4 -- Slicing Python Lists**

In this exercise, we'll practice slicing lists in Python.

In [None]:
nums = [0, 1, 2, 3, 4, 5]

In [None]:
# slice from index 2 inclusive to 5 exclusive
nums[2:5]

In [None]:
# slice from the beginning to index 3 exclusive
nums[:3]

In [None]:
# slice from index 3 inclusive to the end
nums[3:]

In [None]:
# "slice" the entire list / copy the list
nums[:]

**Exercise 5 -- Copying Data Structures**

Run the following code to determine how data structures are copied in Python.

In [None]:
# Create a dictionary
d = {'one': 1, 'two': 2}

d

In [None]:
# Try to copy d

copy = d

copy

In [None]:
# Modify the copy

copy['three'] = 3

copy

In [None]:
# See if d remains unchanged
d

What happened here?

**Exercise 6 -- Regular Expressions**

In [None]:
# Let's import the Python standard regular expression library
import re

As we recall from the modules, regular expressions are a way of defining
patterns that we're interested in. They are usually used to
search for patterns in strings, or check if a string matches a pattern.

In [None]:
# For example, you could use a comma as your regex pattern and use that pattern
# to split a string.

input_str = "a::b,c.d,e;:f,g"

# It's a good habit to mark regex strings with the "r" prefix. In this
# case it doesn't matter, but when using some special regex (escape) characters
# like \b, \w, etc. it is needed for the regex to be read correctly.

# This matches a single comma
pattern = r","

tokens = re.split(pattern, input_str)

tokens

In [None]:
# Or we could be a bit fancier, and allow our pattern to be any character in
# a set. Bracket notation [] indicates that we can match any of the characters
# in the brackets.

# This matches any ONE character in the set(a period, comma, semicolon, or
# colon). Note that because period (".") is a special character in regexes, we
# need to tell regex that we mean a normal period, not the special character.
# To do this, we "escape" the period character by putting a backslash "\"
# before it.
pattern = r"[\.,;:]"

tokens = re.split(pattern, input_str)

tokens

In [None]:
# Or we could even use special operators to describe more specific patterns.
# For example, the "+" operator means that it will match the thing to its left
# at least once, but possibly multiple times. Note that if the thing to the left
# is a set, it could be a different character from that set each time.

# This matches any sequence of one or more characters from the set
# [.,;:].
pattern = r"[\.,;:]+"

tokens = re.split(pattern, input_str)

tokens

In [None]:
# We can also use regexes to find all the times a specific pattern appears
# in a string.
# For example, what if we wanted to find all the instances of the word "dog"
# in a text.

text = """F
I love my dog Spot! Spot is the best dog in the world. He likes playing
with other dogs at the park. But he doesn't like cats, he is scared of them.
Today I will take him to the dog park. One time he saw a cat there and got so
scared he wanted to go home.
"""

# This matches just the 3-character sequence "dog"
pattern = r"dog"

tokens = re.findall(pattern, text)

tokens

In [None]:
# There are also other operators we can use, like ? to match the thing to its
# left 0 or 1 times (in other words, it is optional).

# This matches "dog" followed by an optional s.
pattern = r"dog[s]?"

tokens = re.findall(pattern, text)

tokens

In [None]:
# Or we can match multiple possibilities (i.e. A or B)

# This matches "dog" or "cat".
pattern = r"(dog|cat)"

tokens = re.findall(pattern, text)

tokens

In [None]:
# A particularly common operator is the star "*" operator. It matches 0 or more
# of the thing to its left. For example, we can use it to match any word that
# starts with an a (an a or A at the start of a word, followed by 0 or more of
# any letter).

# We can use the \b symbol to match the start of a word.
# and the \w symbol to match a letter.
# See https://docs.python.org/3/library/re.html for details and other
# special symbols.

# This matches any word starting with an a or A
pattern = r"\b[aA]\w*"

tokens = re.findall(pattern, text)

tokens

In [None]:
# Beyond the re.findall() function, we also often use re.search() if we want
# more flexible control over how we match/search
# (example from https://www.w3schools.com/python/python_regex.asp)

# re.search() is slightly more complicated than re.findall(), because it
# doesn't just return a list of matches as strings. It instead returns a Python
# match object that contains more detailed information about the match.

# NOTE: period (".") is a special character indicating any character (except
# a newline). "^" matches the start of a string and "$" matches the end of a
# string.

text = r"The rain in Spain"
match = re.search(r"^The.*Spain$", text)

# Print the match object
print(match)

# Find the first (and only) match
print(match[0])

# Get the original string back
print(match.string)

# Get the group (the entire part of the string where the match happened)
print(match.group())

# Find the span, the tuple (start_index, end_index).
# In other words the positions (from the start of the string, counting from
# 0), where the match starts and ends.
print("span: " + str(match.span()))

In [None]:
# We can also use capture groups (in parentheses) to indicate
# parts of the match that we want to save so that we can use them separately

# Using your knowledge of regexes from the previous examples, what does this
# pattern mean? What do you expect it to do/match from text? What parts of text
# will fall in each of the 3 capture groups?
ai_match = re.search(r"(.*?)ai(.*?)ai(.*)", text)

In [None]:
# Let's check if your guesses were correct:

# Entire match
ai_match.group(0)

In [None]:
# First capturing group 
ai_match.group(1)

In [None]:
# Second capturing group
ai_match.group(2)

In [None]:
# Third capturing group
ai_match.group(3)

Let's try a more difficult example: 

Try to extract the inside of an HTML Tag that follows the following rules:

*   Start tags start with "<," have at least 1 alpha numeric character
and then end with ">"
*   End tags start with "</", have at least 1 alpha numeric characters
and then end with ">"
*   There can be any character between the two tags

Here are some examples example of the text:

`<html>`this is what we want to extract`</html>`

don't want`<h1>`what we want `</h1>` don't want

---

If you want to make it harder make sure you pass this test case, since
technically HTML tags must match (the contents of the start tag must match
the contents of the end tag):

`<html>`this is what we want to extract `</h1>` `</html>` 

You should extract: this is what we want to extract `</h1>`

In [None]:
# Try to solve this here

test_str1 = "<html>this is what we want to extract</html>"
test_str2 = "don't want<h1>what we want </h1> don't want"

hard_case = "<html>this is what we want to extract </h1> </html>"

Solved it? You can find the answer [here](https://colab.research.google.com/drive/1IQ8LqxGY8B0A0ecqgHQrSfb34pCMUQFE) 

Let's try a longer example: 

Try to match a URL that follows the following rules:

* the URL must start with http or https followed by ://
* the domain name can only be alphanumeric or contain "." or "-"
* can contain a port specification (http://abc.com:80) (you can assume ports
go from 0 to 99)
* after the port, the URL can contain any number of  alphanumeric digits,
dots and hyphens

In [None]:
# Try to solve this here

test_str1 = "http://www.google.com"
test_str2 = "https://www.gmail.com:88/hello-hi"
test_str3 = "http://abd-fh.8rhgyt.org:90/h-"


Done? You can find the answer [here](https://colab.research.google.com/drive/1BrNRHLnpu_1W9bsKZzS1Vk9P6xuE7f_i)!

**Exercise 7 -- Jupyter Quirks**

Before we wrap up with the Python review, there's one more thing we'd like to
show you. This will be pretty important for your future homework assignments,
so we encourage you to read till the end and not skip this part!

If you've gotten this far, that means you've figured out the basics of how to
walk through a Jupyter notebook. As we're sure you've noticed, Jupyter notebooks
are a bit different from the sort of Python scripting you may be familiar
with.

As a result, there are a few things that can cause confusion or trip students
up and are worth keeping in mind:

The most important thing to remember about Jupyter/IPython notebooks is that
every time you run a cell, you're changing the global state of the notebook.
In other words, every time a cell is run, it

For example, try running this cell:

In [None]:
print(this_variable_has_not_been_defined_yet)

The cell should give an error, complaining that

```
NameError: name 'this_variable_has_not_been_defined_yet' is not defined
```

Now try running this cell:

In [None]:
this_variable_has_not_been_defined_yet = "until right now!"

And now go back and run the previous cell. It worked now!
What gives?

The reason it worked this time is that when you ran the previous cell, it
changed the state of the notebook, creating the new variable. This new variable
now exists and is accessible from __ALL__ cells
in the notebook that you run from now on! This applies to any future cell
you run anywhere in the notebook, whether it be "above" or "below" the
current cell.

It doesn't matter what order the cells appear in the notebook; the only
thing that matters is the order that the cells were run in! This is something
important to keep in mind as your notebooks become longer and more complex,
with lots of cells. If you run into weird or unexpected behavior in your
notebook, it probably has something to do with the order in which you
executed your cells!

Another corollary of the above fact is that you should be mindful of the side
effects of your cells! We'll show you what we mean with an example.

Let's create a variable:

In [None]:
counter = 0

And now we'll try incrementing it once and printing it. Simple, right? What
result would you expect if you ran the cell below (yes, it's obvious. please
just humor us for now)?

In [None]:
counter += 1
print(counter)

As you no doubt expected, it printed a 1! Now try running the same cell again a
few more times.
What do you expect the result to be? What actually happened?

You should find that it prints 2, 3, 4, ... and so on!

This may or may not surprise you. Often students who haven't worked with
Jupyter notebooks before can find this counter-intuitive, as it seems
natural to expect that the same part of your code should produce the same
output every time.

The reason why that isn't true here is because of the global
state issue that we showed above. Every time you run a cell, it's not
just redoing the same calculation from the same starting point. It's also
updating the global state of the notebook! Any changes you make to variables
persist, and will affect every future cell that you run.

In this particular case, it's not too confusing, and it's pretty easy to
grasp what's going on. But in a huge notebook with many cells, each containing
possibly complex logic with complicated or subtle side effects, this behavior
can often cause problems!

As a result, a good tip when working with Jupyter notebooks is to try to avoid
making the side effects of your cells too complicated. Or at least try to keep
clear in your head which cells have side effects (and so can't be run many times
reproducibly) and which don't.

Obviously it's impossible to totally avoid cross-cell side effects, because if
you did it would defeat the purpose of using a Jupyter notebook at all!
You'd have to have one giant cell with all of your code, or have each cell only
do self-contained work, which wouldn't be very interesting or useful! The global
state maintained by notebooks is what makes them useful and powerful, but
it can also be a major point of pain and confusion if you're not careful.

__TLDR:__ If you run into weird behavior or notebook issues, try thinking
carefully about how things are affecting the global state of your
notebook.


Best of luck with your future Jupyter adventures! Jupyter/iPython
notebooks are an awesome tool for visualization, working with data,
and doing quick prototyping. We hope they can become a useful tool in your
toolbox if they aren't already!

If you're interested in learning a little more Jupyter-foo, check out some of
these links. Even if you're relatively experienced with Jupyter,
it's likely that many of the tips below will be new to you!

* https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/
* https://towardsdatascience.com/optimizing-jupyter-notebook-tips-tricks-and-nbextensions-26d75d502663
* https://towardsdatascience.com/15-tips-and-tricks-to-use-jupyter-notebook-more-efficiently-ef05ede4e4b9


## Bonus

Want more practice with regular expressions? Try [this question](https://www.learnpython.org/en/Regular_Expressions) with your neighbor!

Done with that as well? Complete the python tutorial [here](https://www.learnpython.org/en/Welcome). We recommend looking at the following sections:

* Variables and Types
* Lists
* Basic Operators
* String Formatting
* Basic String Operations
* Conditions
* Loops
* Functions
* Dictionaries
* Modules and Packages
* Numpy Arrays
* Generators
* List Comprehensions
* Regular Expressions
* Sets
* Decorators