# Python

Ajay Raj (araj@berkeley.edu)

Calvin Chen (chencalvin99@berkeley.edu)

# Part 1: Basic Python Syntax



The purpose of this Jupyter Notebook is to help you learn to:
1. navigate Jupyter notebooks (like this one);
2. write and evaluate some basic *expressions* in Python,
3. call *functions* to use code other people have written;
4. break down Python code into smaller parts to understand it.

Fun fact: The creator of the Jupyter Notebooks (formerly known as the IPython Notebook), Fernando Perez, is a UC Berkeley Department of Statistics professor and has attended several of SUSA's faculty dinners!

# 1. Jupyter Notebooks

Each rectangle containing text is a *cell*.

There are two types of cells:

A. *Text*
    - simply cells with text (like this one)
    - can be edited by double-clicking on the cell.

**Practice:**
    
    Delete the following sentence and replace it with the future name of your firstborn daughter: 

Java is better than Python.

B. *Code*
    - cells that contain code in the programming language Python 3
    - running a code cell will execute all of the code it contains
    - can be edited by clicking on the cell, it'll be highlighted with a little green or blue rectangle
    - to run the code in a cell press the ▶| button or press `shift` + `enter`

Try running this cell:

In [None]:
print("SUSA is the greatest club on campus.")

 - Some shortcuts:
     - `Esc` - `a`: creates a cell above current cell
     - `Esc` - `b`: creates a cell below current cell
     - Alternatives to `shift` - `enter`:
         - `ctrl` - `enter`: run cell, keeps cursor in cell
         - `opt` - `enter`: run cell, create new cell below

# 2. Code

Code is broken down into different lines. When a cell is run, the computer executes the code line by line in the order they are written. Run the next cell and notice the order of the output.

In [None]:
print("My favorite animal is a")
print("human")

**Practice:** Change the cell above so that the output looks like this

    My favorite animal is a
    cat, but I am a
    human

# 3. Errors

Python is a language, and like spoken languages, it has rules.  The rules are very simple and after seeing a little bit of code, you'll start to understand it and over time will gain reasonable proficiency. 

However, the rules are strictly enforced. Unlike spoken languages where small mistakes do not inhibit understanding, programming languages cannot work with even the slightest mistakes.

Whenever you write code, you'll make mistakes — it is inevitable; everybody makes mistakes.  When you run a code cell that has errors, Python will sometimes produce error messages to tell you what you did wrong. When you make an error, you just have to find the source of the problem, fix it, and move on.

We have made an error in the next cell.  Run it and see what happens.

In [None]:
print"Trump says the system is broken")

The last line of the error output attempts to tell you what went wrong.  The *syntax* of a language is its structure or the set of rules that governs it. `SyntaxError` tells you that you have created an illegal structure, or in other words, you broke the rules.

There's a lot of terminology in programming languages, but you don't need to know it all in order to program effectively. If you see a cryptic message, you can often get by without deciphering it.

**Practice**: Now fix the line so that it runs.

# 4. Numbers and Arithmetic

In addition to representing commands to print out lines, lines of code, or *expressions* can represent numbers and methods of combining numbers. The expression `12.10000` evaluates to the number 12.1. (Run the cell and see.)

In [None]:
12.10000

Notice that we didn't have to print. When you run a notebook cell, if the last line has a value, then Jupyter helpfully prints out that value for you. However, it won't print out prior lines automatically.

In [None]:
print(1)
2
3

Above, you see that 3 is the value of the last expression, 1 is printed, but 2 is lost forever because it was neither printed nor last.

The line in the next cell subtracts.  Its value is what you'd expect.  Run it.

In [None]:
3 - 5

Many basic arithmetic operations are built in to Python. Here are some common arithmetic operators:

Addition: `+`

In [None]:
4 + 5

Subtraction: `-`

In [None]:
5 - 1

Multiplication: `*`

In [None]:
3 * 8

Division: `/`

In [None]:
4/1

Floor Division: //

In [None]:
5 / 2

In [None]:
5 // 2

Remainder: `%`

In [None]:
8 % 3

Exponentiation: `**`

In [None]:
2 ** 5

Note: `2 * * 5` will return an error.

The order of operations is the same as the one you commonly use, and Python also has parentheses.  Need proof? Run the cell below.

In [None]:
12+4*7-6*3**4*2**5/3*6

# 5. Variables

In computer science, variables are used to store information and allows the programmer to refer to it very easily.  In terms of data, it is very useful for naming data and simplifying the code.

In Python, variables are defined through *assignment statements*. An assignment statement has the name of the variable on the left side of an `=` sign and an expression to be evaluated on the right. When you run an assignment statement, Python will execute the expression to the right of the `=` and then store it in the variable on the left.

In [None]:
thesis = "UC Berkeley is the greatest public school in the world"

Notice there is no output, because the value of the expression was put in the name `thesis`. Run the following cell:

In [None]:
thesis

If no value has been assigned to a name, Python will return an error:

In [None]:
topic_sentence

**Practice:** Assign a value to the variable above so that there are no errors.

You can also assign numbers or mathematical expressions to variables.

A common pattern in Jupyter notebooks is to assign a value to a name and then immediately evaluate the name in the last line in the cell so that the value is displayed as output. 

In [None]:
cos_pi_over_4 = 1 / 2**0.5
cos_pi_over_4

Another common use of variables in Jupyter notebooks is  a series of lines in a single cell will build up a complex computation in stages, naming the intermediate results.

In [None]:
rent = 1580
months = 17 + 12
total_rent_paid = rent * months
total_rent_paid

# 6. Commenting

A *comment* is writing in a code cell that the computer doesn't run. Comments help to communicate what the coder is doing in that piece of code and is extremely important when people collaborate to understand what each person is doing.

In Python, when you put a *#* symbol, everything to the right of the symbol on the same line is not run.

In [None]:
# Calculating an approximation for pi
about_pi = 22/7
about_pi

# 7. Functions

*Functions* are pieces of code written by others that you can openly, freely use. A function is called through its name and then its argument or arguments. An argument is a value that you pass into the function and the function does something with it or to it. The syntax of a function is as follows:

`f(27)`

This is called a *call expression* because  `f` is the name of the function being called and `27` is the argument passed in.

Python has a set of built-in functions that are simple to use and very useful. For example, the `abs` function takes one argument and evaluates to the absolute value of that argument.

Jupyter Notebooks also provide some nice functionality for Python functions. If you put your cursor after a function, and hit *shift-tab*, it will popup a description of what the function does.

In [None]:
abs(-3)

In [None]:
n = 6
abs(n)

Some functions take in multiple arguments, separated by commas. For example, `max` function returns the largest value of the arguments passed in.

In [None]:
max(2, 5)

In [None]:
max(-2, 7, 13, 2, 6, -10, -23, 0)

You can also pass in expressions into functions, as can be seen below:

In [None]:
num_rotations = 4
# round() rounds a value to the nearest integer
approx_radians_turned = round(2 * about_pi * num_rotations)
approx_radians_turned

*Nesting* is the term for passing in expressions into expressions. In other words, a statement or expression is nested if it contains call expressions or other expressions passed into functions as arguments. For example:

In [None]:
abs(4 - max(2, 3))

# 8. Data Types

In Python, there are several types of data. Here are a couple below

1. Numbers 

`[1.25, 0.34, 3.0, 2.0]`
2. `string` - 
This represents a "string" of characters. In other words, it's a piece of text that could be one letter, a word, a phrase, a sentence, or an entire text.

`['has', "The English language is really complicated.", "L", '4']`

3. `bool` -
This represents a boolean value, which is a fancy computer way of saying `True` or `False`. Computers store booleans as `0` for `False` and `1` for `True`. `True`/`False` are interchangeable with `1`/`0`

Here are some examples:

In [None]:
6.0000/2

In [None]:
6/2

In [None]:
1 * 3

In [None]:
'Everything' == "Everything"

In [None]:
my_variable == 3

In [None]:
my_variable = 3

In [None]:
print("I <3", 'numbers')

You can convert from data types quite easily in Python using the following built-in functions:

In [None]:
bool(0)

In [None]:
int(True)

In [None]:
int("4")

In [None]:
float(4)

In [None]:
float(False)

In [None]:
float("3.52 ")

In [None]:
"45" + "3.15"

# 9. Back to functions

There are three places where functions come from.

1. Built-in functions 

    These are the ones we already talked about. Built-in functions come with the language of Python and can be used at any time.
    
    
2. Functions from modules

    Most programming involves work that is very similar to work that has been done before.  Since writing code is time-consuming, it's good to rely on others' published code when you can.  Rather than copy-pasting, Python allows us to **import** other code, creating a **module** that contains all of the names created by that code.

Python includes many useful modules that are just an `import` away. Let's look at a couple real quick:

In [None]:
import math

# math comes with many useful values for constants, such as pi or e
print(math.pi)
print(math.e)

# math also comes with many functions to manipulate values. Here are some examples below
print(math.sqrt(16))
print(math.factorial(5))
print(math.log(1))
print(math.sin(math.pi/2))

In the next cell, we import the `IPython.display` module.
Then we use the `Image` function within the module using your intuition about how to call methods from modules. The `Image` function takes in the URL of a picture as a string and returns the image. Put this link in the function:
    `https://i.ytimg.com/vi/IcWtX1UvIks/maxresdefault.jpg`

In [None]:
import IPython.display
art = IPython.display.Image("https://i.ytimg.com/vi/IcWtX1UvIks/maxresdefault.jpg")
art

(3) Defining functions

The final way to get useful functions is to create them yourselves! Being able to define functions is incredibly useful because it allows you to reuse the same code you already wrote over and over again very easily.

First let's look at what defining a function looks like and then we'll break down what's happening:

In [None]:
def quadratic_solver(a, b, c):
    # This is the docstring of a function, when you hit shift-tab on a user-created function, this is displayed
    '''This function returns the largest non-perpendicular angle of a right triangle using the lengths'''
    '''of the hypotenuse and the two sides.'''
    discriminant = b ** 2 - 4 * a * c
    # returns two different answers, which we can use as shown below
    return (-b + (discriminant ** 0.5)) / 2*a, (-b - (discriminant ** 0.5)) / 2*a

The first thing you see is a `def`. It stands for 'define' and it lets Python know that you're about to define a function. `def` is always bolded because it is a *keyword* that Python looks for, just like `import`, `True`, `False` and `return` (we'll come back to `return`).

Next comes the name of the function. In this case we called the function `quadratic_solver`. Make sure that the name of the function does not exist somewhere else in your code or in module that you are using so that you do not confuse Python and more importantly, yourself.

Following the name, is something called the *signature*. Inside the brackets, you state what arguments you want your function to take in and what you want to refer to them as. In `quadratic_solver`, we wanted three arguments named `a`, `b`, and `c`.

At the end of the first line, there is a `:`. This tells Python that we are done setting the function up and the following is the actual function.

Note that we included *documentation* of what the function does. Functions can do complicated things, so you should write an explanation of what your function does.  For small functions, this is less important, but it's a good habit to learn from the start.  Conventionally, Python functions are documented by writing a triple-quoted string:

Then finally comes the body of the function — the actual code. This is the code that you don't want to have to type over and over again.

The keyword `return` in a function's body tells Python to make the value of the function call equal to whatever comes right after `return`.  In `quadratic_solver`, we return the value of the largest angle.

Now that you've defined your function, you can call it just like any other function

In [None]:
root1, root2 = quadratic_solver(1, 5, 6)
print(root1)
print(root2)

**Practice:** Define a function that takes in four numbers, calculates the average, and then returns true if the average is larger than Euler's number ($e$) and false if it is not. Make sure to test your function.

Note: It'll be useful to know some comparative *operators*:

`==` returns `True` if the values on either side of it are equal to each other

`!=` returns `True` if the values on either side of it are *not* equal to each other

`>` returns `True` if the value on the left is larger than the one the right

`>=` returns `True` if the value on the left is greater than or equal the one on the right

`<` and `<=` do what you would expect.

In [None]:
...

# 10. Arrays and Lists

Arrays and lists are two ways of easily storing and manipulating a sequence of data.

A *list* is a sequence of values (just like an array), but the values can all have different types. Lists automatically come with Python and do not have to be imported.

In [None]:
my_favorite_things = ['L', 36, 17/6, 3 == 3, "family", "https://www.nationalgeographic.com/content/dam/travel/2017-digital/torres-del-paine-patagonia/torres-del-paine-national-park-patagonia.jpg"]
my_favorite_things

You can easily retrieve any value in the list by simply using its *index*. An index is the place in the list of an object. Be careful though, the first item in a list has index 0, second item has index 1, third item has index 2, etc.

In [None]:
my_favorite_things[1] # to get stuff from data containers, use brackets!

You can add an element to a list by using the `append` function

In [None]:
my_favorite_things.append('Python')

In [None]:
my_favorite_things

*Anything* can go in lists, including other lists:

In [None]:
lst = [[1, 2], [3, 4]]

We query it as follows:

In [None]:
lst[0][1] # gets first element of lst, then the second element of that list

# 11. Dictionaries

A **dictionary** is a lookup table with key-value pairs.

Say you wanted to store your friend's email addresses in a variable in Python, and would like to look up a friend's email address quickly.

If you used a list, it would look something like this:

In [None]:
email_addresses_lst = [['Ajay Raj', 'araj@berkeley.edu'], ['Calvin Chen', 'chencalvin99@berkeley.edu']]

This is quite annoying to work with, because you don't know where the email address of your friend is in the list.

In [None]:
email_addresses_lst[0][1] # gets Ajay's email address

There's a much easier (and quicker) way.

In [None]:
email_addresses = { # curly braces for dictionaries
    'Ajay Raj': 'araj@berkeley.edu',
    'Calvin Chen': 'chencalvin99@berkeley.edu'
}

You can query this variable in the following ways:

In [None]:
email_addresses['Ajay Raj']

You can add new entries or update old entries to your **dictionary** in the following way:

In [None]:
email_addresses['Patrick Chao'] = 'prc@berkeley.edu'

In [None]:
email_addresses['Patrick Chao']

In [None]:
email_addresses['Ajay Raj'] = 'ajayspersonal@notberkeley.something'

In [None]:
email_addresses['Ajay Raj']

You can create an empty dictionary (with no key-value pairs) like this:

In [None]:
d = {}

In [None]:
d

# 12. Conditional statements and loops

A *conditional statement* is a multi-line statement that allows Python to choose among different alternatives based on the truth value of an expression. A conditional statement always begins with an `if` header, which is a single line followed by an indented body. The body is only executed if the expression directly following `if` (called the *if expression*) evaluates to `True`. If the if expression evaluates to `False`, then the body  is skipped.

In [None]:
x = 6 # change this to make the conditional work!
if (x < 5):
    y = math.e 
y

We can add an `elif` clause (`elif` if Python's shorthand for the phrase "else, if"). Python only reads the `elif` clause if the if expression is `False`.
Also we can add an `else` clause which always evaluates if all the if and elif expressions are `False`.

Note: `if`, `elif`, and `else` are all Python keywords.

In [None]:
direction = "West"

if (direction == "North"):
    is_direction = True
elif (direction == "South"):
    is_direction = True
elif (direction == "West"):
    is_direction = True
elif (direction == "East"):
    is_direction = True
else:
    is_direction = False
    
is_direction

It is useful to know the `and` and `or` operators (more Python keywords).
The `and` operator takes in two conditional statements on either side of it and evaluates to True only if both are True.
The `or` operator takes in two conditional statements on either side of it and evaluates to True only if at least one is True.

In [None]:
True or False

In [None]:
False and False

In [None]:
(4 == 2 or ((3 > 4 and 5 == 5) or 2 < 7))

We can write the previous code block with "or"s like this:

In [None]:
d = "West"

if d == "West" or d == "South" or d == "North" or d == "East":
    is_direction = True
else:
    is_direction = False
    
is_direction

**Practice:** Define a function named `type_of_triangle` that takes in the lengths of the three sides of a triangle and returns whether the triangle is an equilateral, isosceles, or scalene triangle.

For example, `type_of_triangle(4, 4, 6)` should return `"isosceles"`.

Try to use up as few lines as you can.

In [None]:
...

In programming, we often want to run the same code over and over again. Functions help with that, but the scale of data in computer science is too large to do manually, so we use *iterations*, also known as *loops*. This is especially useful in data science, when trying to analyze millions and millions of pieces of data.

In Python, we use a `for` statement to loop over the contents of a sequence. A `for` statement begins with the word `for`, followed by a name we want to give each item in the sequence, followed by the word `in`, and ending with an expression that evaluates to a sequence. The indented body of the `for` statement is executed once for each item in that sequence

In [None]:
my_favorite_things

In [None]:
for favorite in my_favorite_things:
    print(favorite, "is my favorite!")

The `range` function is also useful for loops: say you want to count all the numbers between numbers $i$ (inclusive) and $j$ (exclusive). You can do that with `range(i, j)`.

In [None]:
for i in range(10): # shorthand for range(0, 10)
    print(i)

In [None]:
def sum_of_squares(max_value):
    ''' This function finds the sum of all numbers from 0 up to, but not including max_value'''
    total = 0
    for i in range(max_value):
        # checks if the square root of i is a whole number by converting it to an integer and seeing if in that process,
        # any decimal places were cut off
        total = total + i
    return total
    
sum_of_squares(10) #should equal 45. Why isn't it 55?

**Practice:** Use what you've learned to fill in the following code:

In [None]:
#Define a function num_long_words that takes in an list of words, 
# and returns to the number of words in the array that has more
#than 4 characters

# Hint: to get the length of a word, run len(s)
len("Ajay")

In [None]:
def num_long_words(...):
    num_words = 0
    ...
    return num_words

num_long_words(["Hello", "this", "is", "Ajay"]) # should equal 1 (Hello)

### Congrats! You've learned the fundamentals of Python and programming!

## Exercises

**Practice**: You dropped your phone in the toilet, so you've lost all of your contacts! You've heard about this mythical technology called *The Yellow Pages*, which has the contacts of everyone in Berkeley. Someone has stored all of the contacts in *The Yellow Pages* in Python in the following way:

It is a dictionary with 26 keys, each corresponding to each letter of the alphabet. The value corresponding to each letter is a list of all contacts whose first name starts with that letter. Each of the values in the list is another dictionary which stores the information about a person.

Here's an abbreviated example of this:

In [None]:
yellow_pages_abbrev = {
    'A': [
        {
            'first': 'Ajay',
            'last': 'Raj',
            'phone_number': '5551234',
            'address': 'Not My Address 94709'
        },
        {
            'first': 'Ajay',
            'last': 'NotRaj',
            'phone_number': '5554321',
            'address': 'Fake Address 94720'
        }
    ]
}

In [None]:
def get_phone_number(yellow_pages, first_name, last_name):
    """Searches The Yellow Pages (stored in yellow_pages) for the phone number of someone whose first name and last name are given. 
    
    >>> get_phone_number(yellow_pages_abbrev, 'Ajay', 'Raj')
    '5551234'
    """
    pass

**Practice**: The Fibonacci sequence is defined as follows: $$f_n = f_{n-1} + f_{n-2}$$ where $f_i$ is the $i$th element of the sequence. The 0th and 1st elements of the sequence are 0 and 1.

Fill in the following function to compute the $n$th element of the sequence.

In [None]:
# Hint: this is how the range() function works
for i in range(1, 5):
    print(i)

In [None]:
# If you are having some trouble, take a look back at the Lists and Arrays section!
def fib(n):
    """
    >>> [fib(i) for i in range(8)]
    [0, 1, 1, 2, 3, 5, 8, 13]
    """
    f = [_, _]
    for i in range(2, n + 1):
        f.______(____________)
    return f[_]

In [None]:
# you can check your answer by running this cell, if it does not error, your code is correct
assert [fib(i) for i in range(8)] == [0, 1, 1, 2, 3, 5, 8, 13]

**Practice**: While working on your Jupyter Notebook for learning python, you suddenly teleported to the ESPN sports center, where people now believe that you’re a basketball analyst! Luckily, you’ve watched basketball for a while, and can speak on players’ performances, but a lot of your analysis also comes from the stats that players have in the game. However, the stats recorded for the game for each player aren’t as clean as you might want them to be- they’re only recorded by quarter! Here’s an example of what the stats look like (they also only contain points per quarter):

In [None]:
game_stats = {
   'Ajay Raj': [7,5,5,9],
   'Calvin Chen': [10,2,3,5],
   'Patrick Chao': [3,12,4,8],
   'Isabelle Townley-Smith': [2,4,3,26]
}

For now, the only metric as an analyst you care about are the points. However, just from the points, you can answer all sorts of questions!

Determine the number of points each player got throughout the game.

In [None]:
def points(name, stats):
   """
   >>> points('Ajay Raj', game_stats)
   26

   >>> points('Isabelle Townley-Smith', game_stats)
   35
   """

Determine who had the most points throughout the game (hint: You may need a variable to store who’s had the highest number of points so far and another to variable to compare the points function above with.

In [None]:
def highest_points(stats):
   """
   >>> highest_points(game_stats)
   'Isabelle Townley-Smith'
   """
   # YOUR CODE HERE

Determine who stayed the most consistent throughout the game by calculating the standard deviation for each player. (Hint: standard deviation can be calculate with np.std)

In [None]:
def most_consistent(stats):
   """
   >>> most_consistent(game_stats)
   'Ajay Raj'
   """
   # YOUR CODE HERE

# Challenge Problem

**Note: this is for people who already know the Python basics.**

Some "refresher" documentation:
- https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions
- https://docs.python.org/2/library/itertools.html#itertools.combinations
- https://docs.python.org/2/library/functions.html

In [None]:
from geopy.geocoders import Nominatim

In [None]:
geolocator = Nominatim()

In [None]:
building_names = []
with open('buildings.txt', 'r') as f:
    for line in f.readlines():
        building_names.append(line.strip())

In [None]:
names_and_coordinates = {}
for i, building_name in enumerate(building_names):
    location = geolocator.geocode(building_name)
    if location and 'Berkeley' in location.address:
        names_and_coordinates[building_name] = (location.latitude, location.longitude)
    print('Halls scanned: {0}/{1}'.format(i + 1, len(building_names)), end='\r')

In [None]:
from itertools import combinations

In [None]:
from math import sin, cos, sqrt, atan2, radians
def haversine(lat1, lon1, lat2, lon2):
    """Calculates the straight line distance between two latitude-longitude coordinates"""
    # approximate radius of earth in miles
    R = 3959.0

    lat1_r = radians(lat1)
    lon1_r = radians(lon1)
    lat2_r = radians(lat2)
    lon2_r = radians(lon2)

    dlon = lon2_r - lon1_r
    dlat = lat2_r - lat1_r

    a = sin(dlat / 2)**2 + cos(lat1_r) * cos(lat2_r) * sin(dlon / 2)**2
    c = 2 * atan2(sqrt(a), sqrt(1 - a))

    return R * c

You are trying to create the most evil schedule on campus, which contains back-to-back classes in buildings that are the furthest apart on campus. You are given the buildings and coordinates of those locations in a dictionary that can be queried as follows:

In [None]:
names_and_coordinates['Soda Hall'], names_and_coordinates['Cory Hall']

Write a function that returns a tuple of the furthest two buildings on campus, and the distance in miles between the two.

In [None]:
# Note: the solution has no loops or the word for
def furthest_walk_on_campus(names_and_coordinates):
    def key_fn(entry):
        """Could be useful with the built-in max() function"""
    """YOUR CODE HERE"""

In [None]:
print('The furthest walk on campus is {0} to {1}, which is {2} miles'.format(*furthest_walk_on_campus(names_and_coordinates)))

**Extension**: Distance isn't always the best measure of the worst walk. Use Google Maps Matrix API (https://developers.google.com/maps/documentation/distance-matrix/intro)'s time to destination function to rank the walks instead.