# Python Basics Training

Welcome to Python!

Python:

* is an open-source programming language
* is very popular for general-purpose programming
    - Highly readable syntax, often reading like regular english
* has a huge open source community, providing tools to perform almost any task
* tends to be faster than R, but it is still an interpreted language. Compiled languages like C, Go, Rust, etc. will all generally perform faster than Python.

There isn't one single "RStudio" IDE that is the go-to for writing python. There are 3 standout options at the Board:

- Spyder, the Board has Spyder installed on SALT
    - Versions Spyder Py3 and Spyder4 Py4, running python 3.7.7 and 3.8.8. respectively (but this rarely matters).
    - Spyder5 runs the latest version of Python
    - These are most similar to Rstudio.

* Jupyter notebook
   - The environment used to create this! Very similiar to R Markdown, allowing for markdown to help explain your code as you go.
         * Able to run chunks of code, potentially in any order.
         * File is named `.ipynb` from `.py` (analogous to `.R` -> `.Rmd`, between R and R Markdown).
         * Can be exported to html for embedding in other web products.
         * To open, call `jupyter3 notebook` or `jupyter3 lab` in the linux console inside the directory what you want to work with.

* VS Code
    - Available both on Linux and Windows
    - Is a text editor which means that you can open most scripts in it such as `.inp` files
        - Also means that as long as you install the right extension and interpreter you can write in almost any language in it.
    - A lot of customizability and easy to install extensions
    - Capable of working with `.py` and `ipynb` scripts
    - Can work in the browser https://vscode.dev/ (not recommended)
    
### Helpful Jupyter Notebook Shortcuts


* `ctrl + enter`: runs the code in the current cell
* `shift + enter`: runs the code in the current cell and jumps to the cell below it or creates a new cell below it if there is not yet a cell below
* `esc -> b`: creates a new cell below the current cell (b for below)
* `esc -> a`: creates a new cell above the current cell (a for above)
* `esc -> dd`: deletes the current cell
* `esc -> h`: opens a list of many jupyter notebook shortcuts

# 1 Basic Datatypes

These are some of the datatypes that can be used within Python. They each represent information in different ways. Typically, most mainstream languages have each of these in some form with some naming differences. 

In [None]:
string_example = "I am a string"

integer_example = 7

float_example = 7.49

boolean_example = True

list_example = [7,7.49,"I am a string",True]

tuple_example = (1,2,3)

dictionary_example = {
    "float_example":7.49,
    "string_example":"I am a string",
    "boolean_example":True,
    "list_example":list_example
    }

# Type the variable that you want to print between the ()
print(tuple_example)

## 1.1 Strings

Python only has one type for strings: `str`. All text are strings. There is no `char` type for a single character.

Create a string by either `'text here'` `"text here"`, `'''text here'''`, or `"""text here"""`. Single quotes or double quotes can be used, as long as you start and end the string with the same one.

For example:

In [None]:
print("I am a string")
print('''I am also a string''')

You can define a string over multiple lines using the triple-quote syntax, useful for things like SQL queries:

In [None]:
my_hadoop_query = """SELECT *
    FROM if_optionsmetrics.europe_opprc op
    LEFT JOIN if_optionsmetrics.europe_opinf ec
    ON op.securityid = ec.securityid
        AND op.exchange = ec.exchange
        AND op.optionid = ec.optionid
    WHERE op.securityid = 712879
        AND op.tabledate > 20220101
        AND ec.expiration < 20250101
        AND abs(op.delta) BETWEEN .4 AND .6
        AND op.impliedvolatility > 0
    ORDER BY op.tabledate
    """
print(my_hadoop_query)

### 1.1.1 Common Basic String Operations

Manipulating strings is intutitive in python!

* `+`: Concatenation (`"hel" + "lo"` returns `"hello"`)
* `*` : repetition (`'3' * 3` returns `'333'`)
* `string_name[...]` indexing (`'hello'[0]` returns `'h'`)

A convenient feature is the f-string, which stands for **formatted string literals**. The idea is that we want to substitute variables or the results code inside of our string (i.e. interpolate) without writing code that looks like `"..." + "..." + "..."` (which works; though it can get really hard to read), but rather something that looks like we're filling out a template.

In [None]:
name_of_file = "filename.txt"
fstring = f"/path/to/this/data/{name_of_file}"
regular_string = "/path/to/this/data/" + name_of_file
print(fstring)
print(regular_string)
# Now imagine if we wanted to interpolate in the middle of the string!

var1 = 1
var2 = 3
fstring = f"the sum of var1 and var2 is {var1 + var2}"
print(fstring)

Some common methods on strings:

* `s.replace("!", "?")`: Return a new string where `!` is replaced with `?`.

* `s.lower()`: Return a new string, which is the lowercase version of `s`. Example: `"HELLO".lower() == "hello"`

* `s.strip()`: Return a new string, where leading and trailing whitespace have been trimmed off. `Example: "  hello! ".strip() == "hello!"`

* `s.zfill(n):`: Return a new string, zero-padded on the left up to `n` digits. For example, `"1".zfill(2) == "01"`. This can be useful for accessing files named after dates, for example.

Let's test a replacement:

In [None]:
s = "Hi!"
print(s.replace("!", "?"))
print(s)

Notice that, after calling the `replace` method on `s`, the change was "reverted" (more accurately, it was never stored back into `s`!). The `replace` method actually returns a **new copy** of the string, with the modifications made. That is, it does not overwrite `s`, but returns an entirely new string.

In [None]:
s_replaced = s.replace("!", "?")
print(s_replaced)

## 1.2 Integers and Floats

In [None]:
print(f"Addition: {2 + 3}")

print(f"Subtraction: {5.5 - 3.3}")

print(f"Multiplication: {2 * 3}")

print(f"Division: {6 / 3}")

print(f"Floor Divison: {7 // 3}") # Same as division, but discards remainders

print(f"Exponents: {2 ** 3}")

print(f"Modulus: {7 % 3}") # Returns the remainder of 7 / 3

print(f"Scientific Notation: {1e-4}")

print(f"Absolute Value: {abs(-0.4827)}")

## 1.3 Boolean/Conditionals

Boolean variables work very similarly to how they do in R (and most other programming languages).

Conditional statements based on these types work as you expect but instead of enclosing the logic with curly braces `{ }` you simply indent the line (read: **indentation matters in Python**)

In [None]:
x = True
y = False

# Example 1
if x:
    print('x is True.')

# Example 2
if x:
    print('x is True, not False.')
else:
    print('x is False, not True.')

# Example 3
if x and y:
    print('x and y is True.')
else:
    print('x and y is False.')

# Example 4
if x or y:
    print('x or y is True.')
else:
    print('Both x and y are False.')

# Example 5
if not y:
    print("not y is True, therefore y is False")

In table form:

| x     | y     | x and y | x or y |
|-------|-------|---------|--------|
| True  | True  | True    | True   |
| True  | False | False   | True   |
| False | True  | False   | True   |
| False | False | False   | False  |

The `not` operator reverses the value of a boolean variable. So `not True == False` and `not False == True`.


We also have `elif`, which stands for `else if`. As soon as one of these conditions triggers, that will break you out of the entire `if/else` block. Try changing the `elif` to `if` in the cell below. Can you see what happens, and why?

In [None]:
grade = 80.8

if grade >= 90:
    letter = "A"
elif grade >= 80:
    letter = "B"
elif grade >= 70:
    letter = "C"
else:
    letter = "F"

print(f"Grade {grade} translates to letter grade {letter}.")

## 1.4 Lists

Lists are one of the most important objects in Python, similar to what an array would be in other languages. They hold a collection of other objects.

Indexing starts at 0, so the final index of a list of `n` elements will always end on an index equal to `n-1`. 

In [None]:
example_list = ["Justin", "John", "Jerry"]

print(f"Size of List: {len(example_list)}")
print(f"Second element of List: {example_list[1]}")

In [None]:
# Appending to a list
print("------------ First way to Append ------------")
print(example_list + ["Julia"]) # This returns a new list -- it does not modify the existing list!

example_list.append("Julia") # This method modifies the existing list.
print("------------ Second way to Append ------------")
print(example_list)

Mutability is a property that means an object can be altered in-place. Most "atomic" Python types (`int`,`float`,`str` ...) are immuatable. So, when you reassign a variable (declaring `x=5` and later declaring `x = 3`), it is not actually stored at the same location in memory.

Lists, however, are mutable. This has to do with how they are stored under the hood, but generally remember: if you apply a method on a list, it may (and usually will) change the list, **even if you don't reassign anything**.

In [None]:
# Mutability
example = [0, 1, 2]
print(f"Before applying methods: {example}")
example[1] = 3 # Changes second element to 3
example.append(4) # Does not return anything; changes example object to have a 4th element = 4

print(f"After applying methods: {example}")

In [None]:
# Mutability part 2
x = [0, 1, 2]
y = x # y points to the same place in memory as x
x.append(3)

# Appending to x appended to both!
print(f"x: {x}")
print(f"y: {y}")

In [None]:
# Slicing
print(f"Slice of a List: {example_list[0:2]}")

In [None]:
# Combining Lists
first_list = [1,2,3]
second_list = [5,6,7]
print(first_list + second_list)

In [None]:
# Check if an element is in a list
my_list = [1, 3, 4, 10]
print(3 in my_list)
print(7 in my_list)

In [None]:
# Get the length of a list
my_list = [1, 5, 7, 10, 11]
print(f"Length of my_list: {len(my_list)}")

## 1.5 Dictionaries

A formal definition of a a dictionary is a set of key-value pairs (also known more broadly as hash tables or hash maps). These are asymptotically very efficient as lookup tables, and they act similarly to **lists** in R. It should be noted that hash tables or dictionaries by default are not sorted in any way. 

> You can utilize the [Collections](https://docs.python.org/3/library/collections.html) package to create dictionaries that are sorted based off their order of insertion. 

In [None]:
simple_dict = {
    "key" : "value",
    "key2" : "value2",
    "key3" : "value3"
}

# Get the value for a specific key
print(simple_dict['key2'])

# Add a new key-value pair to the dictionary
simple_dict["new_key"] = "new_value"
print(simple_dict)

In [None]:
# Print all Keys
print(simple_dict.keys())

# Print all Values
print(simple_dict.values())

**Exercise:** Use the following `params` dictionary to print a string in the format specified below. 

a. Compute the following in python:
$$
    \frac{(alpha * beta + gamma)^2}{discount\_rate}
$$

b. Print out the computed value in a string with the format `"Example Calculation = Value from a."`

In [None]:
"""
Write answer to exercise in this cell block
"""

params = {
    "alpha" : 1,
    "beta" : 2.55,
    "gamma" : 0.3,
    "discount_rate" : 0.95
}

# part a

# part b

A common pattern you may see is, rather than accessing an element with `d["key"]`, you will see it accessed as `d.get("key", some_default_value)`. The `.get` method works the same way as accessing an item with `d["key"]`, but returns a specified default value if the item doesn't exist. Usually, this is `None`, but other defaults can be useful, for example, in applications where you need to translate specific mnemonics between data sources.

In [None]:
# Try changing this to "alpha". Also try changing the default value!
test_value = params.get("xi", None)
print(test_value)

## 1.6 Tuples

As an honorable mention, the tuple is an immutable version of a list. Tuples don't come up much as, but are used as the return object for many functions. Bonus points: they can be hashed! (don't worry if you don't know what this means). We won't cover them in detail, but they do exist!

In [None]:
# Create a tuple using the () literal
test_tuple = (1,2,3)

# 2 Loops

## 2.1 For

Python has the expected loops (for, while) as a means for iteration.

In [None]:
# Access elements in a list
cc_list = ["us", "eu", "ca"]
for i in range(0,3): # Remember that the second number is not included in the range.
    print(cc_list[i])

In [None]:
# Another way to access elements in a list
cc_list = ["us", "eu", "ca"]
for i in cc_list:
    print(i)

In [None]:
# If you wanted to access the list elements and indices
cc_list = ["us", "eu", "ca"]
for index, i in enumerate(cc_list):
    print(f"index: {index}, country: {i}")

In [None]:
country_list = ["United States", "United Kingdom", "Spain"]
gini_list = [1.05,2.10,3.30]

for x, y in zip(country_list, gini_list):
    print(f"Country: {x}, Gini Index: {y}")

In [None]:
# Sometimes it's helpful to just iterate over the length of a list. Here's how to do that.
# Same as the first example, without hard-coding the length of the list
names = ["John", "Jane", "Jacob"]
for i in range(len(names)):
    print(f"{names[i]} is at index {i}.")

## 2.2 While

A `while` loop will iterate until its condition is no longer `True`. Make sure you have a stopping condition! You can create infinite loops if you don't.

In [None]:
# Simple Example
name_list = ["Alice", "David", "George"]

index = 0

while index < len(name_list):
    print(name_list[index])
    index += 1

In [None]:
number = 104
while True:
    print(f"Current Number: {number}")
    # Recall from 1.2 that this is modulous
    if number % 3 == 0:
        break # Terminates the loop
    else:
        # Recall from 1.2 that this is floor division
        number //= 3

**Exercise:**

Use the following `value_dict` dictionary and parameters to solve a toy optimization problem. Using `parameter_1` and `parameter_2`, update your guess for `parameter_2` until `parameter_1 ** parameter_2 == act_val` by iterating until you find a value of `parameter_2` with an acceptable error (tolerance). Follow the steps below to accomplish this:

a. Create two variables called `param_1` and `param_2` to store the values associated with the keys `parameter_1` and `parameter_2`, respectively.

b. Initialize the error and iteration count:
* Set error to a value greater than tolerance (e.g., error = 100).
* Set count to 0.

c. Follow these steps to find the optimal value:
1. While the error is greater than the tolerance and iterations less than the max_iterations repeat the below steps.
2. Calculate the estimated value: `estim_val = param_1 ** param_2`. 
3. Find the error which is the absolute difference between the estimated and actual value.
4. Update `param_2` by adding 0.01.
5. Increment `count` by 1
6. If `count` is divisible by 50 fill in the #### and print out the following statement: 
    * "Iteration: ####, Error: ####"

In [None]:
"""
Write answer to exercise in this cell block
"""

value_dict = {
    "act_val" : 1000,
    "parameter_1" : 1000,
    "parameter_2" : 0.01
}

tolerance = 1e-10
max_iterations = 10000

In [None]:
"""
SOLUTION BELOW
SOLUTION BELOW
SOLUTION BELOW
SOLUTION BELOW
SOLUTION BELOW
SOLUTION BELOW
SOLUTION BELOW
SOLUTION BELOW
SOLUTION BELOW
SOLUTION BELOW
SOLUTION BELOW
SOLUTION BELOW
SOLUTION BELOW
SOLUTION BELOW
SOLUTION BELOW
SOLUTION BELOW
SOLUTION BELOW
SOLUTION BELOW
"""

In [None]:
value_dict = {
    "act_val" : 1000,
    "parameter_1" : 1000,
    "parameter_2" : 0.01
}

tolerance = 1e-10
max_iterations = 10000

error = 100
iter_count = 0
param_1 = value_dict["parameter_1"]
param_2 = value_dict["parameter_2"]

while error > tolerance and iter_count <= max_iterations:
    estim_val = param_1 ** param_2
    error = abs(estim_val - value_dict["act_val"])
    param_2 += 0.01
    iter_count += 1
    if iter_count % 50 == 0:
        print(f"Iteration: {iter_count}, Error: {error}")

print(f"Final Error: {error}, Parameter 2 Value: {param_2}")

# 3 Functions, Methods, and Scope

Functions are the same in Python as they are in other languages. That is, they are reusable bits of code that take input(s), operate on them, and `return` an output. Think of a function in math: $f(x) = x^2$ is a function named $f$ that takes an input $x$ and returns an output $x^2$. Similar to math, a function can have multiple inputs (and even return multiple things, though we won't cover that here). We often call inputs **arguments** to that function, or `args` for short.

The distinction between a **function** and a **method** is simple: a **method** is attached to a specific **data type**, often called with `object_name.method(input1, input2, ...)`. Think back to the `replace()` method on strings and `append()` on lists. We won't cover how to create your own types with attached methods, but understanding the distinction between functions and methods is important to understanding how Python works.

**Functions**, on the other hand, are called with the syntax `function_name(input1, input2, ...)`. Notice the lack of the `.` syntax!

## 3.1 Methods

Methods are functions that are defined on a class. All data types in Python are also called classes. Classes consist of attributes and methods.

* **Attributes** are values assigned to specific instance of a class
    - Value for `int` might be `1`.
    - Length of `list` might be `15`.

* **Methods** are functions that classes are constructed to apply to their value and provide other functionality specific to that type
    - Instances of the `str` type have many methods that perform tasks such as filtering, find and replace, and truncating leading/trailing characters -- many of which we covered earlier

#### Examples of Methods as a dog

If we think of a dog:


The attributes would be things that describe the dog.

* name
* breed
* fur color

The methods act like verbs, things that the dog can do on its own.

* run
* jump
* sleep

This becomes clearer the more you use them. Remember, in Python, **methods** are called by `object_name.method()`, as opposed to functions, which are covered next.

### 3.2 Functions

Functions work just like methods, but generally don't need to be associated with a class. Define a function with

```python
def function_name(argument1, argument2, ...):
    """
    A docstring, which is a description in text of what the function does
    """
    # do some operations
    result = some_operation(argument1, argument2, ...)
    return result
```

In [None]:
def my_func(x, y):
    if x > y:
        return "x is greater than y"
    elif y > x:
        return "y is greater than x"
    else:
        return "y is equal to x"

# Arguments will take the same order as the function definition by default
# These are called positional arguments
print(my_func(1, 3))

# We can also name the specific values of each argument, called using keyword arguments
# In Python, we often call these kwargs for short
print(my_func(x=2, y=1))

# When using keyword arguments, order doesn't matter
print(my_func(y=2, x=1))

In [None]:
# But be careful! Look what happens when you specify a keyword argument before a positional argument
print(my_func(y=2, 1))

In [None]:
# Positional arguments should always go before keyword arguments.
print(my_func(1, y=2))

Functions can also have default values

```python
def my_function(arg1, arg2 = 10, arg3 = "Hi mom!"):
    pass
```

In [None]:
def add_numbers(a, b = 10):
    """Adds two integers, a and b."""
    return a + b

print(add_numbers(5, 5))
print(add_numbers(5, 10))
print(add_numbers(5))

Python is a dynamically typed language, which means that functions can be defined to work on any type. This is nice for developing quickly, since you often don't need to think about what type things are and encode that into your functions. However, you do have the option of providing **type annotations** (also called **type hints**) in your function definitions, which can make your code a lot easier to read and collaborate on:

In [None]:
# The type hint reads: this function takes 2 integers, with 1 default, and returns another integer
def add_numbers(a: int, b: int = 10) -> int:
    """Adds two integers, a and b."""
    return a + b

print(add_numbers(5,5))

Some languages will enforce these argument and return types. Without any external tools, this is purely cosmetic in Python, but it helps communicate exactly what the function is trying to do, allows IDEs to provide better autocomplete suggestions, and oftentimes even catch errors before running your code!

**Exercise:**
Create a function `sum_list` that takes a list of numbers as input and returns the sum of those numbers. Try writing it using a `for` loop. Bonus points: add type annotations!

Run this function on `num_list` and verify that the output is 272.

In [None]:
"""
Write answer to exercise in this cell block
"""
num_list = [1,2,9,40,220]

## 3.3 Scope

Scope is where a variable's binding to a value is active within the program. A variable with _global_ scope is a variable whose binding is active in all lower scopes. Lower scopes will usually include the scope within your user defined functions. 

> It's good practice to treat global variables as constants, i.e. variables that never change throughout the execution of your code, even though you can alter the value of these global variables in lower scopes. It is common practice to define global variables using SCREAMING_SNAKE_CASE syntax at the top of your script.

A good example of using a global variable as a constant would be the main directory (`MAIN_DIR`) of a project in a python script which should never change, and can thus be accessed in your functions, instead of having to pass it in as a formal parameter.

In [None]:
MAIN_DIR = "/my/main/dir"

def print_dir(sub_dir):
    print(f"{MAIN_DIR}/{sub_dir}")

print_dir("sub_dir")

If there are two or more instances of a variable name within a scope, the innermost binding takes presedence and _masks_ the outermost instances: 

In [None]:
main_dir = "/my/main/dir1"

def print_dir(sub_dir):
    """ The local main_dir masks the global one above"""
    main_dir = "/my/main/dir2"
    print(f"{main_dir}/{sub_dir}")

print_dir("sub_dir")

But since you should be using SCREAMING_SNAKE_CASE case for all of your global variables, this type of masking shouldn't really occur.