# Deliverables
Before turning in the tutorial, do the following to make sure your code works properly:
1. Uncomment the `assert` statements at the end of the exercises.
2. Restart the kernel (Kernel > Restart & Run All).
4. Make sure that you pass the `assert` statements at the end of each exercise. 

As usual, turn in the tutorial by GitHub!

# Introduction
Welcome to the Intro to Python tutorial! We'll be going over the nuts-and-bolts of the language, as well as some parts of the language especially useful for scientific programming. If you're already familiar with Python, you can skip the basics, and follow along in this jupyter notebook!

To run code in the Jupyter Notebook, you can click on the Run button (on the top header part), or type CTRL+Enter/CMD+Enter (for Macs/Windows, respectively). It will run the entire block of code -- there is no way to run a single line at a time.


# The Basics
Python is much more similar to a "normal" programming language (e.g. C++, Java) than R, so if you're gone through the core CS courses you're probably going to be a lot more comfortable with Python. Here's a quick runthrough of the syntax used to implement "standard" programming procedures. Basically, Python tries to make things as easy as possible -- if something looks like it should work, it'll probably work.

## Unique Python Syntax: Indentation
In the other programming languages you've used before, such as R, you have defined code blocks using curly braces. Python is completely different, in that it uses **indentation** to demark a new code block. You'll see this in the looping, control flow, and function parts of the guide. This means that Python forces you to maintain nicely readable code.



## Defining Variables
There's nothing particularly interesting for defining a variable; just use the equals sign. Like R, you can't declare a variable as a specific type as you do in C++.

In [1]:
x = 3
sentence = "hello world!"

x

3

## Accessing and Modifying Variables
In Python, we have an abbreviated way to modify variables. For example, if we want to add `2` to a variable, we have two options:

In [2]:
x = 3
x = x + 2
print("x is", x)

y = 3
y += 2
print("y is", y)

x is 5
y is 5


We have equivalent ways to do subtraction (`-=`), multiplication (`*=`), division (`/=`), etc. There's little difference between writing something like `x = x/y` and `x /= y` except for the fact that it saves some typing, which becomes much more helpful when we get long variable names.

**Exercise.** Below each example, write the short version for the following assignments (which are in comments). The first one is done for you.

In [3]:
x = 4
y = 2

# 1. y = y / x (example is filled in below)
y /= x

# 2. y = y * 3
y *= 3

# 3. x = x - y
x -= y

print(x,y)

assert((x, y) == (2.5, 1.5))

2.5 1.5



## Array Data Structures

In Python, you will have worry less about vectorized operations (though they're still much faster -- more on that in the scientific computation section). However, there's four main types of array data structures (collections) in base Python: lists, tuples, sets, and dictionaries.
* `lists` are your standard array data structure in Python (being ordered and changeable). **These are the ones that you'll use the most**. You declare them using square brackets (`my_list = [1, 2, 3]`).
* `tuples` are like `lists`, but they are **unchangeable**. They're helpful if you're optimizing code to maximize runtime, though you can in most cases just use `lists`. You declare them using parentheses. (`my_tuple = (1, 2, 3)`).
* `sets` are **unordered**, **unchangeable** collections. Also, you can't have duplicates in them. Again, you can probably ignore the existence of sets and just use lists most of the time. You declare them using curly brackets (`my_set = {1, 2, 3}`).
* `dictionaries` are an associative data structure, having a `key` and `value` tuple (if you've taken 104, they're `maps`/hash tables). Essentially, they're a `set` (the key), where each element in the set is paired with some value. As such, they're also defined using curly brackets (`my_dict = {1:"one", 2:"two", 3:"three"}`). You're most likely to use these to associate some input with an output, and as such are very useful.

### A Foreword: Objects
We've managed to avoid talking about objects in the context of R, but Python makes us understand what objects are (because **literally everything** in Python is an object, including packages and functions). Very abstractly, an `object` is a specially-defined data type, and it has the following two attributes (i.e. it stores the following information):
* Data attributes: these store variables.
* Methods: these are functions.

To access data attributes, use `object_name.attribute` (note the lack of parentheses). To call a function from an object, use `object_name.function()` (note that these have parentheses). We'll see this soon when working with lists, which are, in fact, examples of objects.


### Accessing Values
To access the element at a specific index in a list, use square brackets (`first_list[1]` is `1`, for example). *Note Python is 0-indexed, like most other languages and unlike R.*  This means that `first_list[1]` is the *second* element in the lsit).

To get a range of values, use the colon (`:`). Note that the range is inclusive of the first index, but not of the second (as is standard in Python).

In [4]:
first_list = [0, 1, 2, 3, 4, 5, 6]
first_list[1:3] # note that 3 is NOT included

[1, 2]

If you don't specify an index, Python will just go all the way to the beginning/end, depending on which index you omit (and if you omit both, it will give the entire array).

In [5]:
print(first_list[1:])
print(first_list[:3])
print(first_list[:])

[1, 2, 3, 4, 5, 6]
[0, 1, 2]
[0, 1, 2, 3, 4, 5, 6]


You can also give Python negative indices, which indicates the $i^{th}$-last index to retrieve. Contrast with R, where a negative number excluded that index.

In [6]:
print(first_list[-2]) #  second last
print(first_list[:-1]) #  go to second-last index
print(first_list[-2:]) # start with second-last to the end

5
[0, 1, 2, 3, 4, 5]
[5, 6]


**Exercise**. Predict the output of the following print statements:

In [7]:
first_list = [0, 1, 2, 3, 4, 5, 6]
print(first_list[3:5])
# prediction [3, 4]
print(first_list[2])
# prediction [2]
print(first_list[0])
# prediction [0]

[3, 4]
2
0


**Exercise**. Get the third, fourth, and fifth elements of `first_list`, and store it into `short_list`. Fill in the ellipses!

In [8]:
first_list = [0, 1, 2, 3, 4, 5, 6]
short_list = first_list[2:5]
# print (short_list)
assert(short_list == [2,3,4])

## Useful List Functions
To get the length of a list, use the `len()` function. For example, `len(first_list)` would be 7.

We can also modify a list the same way as we access it. The following code adds 3 to the first element, and subtracts three for the last element.

In [9]:
first_list = [0, 1, 2, 3, 4, 5, 6]
first_list[0] += 3
first_list[-1] -= 3

print(first_list)

[3, 1, 2, 3, 4, 5, 3]


**Exercise**: double the third value in `first_list`, then print the length of `first_list`. 

In [10]:
first_list = [0, 1, 2, 3, 4, 5, 6]
# double the third value in first_list
first_list[2] *= 2 

# print the length of first_list
print(len(first_list))
print(first_list)

assert(first_list == [0, 1, 4, 3, 4, 5, 6])

7
[0, 1, 4, 3, 4, 5, 6]


The other function you're likely use on a list is `append()`, which adds an element to the end of the list. This is an example of calling a functions *from* an object, which looks like `first_list.append(7)` to add `7` to the end of the list. This is how most functions in Python will be used; `len()` is somewhat of an exception.

In [11]:
first_list.append(7) # notice the object.function() syntax
first_list

[0, 1, 4, 3, 4, 5, 6, 7]

**Exercise:** Starting with an empty list, `list_ex`, use `append()` to add 9, 8, 7 to the back, in that order. Then print `list_ex`.

In [12]:
list_ex = []
list_ex.append(9)
list_ex.append(8)
list_ex.append(7)
print(list_ex)

assert(list_ex == [9,8,7])

[9, 8, 7]


## Control Flow
Much like R, you have `if` statements, but there are three main differences:
1. Rather than using curly brackets, you have colons and indentation.
2. You don't put the `if` statement in parentheses.
3. Instead of `else if`, you have the abbreviated `elif`.

The following example should illustrate the differences:

In [13]:
x = -10

if x > 0:
    print('x is positive!')
elif x == 0:
    print('x is 0!')
else:
    print('x is negative!')

x is negative!


## Looping
We have `for` and `while` loops again, which look very similar to the ones you briefly encountered in R.

### For Loops
In Python, as in R, all `for` loops are "for-each" loops, meaning you have to go through a list. For example, the following chunk of code prints each element in `a_new_list` on a separate line. Note that like the `if` statements, you do not use parentheses around the `for` condition:

In [14]:
a_new_list = [1, 'fish', 2, 'fish']
for x in a_new_list:
    print(x)

1
fish
2
fish


To repeat a task some number of times, use the `range()` function:

In [15]:
# this loop will print 10 times
for i in range(10):
    print("looping")

looping
looping
looping
looping
looping
looping
looping
looping
looping
looping


In fact, the `range(n)` function generates numbers from 0 to $n-1$ (so `range(10)` is actually 0, 1,..., 9). We can show this below:

In [16]:
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


A common use of `range()` is to fill some list with numbers. For example:

In [17]:
a_list = []
for i in range(10):
    a_list.append(i)
print(a_list)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


**Exercise**: Fill in the ellipses to calculate the mean of the elements in `nums`.

In [18]:
nums = [1,2,3,4,5,6]
total = 0

# you have two options here; either is ok
for i in nums:
    total += i

mean_value = total / len(nums) # DO NOT fill in 6 (use a function instead)
# print(mean_value)
# assert(total == 3.5) I think this is a typo so I re-wrote the assert
assert(mean_value == 3.5)

**Exercise**: Add every element from `a_new_list` onto the end of `num_list` using a for loop. Do *not* use the `range()` function. Refer to the previous examples if necessary!

In [19]:
a_new_list = [1, 'fish', 2, 'fish']
num_list = [0,1,2,3,4,5,6]

## Put your code here
for i in a_new_list:
    num_list.append(i)

# print(num_list)
assert(num_list == [0,1,2,3,4,5,6,1,'fish',2,'fish'])

**Exercise**: Same as before, add every element from `a_new_list` onto the end of `num_list` using a for loop. However, this time, *use* the `range()` function. Hint: for this to work, you'll have to get the length of `a_new_list`.

In [20]:
a_new_list = [1, 'fish', 2, 'fish']
num_list = [0,1,2,3,4,5,6]

## Put your code here
for i in range(len(a_new_list)):
    num_list.append(a_new_list[i])

# print(num_list)
assert(num_list == [0,1,2,3,4,5,6,1,'fish',2,'fish'])

**Exercise**: Given the following list of strings `string_list`, copy all strings that start with the letter "A" into `starts_A_list` using `append()`. Hint: you can get the first letter of a string just by treating it as an array of characters.

In [21]:
# example of string indexing
my_string = "Tree"
print(my_string[0])

T


In [22]:
string_list = ["Apple", "Banana", "Alligator", "Anteater", "Potato", "Water", "Aardvark"]
starts_A_list = []

for i in string_list:
    if i[0]=="A":
        starts_A_list.append(i)

# print(starts_A_list)
assert(starts_A_list == ["Apple", "Alligator", "Anteater", "Aardvark"])

### While Loops
While loops are identical to those in R, barring the Python syntax.

In [23]:
i = 1
while i < 256:
    i *= 2  # note: this is equivalent to writing i = i * 2
    print(i)

2
4
8
16
32
64
128
256


**Exercise**: Given a list of numbers `some_nums`, add the numbers to `total` until `total` becomes greater than 100, then print `total`.

In [24]:
some_nums = [20, -3, 54, 4, -10, 23, 3, 33, 23]
total = 0 # what should total start out as?
i = 0
while total <= 100:
    total += some_nums[i]
    i += 1
# print(total)
assert(total == 124)

## Writing Functions
Writing functions in Python uses the `def` keyword. Unlike R, you don't save the function to some variable; rather, it follows the `def func_name(params)` template. Note that like R, you don't have typed parameters or outputs. Calling functions is the same syntax as in R and in most other languages, illustrated below

In [25]:
# defines a function called sum_two_numbers
def sum_two_numbers(a, b):
    return a + b

sum_two_numbers(10, 23)

33

**Exercise.** Fill out the following code to write a function that doubles every element in a list. (You cannot simply write `nums * 2` in Python, at least not for now).

In [26]:
my_nums = [5,4,3,2,1]

def double_every_number(nums):
    results = nums
    for i in range(len(nums)):
        # do something here
        results[i] = nums[i]*2
    return results

# print(double_every_number(my_nums))
assert(double_every_number(my_nums) == [10, 8, 6, 4, 2])

**Exercise**. Write a function called `max(nums)`, which returns the largest element the `nums` array. (Hint: you'll have to use a for loop).

In [27]:
my_nums = [1,4,2,5,3]

# define and write your function here
def max(nums):
    m = 0
    for i in nums:
        if i>m:
            m = i
    return m

# print(max(my_nums))
assert(max(my_nums) == 5)

## Print Formatting
Formatting your print statements is a super nice way to make your print statements readable. For example, you might be accustomed to doing something like this to print:

In [28]:
def count_letters(word):
    word_len = len(word)
    print(word, "has", word_len, "letters in it!")
    
count_letters("bananas")

bananas has 7 letters in it!


This is a little difficult to read and is a little annoying to type, but luckily there's a better way to print. If you put `f` in front of the string (i.e. single/double quotes) and put variables in curly braces, it automatically substitutes that variable in the string! Here's a neater way to write the above function:

In [29]:
def count_letters_2(word):
    word_len = len(word)
    print(f"{word} has {word_len} letters in it!")
    
count_letters_2("bananas")

bananas has 7 letters in it!


**Exercise.** Write a function, `print_args(a, b)` that prints `a` and `b` using the string formatting trick. For example, `print_args("red", "blue")` will print `a is red, b is blue`.

In [30]:
# write your function here
def print_args(a, b):
    print(f"a is {a}, b is {b}")

# print_args("red", "blue")

## Extra Looping
Python has some special features that make looping easier. Here's a few ways that save the most time:

### enumerate
`enumerate` allows you to simultaneous loop through the index as well as the elements of a list, which is helpful if you want to track along another array. Just remember that the order is `index`, `object` in the for loop. For example:

In [31]:
colors = ["red", "yellow", "brown", "orange", "purple"]
fruits = ["apple", "banana", "pear", "orange", "plum"]

# i is index, fruit is object
for i, fruit in enumerate(fruits):
    print(f"{i}: this fruit is a {colors[i]} {fruit}")

0: this fruit is a red apple
1: this fruit is a yellow banana
2: this fruit is a brown pear
3: this fruit is a orange orange
4: this fruit is a purple plum


### zip
`zip` allows you to loop through two lists of the same size simultaneously. It's very similar to using `enumerate` in its usage. For example:

In [32]:
for color, fruit in zip(colors, fruits):
    print(f"{fruit} is {color}-colored!")

apple is red-colored!
banana is yellow-colored!
pear is brown-colored!
orange is orange-colored!
plum is purple-colored!


**Exercise**. Write a function, `add_to_list(from_list, to_list)`, to do the following:
1. Check that `from_list` and `to_list` are the same length. Do nothing if they are not the same length.
2. Using `enumerate`, add every element in `from_list` to `to_list`. *Use both the index and the element in `from_list`*.
3. Return `to_list`. We need to do this because `to_list` won't actually be modified when the function ends.

For example, `add_to_list([0,1], [2,3])` should be `[2,4]`.

In [33]:
# Write the add_to_list function here
def add_to_list(from_list, to_list):
    if (len(from_list)==len(to_list)):
        for i, element in enumerate(from_list):
            to_list[i] += element
        return to_list
    else:
        return from_list

# print(add_to_list([0,1,2,3,4], [5,6,7,8,9]))
assert(add_to_list([0,1,2,3,4], [5,6,7,8,9]) == [5,7,9,11,13])

# Scientific Programming (NumPy)
Python on its own doesn't have too much in the way of scientific/statistical programming the way R does, but thankfully it has a million packages that implement these super-useful functionalities.

## Importing Packages
Importing packages uses the `import` keyword (vs. `library()` in R). Let's import the first package we're going to use, `numpy`. We'll use the `as` keyword to call it `np` to save typing, which is a standard abbreviation.

In [34]:
import numpy as np

As mentioned previously, you have to prefix everything from `numpy` with `np`. For example, `numpy` includes the constant `pi` and the sine function (these are `data attributes` and `functions` within the `numpy` object, respectively). Here's how you would find the sine of 90 degrees ($\pi/2$ radians), for example.

In [35]:
np.sin(np.pi/2)
# if we had simply typed import numpy, we would have to type a bit more to get the same result:
# numpy.sin(numpy.pi)

1.0

**The two main takeaways regarding packages are**:

1. You must `import` them before using them, and
2. You have to put the package name and a period (`np.`) in front of anything that the package contains (in this case, `sin()` and `pi`), since both of these come from the numpy package that we're calling `np`.

You also might see more complicated ways import packages. For example, a very common way to import the plotting package `matplotlib` is as follows:

In [36]:
import matplotlib.pyplot as plt

This imports the specific plotting functionality (`pyplot`) from the larger `matplotlib` package, since we don't want to import too many things at once. 

One last way to import a specific functionality from an object is to use the `from` keyword. This lets you use skip putting the package name in front. Again, you typically only want to do this if you want a few specific functions from a package. For example:

In [37]:
# note that we already imported numpy, so this example is just to show the syntax
# in reality you should just import all of numpy
from numpy import pi
from numpy import sin
# instead of typing np.sin(np.pi/2), we can just write:
sin(pi/2)

1.0

## Numpy Arrays
While `numpy` has a bunch of useful functions, one particular draw to `numpy` is the array structure that it implements, called the `ndarray`. It has the following differences compared to a normal list:

* Fixed size (no appending).
* Its contents must be the same data type.
* **Vectorization!**

First, let's look at an numpy array. You can declare one by calling `np.array` and passing it a Python list (which basically takes the list into an `ndarray`).

In [38]:
nd_nums = np.array([0, 1, 2])
nd_nums

array([0, 1, 2])

As previously alluded to, these arrays are vectorized, so we can treat them similarly as in R.

In [39]:
nd_nums = np.array([0, 1, 2])
print(nd_nums + 1)
print(nd_nums * 10)

# this doesn't work -- uncomment this and see what happens!
# [0, 1, 2] + 1
# [0, 1, 2] * 10

[1 2 3]
[ 0 10 20]


**Exercise**. We're returning to boolean indexing! Given a numpy array of numbers, fill in the dots to remove all negative numbers:

1. Determine if each element is either positive or negative (treat 0 as positive). Fill in `positive_mask` with this information.
2. Use the mask to remove the negative numbers in `nd_nums`.

In [40]:
nd_nums = np.array([5, -2, -1, 0, 6, 1])

positive_mask = nd_nums>=0
nd_nums_positive = nd_nums[positive_mask]

# print(nd_nums_positive)
assert(np.all(nd_nums_positive == [5, 0, 6, 1]))

**Exercise.** (Removing NA values from a 1D structure) Use `np.isnan()` to remove NA (`np.nan`) values from `nd_nums_with_NAs`. As in R, you would do the following steps:

1. Create a boolean mask using `np.isnan()` to detect which values are NA.
2. Flip the truth values of the mask so that NA values are false, and everything else is true. In Python, use the `~` symbol in place of the exclamation mark.
3. Apply the mask as the index to `nd_nums_with_NAs` and store the result.

See the previous example for the syntax!

In [41]:
nd_nums_with_NAs = np.array([5, -2, np.nan])
nd_nums_with_NAs = nd_nums_with_NAs[~np.isnan(nd_nums_with_NAs)]
# print (nd_nums_with_NAs)
assert(np.all(nd_nums_with_NAs == [5, -2]))

Another benefit is that you get extra math functions that you can apply on the arrays. For example, you can quickly find the mean and variance of the values (all of these have equivalent `np.` functions) without having to write those functions yourself.

In [42]:
nd_nums = np.array([5, -2, -1, 0, 6, 1])

print(nd_nums.mean())
print(nd_nums.var())

### NOTE: the following are equivalent to the above, and they can take in a regular list as well
### although calling these on a list is MUCH slower
# np.mean(nums)
# np.var(nums)

1.5
8.916666666666666


There's not too much else you need to know about `numpy` arrays, since most of your data will be in a **data frame**. The next tutorial will go in depth about these. When you're done, go back to the top of tutorial, restart the kernel and run all the code again, and make sure you don't get any errors!