# Welcome!

Welcome to your self-guided tour through David Degnan/Lisa Bramer's R and Python Bootcamp for data science! We are data scientists at Pacific Northwest National Laboratory and bioinformatics/biostatistics professors at the University of Oregon. This lecture series has been adapted from several R/Python teaching workshops. If you experience any issues, please reach out to David at david.degnan@pnnl.gov

*More about the authors:* [David Degnan](https://www.pnnl.gov/people/david-degnan), [Lisa Bramer](https://www.pnnl.gov/people/lisa-bramer)

## How does this work?

You are currently in a jupyter notebook. These notebooks are divided into two sections. A text section (which you are currently reading) and a code section (examples below). Code sections allow you to run code by hitting the play button. You may write your own code in these sections too. Note that you are on your own notebook server at this moment, which means no one else can access/see what you are doing. It also means you are free to play with any code you want to, and you will not mess up other folks' notebooks. It does mean that any notes you type in here will not be saved, so write them elsewhere. This notebook is in R and contains the R introductory tutorials.

## Table of Contents

[R Lecture Series](https://colab.research.google.com/drive/1KxavcX4LKEUp2krw8rbxaLSiO6kEDvEP?usp=sharing) - Click the table of contents on the left to skip to lectures

[Python Lecture Series - This Notebook](https://colab.research.google.com/drive/1h7mr_7IAkueWS2tT1D9XEoKFTvOzpq7v?usp=sharing)



---



# Python Lecture 1: Variables, Expressions, and Statements

See: https://greenteapress.com/thinkpython2/html/index.html for the full book

## Section 1.1 Python as a Calculator

Just like R, python can be thought of as a calculator. The exponential symbol is different, but for the most part, the math symbols are the same:


| **Operation**         | **R Symbol**    | **Python Symbol** |
|-----------------------|-----------------|--------------------|
| Addition              | `+`            | `+`               |
| Subtraction           | `-`            | `-`               |
| Multiplication        | `*`            | `*`               |
| Division              | `/`            | `/`               |
| Exponentiation        | `^`            | `**`              |

Note that python also follows the rules of PEMDAS, so be careful! Note that comments are written with the pound sign (#) just like in R.

In [1]:
# Take 3 to the exponenet of the product of 2 and 4
3 ** (2*4)

6561

In [24]:
# Calculate the quotient
105 // 60

1

In [25]:
# Calculate the remainder
105 % 60

45

## Section 1.2 Assigning Variables

Variables cannot start with a number, include a symbol, or can not simply be from this list of words used for base functions in python:

> False, class, in, finally, is, return, None, continue, for, lambda, try, True, def, from, nonlocal, while, and , del ,global, not, with, as, elif, if, or, yield, assert, else, import, pass, break, except, in, raise

These are some examples of unacceptable variable names:

76trombones = 'big parade'
more@ = 1000000
class = 'Advanced Theoretical Zymurgy'

And some examples of good ones:

In [2]:
message = 'And now for something completely different'
n = 17
pi = 3.1415926535897932

In [3]:
# To see your variable, call it or print it
message

'And now for something completely different'

In [4]:
print(n)

17


In [5]:
# Note that if you simply call it, python will only show the most recent call
n
pi

3.141592653589793

In [6]:
# To avoid this, print both
print(n)
print(pi)

17
3.141592653589793


In [7]:
# You can use a saved variable in an equation
miles = 26.2
miles * 1.61

42.182

## Section 1.3 Python Types

| **Type**      | **Description**                                           | **Example**          |
|---------------|-----------------------------------------------------------|----------------------|
| **int**       | Whole numbers                             | `x = 42`            |
| **float**     | Decimals                              | `x = 3.14`          |
| **str**       | String of text or characters               | `x = "hello"`        |
| **bool**      |  `True` or `False`.          | `x = True`          |
| **NoneType**  | Represents the absence of a value or a "null" value.     | `x = None`          |

In [8]:
type(42)

int

In [9]:
type(42.0)

float

In [10]:
# Make an integer with int. Note: it floors values
int(3.999999)

3

In [11]:
# If it can convert an object type, it will
float('3.14159')

3.14159

In [12]:
str(32)

'32'

In [23]:
# None is its own type
type(None)

NoneType

# Python Lecture 2: Functions & Advanced Math

Oftentimes, we need to do more than just the basic math operations in a coding language.

## Section 2.1 Loading Other Code Packages

Let's say we want to use a python function that is not part of base python. To do so, we have to load a coding package, which we can do with `import`

In [13]:
import math

To install packages, we would use `pip` or `conda`. Once you finish the tutorials here, see [Getting Started with pip](https://pip.pypa.io/en/stable/getting-started/) to set up python locally and install your own packages

## Section 2.2 Advanced Math

In [14]:
radians = 0.7

# Math is the name of the package, and sin is the name of the function
print(math.sin(radians))

0.644217687237691


In [15]:
degrees = 45
radians = degrees / 180.0 * math.pi
math.sin(radians)

0.7071067811865475

In [16]:
# Note that log is base log, and exp is the same as e ** x (e^x)
math.exp(math.log(22+1))

23.0

## Section 2.3 Defining Functions

A crux of python is defining your own functions.

In [17]:
# Define a function with no inputs
def print_lyrics():
    print("I'm a lumberjack, and I'm okay.")
    print("I sleep all night and I work all day.")

print_lyrics()

I'm a lumberjack, and I'm okay.
I sleep all night and I work all day.


In [18]:
# Define a function that calls an existing function
def repeat_lyrics():
    print_lyrics()
    print_lyrics()

repeat_lyrics()

I'm a lumberjack, and I'm okay.
I sleep all night and I work all day.
I'm a lumberjack, and I'm okay.
I sleep all night and I work all day.


In [19]:
# Define a function that takes an input
def make_song(name):
    print("Hello " + name + "! It's nice to meet you.")
    repeat_lyrics()

make_song("John Doe")

Hello John Doe! It's nice to meet you.
I'm a lumberjack, and I'm okay.
I sleep all night and I work all day.
I'm a lumberjack, and I'm okay.
I sleep all night and I work all day.


In [20]:
# In functions, it's useful to provide a typing hint for the object.
# Here, name is a string, so str can be used:
def say_hello(name: str):
    print("Hello " + name + "! It's nice to meet you.")

say_hello("Tom")

Hello Tom! It's nice to meet you.


In [22]:
# A function with two parameters
def cat_twice(part1, part2):
    cat = "My cat breeds are " + part1 + " and "  + part2 + ". I don't know what I said that."
    return cat

cat_twice("Calico", "Tortoise shell")

"My cat breeds are Calico and Tortoise shell. I don't know what I said that."

# Python Lecture 3: Conditionals and Recursion

## Section 3.1 Boolean Expressions

Just like in R, python has boolean expressions, which are the same. Unlike in R, these functions are not vectorized and will not inherent work on a series of values (in python, called an array).

| **Operation**         | **R Symbol**    | **Python Symbol** |
|-----------------------|-----------------|--------------------|
| Greater than          | `>`            | `>`               |
| Less than             | `<`            | `<`               |
| Greater than or equal | `>=`           | `>=`              |
| Less than or equal    | `<=`           | `<=`              |
| Equal to              | `==`           | `==`              |
| Not equal to          | `!=`           | `!=`              |

In [26]:
# Does 5 equal 5?
5 == 5

True

In [27]:
# Does 5 equal 6?
5 == 6

False

In [28]:
x = 3
y = 5

print(x != y)
print(x > y)
print(x < y)
print(x >= y)
print(x <= y)

True
False
True
False
True


`and` returns True when both conditions are True. `or` returns True when one condition is True. Otherwise, False is returned.

In [29]:
x != y and x > y

False

In [30]:
x != y or x > y

True

In [31]:
# Python may also do things you don't expect!
42 and True

True

## Section 3.2 If, elif, else

Conditionals can be used in `if`, `elif`, `else` statements.

In [34]:
first = "cats"
second = "dogs"

# Only the first True statement will be evaluated
if first == "cats":
    print("cats were first")
elif second == "dogs":
    print("dogs were second")
else:
    print("cats were not first and dogs were not second")

cats were first


In [35]:
# As opposed to this schema ordering

if first == "cats":
    print("cats were first")

if second == "dogs":
    print("dogs were second")
else:
    print("cats were not first and dogs were not second")

cats were first
dogs were second


In [37]:
# If-else statements can also be nested

if first == "cats":
  if second == "dogs":
    print("cats were first, dogs were second")
  else:
    print("cats were first")
else:
  print("cats were not first")

cats were first, dogs were second


In [38]:
# Another nesting example
print("x is", x)
print("y is", y)

if x == y:
    print("x and y are equal")
else:
    if x < y:
        print("x is less than y")
    else:
        print("x is greater than y")

x is 3
y is 5
x is less than y


## Section 3.3 Recursion

In python, functions can call themselves within their function definition.

In [39]:
def countdown(n: int):
    if n <= 0:
        print('Blastoff!')
    else:
        print(n)
        countdown(n-1)

countdown(5)

5
4
3
2
1
Blastoff!


In [40]:
countdown(3)

3
2
1
Blastoff!


# Python Lecture 4: Iteration

As opposed to R, base python has no built-in vectorization. This means iterations must be written. Thus, `for` loops and `while` loops are commonly used.

## Section 4.1 While loops

`while` loops continue to evaluate until they're no longer True. This means that if you set a loop to simply `while True`, it will never stop running. You should be careful with `while` loops, and thus, we recommend `for` loops for those beginning in python.

In [41]:
# In python, we can use triple quotations ''' ''' to write documentation for functions
def countdown(n: int):
    '''
    Counts down from n to 1 and then says "blastoff"

    Parameters
    ----------
    n
        The number to start counting down from

    Returns
    -------
    a series of print statements from n to 1 with "blastoff" at the end
    '''
    while n > 0:
        print(n)
        n = n - 1
    print('Blastoff!')

countdown(5)

5
4
3
2
1
Blastoff!


In [42]:
# Here is a more complicated function with a while loop
def sequence(n: int):
    while n != 1:
        print("input:", n)
        if n % 2 == 0:        # n is even
            n = int(n / 2)
        else:                 # n is odd
            n = n*3 + 1
        print("print:", n)

sequence(5)

input: 5
print: 16
input: 16
print: 8
input: 8
print: 4
input: 4
print: 2
input: 2
print: 1


## Section 4.2 Break

Sometimes, code needs to `break` out of a for a while loop. This function can be used to stop a loop from happening.

In [43]:
# Initialize x
x = 10

# Iterate with a while loop
while x > 0:

  # Printe the value
  print("x is now", x)

  # Add one with this shorthand that means the same as x = x+1
  x += 1

  # This loop cannot break on its own since it is counting up, so let's use break
  if (x == 15):
    break

x is now 10
x is now 11
x is now 12
x is now 13
x is now 14


## Section 4.3: For loops

In [47]:
# Python starts counting from 0, so range(3) will iterate over 0, 1, and 2
for i in range(3):
    print(i+1)

1
2
3


In [48]:
# For loops can also be nested
for i in range(3):
    for j in range(5):
        print("i is " + str(i) + " and j is " + str(j))

i is 0 and j is 0
i is 0 and j is 1
i is 0 and j is 2
i is 0 and j is 3
i is 0 and j is 4
i is 1 and j is 0
i is 1 and j is 1
i is 1 and j is 2
i is 1 and j is 3
i is 1 and j is 4
i is 2 and j is 0
i is 2 and j is 1
i is 2 and j is 2
i is 2 and j is 3
i is 2 and j is 4


# Python Lecture 5: Strings

In R, to subset strings, you used the `substr()` function. In python, strings can be simply subsetted. Strings are also objects with built in default functions in python.

## Section 5.1 Strings as Sequences

In [50]:
# Extract position 2 (3rd in order)
fruit = 'banana'
fruit[2]

'n'

In [49]:
# Extract from position 0 to position 1, but don't include position 1
fruit[:1]

'b'

In [51]:
# Strings can be iterated through with a for loop
for i in range(len(fruit)):
    print(fruit[i])

b
a
n
a
n
a


In [52]:
# They are also passed as the index values
for i in fruit:
    print(i)

b
a
n
a
n
a


In [53]:
# From position 1 to position 3, not including 3
fruit[1:3]

'an'

In [54]:
# From position 3 forward
fruit[3:]

'ana'

In [55]:
# To position 3 but not including position 3
fruit[:3]

'ban'

In [64]:
# We can also remove strings and get creative with the negative symbol
fruit[:-2]

'bana'

## Section 5.2 Updating String Values

Strings are immutable, meaning a chunk of code like this won't work:

> greeting = 'Hello, world!'

> greeting[0] = 'J'

Instead, we would use the `.replace()` method. Methods are functions specific to an object, in this case, strings.

In [57]:
greeting = 'Hello, world!'
greeting.replace("H", "J")

'Jello, world!'

## Section 5.3 String Methods and In

In [58]:
# Use .upper() to uppercase a word
word = 'banana'
new_word = word.upper()
new_word

'BANANA'

In [59]:
# Use .find() to return the first instance of a match
word.find('a')

1

In [60]:
# Use .split() to break a string into a list of strings using a delimiter
word.split('a')

['b', 'n', 'n', '']

In [61]:
# Use .join() to join a list together
" ".join(["Hello", "world!"])

'Hello world!'

In [62]:
# Use `in` to see if one string is in another
'a' in 'banana'

True

In [63]:
'ana' in 'banana'

True

# Python Lecture 6: Lists, Dictionaries, Numpy

f you are more familiar with python or R, this list of comparisons between objects which allow multiple values may be helpful. If not, no worries. We will go through each of these object types.


| **Python**                  | **R**                                |
|-----------------------------|--------------------------------------|
| **List**                    | **Vector**                          |
| **Dictionary**              | **List**                            |
| **DataFrame** (`pandas`)    | **data.frame** (`tibble`, `data.frame`, `data.table`) |
| **2D Array/Matrix** (`numpy`) | **matrix** (mostly used for matrix math) | â€‹

## Section 6.1 Lists

In [79]:
# lists can be declared with square brackets
print(["This", "is", "a", "list"])

['This', 'is', 'a', 'list']


In [66]:
# Unlike vectors in R, lists can be different types.
["cats", 2, True]

['cats', 2, True]

In [67]:
# Index a list the same way you would index a string
my_list = ["Indiana Jones", "Jurassic Park", "Star Wars"]
print(my_list[2])

Star Wars


In [68]:
# Unlike strings, lists are mutable
my_list[2] = "Star Trek"
print(my_list)

['Indiana Jones', 'Jurassic Park', 'Star Trek']


Add to a list with `append()`. If you want to add a list to a list and not nest a list within a list, use `extend()`

In [69]:
# Note that this updates my_list without needing to overwrite the variable
my_list.append("Supernatural")
print(my_list)

['Indiana Jones', 'Jurassic Park', 'Star Trek', 'Supernatural']


In [70]:
my_list.extend(["My Little Pony", "Last of Us", "Dune"])
my_list

['Indiana Jones',
 'Jurassic Park',
 'Star Trek',
 'Supernatural',
 'My Little Pony',
 'Last of Us',
 'Dune']

In [71]:
# If you use append on a list with a list, it becomes a list in a list
nested_list = [1,2,3]
nested_list.append([4,5,6])

print(nested_list)

[1, 2, 3, [4, 5, 6]]


In [74]:
# To query a list in a list, use a single bracket for the first list and a
# single bracket for the second
nested_list[3][1]

5

In [75]:
# Lists are often used in conjunction with loops

second_list = []

for val in my_list:
    second_list.append(val)

print(second_list)

['Indiana Jones', 'Jurassic Park', 'Star Trek', 'Supernatural', 'My Little Pony', 'Last of Us', 'Dune']


In [76]:
# In R, two vectors can simply be added together. That is not the case in python,
# where we must iteratively add tow items together.
list1 = [1,2,3,4]
list2 = [2,3,4,5]
list3 = []

# Number of times
for x in range(len(list1)):
    list3.append(list1[x] + list2[x])

print(list3)

[3, 5, 7, 9]


A quick note on sets. In python, a `set()` is a specific type of list which does not allow duplicate entires and automatically sorts entries.

In [85]:
# A quick note on sets
set(["dogs", "dogs", "cats"])

{'cats', 'dogs'}

## Section 6.2 Dictionaries

You can think of a python dictionary as a named list. The name is called the "key" and the item within the "key" is called the "value".

In [80]:
# Define a dictionary with {}. The format is key: value
eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'}

# The dictionary can be queried by key (name) now
eng2sp["two"]

'dos'

In [81]:
# Get all the keys with the keys() method
eng2sp.keys()

dict_keys(['one', 'two', 'three'])

In [82]:
# Use a for loop to get all the values
for key in eng2sp.keys():
    print(eng2sp[key])

uno
dos
tres


In [86]:
# The `in` function only works for keys
"two" in eng2sp

True

In [87]:
# And not values!
"dos" in eng2sp

False

## Section 6.3 Numpy

In [89]:
import numpy as np

# Numpy is for fast computations of numbers
my_arr = np.array([3,4,5])
my_arr

array([3, 4, 5])

In [92]:
# It comes with several functions
np.mean(my_arr)
np.std(my_arr)

np.float64(0.816496580927726)

In [93]:
# And the ability to make matrices
my_matrix = np.array([[1, 2, 3], [4, 5, 6]])
my_matrix

array([[1, 2, 3],
       [4, 5, 6]])

In [94]:
# Get the dimensions
my_matrix.shape

(2, 3)

In [95]:
# Get the size
my_matrix.size

6

# Python Lecture 7: Object-Oriented Programming

You can define your own classes with your own methods and functions. This is called "object-oriented program" OOP.

In [105]:
# Define a class that other classes will inherit from. init defines how to make
# this class
class Pet:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def show(self):
        print("My name is", self.name, "and I am", self.age, "years old.")
    def speak(self):
        print("I don't have a sound.")

my_old_dog = Pet(name = "Jesse", age = 13)

# Calling a class doesn't show much by itself
my_old_dog

<__main__.Pet at 0x7cf08b5e0e90>

In [106]:
# See the functions/attributes available to a class with dir
dir(my_old_dog)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'age',
 'name',
 'show',
 'speak']

In [107]:
# Access an attribute
my_old_dog.age

13

In [108]:
# Access a method
my_old_dog.show()

My name is Jesse and I am 13 years old.


In [109]:
# Let's define subclasses that inherit from the main class

class Dog(Pet):
    def speak(self):
        print("Woof! Woof!")


class Cat(Pet):
    def speak(self):
        print("Meow!")
    def scratch(self, times):
        print("The cat scratches you " + str(times) + " times!")

my_other_dog = Dog("Toby", 5)
my_other_dog.speak()

Woof! Woof!


In [111]:
my_cat = Cat("Whiskers", 12)
my_cat.show()
my_cat.speak()
my_cat.scratch(6)

My name is Whiskers and I am 12 years old.
Meow!
The cat scratches you 6 times!


# Python Lecture 8: Embedded for loops

Useful for making a list in one line, embedded for loops are a way to vectorize.

In [115]:
my_list = [2,3,4,5,6]

# One way to write a loop
new_list = []

for el in my_list:
  if el % 2 == 0:
    new_list.append("cats")
  else:
    new_list.append("dogs")

# This is a lot of steps for a seemingly easy function
print(new_list)

['cats', 'dogs', 'cats', 'dogs', 'cats']


In [116]:
# Embedding puts all of this in one line!
["cats" if el % 2 == 0 else "dogs" for el in my_list]

['cats', 'dogs', 'cats', 'dogs', 'cats']