# Python Tutorial


a short list of popular Python resources and guides here:
* [A Byte of Python](https://python.swaroopch.com/)
* [A collection of useful scripts, tutorials, and other Python\-related things](https://github.com/rasbt/python_reference#a-collection-of-useful-scripts-tutorials-and-other-python-related-things)
* [Automate the Boring Stuff with Python](http://automatetheboringstuff.com/)
* [Google's Python Class](https://developers.google.com/edu/python/)
* Python courses on [LinkedIn Learning](https://www.linkedin.com/learning/) (which Yale students have free access to)
* [The Hitchhiker’s Guide to Python! — The Hitchhiker's Guide to Python](https://docs.python-guide.org/)
* [The Python Tutorial](https://docs.python.org/3/tutorial/index.html) (from Python's own website, you can change versions at the top)
* [Real Python](https://realpython.com/)

## Colab


* Colab Notebooks: New notebooks will automatically save to a new top-level directory in your Google Drive called "Colab Notebooks"
  * Even if you move them elsewhere in your Drive, Colab will be able to find them
  * I suggest saving them in your Google Drive then mounting it to have easy access to them

  ```python
  from google.colab import drive
  drive.mount('/gdrive')
  ```
* You can also save them to a GitHub repository
* You don't have to be signed in to view a notebook, but you should login with a Google Account to connect to the runtime
* Your environment resets each time you run a notebook, so variables, unsaved files, installed packages, etc. will all be lost
* Shut down your notebook once you're done to free up resources
  
* Colabs has some significant differences with Jupyter, including but not limited to:
  * Built-in Table of Contents
  * Some useful code snippets
  * Code execution history tab
  * A SearchStackOverflow button that pops up for errors
  * Slightly different Markdown formatting
  * Access to the terminal in Colab Pro
  * Commenting and sharing
  * Revision history
  * Fun features under Settings -> Miscellaneous

In [None]:
# mounting your Google Drive
from google.colab import drive
drive.mount('/gdrive')

# now you can see it in the Files section of the left sidebar (after authorization)
# keep in mind this is actually your drive -> changes (e.g. deleted files) will propagate

In [None]:
# Colab comes pre-installed with many useful tools (numpy, scipy, pandas, pytorch, etc.)
!pip freeze # to see available libraries and their versions

In [None]:
# the ! is part of IPython magic -> it lets you access the command line
!python --version # the latest Python version released is 3.9

In [None]:
# if you need to install a package, run
%pip install package_name

# you could also do !apt-get install to use the Linux installer


## Fundamentals


### Commenting and Printing

Commenting (writing text around your code) is done with a '#' in front of what you want to comment out.

In [None]:
# comment statements look like this: these green lines will not affect your code
# commenting code is VERY IMPORTANT - it will let me, Samah, and most importantly yourself know what you're coding
# a good comment explains what a function does, what a loop accomplishes, and so forth
# all code for this class must be commented, so make it a habit

# printing statements looks like this
print('Hello World!')

# print multiple things with commas
print('hello', 'students') # adds spaces by default
print('hello' + 'students') # only works for strings

# you can even get fancy
print("We're having {} with Python!".format("fun"))

greeting = "Howdy"
print(f"{greeting}!")

### Pythonic Coding

In [None]:
import this

___________________________________________________________________________________________________________________________
## Data Types
Python is a special type of object-oriented programming language. Other such languages include Java, C++, Ruby, PHP and many more.

### Numbers and Strings

In [None]:
# objects are your computer's way of representing data
# objects come in many flavors; some basic ones include:
    # integer: int
    # decimal: float
    # character: char
    # character sequence: str
    # boolean: bool
# initiate objects by giving them a name and assigning them a value using an '=' 
# you do not need to explicitly tell Python what type of object you are creating, it infers it!

x = 4                               # numerics (int and float)
y = -5.55
z = 'word'                          # strings
my_bool = False                     # boolean

print(x)
print(z)

In [None]:
# perform basic arithmetic with numbers
    # add: +
    # subtract: -
    # multiply: *
    # divide: /
    # power: **
    # mod: %
    
print(x + y)
print(x ** y)
print(x % 3)

# doing arithmetic with different types of objects can be tricky
# as shown above, ints and floats are compatible; they are both numeric
# if you try to add an int to a str, you get a TypeError
print(x + z)

### Datetime

In [None]:
# there is also a datetime data type
# datetime objects are great because writing code to deal with dates is actually very difficult (so many special cases)
# for this we need to import python's datetime package
# we will learn more about packages later
import datetime

DOB = datetime.datetime(1995, 5, 17) # there are different date layouts
print(DOB)
print(type(DOB))

In [None]:
# you can access any part of the datetime object as well
print(DOB.year)
print(DOB.month)

In [None]:
# you can also add and subtract date times
present = datetime.datetime.now()
print(present-DOB)

In [None]:
# you can check the type of an object by using Python's type function
print(type(x))
print(type(z))

### Lists

In [None]:
# lists are objects that can hold other objects
# lists are ordered and start at an index of 0 (not true for all languages)

# create a list with an '=' again, and using square brackets '[]'
# a list can be empty, like our example below
empty_list = []
my_list = [11,12,13,14]
print(my_list)

In [None]:
# if your list is full of just numeric data types, you can easily ask some questions about it
# like find the maximum value
print('max:', max(my_list))

# or the minimum value
print('min:', min(my_list))

# or the sum of all elements in the list
print('sum:', sum(my_list))

In [None]:
# a list can contain any type of Python objects
# you can look at things in list by position, and change them
print(my_list[0])
my_list[0] = 'eleven'
print(my_list)

In [None]:
# add things to a list using .append(*thing*)
my_list.append(False)
print(my_list)

In [None]:
# create lists of lists
my_new_list = [my_list, my_list, my_list]
print(my_new_list)

In [None]:
# lists can be sliced, that is, you can access individual parts of the list, that then act as lists
list1 = [5, 4, 3, 2, 1]
sliced_list = list1[1:3]
print(sliced_list)

In [None]:
# lists can be combined
combined_list = list1 + sliced_list
print(combined_list)

In [None]:
# you can ask the length of a list with the len function: len()
print(len(combined_list))

#this will only count the contents of the list, not the contents of any lists within your list
print(len(my_new_list))

In [None]:
# a tuple is a list that cannot be changed
# tuples are valuable when you don't want to accidently change an important piece of data
# tuples are created using '()'
our_tuple = ('one fish','two fish','red fish','blue fish')
print(type(our_tuple))

# trying to change a tuple like we did with lists results in an TypeError
our_tuple[0] = 'no fish'

### Dictionaries

In [None]:
# dictionaries are handy objects with keys that correspond to values.
# dicts are UNORDERED
# because of behind-the-scenes hashing, looking up values in dicts is very quick
# create a dict using curly brackets '{}' with pairs of objects {key:value}
# dictionaries can hold any objects as values
# dictionaries can, like lists, be empty
empty_dictionary = {}
patient = {'first name': 'joe',
           'last_name': 'Smo',
           'age': 24,
           'diagnosis': 'diabetes',
           'perscription': ['metformin', 'insulin'],
           'DOB': DOB
          }

In [None]:
# access (and change) things in a dictionary by their key
print('unchanged:', patient['first name'])
patient['first name'] = 'Joe'
print('changed: {}'.format(patient['first name']))

In [None]:
# you can also access the all of the keys, values, and item pairs of a dictionary using some built-in capabilities
# here is how you access the keys -> it gives you a dict_keys object
# I recommend casting these output objects as lists, as the dict_keys object isn't very useful
print('1:', patient.keys())
patient_keys = list(patient.keys())
print('2:', patient_keys)

# you can also get the values using .values()
# they will be in the same order as the keys from .keys()
patient_values = list(patient.values())
print('3:', patient_values)

# finally, you can retrieve the (key, value) pairs as tuples using .items()
patient_pairs = list(patient.items())
print('4:', patient_pairs)

In [None]:
# you can ask the length of a dict with the len function: len()
print(len(patient))

### Changing Types

In [None]:
# casting objects to different types
    # int -> float: just adds a .0 on to it
    # float -> int: takes the floor of the value
    # anything (including lists, dicts, tuples etc.) -> string: whatever print(*thing*), would output, that's your new thing
    # string -> int/float, but not one that makes sense (its not an integer or decimal): gives you a TYPE ERROR
    # tuple -> list and vice versa: exactly how it sounds

x = 5
print(x, type(x))

x = float(x)
print(x, type(x))

x = str(x)
print(x, type(x))

# casting objects can sometimes give you a TypeError
# here is a str of numerals
a = '88'
# we can cast it as an int, because all the characters in it are numerals
print(int(a))
# here is a string with numerals and an alphabet character
b = '88z'
# casting this as an int gives us an TypeError message
print(int(b))

___________________________________________________________________________________________________________________________
## Basic Coding
These basic coding concepts, in some form or another, are found across all common programming languages.

### If/Then and Else Statements

In [None]:
# IF THEN statements allows the computer to ask a question and make a decision
# semantically, it sounds like "if this is true, then do this thing"
# logically, it uses conditionals like 
    # =='  equal to
    # >    less than
    # <    greater than
    # >=   less than or equal to
    # <=   greater than or equal to
    # !=   not equal to
    # and a few other things
# in Python, the THEN is implied after a ':' (and a newline and tab)

a = 5
b = 4
if a > b:
    print(a, 'is greater than', b)

In [None]:
# ELSE statement is added after an if statement, and will trigger when the condition is not met
c = 'dog'
d = 'cat'
if c == b:
    print(c, 'equals', d)
else:
    print(c, 'does not equal', d)

In [None]:
# any code can go within the action of the loop
# make a IF statement that checks if something is not equal to 10, and if it is not, sets it equal to 10

if a != 10:
    a = 10
    print(a)

In [None]:
# if statements work well with lists
our_list = ['hello','my','good','friends']
# we can use IN to ask if there is something in our list
if 'good' in our_list:
    print("we have 'good' in our list")

In [None]:
# we can add logical statements to make our conditions a little more flexible
# combine conditions with AND, this requires both conditions to be met
if ('hello' in our_list) and ('goodbye' in our_list):
    print('passed')
else:
    print("failed")

# combine conditions with OR, this requires only one condition to be met
if ('hello' in our_list) or ('goodbye' in our_list):
    print('passed')
else:
    print("failed")

In [None]:
# negate conditions with NOT, this requires the condition to be FALSE to pass
if not 'goodbye' in our_list:
    print('passed')
else:
    print("failed")

### While Loops

In [None]:
# a WHILE loop executes over and over again until the condition is met
# these can be dangerous, as they can make INFINITE loops
    # infinite loops happen when the variables will never change such that the condition is no longer true
    # and Python will not do anything about it, it will just keep running until you stop it manually

# example problem: your 6 year old really wants a dog, but you do not (for some terrible reason)
# you tell them they have to wait until they are 12 to get a dog
# we can represent this programatically.
childs_age = 8
while childs_age < 12:
    childs_age = childs_age + 1 # forgettting this line will cause an infinite loop
    print('Happy', str(childs_age) + 'th birthday!')
print('You made it, here is your dog')

# what would happen if your child is Benjamin Button, that is, they age backwards?

### For Loops

In [None]:
# to avoid infinite WHILE loops, FOR loops were invented
# FOR loops iterate over everything in a list

our_list = ['this','is','a','sentence','.']
for word in our_list:
    print(word)

In [None]:
# you can also loop over dictionary items using .items()
patient = {'first name': 'joe',
           'last_name': 'Smo',
           'age': 24,
           'diagnosis': 'diabetes',
           'perscription': ['metformin','insulin'],
           'DOB': DOB
          }
for pair in patient.items():
    print(pair)

#### Range

In [None]:
# RANGE is a nice way of getting a range of numbers
print(range(5))
for i in range(5):
    print(i)

# RANGE does not have to start at zero
print(range(5, 10))
for i in range(5, 10):
    print(i)
    
# RANGE can skip numbers as well
print(range(0, 50, 10))
for i in range(0, 50, 10):
    print(i)

In [None]:
# you can do anything in a FOR loop
# convey you are super stoked about frogs, 
# write a FOR loop that adds 10 '!'s to the end of the word 'frogs', one at a time
word = 'frogs'
for i in range(10):
    word += '!' # += is an alternative to writing word = word + ...
    print(word)

In [None]:
# you can nest loops of any kind as well
# for this, we again will think of a nice toy example
# you have a rigorous indoor plant water schedule to keep your plants healthy and happy
    # you water the schefflera every 3rd day but only if it is also sunny
    # you water the fig leaf 3 times every 4nd day
# have the computer tell you each day for two weeks when to water a plant
forecast = ['sunny','sunny','sunny','cloudy','cloudy','sunny','sunny',
            'cloudy','sunny','sunny','cloudy','cloudy','sunny','sunny']

for day in range(14):
    if day % 3 == 0:
        if forecast[day] == 'sunny':
            print(f"Day {day}: water schefflera")
    if day % 4 == 0:
        for i in range(3):
             print(f"Day {day}: water fig leaf")

#### Enumerate

In [None]:
# ENUMERATE allows you to keep track of FOR loops
# ENUMERATE gives you both the items you are looping over, and their index
animals = ['dog','cat','fish','gibbon','ox','donkey','ibis']
for i, animal in enumerate(animals):
    print(i, animal)

### Defining Functions

In [None]:
# functions are nice containers for code
# functions usually have an input and RETURN an output, though they do not have to have either
# Python has preset functions like print(): print does not return anything, but does have an input
# new functions are created in python using DEF, followed by the function name and '(*input*):'
# functions are later called using the defined function name followed by function's input in '()'

def hello_world():
    print('hello world!')

hello_world()

In [None]:
# you can fill a function with anything
# write a function that calculates both solutions to the quadratic formula
# this is the quadratic formula: (-b +/- (b**2 - 4*a*c)**.5)/2*a

def quadratic_function(a,b,c):
    first_solution = (-b + (b**2 - 4*a*c)**.5)/2*a
    second_solution = (-b - (b**2 - 4*a*c)**.5)/2*a
    
    return first_solution, second_solution

# check the solutions for [2,-11,5] and [1,-3,4]
s1, s2 = quadratic_function(2,-11,5) # real solutions
print(s1, s2)

s1, s2 = quadratic_function(1,-3,4)  # complex solutions
print(s1, s2)

___________________________________________________________________________________________________________________________
## Intermediate Coding
Here are a few things that you can get through a Python course without knowing, but I wouldn't recommend it

### String Manipulations

In [None]:
# strings can be added and subtracted
string_1 = 'hello'
string_2 = 'there'
print(string_1 + string_2)

In [None]:
# we can get the length of strings easily
print(len(string_1))

In [None]:
# unwanted characters can be easily removed from the ends (whitespace is removed if no input in function)
string_3 = 'friend*'
print(string_3.strip('*'))

In [None]:
# strings can be split up by a given character, and tossed into a list
string_4 = 'hello there friend'
print(string_4.split(' '))

### List Comprehensions

In [None]:
# list comprehensions is a great way of making your code more clean, crisp, and PYTHONIC
# it essentially combines lists and for loops together

# there are other ways of making FOR loop code pythonic, like itertools
# feel free to follow the zen of python anyway you want, here is just one example

# EXAMPLE 1: say you have a list of numbers, but want a list with those numbers each with 5 added to it
our_list = [2, 3, 8, 22, 100]

# we could make a FOR loop to do this (old method)
new_list = []
for x in our_list:
    new_list.append(x + 5)
print(new_list)

# or we could use a list comprehension to turn 3 lines of code into 1 
new_list = [x + 5 for x in our_list]
print(new_list)

In [None]:
# EXAMPLE 2: they can be used to make sequences of numbers easily, like the first 10 square numbers
squares = [x**2 for x in range(10)]
print(squares)

In [None]:
# Example 3 (a bit of a story)

# list comprehensions can be embedded as well, just as you can embed for loops
# to show this, let's phrase this as a problem
# say you have some important words from a few electronic health records
# we have a list of 3 EHRs, each as a list of words
EHRs = [['fell','tylenol','disease','height','april','emergency','hydrocodone','bruising'],
        ['lung','diabetes','drug','smoking','warfarin'],
        ['tylenol','disease','height','april','acetaminophen']
       ]

# from these 3 EHRs, you wish to pull out all of the instances of words that exist within a list of drugs
drugs = ['metformin','asprin','ibuprofin','hydrocodone','acetaminophen','warfarin','tylenol']

# but, you actually want the drug codes for each drug, not the name, and this is found in a drug codes dict
drug_codes = {
    'metformin':'MET',
    'asprin':'ASP',
    'ibuprofin':'IBU',
    'warfarin':'WAR',
    'acetaminophen':'ACE',
    'hydrocodone':'HYD',
    'tylenol':'TYL'
}

# you do not care which EHR the drug came from, just the counts

# to do this, you first need to collapse the EHRs into one list of words
# you can do this with a FOR loop like so
EHR_words_1 = []
for EHR in EHRs:
    for word in EHR:
        EHR_words_1.append(word)
print('len EHR_words_1:',len(EHR_words_1))

# you an also do this with a list comprehension
EHR_words_2 = [word for EHR in EHRs for word in EHR]
print('len EHR_words_2:',len(EHR_words_2))

# lets see if they are identical
print('collapsed lists are the same?', EHR_words_1 == EHR_words_2)
print(EHR_words_1)

# this may seem like a jumbled mess, but if you break it down it begins to makes sense
# it is precisely the nested FOR loops we made earlier, even in order
# we are trying to get the 'word' within the inner most loop
# so we do 'word' for...
# then we spell out the 2 loops provides us that word
    # first loop: for EHR in EHRs
    # second loop: for word in EHR
# we put it all together and get [word for EHR in EHRs for word in EHR]

In [None]:
# now that we have collapsed our EHRs into one list, we need to, for each word, assess if it is a drug
# we can do this with a FOR loop like so
drug_words_1 = []
for word in EHR_words_2:
    if word in drugs:
        drug_words_1.append(word)
print('len drug_words_1:',len(drug_words_1))

# we can also be slick, cool, pythoners and do this with list comprehension
# here we find that the IF statement found its way into our list
# ...
drug_words_2 = [word for word in EHR_words_2 if word in drugs]
print('len drug_words_2:',len(drug_words_2))

# let's look if these are the same
print('drug words lists are the same?', drug_words_1 == drug_words_2)

In [None]:
# now before we do the same thing as before and compare a FOR loop to a list comprehension,
# let's be clever
# we can get the codes we want within the same list comprehension above
drug_codes_2 = [drug_codes[word] for word in EHR_words_2 if word in drugs]

# let's compare what we did here to the previous iteration
print(drug_words_2)
print(drug_codes_2)

In [None]:
# so we were able to turn a bunch of FOR loops into 2 list comprehensions like so
EHR_words_2 = [word for EHR in EHRs for word in EHR]
drug_words_2 = [word for word in EHR_words_2 if word in drugs]

# we can be proud of ourselves for living the zen of python life

# ... but wait, the python guru looks down upon us and says "ugh, 2 lines? I'd do that in one"
drug_words_3 = [word for word in [word for EHR in EHRs for word in EHR] if word in drugs]
drug_words_4 = [word for EHR in EHRs for word in EHR if word in drugs]
print('drug_words_2:', drug_words_2)
print('drug_words_3:', drug_words_3)
print('drug_words_4:', drug_words_4)

# and that is when list comprehensions went too far

### Reading and Writing Files

In [None]:
# there are multiple ways to read and write files within Python
# different methods are designed for different needs: some methods work better with really long files

# one very simple example is as follows
# we can create a file object using OPEN()
# the input to this object is 1) the file name, and 2) what we want the object to do
# "w" in the file object means we are writing to the file

fout = open('testfile.txt','w')

# fout.write(*text*) adds the input text to the end of the file WITHOUT a new line, 
# new lines must be added with a '\n'
fout.write('0,') 
fout.write('cat\n')
fout.write('1,dog\n2,pig')

# close the file object after you are done with it each time
fout.close()

# check to see that the file was made

In [None]:
# this is often done within a WITH loop, don't be alarmed, it is just another way to write it
with open('testfile.txt', 'w') as fout:
    fout.write('0,') 
    fout.write('cat\n')
    fout.write('1,dog\n2,pig')
    
# this will create the same file as the last cell
# and will automatically close the file for you

In [None]:
# a file object will read a file when it is passed an "r" instead of a "w"
fin = open('testfile.txt', 'r') 
print(fin.read())
fin.close()

In [None]:
# you can easily write lists to file using a loop
animals = ['dog', 'cat', 'fish', 'gibbon', 'ox', 'donkey', 'ibis']
# write this list to a file called 'animals.csv'

fout = open('animals.csv','w')

for i, animal in enumerate(animals):
    fout.write(str(i)+','+animal+'\n') 
fout.close() 

# let's read the file to see what it output
fin = open('animals.csv', 'r') 
print(fin.read()) 
fin.close()

### Error Messages

In [None]:
# there are a few types of error messages that may pop up when Python gets mad at you
# here we will go over a few

# we have seen a common one: the TypeError
# this happens when you make objects of different types interact, but those types do not play nicely together
print(8 + 'hello')

# Python usually does a good job telling you what went wrong, and these are usually easy to fix with casting

In [None]:
# another type of error is a SyntaxError
# these happen when your code is wrong (not wrong as in ugly or non-pythonic, wrong as in incorrect)
# a common source for these errors is forgetting the ':' after a conditional or loop
for x in range(3)
    print(x)

In [None]:
# we also can be given an IndentationError
# this is a type of SyntaxError that occurs when you don't indent properly in your loops and conditionals
for x in range(3):
print(x)

# for these last 2 error messages, you don't need to reevaluate your coding logic, just look for typos

In [None]:
# NameError occurs when you try to use a variable you have not defined yet
# these are often caused by typos, or can become an issue if your project contains many parts that interact
# here we see the variable named "variable" is not yet defined
print(variable)

In [None]:
# ZeroDivisionError occurs when you divide by zero
print(1/0)

Try/Except

In [None]:
# you can deal with errors with a TRY EXCEPT
# the TRY attempts to do everything within, 
# but if an error is thrown, it does everything within the EXCEPT instead
try:
    print(1/0)
except:
    print('naw')

In [None]:
# you can also add in a pass within the EXCEPT if you want the EXCEPT to do nothing
try:
    print(1/0)
except:
    pass

In [None]:
# errors don't always have to break your code, sometimes you can use them if you expect them
# say you want to sum values up, but some of the values are missing as 'NaN'
nums = [2,4,2,8,7,'NaN',5,5,10,34,'NaN']

# you could check each value to see if it is numeric, and if it is add it to the total
total = 0
for num in nums:
    if num != 'NaN':
        total = total + num
print(total)
# but the extra check in each step takes up time, 
# (not enough to matter for small cases but enough to care for bigger cases)

In [None]:
# instead, we could expect the error, pass over it with a TRY EXCEPT, and move on
total = 0
for num in nums:
    try:
        total = total + num
    except:
        print('found NaN, but who cares')
print(total)

# despite have slightly more code, this will run faster in if nums is significantly large ****

___________________________________________________________________________________________________________________________
## Packages
Installed packages can be imported using ```import *package*```

This allows you to use software not native to vanilla python.
Some packages like pandas and NumPy come shipped with most installs of Python.
Other packages like sklearn may not, and have to be installed using pip or conda.
You will not have to worry about installing packages in this class, only importing them.

### JSON

In [None]:
# JSON (JavaScript Object Notation) is a very handy data format for multi-type data
# JSON is a very effective format for medical data
# JSON often exists as a file type, a .json file, but can exist within Python as well
# JSON most closely resembles a dictionary in Python
# with this package, we make JSON objects from dictionaries or lists of dictionaries using json.dumps
import json

# here is some patient information as dictionary
# they are enumerated with patient IDs, each corresponding to some information
patients = {0:{"first name": "John",
               "last name": 'Shmo',
               "age": 24,
               "city": "New York",
               "diagnosis": "diabetes"
              },
            1:{"first name": "Deborah",
               "last name": 'Doe',
               "age": 30,
               "city": "New Haven"
              },
           }
# notice that the fields are not 100% the same, json allows data fields for objects to be different

# convert our dict into a JSON:
patients_json = json.dumps(patients)
print(patients_json) 

In [None]:
# you can print our json object a little prettier using an indent
print(json.dumps(patients, indent=2))

In [None]:
# we can write our json objects to a file just like before
with open('patient.json', 'w') as fout:
    json.dump(patients, fout)

In [None]:
# we can load a json file using json.load
with open('patient.json') as json_file:
    data = json.load(json_file)

print(data)

import pprint # this is a handy package
pprint.pprint(data)

### Random

In [None]:
# random number generators are very useful in coding
# a popular one in Python is the package random
import random
# it is important to state that this, and really all random number generators, are actually pseudo-random
# this will not matter for most things you use it for, but is important for security things

# random can be used to give a random integer in a range with RANDINT
# here are 10 random throws of a die
for throws in range(10):
    print( random.randint(1,6))
# IMPORTANT: oddly enough, unlike RANGE, RANDINT is inclusive
    # e.g. 6 is included in randint(1,6) where 6 is not included in range(1,6)

In [None]:
for i in range(1,6):
    print(i)

In [None]:
# random.random() will give you a random float with lots of precision between 0 and 1
for throws in range(10):
    print(random.random())

In [None]:
# random.choice() will randomly choose something from a list that you give it
choices = ['red','blue','green']
for throws in range(10):
    print(random.choice(choices))

In [None]:
# random.shuffle() will shuffle the contents of a list that you give it
# it does not return a list though, it actually shuffles the instance of the list it gives you
    # so make sure you have a copy of a list you shuffle if the order matters somewhere else
our_list = [0,1,2,3,4,5,6,7,8,9]
for throws in range(10):
    random.shuffle(our_list)
    print(our_list)

### Matplotlib PyPlot

In [None]:
# matplotlib's PyPlot is probably the easiest plotting package in Python
# this is not me promoting PyPlot (plot9 is very good) but rather showing it because it is easy
# this is also not a comprehensive guide to PyPlot, but rather just some case uses
import matplotlib.pyplot as plt # you can change what you want to call imported packages

# plotting a scatter plot with PyPlot
# let's create some random 2d data
import random

x = [random.random() for i in range(100)]
y = [random.random() for i in range(100)]

plt.scatter(x,y)

plt.show()

# there are many many things you can do to change this plot, like change the colors of the points, 
# but I will let you figure that out with the matplotlib documentation

### NumPy

In [None]:
# NumPy is a package that works with vectors and matrices of numbers. 
# It is essential to data analysis in Python,
# and is the core to many other data oriented packages like sklearn and tensorflow
import numpy as np

# you can initiate an array of all zeros with the np.zeros function, with the size of the array as input
array = np.zeros([5,6])
print(array)

# look at the shape (tuple of # rows, # columns) of an array with .shape
print(array.shape)

In [None]:
# or you can define an array with np.array() with a list as input
# a list of lists will make a 2d array
array_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(array_2d)

In [None]:
# a list of lists of lists will make a 3d array
array_3d = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
                     [[11, 12, 13], [14, 15, 16], [17, 18, 19]],
                     [[101, 102, 103], [104, 105, 106], [107, 108, 109]]
                    ])
print(array_3d)
print(array_3d.shape)
# a 3d matrix doesnt look very pretty on a 2d screen
# but we can think of this 3x3x3 matrix as like a Rubik's cube

In [None]:
# you can slice arrays like lists
print('1st slice:\n',array_3d[1]) # imagine this is the middle cross section of a Rubik's cube
print('2nd slice:\n',array_3d[1,1]) # imagine this is a middle column of a Rubik's cube
print('3rd slice:\n',array_3d[1,1,1]) # imagine this is the core of a Rubik's cube

In [None]:
# NumPy is great for linear algebra

# we will not be learning linear algebra in this course, but it is very important for data science
# however, here are a few matrix manipulations to get you familiar with some NumPy functionality

# first, lets make 2 toy matrices with some shapes
# the first matrix will be all zeros except for a square of circles in the middle
mat1 = np.zeros((100,100)) # to do this, we start with a matrix of zeros
mat1[25:75,25:75] = 1 # then we set the slice of the inner square equal to 1

# now we can visualize our matrix with imshow
import matplotlib.pyplot as plt

plt.imshow(mat1)
plt.title('mat1')
plt.colorbar()
plt.show()
# here, dark purple corresponds to the lowest value: 0, and yellow corresponds to the highest value: 1

In [None]:
# our second toy matrix can be made the same way as the first, but we will have it look like a plus sign
mat2 = np.zeros((100,100))
mat2[33:66] = 1
mat2[:,33:66] = 1

# let's see what our second toy matrix looks like
plt.imshow(mat2)
plt.title('mat2')
plt.colorbar()
plt.show()

In [None]:
# matrix addition (it is always element-wise)
mat_3 = mat1 + mat2

plt.imshow(mat_3)
plt.title('matrix addition')
plt.colorbar()
plt.show()
# we can see that adding the matrices stacked them, 
    # now yellow corresponds to 2
    # teal corresponds to 1
    # and dark purple corresponds to 0

In [None]:
# matrix multiplication (element-wise)
mat_3 = mat1 * mat2

plt.imshow(mat_3)
plt.title('matrix elementwise multiplication')
plt.colorbar()
plt.show()
# we see the same yellow shape as matrix addition, but without the teal sections
# this is because only the positions where both mats were 1 are nonzero as multiplying by 0 gives you 0

In [None]:
# matrix multiplication (dot)
mat_3 = mat1 @ mat2

plt.imshow(mat_3)
plt.title('matrix multiplication')
plt.colorbar()
plt.show()
# the @ operation in np gives us the matrix product of mat1 and mat2. 
# without more detail, know that this type of matrix multiplication is crucial to data science
# I would recommend looking into what this means if you are unfamiliar (it is honestly not very complicated)

In [None]:
# matrix multiplication (dot) again
mat_3 = mat2 @ mat1

plt.imshow(mat_3)
plt.title('matrix multiplication')
plt.colorbar()
plt.show()
# notice, matrix multiplication is not commutative like regular multiplication
# also, notice what happened with the scales, interesting huh

### Pandas

In [None]:
# if you code in R, you may like pandas
# it creates dataframes, that may contain more than just numbers (like in NumPy), similar to dataframes in R
# pandas dataframes are not as lightweight as NumPy matrices, but they are certainly prettier
import pandas as pd
import numpy as np

# you can use a dictionary to make a pandas dataframe
d = {'trees': ['conifer', 'aspen','pine','eucalyptus'], 
     'dogs': ['brittany', 'boxer','chihuahua','border collie'],
     'vegetables': ['cucumber','carrot','pea','legume']
    }
df = pd.DataFrame(data=d)

# show a preview of the dataframe using .head() as printing pandas dataframes doesn't look too good
df.head()

In [None]:
# give the .head() method an integer for parameter 'n' to tell it how many rows to display
df.head(n=2)

In [None]:
# pandas and NumPy can work together
# turn your 2d NumPy array into a pandas dataframe
array = np.zeros((5,6))

df2 = pd.DataFrame(array)
df2.head()

In [None]:
# pandas can be used to load files like .csv easily using pd.read_csv()
# pd.read_csv() will take the first line as the header if you do not specify one
# lets load our animals.csv we made earlier

animals_df = pd.read_csv('animals.csv')
animals_df.head()

In [None]:
# header can be ignored by passing in the pandas 'header' parameter as None
animals_df = pd.read_csv('animals.csv', header=None)
animals_df.head()

In [None]:
# or you can specify the header of our animals dataframe with the pandas 'names' parameter
animals_df = pd.read_csv('animals.csv', names = ['count', 'animal'])
animals_df.head()

In [None]:
# pandas can also accept json files, by using the pd.read_json()
# lets load the json file we made earlier
patient_df = pd.read_json('patient.json')
patient_df.head()

# notice what happens when a field is missing

In [None]:
# missing data points is common in medical data
# in pandas, you can ask if something is missing with pd.isna()
# this returns a df with boolean values of True for any missing data
pd.isna(patient_df)

In [None]:
# you can easily transpose your data with the .T method
patient_df.T.head()

In [None]:
# for the next few methods, lets create a dataframe of a mock drug trial

# we will start with a random numpy array
ran = np.random.rand(10,5)
df = pd.DataFrame(ran)

# change the names of columns to be some trials
df.columns = ['trial_1','trial_2','trial_3','trial_4','trial_5']

# and change the dataframe's index (the row names) to some drugs I made up
    # please don't make fun of my lack of medical knowledge :)
df.index = ['interferon A','interferon B','interferon C','interferon D','interferon E',
            'inhibitor A','inhibitor B','inhibitor C','inhibitor D','inhibitor E',
           ]

df.head(n=10)

In [None]:
# you can sort a dataframe by a column
df = df.sort_values(by='trial_2')
df.head(n=10)

In [None]:
# selecting data from dataframe

# selecting data from dataframes can be confusing, because there are many ways to do it
# getting a column is easy though, as you can just call it by name
print(df['trial_3'])

In [None]:
# you can select slices of rows similarly to how it is done in NumPy
# however unlike NumPy, this can only be used to isolate rows and NOT COLUMNS
# other methods (.loc and .iloc) must be used for this
print(df[1:3])

In [None]:
# selection by label
# this is done with the .loc() method
# you can use it to select by row or column, or both by label
df.loc['interferon A', 'trial_2']

In [None]:
# .loc() can also be used to get multiple columns or rows with a list, or a slice
# both list and slice shown below
df.loc['interferon D':'interferon A',['trial_1','trial_5']]

In [None]:
# selection by position
# this is done with the .iloc() method
# like .loc() this can be used to get row or column, or both
df.iloc[1,1]

In [None]:
# like .loc(), .iloc() can get slices and lists
df.iloc[[1,2,4],0:3]

In [None]:
# iterate over a dataframe

# this essentially does a FOR loop over all of the rows
# iterating over the rows is performed using the .iterrows() method
for index, row in df.iterrows():
    print('trial_1:',row['trial_1'],'\ttrial_2:',row['trial_2'])

In [None]:
# iterating over columns uses the .iteritems method
for col_name, col in df.iteritems():
    print(col_name)
    print(col,'\n')