# What to gain from the class

By the end of the class I want each of you to have:

* The briefest of introduction to Python, with resources to find out more
* Confidence that Python can help you do your analytics quickly and effectively
* The relief that you survived the initial anxiety of learning a new language

This will be a first pass, so I don’t expect you to remember most of it by the end of the lesson.

I want to convince you that Python is a very quick and easy tool to pick up to get you to do your analytics. Most people’s reaction to python when developing a python script was, “oh, it was really that easy?” I hope you see that many of the tools are already there for you to use.

# Getting started

Above all else, please check out [the official documentation and tutorial for python](https://docs.python.org/3/tutorial/index.html). This is a well-visited and reviewed piece of documentation, and will give you a clearer and more thorough introduction to the language than I will.

A word of caution on Stack Overflow and ChatGPT: they are excellent tools to get quick answers, but if you find yourself on it more than you are programming yourself, it is an indication that you should slow down and review the initial documentation. It will not be a waste of your time to schedule a block to read the documentation comprehensively. You will thank yourself later for learning the fundamentals.

# Interactive Mode – Python’s console

The [interactive interpreter](https://docs.python.org/3/tutorial/interpreter.html) is a convenient tool to run quick python code, install packages, read help documentation, and more. In it, you can type basic python commands and execute them immediately.

The interpreter provides a help tool to the python user, too. You can type help to open an interactive help menu. Within it, you can type in package names or keywords that it lists to get a description on how to use tools in python.

It also stores a history of commands run. You can use it to scroll up through previous commands (with the up arrow).

Also, it supports tab-completion. Typing a single tab will cause the interpreter to finish the line you were typing. If there are multiple correct options, you can press tab a second time and it will show all possible options.

A word of caution: don’t over-rely on the console to run code. Get into the habit of writing code to file and saving it. Better yet, version-control your python files using a tool like git.

# Python Basics and Keywords

## Defining Variables

Defining variables is simple in python, just place an equal size between the variable name on left and its value on the right. You don't have to specify the type of the variable on assignment. Python figures it out under the hood.

In [None]:
num = 3
name = "Clint"
my_list = [1, 2, 3]

## Variable Types

If you're curious to find the type of a variable, use the `type()` function. Use the `print()` function to display lines. In the interactive interpreter, you can omit the print statement.

In [None]:
print(type(num))
print(type(name))
print(type(my_list))

## Numbers of Varying Format

There are multiple types of numbers that can be stored in python. Further, numbers can be written in many ways:

In [None]:
my_integer = 3
my_float = 3.4 # floating point values, representing fractional numbers
imaginary = 1 + 3j # Note, python uses a 'j' instead of an 'i' for imaginary numbers
imaginary_2 = complex(1,3)
large_num = 1.56E50 # This reads 1.56 times 10 to the 50th power

## Operators

Python offers a wide variety of operators. We can look at a few for arithmetic and for comparison below.

In [None]:
print("Basic arithmetic:",
      3 + 3, # addition
      2 - 1, # subtraction
      4 * 5, # multiplication
      6 / 3, # division
      2 ** 3, # exponent
      7 // 2, # integer division
      91 % 4, # modulo or remainder
      sep='\n  '
)

print("Comparison operators:",
      2 == 2, # test equality
      2 != 1, # inequality
      5 > 3, # greater than
      7 <= 2, # less-than or equal to
      sep='\n  ' # define separator between each print entry
)

There are shortcut arithmetic operators built into python. Placing the operator before an equal `=` sign will apply the value on the right to the variable to the left. Let's edit the `num` variable that I created earlier.

In [None]:
# The following assignments are equivalent
print("num's original value:", num)

num = num + 3 # add 3 to "num", then assign the result to num
print("add 3 to num:", num)

num += 3 # add 3 to num
print("added 3 again:", num)

# You can do this with most arithmetic operators
num **= 2
print("raised num to power of 2:", num)
num /= 5
print("divided num by 5:", num)

## Text and Strings

There are quite a few methods for creating strings, or text, in python. You can use either single quotes `'` or double-quotes `"` to create a string, whichever is more convenient is to you.

In [None]:
single_quote = 'Here is some text'
double_quote = "Some more text"

We can create a multi-line text block with triple single quotes `'''` or triple double quotes `"""`. Everything following will be part of the string until it finds a matching triple quote.

In [None]:
multi_line = """Here are multiple lines
of text
and this is completely legitimate
"""

print(multi_line)

Lastly, I'm a frequent user of [the formatted string](https://docs.python.org/3/tutorial/inputoutput.html#tut-f-strings). It allows for displaying variables their values inline with the text. A formatted string if created by placing an `f` before the quotes: e.g. `f"a formatted string"`. To display a variable in a formatted string, place it in curly braces `{}`.

We'll print a few previously-defined variables below with formatted strings.  

In [None]:
print(f'Here is my name: {name}')

print(f'You can display name and value together with "=". Like this: {my_list=}')

# There are tools to change precision of numbers by adding a colon afterwards.
long_float = 34.234324324324 
print(f'Adding ":.3f" after variable name says, "show first 3 decimal numbers: {long_float:.3f}')

## Comments

Comments are used with the hash: `#`. Everything that follows it will be ignored by the interpreter. Examples of this are seen above in this document.

Further, lone triple-quotes (`"""`) will be ignored by the interpreter when it is found in a file. For this reason, it is used to formally document code.

In [None]:
'''A comment: This variable does cool stuff'''
cool_stuff = 59393

## Lists

[Lists](https://docs.python.org/3/tutorial/introduction.html#lists) are python's built-in container to store ordered values. We saw an example above of a list getting created with `my_list`. Let's create another. A list is created with square brackets `[]` with commas separating each entry.

In [None]:
# A single list can store objects of different types
# They can even store other lists!
mixed_list = [1, 3.4, 'a', [3, 2, 1]]

print(f"{mixed_list=}")

# Accessing an element is done with brackets after name
print(f"{mixed_list[0]=}") # indexing starts at 0
print(f"{mixed_list[2]=}")
print(f"{mixed_list[-1]=}") # negative index means start at end of list

In [None]:
mixed_list[2] = 'b' # you can edit a single entry
print(f"After editing: {mixed_list=}")

# you can remove a single element
del mixed_list[3]
print(f"After removal: {mixed_list=}")

In [None]:
# or you can append two lists together
mixed_list = mixed_list + my_list
print(mixed_list)

# to see how many elements are in a list, use len()
print(f"Number of elements: {len(mixed_list)}")

# to see if an element is in a list use in keyword
print("'a' is in our list:", 'a' in mixed_list)
print("'b' is in our list:", 'b' in mixed_list)

## Sets

A [set](https://docs.python.org/3/tutorial/datastructures.html#sets) is an unordered collection of data where there are no duplicate elements. It is created with curly braces `{}` and the elements are comma separated. 

In [None]:
my_set = {'a', 'b', 'c', 'a'} # note two 'a' values
print(my_set)

# like lists, to find if an element is in a set, use 'in' keyword
print("'c' is in my_set:",
      'c' in my_set)

# remove an element with remove() or discard()
my_set.remove('c') # remove throws error if it can't find element
my_set.discard('d') # will not complain if it can't find it
print("Set after removal", my_set)

# you can add elements with f
my_set.add('f')
print("After adding an element:", my_set)

## Dictionary

A [dictionary](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) is an associative object, matching a unique key to any value. This is one of pythons more famous and heavily used container objects. The versatility is strong, and can be used to represent most things. Like a set, you create it with curly braces `{}`, but in each comma-separated element you provide it a key and value, separated by a colon `:`.  

In [None]:
dict_about_me = {'my_brain':'thinking', 'num_arms':2, 'my_hands':'typing'}

# add new elements with square brackets and =
dict_about_me['my_feet'] = 'dancing'

# access with square brackets, too
print("What is my brain doing?", dict_about_me['my_brain'])

# list keys and values with keys() and values() functions
print("Keys: ", dict_about_me.keys())
print("Values: ", dict_about_me.values())

## Indentation

Spaces and indents have special meaning in python. A set number of spaces before a line means a continuation of logic. You can't arbitrarily indent lines without python getting mad.

The convention is to add 4 spaces to indent, but you can set the number of spaces to just about anything. As long as you are consistent in the file, python can figure it out.

In [None]:
# If you uncomment the line below, python will complain
#          my_var = 3

## Classes and Objects

[Classes](https://docs.python.org/3/tutorial/classes.html) are not something that I will review in the lesson, of the scope of the subject is too vast for this short lesson. However they are important to understand. Instances of those classes, objects, are what you are interacting with most of the time in python.

## Basic Logic and Control

Here we will sample a few of the most critical [control-flow](https://docs.python.org/3/tutorial/controlflow.html) expressions in python. These are expressions that allow you to perform checks or repeat code in loops.

### If Statements

And `if` statements only runs the code that follows if the expression to check returns true.

In [None]:
sheep = 'baa'

# put a true/false statement between "if" and the colon
if sheep == 'baa':
    print('That is a sheep') # use a 4 space indent for every line you want to run only if the statement above was True
    print('Nice sheep')

Accompanying the `if` statement are the optional `elif` and `else` statements.

The `elif` stands for "else if", saying "if the previous expressions weren't true, check the following. It will not run if any previous expressions in the if/else chain showed to be true.

Finally, the `else` statement will run only if none of the above statements were true. It is the default expression if all else failed. Let's expand our `if` statement from above.

In [None]:
sheep = 'boo' # feel free to edit this value to see what prints!

if sheep == 'baa':
    print('That is a sheep')
    print('Nice sheep')
elif sheep == 'boo':
    print('That is a negative sheep')
elif sheep == 'bro':
    print('That is a cool sheep')
else:
    print('That is not a sheep')

### While loops

A `while` loop is something that will repeatedly run a code block until an expression returns `False`. It is structured like the `if` statement, where you put a true/false expression between `while` and the colon. Let's count some sheep as an example.

In [None]:
num_sheep = 1

while num_sheep <= 5:
    print(num_sheep, "sheep...")
    num_sheep += 1 # increment the number of sheep here

Note that I had to increment the `num_sheep` inside the loop. Without it, the loop would continue forever. An infinite loop can be a nice feature on some projects, but for beginners, it can be a scary resource hog.

### For loops

The `for` loop is a good tool to iterate over all elements of a container, such as a `list`, `set` or `dict`. It will loop until it accesses all elements specified. Below we will iterate over the objects we made earlier. The structure is as follows:

```
for <new_variable_name> in <an_existing_object>:
    <stuff to do>
```

The `new_variable_name` is the name you assign to access each element in the object you're looping over. The variable name is local to the loop, and can't be used outside the loop.

In [None]:
for val in my_list:
    print(val)

for letter in my_set:
    print(letter)

# dictionary has items where you can get key and value at same time in loop
for key, val in dict_about_me.items():
    print("the key is", key, "and the value is", val)

## Functions

[Functions](https://docs.python.org/3/tutorial/controlflow.html#defining-functions) are the logical tool to group a set of commands together into one. Like a math function, the goal is to take one or more inputs and provide you with an output.

It is common for academics to string together their code as a long list of commands to run sequentially. It is a better habit to have your code into logical groups through functions.
* It better communicates motivation.
* It helps avoid repeating the same lines throughout your code (see DRY principle).
* It helps in testing (we will see this later).

We'll define a basic function below. Between `def` and the parentheses, give the function a unique name. The indented block following the colon is what will run.

In [None]:
def hello():
    print("Hello there")

To run it, call the name, followed by parentheses:

In [None]:
hello()

You can pass arguments into the function. We give arguments a name within the parentheses in a comma-separated list. After they are named, we can use them in the function block. 

In [None]:
def bad_conversation(name1, name2):
    print(f"Hello {name1}, how are you?")
    print(f"  Good {name2}, and you?")
    print(f"I've been better...")
    print(f"  Great! See you later!")

To call `bad_conversation`, we must now pass it some values. If you leave an argument out, python will complain.

In [None]:
bad_conversation(name, "Bill")

# Documenting your code

It is important for yourself and others to document the work you are doing in your code. Sometimes the code alone does not express motivation. If we comment our code properly, many built-in tools can generate interactive documentation for you.

Confusingly, there are many accepted conventions for documenting your code in Python, so it can be hard to know which one is best. I typically follow the sphinx docstring format, but there are many formats to follow in Python. Pick the one you like best. The most important rule is to be consistent throughout your files.

We'll review some of the documentations principles in the links below:
* https://peps.python.org/pep-0008/
* https://peps.python.org/pep-0257/

Here are few docstring example guidelines:
* https://sphinx-rtd-tutorial.readthedocs.io/en/latest/docstrings.html
* https://google.github.io/styleguide/pyguide.html
* https://numpydoc.readthedocs.io/en/latest/format.html

# Installing and Using Packages

Most of the time, you will be using the code designed by other people to run your analyses. The special algorithms and libraries are stored in [packages](https://docs.python.org/3/tutorial/modules.html#packages).

## Pip

[Pip](https://docs.python.org/3/tutorial/venv.html#managing-packages-with-pip) is the preferred way to install packages and now comes built-in with a python installation. You can check to see if it is installed on your machine by typing the following.


```
py -m pip --version
py -m pip --help
```

Listing all packages installed in your environment can be done with its `list` command

```
py -m pip list
```

You can even see details on a particular package with the `show` command
```
py -m pip show package_name
```

## Viewing package documentation

You don’t need Stack Overflow to navigate your packages. The best source is always the documents.
Using the help() function
* help(), then type the package name
* help(“package_name”)
* import first, then ask help
* more specific: help(package_name.function) help(package_name.object)

Alternatively, you can use a tool installed with python from terminal, called `pydoc3`
* type “pydoc3 package_name” into your console

## Using the packages and modules

After installing a package with `pip`, you can begin using it in your code. The `import` keyword followed by your package name will allow you to use anything in the package.

In [None]:
import matplotlib # importing the matplotlib library

You can change the name of the imported library with the `as` keyword. This is useful for common packages that you are typing often.

In [None]:
import numpy as np # common shortcut for the numpy library

np.pi # you can now call any numpy object with np shortcut

It is good practice to only import what you need from a package. The `import` keyword will load in the entire package in your current python file. If you're only using a few tools from the libary, it is better to use the `from` keyword first.

In [None]:
# Stats models is huge. Maybe better to import the one thing we're using
from statsmodels.formula.api import ols # Oridinary Least Squares regression

You can import from anywhere in your python file, like even in an `if` block or a function. However, it is best practice to import packages at the top of your file to indicate to someone else what you expect them to have installed on their system in order to run your code.

# Data Structure Packages

Now that we have you introduced to the basics of python, it is time to introduce you to the most common science packages python has to offer.

## Numpy

[Numpy](https://numpy.org/) is one of the more commonly used packages in just about any project. It gives you access to n-dimensional arrays where all elements are the same time. Additionally, it gives you access to manipulate those arrays.

Let's create a basic one-dimensional array.

We’ll do a following simple examples
* Create a 1-dimensional array
  * Look at length: e.g. arr.size
  * Look at type
* Create a 2-dimensional array
  * Find the mean
  * Look briefly at all the tools it has

In [None]:
arr = np.array([5, 6, 7])
print(arr)

# every array has one type for all elements, stored in dtype
print(arr.dtype)

Numpy has many tools available, such as calculating mean, finding maximum and minumum and so much more. Here is an example of creating a 2-dimensional array and calculating the mean of each row.

In [None]:
arr_2d = np.array([[1,2,3],[4,5,6]])

print(f"Shape of array: {arr_2d.shape}")
print(f"The mean of each row: {arr_2d.mean(axis=1)}") # axis = 0 is column, axis = 1 is row

## Pandas

What happens when you want to import tables, or objects with heterogeneous data types? This is where [pandas](https://pandas.pydata.org/docs/getting_started/intro_tutorials/01_table_oriented.html) comes in. For those familiar with R’s data.frame, the pandas library will feel very familiar.

We'll load a dataset from another package `seaborn` (mentioned later) to create a simple `pandas` data frame. We'll see very quickly a pretty table with a bunch of information.

In [None]:
from seaborn import load_dataset

dat = load_dataset('iris')

print(f"Type of dat: {type(dat)}")
dat

We can create a `pandas.DataFrame` from a python dictionary. Let the key be the column name and the value be a list of the rows.

In [None]:
import pandas as pd

contacts_dict = {'first name':['Bob','Linda','Tina','Gene','Louise'],
                 'age':[46,44,13,11,9],
                 'email':['burgerbob@example.com','lbelcher@example.com', 'ponies.delux@example.com', 'im.with.ken@example.com', 'top.secret@example.com']
                }

contacts_df = pd.DataFrame(contacts_dict)
contacts_df


## Loading Data from file and saving to file

Of course the data structures above can be loaded from a file and saved to file.
* `pandas` has a read_csv() function you can use.
* `numpy` has the genfromtxt() function, where you can specify a delimeter



# Data Visualization

## matplotlib’s pyplot

[Matplotlib](https://matplotlib.org/stable/users/explain/quick_start.html) is the most common plotting library used in python. In particular, its module `pyplot`, can be used in interactive work to display your information quickly. This tool gives you plots much similar to the ones you’d find in MATLAB.

The tool is extensive and can do a potentially overwhelming number of things to a plot. As with many python libraries to me, matplotlib can start to feel like its own language. However, to get a single graph up and running is pretty simple. We’ll focus on displaying information with the bare-minimum results.

To get a better understanding of plotting more advanced images, I’d check out examples in the scikitlearn library.

## seaborn

The [seaborn library](https://seaborn.pydata.org/tutorial/introduction.html) builds on matplotlib for statistics and integrates closely with the `pandas` library. Some of the plotting that is common in seaborn are:
* Relational plots with relplot
* Fit linear regression models to scatterplots with lmplot.
* Distribution plots with displot.
*  Categorical plots with catplot. We can do box and whisker or violin plots

# Basic stats packages

## Scipy

[Scipy](https://docs.scipy.org/doc/scipy/tutorial/index.html) is built to extend numpy and provides a wide range of algorithms. This was by far my most used stats packet in the engineering workforce. As scientists, I imagine you’ll get familiar with the following modules:
* Linear algrebra – linalg
* Spatial algorithms – spatial
* Statistics algorithms – stats

## Statsmodels

The [statmodel](https://www.statsmodels.org/stable/) library is the heavy-weight stats analysis package. If you want summary data quick on a model you're running. This is the one to use. It serves as a bridge between R and python, and uses `pandas` library heavily to run its analyses.

Links:
* https://www.statsmodels.org/stable/user-guide.html
* https://www.statsmodels.org/stable/examples/notebooks/generated/tsa_filters.html

# Machine Learning

## Scikit-learn

The [scikit-learn](https://scikit-learn.org/stable/user_guide.html) package is another library built on `numpy` and `scipy` and adds supervised and unsupervised learning algorithms to the list of things you can do in python. This library could be an entire class all on its own, but fortunately the documentation very clearly layed out that you can start exploring and expirementing on  your own.

# Testing Your Code

Not often taught to scientists is the importance of [testing](https://docs.python.org/3/library/unittest.html) your code. There is a need for your analysis to be reproducible, and testing the packages and methods used can help achieve this much easier.

This will give you confidence:
* That the code you ran 6 months ago still runs the same way now
* That your code you ran on an earlier version of the packages still runs the same way on the newer version
* That your code you gave to somebody else will run the same way for them

We'll create a small test case using `unittest`. For this package, our tests live inside a `class`. We didn't review any syntax for creating classes, so I advise to follow one of the simple templates laid out in the documentation. Basically, we will define test functions within the `class` (usually with a starting with `test_`), and we will use statements that start `self.assert` to check values.

In [None]:
from unittest import TestCase

# We create a test class as follows
class TestCumulativeSum(TestCase):
    def test_ones(self):
        """Pass some ones into the function and see if it increments"""
        ones = [1,1,1,1,1]
        res = np.cumsum(ones)
        self.assertTrue(np.array_equal(res, [1,2,3,4,5]))


# Improving your Python Experience

## Ipython

The [ipython](https://ipython.readthedocs.io/en/stable/overview.html) terminal is a new interactive interpreter that improves on the base python. The most immediate benefits come from previewing previous lines you’ve used. It also offers a nice drop-down menu of available options when you are typing (just press \<Tab>!)

Other features:
* Create scripts for it to run each time you start it up
* Built-in commands that make navigating on the machine easier (“magic functions”)
* Quick access to a python debugger, so you can see what is going wrong line-by-line

## Jupyter

[Jupyter](https://docs.jupyter.org/en/latest/) combines formatted text and code together on one page. You can use markdown or LaTeX to format your jupyter notebook, and create code blocks right after that you can run in the document. It is great for demonstrations and tutorials. As a matter of fact, I’m using them in the class to teach the material!

## Nose tests

We briefly discussed creating test functions for your product code. The [nosetest](https://nose.readthedocs.io/en/latest/man.html) a package and a suite of tools that can improve your testing experience significantly. Nosetests provides an easier interface for testing your code, and it integrates well with many of the main development tools you’ll use.

# IDEs

An integrated development environment, or IDE, are tools that provide a clean graphical interface and allow for more natural code editing and writing. The following are my two recommended for python.

## Spyder

[Spyder](https://www.spyder-ide.org/) is marketed towards scientists, and mimics an environment similar to RStudio. Similar to RStudio, it allows your file one line at a time. It shows all variables you have defined in your environment, too.

## Visual Studio Code

Microsoft released a free IDE that can support many languages. It offers thousands of extensions built by the community and Microsoft, and is very feature-rich for Python. This is the one I recommend for this class, as it requires no subscription and comes pre-installed on many university computers

## PyCharm

Jetbrains offers a suite of language-specific IDEs. Their python IDE is [PyCharm](https://www.jetbrains.com/pycharm/). People who use their tools swear by them, but they do cost money. I'd check to see if you have a student license, if you're interested.

## Notepad++

[Notepad++](https://notepad-plus-plus.org/) is marketed to the software engineer. It feels lightweight to some, and even barebones to others, but it has a lot of powerful tools underneath the hood, if you are willing to take the time to learn it. I personally don't have much experience with it. 

# Static Code Analysis Linters

Python is a runtime language, meaning it evaluates each line when it is run. This means checks on code are usually done at the moment the code is run, aside from some basic syntax checks.

To solve this, many teams have created a series of static code checking tools, called linters. These will evaluate your code as you create it, not as you run, and report warnings and errors that they can find without running the code directly.

There are many options, and I’ve found all of them to be chatty and annoying at first in python, but once you configure it to your liking, it will catch mistakes you make way more accurately and quickly than manually running the code each time.

Some common code analyzers:
* [pylint](https://pylint.readthedocs.io/en/stable/)
* [mypy](https://mypy.readthedocs.io/en/stable/getting_started.html)
* [black](https://black.readthedocs.io/en/stable/index.html)

# Using virtual environments

A python [virtual environment](https://docs.python.org/3/tutorial/venv.html) is a contained mini python world that you run locally. It allows you to save python packages in a directory local to your project, access and manipulate them, and save a state for that specific work.

If you have multiple projects on the same computer, sometimes the need to use different versions of packages to run. Or, if you don't have admin privileges on the machine you're running, you might be blocked from installing packages system-wide. These are two cases where you should start up a virtual environment. 

Probably most importantly, by creating a virtual environment between projects, you can save important environment variables, imports and themes that only apply to that project.

To start the virtual environment, run the following from a terminal:

```
py -m venv lesson
```

A virtual environment should be created in your directory. Look for a folder named `lesson`. See what's inside. Look in the windows explorer, or type the following to see the contents:

```
tree lesson
```

Inside a directory called `Scripts`, there is an `activate` file. In terminal, type the file: `lesson\Scripts\activate`. Now, your virtual environment is activated, as indicated with the text `(lesson)` at the front of the terminal entry line.

## Cloning your virtual environment

Say another person is about to help you with your work on their machine. How can you ensure that the two of you are using all the same packages and same versions?

If you've installed all your packages into your virtual environment, this can be quite easy.

Activate your environment, like mentioned above (e.g. `lesson\Scripts\activate`). Then, run the following command:

```
pip freeze
```

This will display all packages installed, like `pip list`, but the ouput is structered in something called [requirements format](https://pip.pypa.io/en/stable/reference/requirements-file-format/). Copy the output to a file, usually `requirements.txt`. Give that file to your partner, and they can install all the same packages as you in their own environment with the following:

```
pip install -r requirements.txt
```