# Welcome to python!


First things first: when you run a python program, python will read and execute your code line by line starting at the top of the file. This means that, outside of syntax errors, you will find most of your bugs at 'runtime' when the program reaches an instruction it doesn't know how to handle. It will give you an error message that will tell you which line is broken and help you figure out what needs to be fixed.

This notebook will review the following:

### Data structures

    - Int, float, str
    - List
    - Set
    - Array
    - Dict

### Functions

    - Return types
    - Signature
    - Calling

### Importing

    - Libraries
    - Your code

### Libraries

    - Numpy
    - Pandas
    - Matplotlib
    - sklearn
    - seaborn

### Using the terminal

    - Common commands (don't worry about memorizing these right now)

### Conda

    - Download and install
    - New environment
    - Entering/exiting
    - View environments
    - View installed packages

### Workflow

    - Standard flow
    - Writing your first programs
    - Larger projects

### Best practices

    - Comments
    - Variables
    - Function length
    - Type hints

### Pycharm (forthcoming)

    - Creating a project
    - Creating a configuration
    - Breakpoints
    - Debugging

### Jupyter (forthcoming)

    - Opening a notebook
    - Running a cell

### Objects (actually skipped this one for now)

    - Init
    - Class variables
    - Interfaces
    
### Computer basics (forthcoming)

    - Core utilization and memory


This notebook itself is a jupyter notebook. It's a way of running python code in little snippets that is useful when you want to maintain the state of the program without rerunning it. This will make more sense later but for now it is a useful way to communicate ideas and see the syntax in its native habitat.



Now, let's get into it!

***


# Data structures


In [28]:
# Anything following a # is a comment and will not be run by the python interpreter

# Python will automatically assign the value of a variable to a type (int, string, float) when you create it
a = 'a'     # str
b = 1       # integer
c = 1.0     # float, or decimal value
d = (a, b)  # tuple
e = False   # boolean
f = None    # N/A

# You can manually convert from one data type to another (if it makes sense)
g = str(b)  # Convert b to a string
h = int(c)  # Convert c to an int, this will round to the nearest int
i = int(a)  # This won't work'

# The following is the error message:

ValueError: invalid literal for int() with base 10: 'a'

### Lists

All python iterators are zero-indexed, meaning the first element is at index 0

In [18]:
# Create an empty list
lst = []

# Add an element to the end. A list can hold multiple data types
lst.append('a')
lst.append(1)
lst.append(True)
print(lst)

['a', 1, True]


In [17]:
# Add an element to the 0'th index, or the front
lst.insert(0, 'b')

# Access an element by it's index
lst[1]

# Get the index of some element
lst.index('b')

# Remove some element by it's index (this is destructive, as in it mutates the original list, unlike all the 
# operations shown above)
print(lst)
print(lst.pop(0))
print(lst)


['b', 'a', 1, True]
b
['a', 1, True]


### Arrays

Arrays are an efficient way to store data of a uniform shape and type. Think of them like matrices of arbitrary dimension. Python has built-in matrix operators that operate on arrays. The most common array library is called numpy - you'll use this everywhere.

In [15]:
import numpy as np

# 1D array from a list
x = np.array([1, 2, 3, 4, 5])
print(x)

[1 2 3 4 5]


In [14]:
# 2D array from a list
y = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(y)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [13]:
# Get the shapes of an array
y.shape

# Get the sum of the array
y.sum()

# Access array elements with []

# Get the third element of x
x[2]

# Get the center element of y
y[1, 1]

# Use the : operator to slice arrays

# Get only the first column of y.
# This is saying, 'get all rows and column 1'
print(y[:, 1])


[2 5 8]


In [12]:
# Notice that accessing array elements does not modify the array
print(y)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [11]:
# Get last element of the second row of y
y[1, -1]

# Get the third element onward in x
x[2:]

# Get the elements from the third from the end to the end
x[-3:]

# Multiplication

# If your arrays are the appropriates sizes, you can multiply them

a = np.array([0, 1, 2])
b = np.array([[3, 4, 5], [6, 7, 8]])
c = np.multiply(a, b)
print(c)
print(c.shape)

[[ 0  4 10]
 [ 0  7 16]]
(2, 3)


### Dictionaries

Dictionaries hold data as key: value pairs. The keys and the values can be any data type.

In [19]:
# Define a dictionary
d = {'a': 1, 'b': 2}

# Access the values by querying the dict using the corresponding key
d['a']

# Add dictionary elements by just assigning them
d['c'] = 3
print(d)

{'a': 1, 'b': 2, 'c': 3}


You can iterate over any of these iterators by using a `for` loop using the following syntax

In [20]:
lst = [1, 2, 3, 4, 5]

# Iterate over each element in the list
for elem in lst:
    print(elem)

# ^ Note: elem is just a variable name and can be anything like, i, j, etc.    

# Enumerate lets you iterate while keeping track of the current index
for i, elem in enumerate(lst):
    print("Index {}, Element = {}".format(i, elem))  # This is how you can insert variable values into a string

# You can also iterate over a list of tuples
lst_tuples = [(1, 2), (3, 4), (5, 6)]
for a, b in lst_tuples:
    print(a, b)



1
2
3
4
5
Index 0, Element = 1
Index 1, Element = 2
Index 2, Element = 3
Index 3, Element = 4
Index 4, Element = 5
1 2
3 4
5 6


In [21]:
# To create a list on the fly, use the range function
start = 0
stop = 10
step = 2
for i in range(start, stop, step):
    print(i)

0
2
4
6
8


***

# Functions

To organize your code, you'll want to collect related ideas into meaningful functions

In [25]:
# Lets define a function that returns the sum of the square of two numbers

def sum_squares(a, b):
    
    sq = a**2 + b**2
    
    return sq

# Now, we can use this to get the value whenever we want
# Below, we say that we 'call' the square function with 'arguments' 2 and 3
print(sum_squares(2, 3))

# Note: python reads your code line by line from top to bottom, so you'll 
# need to define your functions above where you call them


13


In [26]:
# We can also call functions inside of other functions
# Maybe we want a new function that takes the square root of the sum of squares

def sqrt_sum_squares(a, b):
    
    # First, get the sum of squares by calling our own function
    x = sum_squares(2, 3)
    
    # Then take the square root
    sqrt = np.sqrt(x)
    
    return sqrt


# Now, we call the sqrt_sum_squares function
print(sqrt_sum_squares(2, 3))

3.605551275463989


***

# Objects

Objects are an important part of writing larger python software systems but for now you will do fine to just write functions so in the interest of time I'm going to leave this for another lesson.

***

# Libraries

One of the primary benefits of python is the enormous ecosystem of code and libraries at your disposal. Some of the most important ones are:

[Numpy](https://numpy.org): Super performant, nuts and bolts computational tools for executing efficient mathematical operations and operating on raw data (arrays, mostly).

[SciPy](https://scipy.org): Higher level than numpy, SciPy provides algorithms for larger scientific concepts like optimization, integration, interpolation, eigenvalue problems, algebraic equations, differential equations, statistics and many more.

[Pandas](https://pandas.pydata.org): Fast and memory-efficient way of ingesting large databases like csv files. The API (meaning Pandas' built in functions that you can use) is extremely intuitive and the pandas backend handles all the data transformations for you. Use this library when working with data.

[Matplotlib](https://matplotlib.org): Workhorse of plotting. Make 1D, 2D, 3D, 4D (videos) easily. Most figures you want to make, you can make with matplotlib. For fancier plotting, you can also try [Seaborn](https://seaborn.pydata.org)

[scikit-learn](https://scikit-learn.org/stable/): Python's original learning-models-from-data library. Has intuitive functions for classifying data, clustering, dimensionality reduction, PCA, linear regression and so much more. Start here if you want to fit a model to data.


## Importing

To make these available to your python script, you need to import them. For the preceeding packages, you would import them (after you've installed them) like this at the top of your script:

In [1]:
import numpy as np
import scipy as scp
import pandas as pd
import matplotlib.pyplot as plt
import sklearn

You then access their functionality in your code by calling them using their assigned variable names. For example get the standard deviation of the values in a list with `np.std(your_list)` or load a csv file into a pandas dataframe using `pd.read_csv(your_file_path)`.

You can also import your own code. If you have two files in the same directory, say `main.py` and `utils.py` you can put your boring or complex utility functions in the file `utils.py` and then import them into `main.py` using the same syntax: `import utils`. Then you can call those functions with something like `utils.my_boring_function(arg_1, arg_2)`. This helps to keep your main script cleaner and easier to read.

## Installation

You can install these libararies easily from the command line in iTerm. I'll show you how to do this the right way in the following section using another library called `conda`.

But first...

***

# Using the terminal

Before we install conda, a quick review of the basics of using a terminal window (also called a shell). The terminal is just another way of interacting with the computer and it's file systems. You can browse files just like you do in windows, but you can also do things like download packages and pages from the internet, push your code to remote repositories (like github), communicate with other computers using a tool called `ssh` and run programs you've written. The terminal looks like the scary matrix but it's going to become your best friend very quickly.

When you open the terminal you're presented with an empty line, this is the command line, through which you will interact with the computer.

Here are some common commands:

`$: pwd`      'print working directory', shows you where you are

`$: ls -lrt`  prints the files in the current directory along with some additional info

`$: cd folder_name` changes your directory into the directory folder_name

`$: cd ..` changes your directory up a level into the parent directory

`$: history` shows you list of the commands you've recently entered

`$: cat file.txt` prints the contents of a file file.txt line by line

`$: wget target_url` pulls the file located at target_url down to the current directory on your machine

`$: mv file.txt /target/folder` move some file file.txt to another folder /target/folder

`$: mv file.txt new_name.txt` rename an existing file

`$: cp file.txt file_copy.txt` create a copy of a file with another name

`$: rm file.txt` delete a file file.txt (note, this is irreversible)

`$: mkdir new_folder` make a new directory in the current directory

`$: python script.py` run a python file

Don't worry too much about memorizing all these right now. As you get started, it is just good to know what kinds of things you can do in the terminal, so use this list as a quick reference as you practice and familiarize yourself with the terminal. Exciting times!

Ok, now on to Conda.

***

# Conda

When writing and executing python scripts, it is crucial that your script has access to the right libraries _and_ the right versions of those libraries. For this to happen, these libraries must be installed and accessible to the program. At first, it will be enough to just install them in the 'global' environment, i.e. just install them on your computer via the command line so that every python script you ever write has access to them. 

While this is convenient at first it QUICKLY becomes a problem as you write more programs that have different dependencies and require different versions of this or that library.

The solution is a tool called a 'virtual environment'. Virtual environments are much less scary than they sound. They're just like little rooms that you'll fill with tools for a particular task. When you install a software library in a virtual environment, it is like putting a tool in that room. Any python script that you run in some environment has access to only those libraries (tools) you've already installed in that environment (room). Because these are virtual spaces, you can make as many of them as you like and the tools from one will never interfere with the tools from another. Using virtual environments is a clean and easy way to keep your space organized and efficient.

I recommend you use a virtual environment manager called `conda`.

### Installation (one time)

To install conda, run the following commands from your home directory:

In iTerm, cd (change directory) into your home directory and download the miniconda installer

`$: cd ~/`

`$: wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh`

Run the installer using bash and say `y` for yes to agree to any prompts it gives you

`$: bash Miniconda3-latest-MacOSX-x86_64.sh`

Once complete, restart your terminal using the following command (don't worry too much about what this means, it is just reinitializing the state of your terminal so that it recognizes the newly installed conda package)

`$: source .bashrc`

Now, conda is installed and ready to use, and you can make sure using the following command, which will show the location of the conda executable:

`$: which conda`

### Creating a new conda environment

To create a new environment,

`$: conda create -n env_name python=3`

This will create a new environment called `env_name` and pre-install python3. It will ask you if you want to proceed with `Proceed ([y]/n)?` to which you type `y` and hit enter.

Once complete, you can enter the new environment with the following. You can see that you are in the new environment because `(env_name)` is prepended before your usual terminal alias.

`$: conda activate env_name`

`(env_name) $: `


Now, any library you install through the command line will be available _only_ in this environment `env_name`. And when you run a python script, that script will only have access to the libraries installed in this virtual environment. 

`(env_name) $: python script.py`

Now, while you're in the environment, you can install all the libraries I mentioned above using either the `pip` library (most common for python) or `conda` with a single command:

`(env_name) $: pip install -U numpy scipy pandas matplotlib scikit-learn`

From here, you can run your python scripts as you like. 

When you're ready to leave the environment, use the following command. Similarly, you'll know that you've exited the environment when `env_name` does not prepend your usual terminal alias:

`(env_name) $: conda deactivate`

`$: `

### Viewing conda environments

To view all your conda environments use

`$: conda env list`


To view all the installed libraries within a conda environment, activate the environment then list the libraries:

`$: conda activate env_name`

`(env_name) $: conda list`



***

# Workflow

Now, you have more than everything you need to start writing your first python script. To contextualize it a bit more, here is how my usual programming flow would proceed:

1. Have an idea
2. Find data
3. Create a conda environment
4. Enter that environment
5. Create a python script
6. Start coding, debugging, and installing packages in the virtual environment as needed
7. Get tired enough to start making mistakes
8. Try to push through it
9. Diminishing returns
10. Save everything
11. Exit the conda environment
12. Background process the code

Ok, joking aside, steps 1-6 and 10-11 are the usual flow. 

For now, you can write your python scripts in any text editor you like and run them through the command line using `python my_script.py`. To debug, you can use print statements like I have in this jupyter notebook. When running the script from the terminal, print statements in the python script will also be printed to the terminal.

^ This is a rather simplistic and clunky way to get started, but I don't want to burden your mind with managing a new programming language _and_ and new IDE (interactive development environment). However, in the coming weeks, we can meet again and review how to use an IDE called PyCharm that will make you 5X more productive overnight. Get excited.

***

# Best Practices

### Comments

It is crucial to comment your code well even if no one else is going to read it. Your future self will not remember why some function is so abstruse and jump to the conclusion it's author must be oh-so-very obtuse! So be kind to your future and past (current) self and document things. 

Here is an example from a function I wrote a while ago. Some of these lines would be hard to read without the higher level comments.

In [None]:
# Pre-process the financial data
def gen_data(frame):
    
    # Replace SID with node index
    ids = pd.unique(frame['sid'])
    for i, sid in enumerate(ids):
        frame['sid'].replace(sid, i, inplace=True)

    # Only keep non-NaN ints/floats - not optimal, need to handle these
    frame = frame.select_dtypes(exclude=['object', 'string']).dropna(axis='columns')

    # Fill any remaining NaN values - may not need this
    findata = findata.fillna(0)

    # Normalize float columns across the time dimension
    cols = frame.columns.values
    dtypes = list(frame.dtypes)
    cols = [name for name, dtype in list(zip(cols, dtypes)) if dtype in ['float32', 'float64']]
    normed = frame.groupby('sid')[cols].transform(lambda x: (x - x.min()) / (x.max() - x.min()))
    
    return normed

### Variables

When naming your variables, make sure they are descriptive so you can tell what they are later. The norm for python is to use underscores like

`num_interesting_genes`

`rat_health_status`

`rat_emotional_health_status`

etc.

### Function length

Functions should not be too long or they risk becoming unreadable. They should be somewhere over 1 line, 20 would be long, and 100 would be too long. As you go, try to break up the code into discrete pieces with understandable purposes, then put those pieces into their own functions. It makes your code much more readable and easier to understand.

***

# Next

* PyCharm
* Jupyter
* Creating python objects