# Installation Prerequsites 

## Anaconda Installation

Anaconda / Conda

* Platform that allows you to manage python versions and packages through "environments"
* Includes pip - python's package manager that can be used to expand the capabilities of a base python environment
* Each environment can run different versions of python
* Each environment has a different set of installed packages

Install Link: https://www.anaconda.com/distribution/

* Download Anaconda Python 3.7 version
* Use default settings for installation
* If your username has a space in it, anaconda will warn that this might cause install issues

## Create Conda Environment
**Windows**: Open "Anaconda Prompt" through Start menu

**Mac**: Open "Terminal" through spotlight search

Enter the following command to create a new environment named "py37" with python version 3.7
```
conda create --name py37 python=3.7
```
Activate the environment you just created
```
conda activate py37
```

Conda Cheet Sheet is a useful reference: http://bit.ly/2xknudL

## Install Packages
Many features are already included in python, but many packages can be installed to expand the capabilities of python

These packages can be installed using pip - python's package installer - or conda

### jupyter-notebook
* a web-based interactive development environment (IDE) that allows you to interactively develop python scripts, analyze and visualize data
* A notebook is comprised of a series of cells; can either be a python code cell or a markdown text cell
* Can execute the contents of the active cell by clicking (Shift+Enter)
* Code cells are run in the order you execute them
Install jupyter-notebook and add our conda environment as a python kernel (the python environment in which we'll run our code)
```
pip install jupyter
python -m ipykernel install --user --name=py37
```

### Matplotlib
* A package for creating visualizations
```
pip install matplotlib
```

### Pandas
* Used for working with datasets
* Load data into pandas DataFrame
* Can query large datasets
```
pip install pandas
```

### Numpy
* Another useful tool for mathematical operations and analyzing data
```
pip install numpy
```

### Sklearn
* Another useful tool for mathematical operations and analyzing data
```
pip install sklearn
```

## Start-up jupyter
In Terminal/ Anaconda Prompt run:
```
jupyter-notebook
```
This will open the jupyter console in your web browser

### Unix (mac terminal) Commands

Check current directory
```
pwd
```

List directory contents
```
ls
```

Change directory
```
cd < insert directory name without brackets >
```

Make new directory
```
mkdir < directory name >
```

# Getting Started with python

Import packages that we'll be using

In [None]:
import numpy as np
import pandas as pd
# iPython magic to allow interactive plots
%matplotlib notebook
# Import for 3D plots
from mpl_toolkits.mplot3d import Axes3D
# Plotting library
import matplotlib.pyplot as plt
# Colormap Library
from matplotlib import cm
import sklearn

## Variables

### Integers and Floating Point Numbers

In [None]:
# Define two integers
num1 = 4
num2 = 3
total = num1 + num2
product = num1 * num2
ratio = num1 / num2

# int + float = float
float1 = .125
f1 = num1 + float1

In [None]:
f1

In [None]:
num1

In [None]:
num1 += 2

In [None]:
num1

**Printing**

In [None]:
print("sum:",total, " product:",product, "ratio:", ratio,"\n")
print("sum: {}\nproduct: {}\nratio: {:.3f}".format(total, product, ratio))

### Bool

In [None]:
print("num1: ",num1, "\nnum1 > 3: ", num1 > 3)

### Strings
https://www.w3schools.com/python/python_ref_string.asp

In [None]:
first_half = "the quick brown fox"
second_half = "jumped over the lazy dog"
full_sentence = first_half + " " + second_half

In [None]:
full_sentence

In [None]:
first_half.replace("fox","bear")

Check the docs to see if the function returns the result or sets the value in-place

In [None]:
first_half

In [None]:
first_half[0]

In [None]:
first_half[4:9]

In [None]:
second_half[ second_half.find("the") + 4 : second_half.find(" dog") ]

### Lists, Tuples
https://www.w3schools.com/python/python_ref_list.asp

In [None]:
list_1 = [1,2,3,4,5]
tup_1 = (6,7,8)
list_2 = list(tup_1)

In [None]:
list_1 + list_2

This is an example of an in-place function

In [None]:
list_1.reverse()

In [None]:
list_1

You can index a list and get a subsequence

In [None]:
list_1[3]

In [None]:
list_1[:2]

In [None]:
list_1[2:-1]

Lists are mutable

In [None]:
list_1

In [None]:
list_1[2] = 55

In [None]:
list_1

Tuples are not mutable

In [None]:
# try running this code block and if an error gets raised do that
try:
    tup_1[1] = 5
except:
    print("TUPLES ARE NOT MUTABLE")

### Sets

In [None]:
fruits = set(["apple", "orange", "banana"])
vegetables = set(["broccoli", "carrot", "lettuce", "olives"])
food_I_dont_like = set(["olives", "anchovy"])

In [None]:
fruits

In [None]:
fruits.union(vegetables)

In [None]:
vegetables.difference(food_I_dont_like)

### Dictionaries

In [None]:
me = {"first name": "Dan", "last name": "Zeiberg", "age": 24}
pedja = {"first name": "Predrag", "last name": "Radivojac"}

In [None]:
people = [me, pedja]

In [None]:
people

### Nump Arrays, Matrices

In [None]:
arr_1 = np.array([6,2,6,8,1,6,22,6,8,999])
arr_2 = np.random.randint(0,25,size=10)

In [None]:
arr_1

You can do arithmetic on arrays

In [None]:
arr_2 / arr_1

In [None]:
matrix_0 = np.array([[1,2,3],[4,5,6],[7,8,9]])

In [None]:
matrix_0

**Indexing Matrices**

In [None]:
matrix_0[1,2]

Numpy allows you to draw from statistical distributions

In [None]:
matrix_1 = np.random.normal(loc=0, scale=1, size=[5,4])

In [None]:
matrix_1

# Pandas DataFrames

Load DataFrame from file

boston housing dataset: https://www.kaggle.com/puxama/bostoncsv

In [None]:
housing = pd.read_csv("../data/BostonHousing.csv")

In [None]:
housing

In [None]:
people_df = pd.DataFrame(people)

In [None]:
people_df

In [None]:
# each element in the dataframe is uniquely identified by it's index value
# Lets set the index for the people DataFrame to "last name"
people_df = people_df.set_index("last name")

In [None]:
people_df

Get the age of the person represented by the first row

Can index a dataframe by row number

In [None]:
housing.iloc[0]["age"]

Or you can index by the index (key) and column name

Get Dan's age

In [None]:
people_df.loc["Zeiberg","age"]

Extract Column of DataFrame, convert to numpy array, limit to first 10 values

In [None]:
housing["age"].values[:10]

You can query a DataFrame

In [None]:
housing[housing["age"] > 33]

In [None]:
housing[(housing["age"] > 33) & (housing["tax"] <= 350)]

## Joining DataFrames

In [None]:
people_df

In [None]:
publications_df = pd.DataFrame([
    {"title": "Fast Nonparametric Estimation of Class Proportions in the Positive-Unlabeled Classification Setting",
     "year": 2020,
     "first author": "Zeiberg"},
    {"title": "Prediction of boundaries between intrinsically ordered and disordered protein regions",
     "year": 2003,
     "first author": "Radivojac"}])

In [None]:
publications_df

In [None]:
people_and_publication = people_df.merge(right=publications_df, how="left", left_on="last name", right_on="first author")

In [None]:
people_and_publication

In [None]:
# remove the first author column
people_and_publication = people_and_publication.drop("first author",axis=1)

In [None]:
people_and_publication = people_and_publication.rename(columns={"title":"publication name"})

In [None]:
people_and_publication

# Conditionals and Loops

In [None]:
randnum = np.random.choice([1,2,3,4])
if randnum == 1:
    print("chose first value")
elif randnum == 2:
    print("chose second value")
elif randnum == 3:
    print("chose third value")
else:
    print("choes fourth value")
    

You can repeat a block of code using for loops and while loops

**For loops**

iterate over a list of values, this can be loop indices or other data

In [None]:
for i in range(10):
    print(i)

In [None]:
for person in people:
    print(person["first name"])

Loop over matrix values

In [None]:
# Loop over each row
for r in range(matrix_1.shape[0]):
    # Loop over each column
    for c in range(matrix_1.shape[1]):
        print(matrix_1[r,c])
    print()

**While Loops**

Continue executing a block of code while the specified condition is false

In [None]:
values = [1,7,1,6,888,221,5]
idx = 0
while values[idx] < 50:
    idx += 1
print("first big number is ",values[idx])

# Functions

In [None]:
def add(a,b):
    return a+b

In [None]:
add(4,5)

In [None]:
def factorial(x):
    if x > 0:
        return x * factorial(x-1)
    elif x == 0:
        return 1
    else:
        raise Exception("Input must be non-negative")

In [None]:
factorial(5)

# Classes

In [None]:
class Person:
    def __init__(self, name, hair_color, eye_color):
        self.name = name
        self.hair_color = hair_color
        self.eye_color = eye_color

    def introduce(self):
        print("Hi, my name is {}".format(self.name))
    
    def converse(self, other_person):
        print("Hi {}, how are you today?".format(other_person.name))

In [None]:
dan = Person("Dan", "brown", "green")
emily = Person("Emily", "blond", "blue")
dan.introduce()
emily.converse(dan)

# Plotting

In [None]:
ph = pd.read_csv("../data/pH-example.txt")

In [None]:
ph

In [None]:
fig,ax = plt.subplots(1,1)
ax.plot(ph["time"], ph["v"])

In [None]:
fig, ax = plt.subplots(1,1)
hist = ax.hist(housing["age"],bins=30)

3D-Plot

In [None]:
from scipy.stats import multivariate_normal as mvn

In [None]:
grid = np.zeros((20,20))
for i in range(grid.shape[0]):
    for j in range(grid.shape[1]):
        grid[i,j] = mvn.pdf([i,j],[9,9],[[5,0],[0,5]])

In [None]:
fig = plt.figure()
ax = fig.gca(projection='3d')
X,Y = np.meshgrid(list(range(grid.shape[0])), list(range(grid.shape[1])))
surface = ax.plot_surface(X,Y,grid, cmap=cm.coolwarm)

# Example

In [None]:
ph_arr = ph.values

In [None]:
delta = ph_arr[1:,1] - ph_arr[:-1,1]

In [None]:
fig, axes = plt.subplots(2,1, figsize=(6,5))
# adjust vertical spacing between subplots
plt.subplots_adjust(hspace=0.3)
axes[0].plot(ph_arr[:,0],ph_arr[:,1],'--o', linewidth=2, markersize=2, label="pH")
axes[0].set_ylabel("pH")
axes[1].plot(ph_arr[1:,0],delta, color="red", label="$\Delta$ pH")
axes[1].set_ylabel("$\Delta$ pH")
xlab = axes[1].set_xlabel("Time, h")