# Python Cheat Sheet 
#### This notebook contains an overview of commonly used Python commands and packages. This outline was adapted from Dr. Gail Christenson's (UT Austin) class: Python in Geosciences.

#### Topics Covered: 
* Python Basics 
* Lists
* if/elif/else statements 
* Loops 
* Functions
* Matplotlib
* Numpy
* Pandas
* Machine Learning

## Python Basics

##### Simplest Data Types: Integer, Float, String, Boolean 
* Integer = whole numbers 
* Float = numbers with decimal points or exponents 
* String = sequence of text characters enclosed in single or double quotes 
* Boolean = True or False 

##### Default type
* Python will sensibly assign variable type
* You can declare what type you want with int, str, float
* See what type a variable is by %whos or type()
* Converting a boolean value into a float or integer will result in zero (False) or one (True)

##### Addition, subtraction, multiplication
* Between two integers = integer
* Between two floats = float
* Between integer and float = float
* Can have multiple operators on same line
* Multiplication takes precedence over addition and subtraction
* When in doubt about precedence use parentheses

##### Tab completion
* Jupyter notebooks have 'Tab completion'
* After you type some characters and then hit 'Tab', a menu will appear with all things you might type that start with those characters
* Type '%wh' and then the Tab key in the cell below

|Command|Function|
|:- | :-|
| %who  | see active variables|
| %whos | see active variables + variable type| 
| type()| see variable type| 
|del a,b,c| delete variables| 
| %reset| delete all variables|
|value = input('Enter a value') | ask user for an input (returns string)|
|value = int((input('Enter an integer') | turns input string into an integer|
|value = float(input('Enter a number') | turns input string into a float|
|a=b=c=2 or x,y=3,4 | Multiple Variable Assignment|
| / | floating-point (decimal) division|
|// | integer (truncating) division|


## Lists
* Python indeces start with zero 
* Lists are mutable 
* Can create nested lists
* Can combine lists with + 


|Command| Function|
|:- | :-|
|list1=[]| create a list|
|list2=list() or list3 =list('cat') | create an empty list or turn other data types into lists|
|list[0] | access the first element of the list| 
|list[-1] | access the last element of the list|
|list[0][0]| first value of first element in nested list|
|list1[index1:index2]| slicing a list *note index2 is exclusive (list will not contain index2)|
|list1[index1:index2:step| slicing a list with a step interval|
|list1.remove(7)|remove an item from a list e.g. 7|
|list1.append(7)|append an item to a list e.g.7|
|del list1[0] | delete item by index |
|list1.count(7)| gives you number of times 7 is in the list|
|len(list1) | number of items in the list|
|sorted()| return copy of sorted list|
|list1.index(7)| returns index for 7 in list|
|copy()|make copy of list|





## if, elif, else Statements

##### Comparison Operators: Return a Value of True or False
*  == equality
* != inequality
* < less than
* \> greater than
* <= less than or equal to
* \>= greater than or equal to
* testing membership: print(5 in a) 

##### Logical (Boolean) Operators 
* and - True if both operands are True
* or - True if either operand is True
* not - return the opposite

##### Example 
*   vowels = ['a','e','i','o','u','A','E','I','O','U']
   
    letter = 'M'
    
    if letter in vowels:
        print(letter,'is a vowel')
    else:
        print(letter,'is not a vowel')
        
##### Example with multiple if/elif/else statements 
* furry = True
  small = False
  if furry:
    if small:
        print('Is it a cat?')
    else:
        print('Is it a bear?')
  else:
    if small:
        print('Is it a skink?')
    else:
        print('Is it a human?')

## Loops

|Command|Function|
|:-|-:|
|for i in list: | use 'for' to iterate through a python iterable object such as a list|
|range(5,1,-1) | use with 'for' to iterate through 5,4,3|
|zip() | iterate over multiple lists|
|while | repeat in a loop until a condition is met|
|break | break out of a while or for loop|


## Functions

##### Functions are used when you want to repeat a series of steps, but maybe with different values.
* Functions are named
* Functions can take any number and type of input parameters
* Functions can return any number and type of output results
* We define a function with def
* Use return statement to provide value back to the user

##### Factorial Example 
* def factorial1(n):
    fact = n
    mycount = n-1
    while mycount >= 1:
        fact = fact*mycount
        mycount -= 1
    return fact

## Matplotlib

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

|Command|Function|
|:- | :-|
|plt.plot(x,y)| plot x vs y as line|
|plt.rcParams['font.size']=14| set default font size in notebook|
|plt.plot(x,y,'b')|change color of line b=blue, r=red,y=yellow,g=green,k=black,m=magenta,c=cyan|
|plt.plot(x,y,linewidth=2.0)|change linewidth|
|plt.plot(x,y,linestyle='dotted')|change linestyle, can also use short commands i.e. ':' for dotted, '--' dashed, '-.' dot-dash|
|plt.plot(x,y,'s')| 's' for square points, 'o' for circle|
|plt.plot(x,y,marker='x')| add markers to line, can also do '*'|
|plt.plot(x,y1,'s',markerfacecolor='green',markeredgecolor='blue', markersize=10.0,markeredgewidth=1.5)| example of specifications|
|plt.plot(x,y1,label='line1')| add label for legend|
|plt.legend(loc='best')| add legend|
|plt.xlim([0, 300]) and plt.ylim([-1, 2])| limit axes|
|plt.xlabel('Degrees') and plt.ylabel('Value')| label axes|
|plt.xticks([0,90,180,270,360]) and plt.yticks([-1,0,1,2])| specify tick increments|
|plt.tick_params(axis='both',length=10.0)| specify size of axes ticks|
|plt.gca().invert_xaxis()| invert axis, use yaxis() to invert y|
|plt.loglog()| log plot|
|plt.semilogx()| semilog plot x axis, change to y for y axis|
|plt.style.available| shows available plot styles|
|plt.style.use('seaborn-bright')| specify plot style|
|fig=plt.figure(figsize=(10,10))| specify figure size before plotting|


## Numpy 

In [None]:
import numpy as np

|Command|Function|
|:- | :-|
|arr1=np.array([2,3,4])|create array from list|
|arr1.dtype| get data type numpy used to store the array|
|np.zeros(5)| create array of zeros|
|np.ones(5)| create array of ones|
|np.ones(5,dtype='int'|specify data type in array|
|np.arange(5)|create a sequence of numbers i.e. [0 1 2 3 4]|
|np.arange(5,8)| sequence from 5-7 i.e. [5,6,7]|
|np.arange(0,10,2)| sequence from 0-8 with step of 2 i.e. [0,2,4,6,8]|
|np.linspace(0,5,11)| create a sequence using start value, end value, and number of elements|
|np.zeros(5,2)| can create higher dimensional arrays rowsxcolumns|
|arr1.shape| get dimensions of array|
|arr1.size| get number of elements|
|arr1.ndim| get how many dimensions|
|arr1.reshape(rows,columns)|reshape an array|
|arr1.T| transpose an array|
|arr1*arr2| not the same as matrix multiplication it is element by element|
|ind=arr==2| boolean indexing, finds where this condition is true in the array|
|arr[ind]| logical indexing, returns the true from above|
|arr2=arr1[4:7]|slices are views of the original array, any changes made to arr2 will be reflected in arr1|
|arr2=arr1[4:7].copy()| need to use .copy() if you dont want to change the original array|
|abs,sqrt,square,exp,log,log10,sign,cos,sin,tan,arcos,arcsin,arctan| can be used on one array e.g. np.abs(arr1)|
|np.ceil(), np.floor(), np.rint()| can be used on one array, ceil-round to higher integer, floor-round to lower int, rint-round to nearest integer and preserve dtype|
|np.pi, np.e| gives you values for pi and e|
|np.random.rand(n1,n2)|returns a n1xn2 array of random numbers between 0 and 1|
|np.random.randn(n1,n2)| returns a n1xn2 array of random numbers with a standard normal distribution centered at 0 with variance 1|


## Pandas 

In [None]:
import pandas as pd

|Command|Function|
|:- | :-|
|df = pd.read_csv('filename.csv')| construct a dataframe from a csv file|
|df.head()|Look at top 5 rows|
|df.tail()|Look at last 5 rows|
|df.shape| Get size of dataframe|
|df.info()| Get info on dataframe|
|df.describe()| Get basic statistics of columns|
|df.columns| Get column labels|
|age = df["Column Name"]| gives you a pandas series with indeces from the column labeled Age|
|age2=df["Column Name"].values| gives you just the values|
|df2 = df[["Column Name 1","Column Name 2"]]| access multiple columns by putting column names in a list|
|df.iloc[0:5]| access certain parts of dataframe, here it is the first 5 rows|
|df.iloc[:,0:3]| all rows and first three columns (after indeces)|
|df["Column Name"].unique()| find unique values|
|df.plot(x="Height",y="Weight",kind='scatter',figsize=(5,4),color='red')| Example of plotting with pandas|
|df_gold=df[df["Medal"]=='Gold']| Get a subset from dataframe using conditional operations|
|df_gold_height=df[df["Medal"]=='Gold'][["Name","Team","Year"]]| Get subset with three columns (name team year) for all gold medalists|
|df_goldsilver = df[(df["Medal"]=='Gold') '|' (df["Medal"]=='Silver')][['Height','Weight']] | combine two, don't need '' |
| .mean(), .median(), .std(), .sem(), .min(), .max(), .count(), .sum()| basic statistics e.g. df['Age'].min()|
| del df['Age']| delete a column|
|df['Age minus 50']=df['Age']-50| create new column|
|df_sort = df.sort_values(by=["Age"])|sort values by column you choose|
|df_gold_sort2 = df_gold.sort_values(by=["Year"],ascending=False)| sorting in descending order|
|df.to_csv('filename.csv',index=False)|write out dataframe to csv (can choose to not have index)|






## Machine Learning

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs, make_moons, load_digits
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.linear_model import Perceptron
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.manifold import TSNE
from sklearn.cluster import KMeans, AgglomerativeClustering
from sklearn.metrics import accuracy_score, confusion_matrix
import seaborn as sns; sns.set()

### 1. Supervised learning 
* Data is labeled and model is trained to make correct predictions 
* Regression: used to predict real numerical values e.g. home sales prices, stock market prices 
* Classification: classify things into categories e.g. email spam filters, fraud detection, image classification 

##### K-Nearest Neighbor Classification
* Xtrain,Xtest,Ytrain,Ytest = train_test_split(X,Y, train_size=0.7, random_state_42)
* model = KNeighborsClassifier(n_neighbors = 3) 
* model.fit(Xtrain,Ytrain).predict(Xtest)
* accuracy_score(ytest, y_model)
* for i in range(len(ytest)):
    if ytest[i] != y_model[i]:
        plt.plot(Xtest[i,0],Xtest[i,1],'sk',markersize=10) %plot the misclassified points
* matrix = confusion_matrix(ytest, y_model) %visualize accuracy with confusion matrix

##### Random Forest Classifier 
* forest= RandomForestClassifier(n_estimators=5, random_state=2) %define parameters for model
* y_model = forest.fit(Xtrain, ytrain).predict(Xtest)
* sns.pairplot(df, hue='Item to color by', height=1.5) %examine the features
* sns.regplot(), sns.lmplot() %used to visualize linear relationship
* n_features = model_rf.n_features_ %get number of features
* plt.barh(np.arange(n_features), model_rf.feature_importances_, align='center') %plot feature importances

##### Other Classifiers
* logreg = LogisticRegression()
* y_pred = logreg.fit(Xtrain,Ytrain).predict(Xtest)

* gaussian = GaussianNB()
* y_pred = gaussian.fit(Xtrain,Ytrain).predict(Xtest)

* svc = SVC()
* y_pred = svc.fit(Xtrain,Ytrain).predict(Xtest)

* perceptron = Perceptron(class_weight='balanced')
* y_pred = perceptron.fit(Xtrain,Ytrain).predict(Xtest)

* gbk = GradientBoostingClassifier()
* y_pred = gbk.fit(Xtrain,Ytrain).predict(Xtest)

* ada = AdaBoostClassifier(n_estimators=400, learning_rate=0.1)
* y_pred = ada.fit(Xtrain,Ytrain).predict(Xtest)


### 2. Unsupervised learning
* Data is not labeled 
* Model tries to identify patterns without external help 
* Clustering: providing purchase recommendations for an ecommerce website 
* Anomaly Detection: e.g. someone using your credit card 

##### Visualize dataset using the t-SNE manifold learning algorithm 
* tsne = TSNE() 
* data_tsne = tsne.fit_transform(data.data)
* df_data = pd.DataFrame(digits_tsne, columns=['TSNE1','TSNE2'])  
* df_data["value"] = data.target
* sns.lmplot("TSNE1", "TSNE2", hue='value', data=df_data, fit_reg=False);

##### K-Means Clustering 
* data_km = KMeans(n_clusters=10, random_state=0)
* data_clusters_km = data_km.fit_predict(X_data)
* Can use Elbow method to help determine number of clusters - want small value of k that still has a low SSE 
*   sse=[]

    for i in range(1,20):
        kmeans=KMeans(n_clusters=i,init='k-means++',)
        kmeans.fit(X_digits)
        sse.append(kmeans.inertia_)
    plt.plot(range(1,20),sse)
    plt.title('Elbow Method')
    plt.xlabel('Number of clusters')
    plt.ylabel('SSE' );


### 3. Reinforcement Learning
* Conceptually similar to human learning processes 
* Learns best set of actions to take given a current environment in order to get most reward overtime e.g. recommendations by netflix 

### 4. Deep Learning 
* Tries to loosely emulate how the human brain works 
* Applications: Natural language processing, image audio and video analysis, time series forecasting, etc 
* Requires typically very large datasets of labeled data and is computationally expensive 


