# Jupyter

`shift` + `tab` to see docstring

# Python

## List Comprehension

List comprehension is a way to create a list with values that follow a pattern without using a for loop. It's kinda like a backwards for loop.

In [1]:
s = [x*2 for x in range(4)]
print(s)

[0, 2, 4, 6]


## Lambda Expressions

Lambda expressions are also called anonymous functions. They're pretty much functions with no names.

In [2]:
lambda x:x*2

<function __main__.<lambda>(x)>

You use lambda expressions in functions like `map` and `filter` when you don't want to define new functions.

In [3]:
list(map(lambda x:x**2, range(5)))

[0, 1, 4, 9, 16]

In [4]:
list(filter(lambda x:x%2==0, range(10)))

[0, 2, 4, 6, 8]

## Tuple Unpacking

Tuple unpacking is used when using a for loop to iterate through a list of tuples.

In [5]:
x = [(1,2), (3,4), (5,6)]

for a,b in x:
    print(a)

1
3
5


# NumPy

In [6]:
import numpy as np

## Creating Arrays From Lists

To create an array from a list (or list of lists), pass in the list (or list of lists).

In [7]:
lst = [1, 2, 3]
np.array(lst)

array([1, 2, 3])

In [8]:
mat = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
np.array(mat)

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

## Creating Arrays Using Built-In Methods

### `arange()`

In [9]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

### `zeros()`

In [10]:
np.zeros(4)

array([0., 0., 0., 0.])

In [11]:
np.zeros((5,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

### `ones()`

In [12]:
np.ones(3)

array([1., 1., 1.])

In [13]:
np.ones((3,4))

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

### `linspace()`

In [14]:
# start (inclusive)
# stop (inclusive)
# [number of points]

np.linspace(0,5,9)

array([0.   , 0.625, 1.25 , 1.875, 2.5  , 3.125, 3.75 , 4.375, 5.   ])

### `eye()`

In [15]:
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

### `random.rand()`

Returns numbers from a uniform distribution over [0, 1).

In [16]:
np.random.rand(4)

array([0.62442928, 0.51351865, 0.47632003, 0.48422101])

In [17]:
np.random.rand(3,3)

array([[0.15569856, 0.04731369, 0.42653294],
       [0.02925902, 0.75573999, 0.63025458],
       [0.49338099, 0.93117878, 0.42229167]])

### `random.randn()`

Returns numbers from a standard normal distribution centered at 0.

In [18]:
np.random.randn(3)

array([-1.37458424,  0.23974188, -0.79097505])

In [19]:
np.random.randn(3,3)

array([[ 0.98702107, -0.28320457,  0.92332875],
       [-0.79650427,  2.0653726 , -0.07897936],
       [ 0.05520292, -1.86895691,  0.84944658]])

### `random.randint()`

In [20]:
# low (inclusive)
# [high] (exclusive)
# [number] of integers

np.random.randint(1,50,3)

array([11, 19, 27])

## Array Methods

### `reshape()`

In [21]:
a = np.arange(25)
print(a)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24]


In [22]:
a.reshape(5,5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

---

In [23]:
b = np.random.randint(0,50,10)

### `max()`

In [24]:
print(b)
b.max()

[29  6 24 26 21  9 43 13 31 41]


43

### `min()`

In [25]:
print(b)
b.min()

[29  6 24 26 21  9 43 13 31 41]


6

### `argmax()`

Returns index location of max value.

In [26]:
print(b)
b.argmax()

[29  6 24 26 21  9 43 13 31 41]


6

### `argmin()`

Returns index location of min value.

In [27]:
print(b)
b.argmin()

[29  6 24 26 21  9 43 13 31 41]


1

### `shape`

In [28]:
print(a)
a.shape

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24]


(25,)

In [29]:
print(b)
b.shape

[29  6 24 26 21  9 43 13 31 41]


(10,)

In [30]:
c = a.reshape(5,5)
print(c)
c.shape

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]


(5, 5)

### `dtype`

In [31]:
a.dtype

dtype('int64')

### `copy()`

In [32]:
a_copy = a.copy()
print(a_copy)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24]


## Array Indexing

We can use single brackets to index into a 2D array by using a comma.

In [33]:
mat = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
mat

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [34]:
mat[2,1]

8

### Conditional Indexing

We can use boolean arrays to grab values from an array that satisfy a condition.

In [35]:
arr = np.arange(1,11)
arr

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [36]:
boo = arr > 5
boo

array([False, False, False, False, False,  True,  True,  True,  True,
        True])

In [37]:
arr[boo]

array([ 6,  7,  8,  9, 10])

In [38]:
arr[arr > 5]

array([ 6,  7,  8,  9, 10])

## Universal Functions

https://docs.scipy.org/doc/numpy/reference/ufuncs.html

A universal function is a function that operates on an array element-by-element.

In [39]:
a = np.arange(11)
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

### `sqrt()`

In [40]:
np.sqrt(a)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ,
       3.16227766])

### `exp()`

In [41]:
np.exp(a)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03, 2.20264658e+04])

### `sin()`

In [42]:
np.sin(a)

array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ,
       -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849,
       -0.54402111])

### `log()`

In [43]:
np.log(a)

  """Entry point for launching an IPython kernel.


array([      -inf, 0.        , 0.69314718, 1.09861229, 1.38629436,
       1.60943791, 1.79175947, 1.94591015, 2.07944154, 2.19722458,
       2.30258509])

# Pandas

In [44]:
import pandas as pd

## Series

A Series is like a table with two columns, where the first column is the indices and the second column is the data.

### Creating Series From Lists

In [45]:
labels = ['a', 'b', 'c']
data = [1, 2, 3]

pd.Series(data, labels)

a    1
b    2
c    3
dtype: int64

### Creating Series From NumPy Arrays

In [46]:
pd.Series(np.array(data), labels)

a    1
b    2
c    3
dtype: int64

### Creating Series From Dictionaries

In [47]:
d = {'a':1, 'b':2, 'c':3}

pd.Series(d)

a    1
b    2
c    3
dtype: int64

## DataFrames

A DataFrame is a bunch of Series with shared indices.

### Creating DataFrames

When manually creating a DataFrame, the order is `data, row names, column names`.

In [48]:
df = pd.DataFrame(np.random.randint(1,100,20).reshape(5,4), ['A', 'B', 'C', 'D', 'E'], ['W', 'X', 'Y', 'Z'])
df

Unnamed: 0,W,X,Y,Z
A,14,63,82,32
B,44,13,25,92
C,29,52,83,8
D,63,60,37,97
E,18,62,71,11


### Grabbing Columns

In [49]:
df['W']

A    14
B    44
C    29
D    63
E    18
Name: W, dtype: int64

To grab multiple columns, pass in a list of the column names.

In [50]:
df[['W', 'Y']]

Unnamed: 0,W,Y
A,14,82
B,44,25
C,29,83
D,63,37
E,18,71


### Adding New Columns

In [51]:
df['&'] = np.random.randint(1,100,5)
df

Unnamed: 0,W,X,Y,Z,&
A,14,63,82,32,11
B,44,13,25,92,24
C,29,52,83,8,1
D,63,60,37,97,1
E,18,62,71,11,63


### Removing Columns

In [52]:
df.drop('&', axis=1, inplace=True) # inplace changes dataframe directly
df

Unnamed: 0,W,X,Y,Z
A,14,63,82,32
B,44,13,25,92
C,29,52,83,8
D,63,60,37,97
E,18,62,71,11


### Removing Rows

In [53]:
df.drop('E')

Unnamed: 0,W,X,Y,Z
A,14,63,82,32
B,44,13,25,92
C,29,52,83,8
D,63,60,37,97


### Grabbing Rows

We can use `loc` and pass in the name of the row.

In [54]:
df.loc['A']

W    14
X    63
Y    82
Z    32
Name: A, dtype: int64

Or we can use `iloc` and pass in the index of the row.

In [55]:
df.iloc[0]

W    14
X    63
Y    82
Z    32
Name: A, dtype: int64

### Grabbing Single Values

To grab a single value, specify the name of the row and the name of the column separated by a comma.

In [56]:
df.loc['B','Z']

92

### Grabbing Subsets

To grab a subset of the dataframe, pass in a list of the row names and a list of the column names separated by a comma.

In [57]:
df.loc[['A','C'],['X','Y']]

Unnamed: 0,X,Y
A,63,82
C,52,83


### Conditional Selection

In [58]:
df[df % 2 == 0]

Unnamed: 0,W,X,Y,Z
A,14.0,,82.0,32.0
B,44.0,,,92.0
C,,52.0,,8.0
D,,60.0,,
E,18.0,62.0,,


In [59]:
df[df['W'] % 2 == 0]

Unnamed: 0,W,X,Y,Z
A,14,63,82,32
B,44,13,25,92
E,18,62,71,11


To use multiple conditions, use an ampersand `&` or pipe operator `|`.

In [60]:
df[(df['W'] % 2 == 0) | (df['Y'] > 0)]

Unnamed: 0,W,X,Y,Z
A,14,63,82,32
B,44,13,25,92
C,29,52,83,8
D,63,60,37,97
E,18,62,71,11


### Changing Indices

We can change the indices back to the default indexing.

In [61]:
df.reset_index(inplace=True)
df

Unnamed: 0,index,W,X,Y,Z
0,A,14,63,82,32
1,B,44,13,25,92
2,C,29,52,83,8
3,D,63,60,37,97
4,E,18,62,71,11


If there's a column that we want to use as the new indices, we can set the index by passing in the column name.

In [62]:
df.set_index('index')

Unnamed: 0_level_0,W,X,Y,Z
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A,14,63,82,32
B,44,13,25,92
C,29,52,83,8
D,63,60,37,97
E,18,62,71,11


### Multi-Index And Index Hierarchy

A Dataframe can have multiple levels of indices.

In [64]:
outside = ['G1', 'G1', 'G1', 'G2', 'G2', 'G2']
inside = [1, 2, 3, 1, 2, 3]
hier_index = list(zip(outside, inside))
hier_index

[('G1', 1), ('G1', 2), ('G1', 3), ('G2', 1), ('G2', 2), ('G2', 3)]

In [66]:
hier_index = pd.MultiIndex.from_tuples(hier_index)

In [68]:
df = pd.DataFrame(np.random.randn(6,2), hier_index, ['A','B'])
df

Unnamed: 0,Unnamed: 1,A,B
G1,1,-0.082631,-0.618233
G1,2,-0.024627,2.954915
G1,3,-0.027684,0.374461
G2,1,-2.013856,-1.289841
G2,2,-1.159688,-0.810128
G2,3,-0.601273,-0.64731


In [69]:
df.loc['G1']

Unnamed: 0,A,B
1,-0.082631,-0.618233
2,-0.024627,2.954915
3,-0.027684,0.374461


In [72]:
df.loc['G1'].loc[1]

A   -0.082631
B   -0.618233
Name: 1, dtype: float64

To add index names, create a list with the names of the index columns.

In [73]:
df.index.names = ['Groups', 'Numbers']
df

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B
Groups,Numbers,Unnamed: 2_level_1,Unnamed: 3_level_1
G1,1,-0.082631,-0.618233
G1,2,-0.024627,2.954915
G1,3,-0.027684,0.374461
G2,1,-2.013856,-1.289841
G2,2,-1.159688,-0.810128
G2,3,-0.601273,-0.64731


We can grab a cross-section of the DataFrame to grab some rows from each index. To do that, we specify the name of the row and the name of the column in which it exists.

In [77]:
df.xs(1, level='Numbers')

Unnamed: 0_level_0,A,B
Groups,Unnamed: 1_level_1,Unnamed: 2_level_1
G1,-0.082631,-0.618233
G2,-2.013856,-1.289841
