<a href="https://colab.research.google.com/github/DeanPhillipsOKC/pandas-notes/blob/master/Pandas_DataFrames_Part_One.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import numpy as np
import pandas as pd

In this example we'll create a 5 x 4 array of random integers between 0, and 100.  While assign columns, and rows too

In [0]:
columns = ['W', 'X', 'Y', 'Z']
rows = ['A', 'B', 'C', 'D', 'E']

In [0]:
np.random.seed(42)
data = np.random.randint(0, 500, (5,4))

In [0]:
df = pd.DataFrame(data=data, index=rows, columns=columns)

In [8]:
df

Unnamed: 0,W,X,Y,Z
A,102,435,348,270
B,106,71,188,20
C,102,121,466,214
D,330,458,87,372
E,99,359,151,130


We can grab a single column, by specifying it

In [9]:
df['W']

A    102
B    106
C    102
D    330
E     99
Name: W, dtype: int64

We can specify multiple columns by putting them in a list

In [13]:
df[['W', 'Z']]

Unnamed: 0,W,Z
A,102,270
B,106,20
C,102,214
D,330,372
E,99,130


We can add new columns by just using what we want to name to be and assigning something to it.  In this case we create it and give it values that are equal to the fields in W, plus the fields in Z

In [0]:
df['New'] = df['W'] + df['Z']

In [15]:
df

Unnamed: 0,W,X,Y,Z,New
A,102,435,348,270,372
B,106,71,188,20,126
C,102,121,466,214,316
D,330,458,87,372,702
E,99,359,151,130,229


We can drop columns, but using the drop function.  In this case we specify an axis of 1 because we want the columns dropped (1 in this case refers to the columsn because we are dealing with a 2d array, which are indexed by row, then column)

In [0]:
df = df.drop('New', axis=1)

In [17]:
df

Unnamed: 0,W,X,Y,Z
A,102,435,348,270
B,106,71,188,20
C,102,121,466,214
D,330,458,87,372
E,99,359,151,130


We can specify a row by using the loc property.  Multiple rows can be specified by using a list

In [20]:
df.loc['A']

W    102
X    435
Y    348
Z    270
Name: A, dtype: int64

In [21]:
df.loc[['B', 'C']]

Unnamed: 0,W,X,Y,Z
B,106,71,188,20
C,102,121,466,214


Alternatively, we can access rows by index (including spread notation) by using the iloc property

In [23]:
df.iloc[0]

W    102
X    435
Y    348
Z    270
Name: A, dtype: int64

In [24]:
df.iloc[2:]

Unnamed: 0,W,X,Y,Z
C,102,121,466,214
D,330,458,87,372
E,99,359,151,130


We drop rows exactly the same way as we drop columns (only the axis specification is different).  We use zero for the axis because we are dealing with a 2 dimensional array which use row as the first index in their shape specification

In [0]:
df = df.drop('C', axis=0)

In [26]:
df

Unnamed: 0,W,X,Y,Z
A,102,435,348,270
B,106,71,188,20
D,330,458,87,372
E,99,359,151,130


We can use the loc property to hone into to a range of one or more rows, and columns.  Rows get specified first, and then columns

In [27]:
df.loc['B', 'Y']

188

In [28]:
df.loc[['B', 'D'], 'Y']

B    188
D     87
Name: Y, dtype: int64

In [29]:
df.loc[['B', 'D'], ['Y', 'Z', 'W']]

Unnamed: 0,Y,Z,W
B,188,20,106
D,87,372,330
