## Installation

conda install pandas

# DataFrames

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.DataFrame(np.random.randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())

In [3]:
df

Unnamed: 0,W,X,Y,Z
A,1.174356,-0.529688,0.146518,-0.425011
B,-1.420156,-0.468972,0.323547,0.650965
C,0.745205,-0.289836,-0.763281,0.964985
D,0.213373,1.321038,-0.218712,-0.352165
E,-1.013214,-0.163885,1.447395,0.639646


## Selection and Indexing

Let's learn the various methods to grab data from a DataFrame

In [4]:
df.iloc[0:3,1:4]

Unnamed: 0,X,Y,Z
A,-0.529688,0.146518,-0.425011
B,-0.468972,0.323547,0.650965
C,-0.289836,-0.763281,0.964985


In [5]:
# Pass a list of column names
df[['W','Z']]

Unnamed: 0,W,Z
A,1.174356,-0.425011
B,-1.420156,0.650965
C,0.745205,0.964985
D,0.213373,-0.352165
E,-1.013214,0.639646


In [6]:
# SQL Syntax (NOT RECOMMENDED!)
df.W

A    1.174356
B   -1.420156
C    0.745205
D    0.213373
E   -1.013214
Name: W, dtype: float64

DataFrame Columns are just Series

In [7]:
type(df['W'])

pandas.core.series.Series

**Creating a new column:**

In [8]:
df['new'] = df['W'] + df['Y']

In [9]:
df

Unnamed: 0,W,X,Y,Z,new
A,1.174356,-0.529688,0.146518,-0.425011,1.320873
B,-1.420156,-0.468972,0.323547,0.650965,-1.09661
C,0.745205,-0.289836,-0.763281,0.964985,-0.018077
D,0.213373,1.321038,-0.218712,-0.352165,-0.005339
E,-1.013214,-0.163885,1.447395,0.639646,0.434181


** Removing Columns**

In [10]:
df.drop('new',axis=1)

Unnamed: 0,W,X,Y,Z
A,1.174356,-0.529688,0.146518,-0.425011
B,-1.420156,-0.468972,0.323547,0.650965
C,0.745205,-0.289836,-0.763281,0.964985
D,0.213373,1.321038,-0.218712,-0.352165
E,-1.013214,-0.163885,1.447395,0.639646


In [11]:
df1 = df.drop('A',axis=0)

In [12]:
# Not inplace unless specified!
df

Unnamed: 0,W,X,Y,Z,new
A,1.174356,-0.529688,0.146518,-0.425011,1.320873
B,-1.420156,-0.468972,0.323547,0.650965,-1.09661
C,0.745205,-0.289836,-0.763281,0.964985,-0.018077
D,0.213373,1.321038,-0.218712,-0.352165,-0.005339
E,-1.013214,-0.163885,1.447395,0.639646,0.434181


In [13]:
df1

Unnamed: 0,W,X,Y,Z,new
B,-1.420156,-0.468972,0.323547,0.650965,-1.09661
C,0.745205,-0.289836,-0.763281,0.964985,-0.018077
D,0.213373,1.321038,-0.218712,-0.352165,-0.005339
E,-1.013214,-0.163885,1.447395,0.639646,0.434181


In [14]:
df.drop('W',1,inplace=True)

  df.drop('W',1,inplace=True)


In [15]:
df.drop?

Can also drop rows this way:

In [16]:
df.drop('E',axis=0)

Unnamed: 0,X,Y,Z,new
A,-0.529688,0.146518,-0.425011,1.320873
B,-0.468972,0.323547,0.650965,-1.09661
C,-0.289836,-0.763281,0.964985,-0.018077
D,1.321038,-0.218712,-0.352165,-0.005339


** Selecting Rows**

In [None]:
df.loc['A']

Or select based off of position instead of label 

In [None]:
df.iloc[2]

** Selecting subset of rows and columns **

In [None]:
df.loc['B','Y']

# Getting data from other Sources

Get data from .CSV file

In [17]:
df = pd.read_csv('Data_csv.csv')

In [18]:
df

Unnamed: 0.1,Unnamed: 0,a,b,c,d
0,0,0,1,2,3
1,1,4,5,6,7
2,2,8,9,10,11
3,3,12,13,14,15


Getting data from .xls file

In [19]:
df = pd.read_excel('Data_xls.xlsx')

In [20]:
df

Unnamed: 0.1,Unnamed: 0,a,b,c,d
0,0,0,1,2,3
1,1,4,5,6,7
2,2,8,9,10,11
3,3,12,13,14,15
