

# DataFrames


- DataFrame is an analog of a two-dimensional array with both flexible row indices and flexible column names.

- It is a two-dimensional object similar to a spreadsheet or an SQL table. This is the most commonly used pandas object

In [1]:
import pandas as pd
import numpy as np

In [2]:
from numpy.random import randn
np.random.seed(101)

<IPython.core.display.Javascript object>

In [3]:
df = pd.DataFrame(randn(5,4),index=['A','B','C','D','E'],columns=['W','X','Y','Z'])

<IPython.core.display.Javascript object>

In [4]:
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


The DataFrame has an index attribute that gives access to the
index labels:

In [None]:
df.index

Additionally, the DataFrame has a columns attribute, which is an Index object holding
the column labels:

In [None]:
df.columns

**Creating a new column:**

In [None]:
df['new'] = df['W'] + df['Y']

In [None]:
df['Mul'] = df['W']*df['Z']

In [None]:
df

** Removing Columns**

In [None]:
df.drop('new',axis=1)

In [None]:
df.drop('Mul',axis = 1)

In [None]:
# Not inplace unless specified!
df

In [None]:
df.drop('new',axis=1,inplace=True)

In [None]:
df

Can also drop rows this way:

In [None]:
df.drop('Mul',axis = 1,inplace=True)

In [None]:
df

In [None]:
df.drop('E',axis=0)

### Conditional Selection

An important feature of pandas is conditional selection using bracket notation, very similar to numpy:

In [5]:
import pandas as pd
df = pd.DataFrame({'Date':['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011','14/2/2011','15/2/2011'], 
                   'Product':['Umbrella', 'Matress', 'Badminton', 'Shuttle','Umbrella','Shuttle'], 
                   'Last Price':[1200, 1500, 1600, 352,3256,2547], 
                   'Updated Price':[1250, 1450, 1550, 400,2650,4587], 
                   'Discount':[0.5, 10, 0.2, 0.7,0.8,0.4]}) 

In [6]:
df

Unnamed: 0,Date,Product,Last Price,Updated Price,Discount
0,10/2/2011,Umbrella,1200,1250,0.5
1,11/2/2011,Matress,1500,1450,10.0
2,12/2/2011,Badminton,1600,1550,0.2
3,13/2/2011,Shuttle,352,400,0.7
4,14/2/2011,Umbrella,3256,2650,0.8
5,15/2/2011,Shuttle,2547,4587,0.4


In [7]:
df[df['Last Price']>2000]

Unnamed: 0,Date,Product,Last Price,Updated Price,Discount
4,14/2/2011,Umbrella,3256,2650,0.8
5,15/2/2011,Shuttle,2547,4587,0.4


In [8]:
df[df['Last Price']>2000]['Product']

4    Umbrella
5     Shuttle
Name: Product, dtype: object

In [None]:
df[df['Updated Price']>2000][['Product','Discount']]

**`For two conditions you can use | and & with parenthesis`**:

In [None]:
df[ (df['Last Price']>1500) & (df['Product']=='Umbrella')]

In [None]:
df[ (df['Updated Price'] >1200) | (df['Discount']<0.5)  ]