Pandas is the Python library that handles data on all fronts. Pandas can import data, read data, and display data in an object called a DataFrame. A DataFrame consists of rows and columns. One way to get a feel for DataFrames is to create one.

In the IT industry, pandas is widely used for data manipulation. It is also used for stock prediction, statistics, analytics, big data, and, of course, data science.

In [1]:
import pandas as pd

In [2]:
# Create dictionary for data frames
test_dict = {'Corey' : [63,75,88], 'Kevin' : [58,49,76], 'Avnish' : [46,89,67]}


In [3]:
# Create data frames
df = pd.DataFrame(test_dict)


In [4]:
df

Unnamed: 0,Corey,Kevin,Avnish
0,63,58,46
1,75,49,89
2,88,76,67


In [5]:
df = df.T

In [6]:
df

Unnamed: 0,0,1,2
Corey,63,75,88
Kevin,58,49,76
Avnish,46,89,67


In [7]:
df.columns = ['Quiz1', 'Quiz2', 'Quiz3']

In [8]:
df

Unnamed: 0,Quiz1,Quiz2,Quiz3
Corey,63,75,88
Kevin,58,49,76
Avnish,46,89,67


# Pandas Computations

In [9]:
df.iloc[0]   # Access first row by index no. 

Quiz1    63
Quiz2    75
Quiz3    88
Name: Corey, dtype: int64

In [10]:
df['Quiz1'] # Access first column by name

Corey     63
Kevin     58
Avnish    46
Name: Quiz1, dtype: int64

In [11]:
df.Quiz1  # Access first column using dot notation

Corey     63
Kevin     58
Avnish    46
Name: Quiz1, dtype: int64

In [12]:
df[0:2]

Unnamed: 0,Quiz1,Quiz2,Quiz3
Corey,63,75,88
Kevin,58,49,76


In [13]:
# Defining a new dataframe from fisrt 2 rows and last 2 columns
rows = ['Corey', 'Kevin']
columns = ['Quiz2', 'Quiz3']
df_spring = df.loc[rows, columns]
df_spring

Unnamed: 0,Quiz2,Quiz3
Corey,75,88
Kevin,49,76


In [14]:
# Or we can use this method
df.iloc[[0,1],[1,2]]

Unnamed: 0,Quiz2,Quiz3
Corey,75,88
Kevin,49,76


In [15]:
# Define new columns as mean of other columns
df['Quiz_avg'] = df.mean(axis = 1)
# Axis=1 means rows and axis = 0 represents index

In [16]:
df

Unnamed: 0,Quiz1,Quiz2,Quiz3,Quiz_avg
Corey,63,75,88,75.333333
Kevin,58,49,76,61.0
Avnish,46,89,67,67.333333


In [17]:
df['Quiz4'] = [92,87,90]

In [18]:
df

Unnamed: 0,Quiz1,Quiz2,Quiz3,Quiz_avg,Quiz4
Corey,63,75,88,75.333333,92
Kevin,58,49,76,61.0,87
Avnish,46,89,67,67.333333,90


In [19]:
del df['Quiz_avg']
df

Unnamed: 0,Quiz1,Quiz2,Quiz3,Quiz4
Corey,63,75,88,92
Kevin,58,49,76,87
Avnish,46,89,67,90


What if you want to add new row in the dataframe?
We've to concatenate 
Let's see!

In [20]:
import numpy as np

In [21]:
df_new = pd.DataFrame({'Quiz1':[np.NaN], 'Quiz2':[np.NaN], 'Quiz3':[np.NaN], 'Quiz4':[71]}, index =['Andrew'])
df = pd.concat([df,df_new])
df

Unnamed: 0,Quiz1,Quiz2,Quiz3,Quiz4
Corey,63.0,75.0,88.0,92
Kevin,58.0,49.0,76.0,87
Avnish,46.0,89.0,67.0,90
Andrew,,,,71


In [23]:
df['Quiz_avg'] = df.mean(axis = 1, skipna = True)
#skipna means to skip NaN values
df

Unnamed: 0,Quiz1,Quiz2,Quiz3,Quiz4,Quiz_avg
Corey,63.0,75.0,88.0,92,79.5
Kevin,58.0,49.0,76.0,87,67.5
Avnish,46.0,89.0,67.0,90,73.0
Andrew,,,,71,71.0


In [24]:
df.Quiz4.astype(float)

Corey     92.0
Kevin     87.0
Avnish    90.0
Andrew    71.0
Name: Quiz4, dtype: float64