In [1]:
# import libraries
import pandas as pd

In [2]:
cars = {
    'Brand':['Honda','Toyota','Ford','Audi'],
    'Price':[22000,25000,27000,35000]
}

In [3]:
df = pd.DataFrame(cars)
df

Unnamed: 0,Brand,Price
0,Honda,22000
1,Toyota,25000
2,Ford,27000
3,Audi,35000


## Creating single column

In [4]:
lst = ['Arun','Varun','Ram','Mohan']


In [5]:
df = pd.DataFrame(lst)
df

Unnamed: 0,0
0,Arun
1,Varun
2,Ram
3,Mohan


## Creating multiple columns

In [6]:
names = ['Arun','Varun','Ram','Mohan']
age = [29,21,32,51]

df = pd.DataFrame(list(zip(names,age)))
df

## 4.3 Adding custom column name

We can add the column name as per our requirement also. This is done by a
parameter ‘column’. This is can be passed for any DataFrame and add a
custom column name.


In [7]:
df = pd.DataFrame(list(zip(names,age)),columns=['Name','val1'])
df

Unnamed: 0,Name,val1
0,Arun,29
1,Varun,21
2,Ram,32
3,Mohan,51


In the same way, we can give column name to any DataFrame. The general
syntax is:
df = pd.DataFrame(<data>, columns=[‘<col1>, ‘<col2>])

## 4.4 creating DataFrame from list of dictionaries


we can create DataFrames from a list of dictionaries also. Each element (dict)
of the list will become a row and the dictionary key value pair will become
column and its value

In [8]:
lst = [
    {"name":"Arun","age":29,"gender":"M"},
    {"name":"Varun","age":21,"gender":"M"},
    {"name":"Ram","age":32,"gender":"M"}
]

In [9]:
df = pd.DataFrame(lst)
df

Unnamed: 0,name,age,gender
0,Arun,29,M
1,Varun,21,M
2,Ram,32,M


## 4.5 creating DataFrame from other files

DataFrame can be created from other file types too like CSV, MS Excel.

Even we can create DataFrame with simple txt files too (with some
delimiter).
Here we’ll be using example of CSV to create the DataFrame
df = pd.read_csv(<csvName>)

In [10]:
df = pd.read_csv('name_age.csv')
df

Unnamed: 0,Name,Age
0,Arun,21
1,Brun,42
2,Ram,32
3,Mohan,25


This will give us the data in the csv in the form of a Dataframe.

## 4.6 Creating blank DataFrame

We sometime need to create a blank DataFrame and then later use it inside
some loop. If we try to create the DataFrame inside the loop, it will keep
initializing for every iteration.

In [11]:
df = pd.DataFrame
df

pandas.core.frame.DataFrame

We sometime need to create a blank DataFrame and then later use it inside
some loop. If we try to create the DataFrame inside the loop, it will keep
initializing for every iteration.

In [12]:
df = pd.DataFrame(columns=['name','age'])
df

Unnamed: 0,name,age


This could be important if we want to append data row by row to a
Dataframe. In that case it’s better to have predefined columns.


Now that we have learned about Dataframes and various ways to create it, in
the next chapter we’ll look at some basic operation on Dataframe to work
with our data.

## 5 Basics of DataFrames

DataFrames are the tabular representation of the data in the form or rows and
columns.

Let’s do some basic operations using some csv data.


In [13]:
df = pd.read_csv('names_ages.csv')
df

Unnamed: 0,Name,Age,dob,gender
0,Arun,21,20/02/97,m
1,Brun,42,20/02/92,m
2,Ram,32,20/02/94,m
3,Mohan,25,20/02/99,m
4,Sita,21,20/02/97,f
5,Rita,42,20/02/92,f
6,Gita,32,20/02/94,f
7,Arti,25,20/02/99,f


## 5.2 Shape of the DataFrame

DataFrame is a two-dimensional matrix and will give the shape as rows and
columns by
df.shape

In [14]:
df.shape

(8, 4)

This is a tuple and thus if we need to store the rows and columns into some
variables

In [15]:
rows,columns = df.shape

In [16]:
rows

8

In [17]:
columns

4

## 5.3 Top ‘n’ rows

df.head() gives us the top 5 entries by default.

In [18]:
df.head()

Unnamed: 0,Name,Age,dob,gender
0,Arun,21,20/02/97,m
1,Brun,42,20/02/92,m
2,Ram,32,20/02/94,m
3,Mohan,25,20/02/99,m
4,Sita,21,20/02/97,f


We can even get the desired number of top entries by the same command.
If we need top 3 entries the we just need to pass that value to the head().

df.head(2)

## 5.4 Last ‘n’ rows

As head method we have tail method as well.
So, we can get the last 5 entries by:
df.tail()

In [19]:
df.tail()

Unnamed: 0,Name,Age,dob,gender
3,Mohan,25,20/02/99,m
4,Sita,21,20/02/97,f
5,Rita,42,20/02/92,f
6,Gita,32,20/02/94,f
7,Arti,25,20/02/99,f


And last n entries by:
df.tail(n)

df.tail(2)

## 5.5 Range of entries

We can even extract a range of entries from somewhere in the DataFrame by
df[5, 8]


here, important point is that this will include row #5 an exclude row #8.

In [20]:
df[5:8]

Unnamed: 0,Name,Age,dob,gender
5,Rita,42,20/02/92,f
6,Gita,32,20/02/94,f
7,Arti,25,20/02/99,f


To access all the rows df[:] or just df will work

## 5.6 Accessing the columns

Sometimes, when we extract the data from some alien source, we need to
understand the data. To proceed further with the data manipulation, we may
need to know the columns present in the data. This can be done by:


df.columns

In [21]:
df.columns

Index(['Name', 'Age', 'dob', 'gender'], dtype='object')

In [22]:
df.Name # df['Name']

0     Arun
1     Brun
2      Ram
3    Mohan
4     Sita
5     Rita
6     Gita
7     Arti
Name: Name, dtype: object

## 5.7 Accessing ‘n’ columns

Accessing the n columns is similar to that to accessing n rows.

df[[“col1”, “col2”]]

df[[“name”, “dob”]]


In [23]:
df[['Name','dob']].head()

Unnamed: 0,Name,dob
0,Arun,20/02/97
1,Brun,20/02/92
2,Ram,20/02/94
3,Mohan,20/02/99
4,Sita,20/02/97


## 5.8 Type of column

We can even check the type of the DataFrame by
type(df[<columnName>])
type(df[“name”])

In [24]:
type(df)

pandas.core.frame.DataFrame

In [25]:
type(df['Name'])

pandas.core.series.Series

# 5.9 Basic operations on column

Let’s try some basic operations on DataFrames

## 5.9.1 maximum


In [27]:
df['Age'].max()

42

## 5.9.2 minimum

In [28]:
df['Age'].min()

21

## 5.9.3 mean

In [29]:
df['Age'].mean()

30.0

## 5.9.4 standard deviation

In [30]:
df['Age'].std()

8.518886580500327

## 5.10 Describe the DataFrame

We can get the detail of all the data in the DataFrame like it’s max, min,
mean etc. by just one command

In [32]:
df.describe()

Unnamed: 0,Age
count,8.0
mean,30.0
std,8.518887
min,21.0
25%,24.0
50%,28.5
75%,34.5
max,42.0


## 5.11 Conditional operation on columns

We can have conditional operations on columns too.

E.g.: If we want the rows where age is greater than 30 in the DataFrame then,

In [35]:
df[df['Age']>35]

Unnamed: 0,Name,Age,dob,gender
1,Brun,42,20/02/92,m
5,Rita,42,20/02/92,f


If we want the row with minimum age, then

In [36]:
df[df['Age'] == df['Age'].min()]

Unnamed: 0,Name,Age,dob,gender
0,Arun,21,20/02/97,m
4,Sita,21,20/02/97,f


If we want only the name of the people whose age is lesser than 30 then,

In [39]:
df['Name'][df['Age']<25]

0    Arun
4    Sita
Name: Name, dtype: object

Or if we need two columns like “name” and “dob”

In [40]:
df[['Name','dob']][df['Age']<30]

Unnamed: 0,Name,dob
0,Arun,20/02/97
3,Mohan,20/02/99
4,Sita,20/02/97
7,Arti,20/02/99


## 5.12 accessing row with loc and iloc

The row data can be accessed by two more ways:

In [41]:
df.loc[0]

Name          Arun
Age             21
dob       20/02/97
gender           m
Name: 0, dtype: object

df.iloc[0]

## 5.13 Set index

Index by default comes as an incremental integer (0,1…n).


In [43]:
df.index

RangeIndex(start=0, stop=8, step=1)

we can change this to the actual data as:

In [47]:
df.set_index('Name')

Unnamed: 0_level_0,Age,dob,gender
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Arun,21,20/02/97,m
Brun,42,20/02/92,m
Ram,32,20/02/94,m
Mohan,25,20/02/99,m
Sita,21,20/02/97,f
Rita,42,20/02/92,f
Gita,32,20/02/94,f
Arti,25,20/02/99,f


This is correct but has a problem. This command gives a new DataFrame and
does not change the existing one.

In [48]:
df.head()

Unnamed: 0,Name,Age,dob,gender
0,Arun,21,20/02/97,m
1,Brun,42,20/02/92,m
2,Ram,32,20/02/94,m
3,Mohan,25,20/02/99,m
4,Sita,21,20/02/97,f


To make the change effective to the existing one, there are two ways.
Store the new DataFrame to the existing one

We have a parameter called ‘inplace’ which can be set to true to make the
change effective to the existing DataFrame.


In [49]:
df.set_index('Name',inplace=True)

In [50]:
df.head()

Unnamed: 0_level_0,Age,dob,gender
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Arun,21,20/02/97,m
Brun,42,20/02/92,m
Ram,32,20/02/94,m
Mohan,25,20/02/99,m
Sita,21,20/02/97,f


## 