# Pandas DataFrame Tutorial

Pandas DataFrame is the Data structure, which is a 2D array. One can say that <b>multiple Pandas series make a Pandas dataframe</b>

DataFrames are visually represented in the form of a table. DataFrame are one of the most intergal data structure and one cant simply proceed to learn Panda without learning DataFrames first.

### Parameters of DataFrames in Pandas

1. <b>data</b> – The data from which the dataframe will be made
2. <b>index</b> – States the index from dataframe
3. <b>columns</b> – States the column label
4. <b>dtype</b> – The datatype for the dataframe
5. <b>copy</b> – Any copied data taken from inputs


![Pandas-dataframes.jpg](image/Pandas-dataframes.jpg)

### 1. How to Create Pandas DataFrame from the dictionary?

In [1]:
import pandas as pd
import numpy as np

To create a DataFrame in Pandas from a dict, we first need to make a dict. For that, we will use the following command:

In [2]:
data={'student':['Jack','Mike','Rohan','Zubair'],'year':[1,2,3,1],'marks' : [9.8,6.7,8,9.9]}

After this is done, all we have to do to make a DataFrame is to use the following commands:

In [3]:
dataflair_df=pd.DataFrame(data)

In [4]:
dataflair_df

Unnamed: 0,student,year,marks
0,Jack,1,9.8
1,Mike,2,6.7
2,Rohan,3,8.0
3,Zubair,1,9.9


## 2. How to Access Last and First Rows of DataFrame in Pandas?

Using <b>.head()</b> and <b>.tail()</b>, we have been able to access the first few rows and the last few rows. In both cases, without a parameter, we will get 2 rows. Let’s continue with the help of examples:

In [5]:
dataflair_df.head(2)

Unnamed: 0,student,year,marks
0,Jack,1,9.8
1,Mike,2,6.7


In [6]:
dataflair_df.tail(2)

Unnamed: 0,student,year,marks
2,Rohan,3,8.0
3,Zubair,1,9.9


## 3. How to Change the Column in Pandas DataFrame?

As we can see, the DataFrame is not representing our content according to the column order, we gave in the dictionary. Therefore the following method is used:

In [7]:
dataflair_d=pd.DataFrame(data, columns=['student','marks','year'])

In [8]:
dataflair_d

Unnamed: 0,student,marks,year
0,Jack,9.8,1
1,Mike,6.7,2
2,Rohan,8.0,3
3,Zubair,9.9,1


## 4. How to Access the Columns in Pandas DataFrame?

Columns can be accessed in two ways:

In [9]:
dataflair_d['year']

0    1
1    2
2    3
3    1
Name: year, dtype: int64

Or we can also access columns as an attribute:

In [10]:
dataflair_d.student

0      Jack
1      Mike
2     Rohan
3    Zubair
Name: student, dtype: object

## 5. How to Access the Rows in DataFrames?

We use loc and iloc functions to access rows. Here is an example of how that works:

In [11]:
dataflair_d.loc[2]

student    Rohan
marks          8
year           3
Name: 2, dtype: object

In [12]:
dataflair_d.iloc[2]

student    Rohan
marks          8
year           3
Name: 2, dtype: object

Here, we see that the loc function returns the values of the row needed along with the column names attributed to each value. This is a very helpful function.

## 6. Various Assignments and Operations on Pandas DataFrame

Let’s create a second DataFrame and this time, in the column attribute, let’s add a column that was not present in our dictionary.

In [13]:
dataflair_df2= pd.DataFrame(data, columns=['student','marks','year','subjects'])

In [14]:
dataflair_df2

Unnamed: 0,student,marks,year,subjects
0,Jack,9.8,1,
1,Mike,6.7,2,
2,Rohan,8.0,3,
3,Zubair,9.9,1,


The column ‘subject’ was never a part of out original dictionary. Let’s see how Pandas handles this:

Pandas took all the values of the column ‘subject’ to be missing values and thus represented them as ‘NaN’

A cool feature of Pandas is that you assign a column with a certain constant value. For example:

In [15]:
dataflair_df2['subjects']=4

In [16]:
dataflair_df2

Unnamed: 0,student,marks,year,subjects
0,Jack,9.8,1,4
1,Mike,6.7,2,4
2,Rohan,8.0,3,4
3,Zubair,9.9,1,4


In [17]:
dataflair_df2

Unnamed: 0,student,marks,year,subjects
0,Jack,9.8,1,4
1,Mike,6.7,2,4
2,Rohan,8.0,3,4
3,Zubair,9.9,1,4


This will give us a DataFrame with the subject column containing just the value of 4 for every row.

We can also map series onto a column in a DataFrame. To see how that works, let us first create a series.

In [18]:
ser=pd.Series([2,3,],index=[1,3])

In [19]:
print(ser)

1    2
3    3
dtype: int64


Then we will map it onto our ‘subject’ column:

In [20]:
dataflair_df2['subjects']=ser

In [21]:
dataflair_df2

Unnamed: 0,student,marks,year,subjects
0,Jack,9.8,1,
1,Mike,6.7,2,2.0
2,Rohan,8.0,3,
3,Zubair,9.9,1,3.0


From the above output, 1 and 3 consider as an index for the values of the series. When pandas dataframes mapped columns make sure they only occupy the indices, which were mentioned. The indices that were not mentioned, get a missing value as their value.

We can also perform boolean assignments on operators. Let’s take a new column called ‘grades’

In [22]:
dataflair_df2['grade']=dataflair_df2.marks>8

In [23]:
dataflair_df2

Unnamed: 0,student,marks,year,subjects,grade
0,Jack,9.8,1,,True
1,Mike,6.7,2,2.0,False
2,Rohan,8.0,3,,False
3,Zubair,9.9,1,3.0,True


What this does is, it create a new column grade and fill each value of the column with a boolean expression that is returned when df.marks>8 is evaluated for each row. The boolean value can either be True or False.

## 7. How to Delete Columns in Pandas DataFrame?

To delete a column in Pandas Dataframes, all we need to do is use the ***command del***

In [24]:
del dataflair_df2['grade']

In [25]:
dataflair_df2 

Unnamed: 0,student,marks,year,subjects
0,Jack,9.8,1,
1,Mike,6.7,2,2.0
2,Rohan,8.0,3,
3,Zubair,9.9,1,3.0


## 8. How to Delete Rows in Pandas DataFrame?

Pandas use ***.drop function*** to remove rows and columns.

To remove rows according to the index we will do the following

In [28]:
dataflair_df

Unnamed: 0,student,year,marks
0,Jack,1,9.8
1,Mike,2,6.7
2,Rohan,3,8.0
3,Zubair,1,9.9


In [29]:
dataflair_df.drop([1])

Unnamed: 0,student,year,marks
0,Jack,1,9.8
2,Rohan,3,8.0
3,Zubair,1,9.9


## 9. Pandas DataFrame with Nested Dictionaries

In [31]:
dict={'fruits':{'apple':40,'orange':20,'bananas':25,'grapes':30}, 'vegetables':{'carrot':20,'beans':16,'peas':30,'onion':25}}

In [32]:
dict

{'fruits': {'apple': 40, 'orange': 20, 'bananas': 25, 'grapes': 30},
 'vegetables': {'carrot': 20, 'beans': 16, 'peas': 30, 'onion': 25}}

In [33]:
dataflair_df3 = pd.DataFrame(dict)

In [34]:
dataflair_df3

Unnamed: 0,fruits,vegetables
apple,40.0,
orange,20.0,
bananas,25.0,
grapes,30.0,
carrot,,20.0
beans,,16.0
peas,,30.0
onion,,25.0


Therefore, we get Pandas Dataframe which uses all the members of the nested dictionaries. The members of one dictionary, which are not present in the other get represented as a Missing Value for the dictionary they arent present in.

## 10. How to Transpose Pandas DataFrames?

In [35]:
dataflair_df3.T

Unnamed: 0,apple,orange,bananas,grapes,carrot,beans,peas,onion
fruits,40.0,20.0,25.0,30.0,,,,
vegetables,,,,,20.0,16.0,30.0,25.0


## 11. Iterating over the Rows and Columns of Dataframe


In [36]:
dataflair_new= { 'fruit': ["Guava", "Apple", "Oranges"], 'price':[40, 120, 60]}

In [37]:
dataflair_df= pd.DataFrame(dataflair_new)

In [38]:
dataflair_df

Unnamed: 0,fruit,price
0,Guava,40
1,Apple,120
2,Oranges,60


Then we iterate over the rows using the iterrows() function.

In [39]:
for i,j in dataflair_df.iterrows():
    print(i,j)
    print()

0 fruit    Guava
price       40
Name: 0, dtype: object

1 fruit    Apple
price      120
Name: 1, dtype: object

2 fruit    Oranges
price         60
Name: 2, dtype: object



In [40]:
for i,j in dataflair_df.iteritems():
    print(i,j)
    print()

fruit 0      Guava
1      Apple
2    Oranges
Name: fruit, dtype: object

price 0     40
1    120
2     60
Name: price, dtype: int64



## 12. How to Rename a Column in Pandas DataFrames?

We can rename columns using the ***.rename()*** function.

In [46]:
 dataflair_df.rename(index=str, columns={"fruit": "a", "price": "c"})


Unnamed: 0,a,c
0,Guava,40
1,Apple,120
2,Oranges,60


## 13. Stacking and Unstacking of DataFrames

Using the .stack() function we can get a long version of a wide table dataframe.

In [51]:
dataflair_df

Unnamed: 0,fruit,price
0,Guava,40
1,Apple,120
2,Oranges,60


In [47]:
dataflair_st=dataflair_df.stack()
dataflair_st

0  fruit      Guava
   price         40
1  fruit      Apple
   price        120
2  fruit    Oranges
   price         60
dtype: object

We can unstack this stacked data using the .unstack function.

In [49]:
dataflair_ust=dataflair_df.unstack()

In [50]:
dataflair_ust

fruit  0      Guava
       1      Apple
       2    Oranges
price  0         40
       1        120
       2         60
dtype: object

## 14. Setting a List as an Index in Pandas DataFrames

We can set a python list to be the index for the dataframe. But we need to make sure that the list contains the same number of elements as the number of indices already present in the dataframe

In [52]:
id=['one','two', 'three']

In [53]:
dataflair_df.index=id

In [54]:
dataflair_df

Unnamed: 0,fruit,price
one,Guava,40
two,Apple,120
three,Oranges,60


## 15. Selecting values from a DataFrame according to index

In [56]:
dataflair_df.loc['one']

fruit    Guava
price       40
Name: one, dtype: object

## 16. Working with Missing Values

Missing values in Pandas Dataframes are represented using NaN. There are methods to work around such missing data to make a more optimized dataset

Create a dataset like the following:

In [57]:
dataflair_dict={'Data':[1, np.nan, 8, 9, np.nan], 'name':["Ron","Harry","Hermione","Neville","Dobby"]}

In [58]:
dataflair_dict

{'Data': [1, nan, 8, 9, nan],
 'name': ['Ron', 'Harry', 'Hermione', 'Neville', 'Dobby']}

In [59]:
dataflair_pdx= pd.DataFrame(dataflair_dict)

In [60]:
dataflair_pdx

Unnamed: 0,Data,name
0,1.0,Ron
1,,Harry
2,8.0,Hermione
3,9.0,Neville
4,,Dobby


We can generate a boolean table which gives us the value True for every data which is missing.

In [61]:
 dataflair_pdx.isnull()


Unnamed: 0,Data,name
0,False,False
1,True,False
2,False,False
3,False,False
4,True,False


To replace the missing data with a constant value of our choice, we use .fillna()

In [62]:
dataflair_pdx.fillna('Not avaliable')

Unnamed: 0,Data,name
0,1,Ron
1,Not avaliable,Harry
2,8,Hermione
3,9,Neville
4,Not avaliable,Dobby


We can drop all data which is missing using .dropna() function

In [63]:
dataflair_pdx.dropna()

Unnamed: 0,Data,name
0,1.0,Ron
2,8.0,Hermione
3,9.0,Neville
