 ## 2.                                                     Dealing with Rows and Columns
                                                       
--> A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.

--> We can perform basic operations on rows/columns like selecting, deleting, adding, and renaming.

$Column$ $Selection$ $:-$

-->In Order to select a column in Pandas DataFrame, we can either access the columns by calling them by their columns name.

--> In order to deal with columns, we perform basic operations on columns like selecting, deleting, adding and renaming.

In [1]:
import pandas as pd  
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Age':[27, 24, 22, 32],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
  
# Convert the dictionary into DataFrame 
df = pd.DataFrame(data)
print(df) 
# select two columns
print(df[['Name', 'Qualification']])

     Name  Age    Address Qualification
0     Jai   27      Delhi           Msc
1  Princi   24     Kanpur            MA
2  Gaurav   22  Allahabad           MCA
3    Anuj   32    Kannauj           Phd
     Name Qualification
0     Jai           Msc
1  Princi            MA
2  Gaurav           MCA
3    Anuj           Phd


    Select Second to fourth column.

In [2]:
df[df.columns[1:4]]

Unnamed: 0,Age,Address,Qualification
0,27,Delhi,Msc
1,24,Kanpur,MA
2,22,Allahabad,MCA
3,32,Kannauj,Phd


    Method #2:-
    
$Using$ $loc[]$

Example 1: Select two columns

In [3]:
# select three rows and two columns
df.loc[1:3, ['Name', 'Qualification']]

Unnamed: 0,Name,Qualification
1,Princi,MA
2,Gaurav,MCA
3,Anuj,Phd


    Example 2:-

Select one to another columns. In our case we select column name “Name” to “Address”.

In [4]:
# select two rows and 
# column "name" to "Address"
# Means total three columns
df.loc[0:1, 'Name':'Address']

Unnamed: 0,Name,Age,Address
0,Jai,27,Delhi
1,Princi,24,Kanpur


    Example 3:-
    
First filtering rows and selecting columns by label format and then Select all columns.

In [5]:
# .loc DataFrame method
# filtering rows and selecting columns by label
# format
# df.loc[rows, columns]
# row 1, all columns
df.loc[0, :]

Name               Jai
Age                 27
Address          Delhi
Qualification      Msc
Name: 0, dtype: object

     Method #3:- 
     
$Using$ $iloc[]$

Example 1: Select first two column.

In [6]:
# Remember that Python does not
# slice inclusive of the ending index.
# select all rows 
# select first two column
df.iloc[:, 0:2] 

Unnamed: 0,Name,Age
0,Jai,27
1,Princi,24
2,Gaurav,22
3,Anuj,32


    Example 2:-
    
Select all or some columns, one to another using $.iloc$.

In [7]:
# iloc[row slicing, column slicing]
df.iloc [0:2, 1:3]

Unnamed: 0,Age,Address
0,27,Delhi
1,24,Kanpur


$Column$ $Addition$ $:-$

    Adding new column to existing DataFrame in Pandas
    
    Method #1:-

By declaring a new list as a column.

In [8]:
# Define a dictionary containing Students data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Height': [5.1, 6.2, 5.1, 5.2],
        'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
  
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
  
# Declare a list that is to be converted into a column
address = ['Delhi', 'Bangalore', 'Chennai', 'Patna']
  
# Using 'Address' as the column name
# and equating it to the list
df['Address'] = address
  
# Observe the result
df

Unnamed: 0,Name,Height,Qualification,Address
0,Jai,5.1,Msc,Delhi
1,Princi,6.2,MA,Bangalore
2,Gaurav,5.1,Msc,Chennai
3,Anuj,5.2,Msc,Patna


    NOTE:-
    
Note that the length of your list should match the length of the index column otherwise it will show an error.



    Method #2:-
    
By using $DataFrame.insert()$

--> It gives the freedom to add a column at any position we like and not just at the end. 

--> It also provides different options for inserting the column values.

      Syntax:- DataFrameName.insert(loc, column, value, allow_duplicates = False)
     
     Parameters: 
 

$loc:-$ loc is an integer which is the location of column where we want to insert new column. This will shift the existing column at that position to the right. 

$column:-$ column is a string which is name of column to be inserted.

$value:-$ value is simply the value to be inserted. It can be int, string, float or anything or even series / List of values. Providing only one value will set the same value for all rows. 

$allow$$_$$duplicates :-$ allow_duplicates is a boolean value which checks if column with same name already exists or not. 

In [9]:
# Define a dictionary containing Students data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Height': [5.1, 6.2, 5.1, 5.2],
        'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
  
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
print('before\n',df)  

#inserting a new column in the position 2
df.insert(2,'Age',20)
print('after1\n',df)

#inserting  different value in one column
df.insert(3,'weight',[76.9,56.12,49.00,64])
print('after2\n',df)

#let allow a duplicate column
df.insert(2,'Age',20,True)
print('after\n',df)

before
      Name  Height Qualification
0     Jai     5.1           Msc
1  Princi     6.2            MA
2  Gaurav     5.1           Msc
3    Anuj     5.2           Msc
after1
      Name  Height  Age Qualification
0     Jai     5.1   20           Msc
1  Princi     6.2   20            MA
2  Gaurav     5.1   20           Msc
3    Anuj     5.2   20           Msc
after2
      Name  Height  Age  weight Qualification
0     Jai     5.1   20   76.90           Msc
1  Princi     6.2   20   56.12            MA
2  Gaurav     5.1   20   49.00           Msc
3    Anuj     5.2   20   64.00           Msc
after
      Name  Height  Age  Age  weight Qualification
0     Jai     5.1   20   20   76.90           Msc
1  Princi     6.2   20   20   56.12            MA
2  Gaurav     5.1   20   20   49.00           Msc
3    Anuj     5.2   20   20   64.00           Msc


     Passing series with different value for each row:-
     
In this example, a series is created and some values are passed to the series through a for loop. After that, the series is passed in pandas insert function to append series in the Data frame with values passed.
 

In [10]:
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Height': [5.1, 6.2, 5.1, 5.2],
        'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
#creating the data frame
df=pd.DataFrame(data)
print('before\n',df)
#creating a empty Series
age=pd.Series([])

# running a for loop and assigning some values to series
for i in range(len(df)):
     if df['Name'][i]=='jai':
            age[i]='36'
            
     elif df['Name'][i]=='Princi':
            age[i]='43'
            
     elif df['Name'][i]=='Gaurav':
            age[i]='38'
            
     else :
            age[i]='29' 
df.insert(2,'Age',age)   
print('After\n',df)

before
      Name  Height Qualification
0     Jai     5.1           Msc
1  Princi     6.2            MA
2  Gaurav     5.1           Msc
3    Anuj     5.2           Msc
After
      Name  Height Age Qualification
0     Jai     5.1  29           Msc
1  Princi     6.2  43            MA
2  Gaurav     5.1  38           Msc
3    Anuj     5.2  29           Msc


  age=pd.Series([])


    Method #3:-
    
$Using$  $Dataframe.assign()$  $method$ $:-$

This method will create a new dataframe with new column added to the old dataframe.

               Syntax: DataFrame.assign(**kwargs)

              Parameters:-
              
$kwargs$ $:-$

keywords are the column names. If the values are callable, they are computed on the DataFrame and assigned to the new columns. The callable must not change input DataFrame (though pandas don’t check it). If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned.

$Returns$ $:-$

A new DataFrame with the new columns in addition to all the existing columns.



In [11]:
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Height': [5.1, 6.2, 5.1, 5.2],
        'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
df=pd.DataFrame(data)
# increase the Height by 2 inches
df1=df.assign(New_Height= lambda x: df['Height']+0.2)
df1          
                            

Unnamed: 0,Name,Height,Qualification,New_Height
0,Jai,5.1,Msc,5.3
1,Princi,6.2,MA,6.4
2,Gaurav,5.1,Msc,5.3
3,Anuj,5.2,Msc,5.4


     Example #2:-

Assigning more than one column at a time

In [12]:
df2=df.assign(Age = [45,25,31,29], 
            New_Height= lambda x: df['Height']+0.2)
df2

Unnamed: 0,Name,Height,Qualification,Age,New_Height
0,Jai,5.1,Msc,45,5.3
1,Princi,6.2,MA,25,6.4
2,Gaurav,5.1,Msc,31,5.3
3,Anuj,5.2,Msc,29,5.4


       Method #4:-

By using a dictionary

We can use a Python dictionary to add a new column in pandas DataFrame. Use an existing column as the key values and their respective values will be the values for new column.

In [13]:

# Define a dictionary containing Students data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Height': [5.1, 6.2, 5.1, 5.2],
        'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
  
# Define a dictionary with key values of
# an existing column and their respective
# value pairs as the # values for our new column.
address = {'Delhi': 'Jai', 'Bangalore': 'Princi',
           'Patna': 'Gaurav', 'Chennai': 'Anuj'}
  
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
  
# Provide 'Address' as the column name
df['Address'] = address

# Observe the output
df

Unnamed: 0,Name,Height,Qualification,Address
0,Jai,5.1,Msc,Delhi
1,Princi,6.2,MA,Bangalore
2,Gaurav,5.1,Msc,Patna
3,Anuj,5.2,Msc,Chennai


$Coloumn$ $Deletion$ $:-$

--> In Order to delete a column in Pandas DataFrame, we can use the drop() method. Columns is deleted by dropping columns with column names.

    Syntax:- DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors=’raise’)

      Parameters:

$labels$ $:-$

String or list of strings referring row or column name.

$axis$ $:-$ 

int or string value, 0 ‘index’ for Rows and 1 ‘columns’ for Columns.
index or columns: Single label or list. index or columns are an alternative to axis and cannot be used together.

$level$ $:-$

Used to specify level in case data frame is having multiple level index.

$inplace$ $:-$

Makes changes in original Data Frame if True.

$errors$ $:-$

Ignores error if any value from the list doesn’t exists and drops rest of the values when errors = ‘ignore’

$Return$ $type$ $:-$

Dataframe with dropped values



In [15]:
# making data frame from csv file
data = pd.read_csv(r"C:\Users\ASUS\Documents\All CSV fILe For Pandas\pd1.csv", index_col ="Name" )
print(data)  
# dropping passed values
data.drop(["Avinas", "Manish", "Sagar",
                            "Ansuman"], inplace = True)
  
# display
data

                Regd.No  Roll No Branch Section  Group  Mark
Name                                                        
Ankit        1801289043       17    CSE       A      1  8.36
Chiranjibi   1801289039       38   MECH       D      4  6.75
Ansuman      1801289071       28    EEE       B      1  8.97
Saurav       1801289053       45    ETC       C      3  5.87
Sagar        1801289091       34     IT       A      2  6.93
Ashis        1801289042       29    CSE       B      3  6.97
Dipanshu     1801289077       57  CIVIL       B      4  8.76
Pravas       1801289030       39   MECH       A      1  8.34
Lenka        1801289022       31    CSE       D      2  6.45
Avinas       1801289036       55    ETC       C      4  7.65
Arkasarathi  1801289047       21    CSE       B      3  8.63
Manish       1801289054       22     IT       D      1  7.12
Lal Krishna  1801289063       42    CSE       C      1  7.65
Ganesh       1801289049       54    CSE       C      4  6.56
Abhijit      1801289078 

Unnamed: 0_level_0,Regd.No,Roll No,Branch,Section,Group,Mark
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ankit,1801289043,17,CSE,A,1,8.36
Chiranjibi,1801289039,38,MECH,D,4,6.75
Saurav,1801289053,45,ETC,C,3,5.87
Ashis,1801289042,29,CSE,B,3,6.97
Dipanshu,1801289077,57,CIVIL,B,4,8.76
Pravas,1801289030,39,MECH,A,1,8.34
Lenka,1801289022,31,CSE,D,2,6.45
Arkasarathi,1801289047,21,CSE,B,3,8.63
Lal Krishna,1801289063,42,CSE,C,1,7.65
Ganesh,1801289049,54,CSE,C,4,6.56


     Example #2 :-Dropping columns with column name

In his code, Passed columns are dropped using column names. axis parameter is kept 1 since 1 refers to columns.

In [16]:
# dropping passed columns
data.drop(["Branch","Group"], axis = 1, inplace = True)
  
# display
data

Unnamed: 0_level_0,Regd.No,Roll No,Section,Mark
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Ankit,1801289043,17,A,8.36
Chiranjibi,1801289039,38,D,6.75
Saurav,1801289053,45,C,5.87
Ashis,1801289042,29,B,6.97
Dipanshu,1801289077,57,B,8.76
Pravas,1801289030,39,A,8.34
Lenka,1801289022,31,D,6.45
Arkasarathi,1801289047,21,B,8.63
Lal Krishna,1801289063,42,C,7.65
Ganesh,1801289049,54,C,6.56


     Dealing with Rows:-
     
In order to deal with rows, we can perform basic operations on rows like selecting, deleting, adding and renmaing.

$Row$ $Selection$ $:-$
    
--> Pandas provide a unique method to retrieve rows from a Data frame.DataFrame.loc[] method is used to retrieve rows from Pandas DataFrame.

--> Rows can also be selected by passing integer location to an iloc[] function.

     Syntax: pandas.DataFrame.loc[]

    Parameters:
    
$Index$ $label$ $:-$

-->String or list of string of index label of rows

$Return$ $type$ $:-$

Data frame or Series depending on parameters

     Example #1: Extracting single Row

In this example, Name column is made as the index column and then two single rows are extracted one by one in the form of series using index label of rows.


In [17]:
# retrieving row by loc method
first = data.loc["Ganesh"]
second = data.loc["Pravas"]
  
  
print(first, "\n\n\n", second)

Regd.No    1801289049
Roll No            54
Section             C
Mark             6.56
Name: Ganesh, dtype: object 


 Regd.No    1801289030
Roll No            39
Section             A
Mark             8.34
Name: Pravas, dtype: object


     Example #2: Multiple parameters

In this example, Name column is made as the index column and then two single rows are extracted at the same time by passing a list as parameter.

In [18]:
# retrieving rows by loc method
rows = data.loc[["Lenka", "Ashis"]]
  
# checking data type of rows
print(type(rows))
  
# display
rows

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0_level_0,Regd.No,Roll No,Section,Mark
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Lenka,1801289022,31,D,6.45
Ashis,1801289042,29,B,6.97


     Example #3: Extracting multiple rows with same index

In this example, $Branch$ name is made as the index column and one $Branch$ name is passed to $.loc$ method to check if all values with same team name have been returned or not.

In [19]:
# making data frame from csv file
data = pd.read_csv(r"C:\Users\ASUS\Documents\All CSV fILe For Pandas\pd1.csv", index_col ="Branch" )
print(data)  

# retrieving rows by loc method
rows = data.loc["CSE"]
  
# checking data type of rows
print(type(rows))
  
# display
rows

               Name     Regd.No  Roll No Section  Group  Mark
Branch                                                       
CSE           Ankit  1801289043       17       A      1  8.36
MECH     Chiranjibi  1801289039       38       D      4  6.75
EEE         Ansuman  1801289071       28       B      1  8.97
ETC          Saurav  1801289053       45       C      3  5.87
IT            Sagar  1801289091       34       A      2  6.93
CSE           Ashis  1801289042       29       B      3  6.97
CIVIL      Dipanshu  1801289077       57       B      4  8.76
MECH         Pravas  1801289030       39       A      1  8.34
CSE           Lenka  1801289022       31       D      2  6.45
ETC          Avinas  1801289036       55       C      4  7.65
CSE     Arkasarathi  1801289047       21       B      3  8.63
IT           Manish  1801289054       22       D      1  7.12
CSE     Lal Krishna  1801289063       42       C      1  7.65
CSE          Ganesh  1801289049       54       C      4  6.56
CIVIL   

Unnamed: 0_level_0,Name,Regd.No,Roll No,Section,Group,Mark
Branch,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
CSE,Ankit,1801289043,17,A,1,8.36
CSE,Ashis,1801289042,29,B,3,6.97
CSE,Lenka,1801289022,31,D,2,6.45
CSE,Arkasarathi,1801289047,21,B,3,8.63
CSE,Lal Krishna,1801289063,42,C,1,7.65
CSE,Ganesh,1801289049,54,C,4,6.56
CSE,Amitab,1801289017,11,A,3,6.87
CSE,Ayaskant,1801289056,30,B,2,7.59


      Example #4: Extracting rows between two index labels

In this example, two index label of rows are passed and all the rows that fall between those two index label have been returned (Both index labels Inclusive).

In [21]:
# making data frame from csv file
data = pd.read_csv(r"C:\Users\ASUS\Documents\All CSV fILe For Pandas\pd1.csv", index_col ="Name" )
print(data)  

# retrieving rows by loc method
rows = data.loc["Saurav":"Amitab"]
  
# checking data type of rows
print(type(rows))
  
# display
rows

                Regd.No  Roll No Branch Section  Group  Mark
Name                                                        
Ankit        1801289043       17    CSE       A      1  8.36
Chiranjibi   1801289039       38   MECH       D      4  6.75
Ansuman      1801289071       28    EEE       B      1  8.97
Saurav       1801289053       45    ETC       C      3  5.87
Sagar        1801289091       34     IT       A      2  6.93
Ashis        1801289042       29    CSE       B      3  6.97
Dipanshu     1801289077       57  CIVIL       B      4  8.76
Pravas       1801289030       39   MECH       A      1  8.34
Lenka        1801289022       31    CSE       D      2  6.45
Avinas       1801289036       55    ETC       C      4  7.65
Arkasarathi  1801289047       21    CSE       B      3  8.63
Manish       1801289054       22     IT       D      1  7.12
Lal Krishna  1801289063       42    CSE       C      1  7.65
Ganesh       1801289049       54    CSE       C      4  6.56
Abhijit      1801289078 

Unnamed: 0_level_0,Regd.No,Roll No,Branch,Section,Group,Mark
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Saurav,1801289053,45,ETC,C,3,5.87
Sagar,1801289091,34,IT,A,2,6.93
Ashis,1801289042,29,CSE,B,3,6.97
Dipanshu,1801289077,57,CIVIL,B,4,8.76
Pravas,1801289030,39,MECH,A,1,8.34
Lenka,1801289022,31,CSE,D,2,6.45
Avinas,1801289036,55,ETC,C,4,7.65
Arkasarathi,1801289047,21,CSE,B,3,8.63
Manish,1801289054,22,IT,D,1,7.12
Lal Krishna,1801289063,42,CSE,C,1,7.65


$Row$ $Addition$ $:-$
In Order to add a Row in Pandas DataFrame, we can concat the old dataframe with new one.

    Code #1:
Adding row at the top of given datarame by concatenating the old dataframe with new one.

In [22]:
# making data frame from csv file
data = pd.read_csv(r"C:\Users\ASUS\Documents\All CSV fILe For Pandas\pd1.csv" )
print(data)  

new_row = pd.DataFrame({'Name':'Amiya', 'Regd.No':'1801289035', 'Roll No':12,
                        'Branch':'CSE', 'Section':'A', 'Group':1,
                        'Mark':8.5},index =[0])
# simply concatenate both dataframes
data= pd.concat([new_row, data]).reset_index(drop = True)
data

           Name     Regd.No  Roll No Branch Section  Group  Mark
0         Ankit  1801289043       17    CSE       A      1  8.36
1    Chiranjibi  1801289039       38   MECH       D      4  6.75
2       Ansuman  1801289071       28    EEE       B      1  8.97
3        Saurav  1801289053       45    ETC       C      3  5.87
4         Sagar  1801289091       34     IT       A      2  6.93
5         Ashis  1801289042       29    CSE       B      3  6.97
6      Dipanshu  1801289077       57  CIVIL       B      4  8.76
7        Pravas  1801289030       39   MECH       A      1  8.34
8         Lenka  1801289022       31    CSE       D      2  6.45
9        Avinas  1801289036       55    ETC       C      4  7.65
10  Arkasarathi  1801289047       21    CSE       B      3  8.63
11       Manish  1801289054       22     IT       D      1  7.12
12  Lal Krishna  1801289063       42    CSE       C      1  7.65
13       Ganesh  1801289049       54    CSE       C      4  6.56
14      Abhijit  18012890

Unnamed: 0,Name,Regd.No,Roll No,Branch,Section,Group,Mark
0,Amiya,1801289035,12,CSE,A,1,8.5
1,Ankit,1801289043,17,CSE,A,1,8.36
2,Chiranjibi,1801289039,38,MECH,D,4,6.75
3,Ansuman,1801289071,28,EEE,B,1,8.97
4,Saurav,1801289053,45,ETC,C,3,5.87
5,Sagar,1801289091,34,IT,A,2,6.93
6,Ashis,1801289042,29,CSE,B,3,6.97
7,Dipanshu,1801289077,57,CIVIL,B,4,8.76
8,Pravas,1801289030,39,MECH,A,1,8.34
9,Lenka,1801289022,31,CSE,D,2,6.45


$Row$ $Deletion$ $:-$

In Order to delete a row in Pandas DataFrame, we can use the drop() method. Rows is deleted by dropping Rows by index label.

$AS$ $SAME$ $AS$ $COLUMN$ $DELETION$

$How$ $to$ $get$ $column$ $names$ $in$ $Pandas$ $dataframe$ $:-$
     
     Method #1: Simply iterating over columns

In [23]:
# making data frame from csv file
data = pd.read_csv(r"C:\Users\ASUS\Documents\All CSV fILe For Pandas\pd1.csv" )

# iterating the columns
for col in data.columns:
    print(col)

Name
Regd.No
Roll No
Branch
Section
Group
Mark


    Method #2: Using columns with dataframe object

In [24]:
# list(data) or
list(data.columns)

['Name', 'Regd.No', 'Roll No', 'Branch', 'Section', 'Group', 'Mark']

     Method #3: column.values method returs an array of index.

In [25]:
list(data.columns.values)

['Name', 'Regd.No', 'Roll No', 'Branch', 'Section', 'Group', 'Mark']

     Method #4: Using tolist() method with values with given the list of columns.

In [26]:
list(data.columns.values.tolist())

['Name', 'Regd.No', 'Roll No', 'Branch', 'Section', 'Group', 'Mark']

     Method #5: Using sorted() method

--> Sorted() method will return the list of columns sorted in alphabetical order.

In [27]:
# using sorted() method
sorted(data)

['Branch', 'Group', 'Mark', 'Name', 'Regd.No', 'Roll No', 'Section']

$How$ $to$ $rename$ $columns$ $in$ $Pandas$ $DataFrame$ $:-$

     Method #1: Using rename() function.

--> One way of renaming the columns in a Pandas dataframe is by using the rename() function.

--> This method is quite useful when we need to rename some selected columns because we need to specify information only for the columns which are to be renamed.

In [28]:
# Define a dictionary containing ICC rankings
rankings = {'test': ['India', 'South Africa', 'England',
                            'New Zealand', 'Australia'],
              'odi': ['England', 'India', 'New Zealand',
                            'South Africa', 'Pakistan'],
               't20': ['Pakistan', 'India', 'Australia',
                              'England', 'New Zealand']}
   
# Convert the dictionary into DataFrame
rankings_pd = pd.DataFrame(rankings)
   
# Before renaming the columns
print(rankings_pd)
   
rankings_pd.rename(columns = {'test':'TEST'}, inplace = True)
   
# After renaming the columns
print("\nAfter modifying first column:\n", rankings_pd.columns)

#rename multiple column
rankings_pd.rename(columns = {'test':'TEST', 'odi':'ODI',
                              't20':'T20'}, inplace = True)
   
# After renaming the columns
print(rankings_pd.columns)

           test           odi          t20
0         India       England     Pakistan
1  South Africa         India        India
2       England   New Zealand    Australia
3   New Zealand  South Africa      England
4     Australia      Pakistan  New Zealand

After modifying first column:
 Index(['TEST', 'odi', 't20'], dtype='object')
Index(['TEST', 'ODI', 'T20'], dtype='object')


     Method #2: By assigning a list of new column names

--> The columns can also be renamed by directly assigning a list containing the new names to the columns attribute of the dataframe object for which we want to rename the columns.

--> The disadvantage with this method is that we need to provide new names for all the columns even if want to rename only some of the columns.

In [29]:
# Define a dictionary containing ICC rankings
rankings = {'test': ['India', 'South Africa', 'England',
                            'New Zealand', 'Australia'],
              'odi': ['England', 'India', 'New Zealand',
                            'South Africa', 'Pakistan'],
               't20': ['Pakistan', 'India', 'Australia',
                              'England', 'New Zealand']}
  
# Convert the dictionary into DataFrame
rankings_pd = pd.DataFrame(rankings)
  
# Before renaming the columns
print(rankings_pd.columns)
  
rankings_pd.columns = ['TEST', 'ODI', 'T-20']
  
# After renaming the columns
print(rankings_pd.columns)

Index(['test', 'odi', 't20'], dtype='object')
Index(['TEST', 'ODI', 'T-20'], dtype='object')


$Get$ $unique$ $values$ $from$ $a$ $column$ $in$ $Pandas$ $DataFrame$ $:-$

       Example #1: Get the unique values of column

In [30]:
# create a dictionary with five fields each
data = {
    'A':['A1', 'A2', 'A3', 'A4', 'A5'], 
    'B':['B1', 'B2', 'B3', 'B4', 'B4'], 
    'C':['C1', 'C2', 'C3', 'C3', 'C3'], 
    'D':['D1', 'D2', 'D2', 'D2', 'D2'], 
    'E':['E1', 'E1', 'E1', 'E1', 'E1'] }
  
# Convert the dictionary into DataFrame 
df = pd.DataFrame(data)
  
# Get the unique values of 'B' column
print(df.B.unique())
# Get the unique values of 'E' column
print(df.E.unique())

['B1' 'B2' 'B3' 'B4']
['E1']


     Example #3: Get number of unique values in a column

In [31]:
# Get number of unique values in column 'C'
df.C.nunique(dropna = True)

3

$How$ $to$ $lowercase$ $column$ $names$ $in$ $Pandas$ $dataframe$ $:-$

In [32]:
# Create a simple dataframe
   
# creating a dataframe
df = pd.DataFrame({'A': ['John', 'bODAY', 'MinA', 'Peter', 'nicky'],
                  'B': ['masters', 'graduate', 'graduate',
                                   'Masters', 'Graduate'],
                  'C': [27, 23, 21, 23, 24]})
print(df)
df['A'] = df['A'].str.lower()
df

       A         B   C
0   John   masters  27
1  bODAY  graduate  23
2   MinA  graduate  21
3  Peter   Masters  23
4  nicky  Graduate  24


Unnamed: 0,A,B,C
0,john,masters,27
1,boday,graduate,23
2,mina,graduate,21
3,peter,Masters,23
4,nicky,Graduate,24


      Method #2:

In [33]:
# creating a dataframe
df = pd.DataFrame({'A': ['John', 'bODAY', 'MinA', 'Peter', 'nicky'],
                  'B': ['masters', 'graduate', 'graduate',
                                   'Masters', 'Graduate'],
                  'C': [27, 23, 21, 23, 24]})
   
df['A'].apply(lambda x: x.lower())

0     john
1    boday
2     mina
3    peter
4    nicky
Name: A, dtype: object

$THE$ $UPPER$ $CASE$ $WORK$ $SAME$ $AS$ $LOWER$ $CLASS$ $:-$

In [34]:
  
# Applying upper() method on 'B' column
df['B'].apply(lambda x: x.upper())

0     MASTERS
1    GRADUATE
2    GRADUATE
3     MASTERS
4    GRADUATE
Name: B, dtype: object

$Capitalize$ $first$ $letter$ $of$ $a$ $column$ $in$ $Pandas$ $dataframe$ $:-$

In [35]:
# creating a dataframe
df = pd.DataFrame({'A': ['john', 'bODAY', 'minA', 'Peter', 'nicky'],
                  'B': ['masters', 'graduate', 'graduate',
                                   'Masters', 'Graduate'],
                  'C': [27, 23, 21, 23, 24]})
   
df['A'] = df['A'].str.capitalize()
  
df    

Unnamed: 0,A,B,C
0,John,masters,27
1,Boday,graduate,23
2,Mina,graduate,21
3,Peter,Masters,23
4,Nicky,Graduate,24


     Method #2: Using lambda with capitalize() method

In [36]:
df['A'].apply(lambda x: x.capitalize())

0     John
1    Boday
2     Mina
3    Peter
4    Nicky
Name: A, dtype: object

$Get$ $n-largest$ $values$ $from$ $a$ $particular$ $column$ $in$ $Pandas$ $DataFrame$ $:-$

In [38]:
# making data frame from csv file
data = pd.read_csv(r"C:\Users\ASUS\Documents\All CSV fILe For Pandas\pd1.csv" )

# five largest values in column age
data.nlargest(5, ['Roll No'])

Unnamed: 0,Name,Regd.No,Roll No,Branch,Section,Group,Mark
17,Rahul,1801289086,75,ETC,A,4,9.46
19,Jiban,1801289023,67,MECH,C,3,8.59
18,Swastik,1801289075,59,IT,D,1,9.54
6,Dipanshu,1801289077,57,CIVIL,B,4,8.76
9,Avinas,1801289036,55,ETC,C,4,7.65


$THE$ $    $ $N-SMALLEST$ $WORK$ $AS$ $SAME$ $AS$ $N-LARGEST$ $:-$

In [39]:
data.nsmallest(5, ['Roll No'])

Unnamed: 0,Name,Regd.No,Roll No,Branch,Section,Group,Mark
16,Amitab,1801289017,11,CSE,A,3,6.87
15,Ayusman,1801289084,13,EEE,B,1,9.33
14,Abhijit,1801289078,15,CIVIL,A,2,8.65
0,Ankit,1801289043,17,CSE,A,1,8.36
10,Arkasarathi,1801289047,21,CSE,B,3,8.63


$Convert$ $a$ $column$ $to$ $row$ $name/index$ $in$ $Pandas$ $:-$

     Method #1: Using set_index() method.

In [40]:
# Creating a dict of lists 
data = {'Name':["Akash", "Geeku", "Pankaj", "Sumitra","Ramlal"],
       'Branch':["B.Tech", "MBA", "BCA", "B.Tech", "BCA"],
       'Score':["80","90","60", "30", "50"],
       'Result': ["Pass","Pass","Pass","Fail","Fail"]}
  
# creating a dataframe 
df = pd.DataFrame(data)

# Using set_index() method on 'Name' column
df = df.set_index('Name')
  
df

Unnamed: 0_level_0,Branch,Score,Result
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Akash,B.Tech,80,Pass
Geeku,MBA,90,Pass
Pankaj,BCA,60,Pass
Sumitra,B.Tech,30,Fail
Ramlal,BCA,50,Fail


    Method #2: Using pivot() method.

In order to convert a column to row name/index in dataframe, Pandas has a built-in function Pivot.

Now, let’s say we want Result to be the rows/index, and columns be name in our dataframe, to achieve this pandas has provided a method called Pivot. Let us see how it works,

In [41]:
# Creating a dict of lists
data = {'name':["Akash", "Geeku", "Pankaj", "Sumitra", "Ramlal"],
       'Branch':["B.Tech", "MBA", "BCA", "B.Tech", "BCA"],
       'Score':["80", "90", "60", "30", "50"],
       'Result': ["Pass", "Pass", "Pass", "Fail", "Fail"]}
  
df = pd.DataFrame(data)
  
# pivoting the dataframe
df.pivot(index ='Result', columns ='name')
  


Unnamed: 0_level_0,Branch,Branch,Branch,Branch,Branch,Score,Score,Score,Score,Score
name,Akash,Geeku,Pankaj,Ramlal,Sumitra,Akash,Geeku,Pankaj,Ramlal,Sumitra
Result,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Fail,,,,BCA,B.Tech,,,,50.0,30.0
Pass,B.Tech,MBA,BCA,,,80.0,90.0,60.0,,
