Once your data is clean, the next step is to reshape, reformat, and reorder it as
needed for analysis. Pandas gives you plenty of flexible tools to do this.

**Data Loading**

In [3]:
import pandas as pd

In [5]:
df = pd.read_csv("B'Wood Movies Data.csv")
df

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9


**1. Sorting**- Manipulating the sequence of the columns according to needs

**i. Sort by Values.**

In [6]:
# Sorting in Ascending Order
df.sort_values('Year')

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
10,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
7,Hrithik Roshan,War,2019,Action,475,6.5
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


In [7]:
# Sorting in Descending Order
df.sort_values("Year", ascending=False)

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
9,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0


In [8]:
#Sorting By Columns. When the one columns gets ties with value. The sorting will continue
#Using other Column. Suppose in our dataset if the "Year" column gets ties then it will sort it using 
#IMDB Column. The lowest IMDb will get the priority
df.sort_values(["Year", "IMDb"])

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
10,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


**Lets create the new DataFrame from this to perform sorting operation on index.**

In [10]:
#Creating the copy because panda somtime returns the view(change in origial data) df2
df2 = df.sort_values(["Year", "IMDb"]).copy() 
df2

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
10,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


In [11]:
#Resetting the index of our new data frame df2
df2.reset_index()

Unnamed: 0,index,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,2,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,10,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
3,4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
4,6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
5,5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,7,Hrithik Roshan,War,2019,Action,475,6.5
7,8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
8,11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
9,3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


In [13]:
# Reset Index does not change the origial index of the data frame
df2

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
10,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


In [14]:
# As we used the reset_index() Function on our df2 it surely reset the index of our df2
# but it also created the separate index column to store the indexes of the original df we can drop it
#This method will drop the index column. This will not change the change the original df2
df2.reset_index(drop=True)

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
3,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
4,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Hrithik Roshan,War,2019,Action,475,6.5
7,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
8,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
9,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


In [15]:
# The original df2 is still Unchanged.
df2

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
2,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
10,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
6,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
7,Hrithik Roshan,War,2019,Action,475,6.5
8,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
11,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
3,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


In [23]:
# We can change the index of the df2 permanently by using 'inplace' method.
df2.reset_index(drop=True, inplace=True)

In [24]:
df2

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
3,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
4,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Hrithik Roshan,War,2019,Action,475,6.5
7,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
8,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
9,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


**ii. Sort By Index**- The df.sort_index() function is used to sort the DataFrame based on its index
values. If the index is not in a sequential order (e.g., you have dropped rows or
performed other operations that change the index), you can use sort_index() to
restore it to a sorted order.

In [25]:
# Lets perform sorting through index.
df2.sort_values("IMDb")

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
9,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6
10,Kartik Aaryan,Bhool Bhulaiyaa 2,2022,Horror Comedy,266,5.9
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
6,Hrithik Roshan,War,2019,Action,475,6.5
3,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
7,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
11,Shah Rukh Khan,Pathaan,2023,Action,1050,7.2
4,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
8,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2


In [26]:
df2.sort_index()        

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb
0,Aamir Khan,Dangal,2016,Biography,2024,8.4
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0
2,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1
3,Ranveer Singh,Padmaavat,2018,Historical,585,7.0
4,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3
6,Hrithik Roshan,War,2019,Action,475,6.5
7,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0
8,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2
9,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6


**2. Ranking**- It is used to assign ranks to a numeric value in a column, like scores or point.
By default it give average rank to the tied values which can result in decimal number.

In [29]:
df2["Rank"] = df2["IMDb"].rank()
df2
# Here we can see that the higher the 'IMDb' greater the rank.

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb,Rank
0,Aamir Khan,Dangal,2016,Biography,2024,8.4,12.0
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0,3.0
2,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1,4.0
3,Ranveer Singh,Padmaavat,2018,Historical,585,7.0,6.5
4,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5,9.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3,11.0
6,Hrithik Roshan,War,2019,Action,475,6.5,5.0
7,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0,6.5
8,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2,10.0
9,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6,1.0


In [31]:
# To give the rank 1 to the Highesh "IMDb" we can do
df2["Rank"] = df2["IMDb"].rank(ascending=False)
df2

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb,Rank
0,Aamir Khan,Dangal,2016,Biography,2024,8.4,1.0
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0,10.0
2,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1,9.0
3,Ranveer Singh,Padmaavat,2018,Historical,585,7.0,6.5
4,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5,4.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3,2.0
6,Hrithik Roshan,War,2019,Action,475,6.5,8.0
7,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0,6.5
8,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2,3.0
9,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6,12.0


In [32]:
# As we can see in the above data the ties 'IMDb" getting the decimal rank to give the rank
#in integer we will use dense method.
df2["Rank"] = df2["IMDb"].rank(ascending=False, method="dense")
df2

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb,Rank
0,Aamir Khan,Dangal,2016,Biography,2024,8.4,1.0
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0,9.0
2,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1,8.0
3,Ranveer Singh,Padmaavat,2018,Historical,585,7.0,6.0
4,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5,4.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3,2.0
6,Hrithik Roshan,War,2019,Action,475,6.5,7.0
7,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0,6.0
8,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2,3.0
9,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6,11.0


**3. Renaming Columns and Indexes**

In [35]:
df2.rename(columns={"IMDb" : "IMDb Rating"})   # we can do inplace=True to make changes in original data.

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb Rating,Rank
0,Aamir Khan,Dangal,2016,Biography,2024,8.4,1.0
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0,9.0
2,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1,8.0
3,Ranveer Singh,Padmaavat,2018,Historical,585,7.0,6.0
4,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5,4.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3,2.0
6,Hrithik Roshan,War,2019,Action,475,6.5,7.0
7,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0,6.0
8,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2,3.0
9,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6,11.0


In [41]:
df2.rename(index={0 : "row 1", 1 : "row 2", 2 : "row 3", 3 : "row 4"}) # so on,  we can do inplace=True to make changes in original data.

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb,Rank
row 1,Aamir Khan,Dangal,2016,Biography,2024,8.4,1.0
row 2,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0,9.0
row 3,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1,8.0
row 4,Ranveer Singh,Padmaavat,2018,Historical,585,7.0,6.0
4,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5,4.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3,2.0
6,Hrithik Roshan,War,2019,Action,475,6.5,7.0
7,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0,6.0
8,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2,3.0
9,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6,11.0


In [43]:
df2

Unnamed: 0,Actor,Film,Year,Genre,BoxOffice(INR Crore),IMDb,Rank
0,Aamir Khan,Dangal,2016,Biography,2024,8.4,1.0
1,Salman Khan,Tiger Zinda Hai,2017,Action,565,6.0,9.0
2,Varun Dhawan,Badrinath Ki Dulhania,2017,Romantic Comedy,201,6.1,8.0
3,Ranveer Singh,Padmaavat,2018,Historical,585,7.0,6.0
4,Rajkummar Rao,Stree,2018,Horror Comedy,180,7.5,4.0
5,Ayushmann Khurrana,Andhadhun,2018,Thriller,111,8.3,2.0
6,Hrithik Roshan,War,2019,Action,475,6.5,7.0
7,Akshay Kumar,Good Newwz,2019,Comedy,318,7.0,6.0
8,Vicky Kaushal,Uri: The Surgical Strike,2019,Action,342,8.2,3.0
9,Ranbir Kapoor,Brahmastra,2022,Fantasy,431,5.6,11.0


**4. Changing Column Order**

In [46]:
df2 = df2[["Film", "Actor", "Genre", "Year", "IMDb", "Rank", "BoxOffice(INR Crore)"]]
df2

Unnamed: 0,Film,Actor,Genre,Year,IMDb,Rank,BoxOffice(INR Crore)
0,Dangal,Aamir Khan,Biography,2016,8.4,1.0,2024
1,Tiger Zinda Hai,Salman Khan,Action,2017,6.0,9.0,565
2,Badrinath Ki Dulhania,Varun Dhawan,Romantic Comedy,2017,6.1,8.0,201
3,Padmaavat,Ranveer Singh,Historical,2018,7.0,6.0,585
4,Stree,Rajkummar Rao,Horror Comedy,2018,7.5,4.0,180
5,Andhadhun,Ayushmann Khurrana,Thriller,2018,8.3,2.0,111
6,War,Hrithik Roshan,Action,2019,6.5,7.0,475
7,Good Newwz,Akshay Kumar,Comedy,2019,7.0,6.0,318
8,Uri: The Surgical Strike,Vicky Kaushal,Action,2019,8.2,3.0,342
9,Brahmastra,Ranbir Kapoor,Fantasy,2022,5.6,11.0,431


**We can also move a particular column in front**

In [48]:
cols = ["Actor"] + [col for col in df2.columns if col != "Actor"]
df2 = df2[cols]

In [49]:
df2

Unnamed: 0,Actor,Film,Genre,Year,IMDb,Rank,BoxOffice(INR Crore)
0,Aamir Khan,Dangal,Biography,2016,8.4,1.0,2024
1,Salman Khan,Tiger Zinda Hai,Action,2017,6.0,9.0,565
2,Varun Dhawan,Badrinath Ki Dulhania,Romantic Comedy,2017,6.1,8.0,201
3,Ranveer Singh,Padmaavat,Historical,2018,7.0,6.0,585
4,Rajkummar Rao,Stree,Horror Comedy,2018,7.5,4.0,180
5,Ayushmann Khurrana,Andhadhun,Thriller,2018,8.3,2.0,111
6,Hrithik Roshan,War,Action,2019,6.5,7.0,475
7,Akshay Kumar,Good Newwz,Comedy,2019,7.0,6.0,318
8,Vicky Kaushal,Uri: The Surgical Strike,Action,2019,8.2,3.0,342
9,Ranbir Kapoor,Brahmastra,Fantasy,2022,5.6,11.0,431
