<a href="https://colab.research.google.com/github/chantiasNK26768/data-science-visualization/blob/main/EXP3_Pandas_DataFrame_Operations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# EXPERIMENT 3 – Pandas DataFrame Operations

## Aim:
To create DataFrames and perform:
- Column & row access
- Adding, modifying, and deleting data
- Index and column renaming
- Data cleaning operations
- Sorting DataFrames

**Tools Used:**
- Python
- Pandas
- NumPy
---


*PART A: Creating a College DataFrame*

**Cell 1 – Create DataFrame**

In [None]:
import pandas as pd
import numpy as np

rng = np.random.RandomState(42)

df = pd.DataFrame({
    'name': ['venkate', 'Nagendra', 'Manu', 'Chanti', 'Naveen',
             'Kishan', 'Siva', 'Shanuu', 'Ravi', 'Madhu'],
    'Dept': ['CSE', 'ECE', 'EEE', 'MECH', 'CIVIL',
             'BIOMEDICAL', 'CSE', 'ECE', 'EEE', 'MECH'],
    'Year': [2, 4, 2, 3, 2, 1, 3, 2, 4, 1],
    'Clgfee': rng.randint(10000, 100000, size=10)
})

print("DataFrame of College")
df


DataFrame of College


Unnamed: 0,name,Dept,Year,Clgfee
0,venkate,CSE,2,25795
1,Nagendra,ECE,4,10860
2,Manu,EEE,2,86820
3,Chanti,MECH,3,64886
4,Naveen,CIVIL,2,16265
5,Kishan,BIOMEDICAL,1,92386
6,Siva,CSE,3,47194
7,Shanuu,ECE,2,97498
8,Ravi,EEE,4,54131
9,Madhu,MECH,1,70263


*PART B: Accessing Columns in a DataFrame*

**Bracket Notation**

In [None]:
print("Accessing Dept and Year columns")
df[['Dept', 'Year']]


Accessing Dept and Year columns


Unnamed: 0,Dept,Year
0,CSE,2
1,ECE,4
2,EEE,2
3,MECH,3
4,CIVIL,2
5,BIOMEDICAL,1
6,CSE,3
7,ECE,2
8,EEE,4
9,MECH,1


**Dot Notation**

In [None]:
print("Accessing College Fee column")
df.Clgfee


Accessing College Fee column


Unnamed: 0,Clgfee
0,25795
1,10860
2,86820
3,64886
4,16265
5,92386
6,47194
7,97498
8,54131
9,70263


**Multiple Columns**

In [None]:
df[['name', 'Dept', 'Year']]


Unnamed: 0,name,Dept,Year
0,venkate,CSE,2
1,Nagendra,ECE,4
2,Manu,EEE,2
3,Chanti,MECH,3
4,Naveen,CIVIL,2
5,Kishan,BIOMEDICAL,1
6,Siva,CSE,3
7,Shanuu,ECE,2
8,Ravi,EEE,4
9,Madhu,MECH,1


**Using .iloc and .loc**

In [None]:
df.iloc[0:7, [1, 3]]


Unnamed: 0,Dept,Clgfee
0,CSE,25795
1,ECE,10860
2,EEE,86820
3,MECH,64886
4,CIVIL,16265
5,BIOMEDICAL,92386
6,CSE,47194


In [None]:
df.loc[3:9, ['name', 'Year']]


Unnamed: 0,name,Year
3,Chanti,3
4,Naveen,2
5,Kishan,1
6,Siva,3
7,Shanuu,2
8,Ravi,4
9,Madhu,1


**Accessing with Conditions**

In [None]:
df[df['Clgfee'] > 90000]


Unnamed: 0,name,Dept,Year,Clgfee
5,Kishan,BIOMEDICAL,1,92386
7,Shanuu,ECE,2,97498


*PART C: Adding & Modifying Columns*

In [None]:
df['Clgfee'] = (df['Clgfee'] // 1000).astype(float)
df


Unnamed: 0,name,Dept,Year,Clgfee
0,venkate,CSE,2,25.0
1,Nagendra,ECE,4,10.0
2,Manu,EEE,2,86.0
3,Chanti,MECH,3,64.0
4,Naveen,CIVIL,2,16.0
5,Kishan,BIOMEDICAL,1,92.0
6,Siva,CSE,3,47.0
7,Shanuu,ECE,2,97.0
8,Ravi,EEE,4,54.0
9,Madhu,MECH,1,70.0


*PART D: Removing Rows and Columns*

**Removing Columns**

In [None]:
df.drop(['name', 'Clgfee'], axis=1, inplace=True)
df


Unnamed: 0,Dept,Year
0,CSE,2
1,ECE,4
2,EEE,2
3,MECH,3
4,CIVIL,2
5,BIOMEDICAL,1
6,CSE,3
7,ECE,2
8,EEE,4
9,MECH,1


**Removing Rows**

In [None]:
df.drop([0, 1, 2], axis=0, inplace=True)
df


Unnamed: 0,Dept,Year
3,MECH,3
4,CIVIL,2
5,BIOMEDICAL,1
6,CSE,3
7,ECE,2
8,EEE,4
9,MECH,1


**Removing Rows Based on Condition**

In [None]:
df.drop(df[df['Dept'] == 'CSE'].index, inplace=True)
df


Unnamed: 0,Dept,Year
3,MECH,3
4,CIVIL,2
5,BIOMEDICAL,1
7,ECE,2
8,EEE,4
9,MECH,1


*PART E: DataFrame of Cricketers (Custom Index)*

In [None]:
data = {
    "Name": ["Virat Kohli", "Babar Azam", "Kane Williamson", "Steve Smith",
             "Rohit Sharma", "David Warner", "Ben Stokes",
             "Shakib Al Hasan", "Rashid Khan", "Jasprit Bumrah"],
    "Country": ["India", "Pakistan", "New Zealand", "Australia", "India",
                "Australia", "England", "Bangladesh", "Afghanistan", "India"],
    "Role": ["Batsman", "Batsman", "Batsman", "Batsman", "Batsman",
             "Batsman", "All-rounder", "All-rounder", "Bowler", "Bowler"],
    "Matches": [275, 117, 161, 162, 262, 155, 105, 240, 100, 78],
    "Runs": [13000, 5400, 6550, 8700, 10700, 6500, 3120, 7000, 1120, 120],
    "Wickets": [4, 0, 0, 0, 8, 0, 90, 310, 190, 145]
}

df = pd.DataFrame(data, index=list("abcdefghij"))
df


Unnamed: 0,Name,Country,Role,Matches,Runs,Wickets
a,Virat Kohli,India,Batsman,275,13000,4
b,Babar Azam,Pakistan,Batsman,117,5400,0
c,Kane Williamson,New Zealand,Batsman,161,6550,0
d,Steve Smith,Australia,Batsman,162,8700,0
e,Rohit Sharma,India,Batsman,262,10700,8
f,David Warner,Australia,Batsman,155,6500,0
g,Ben Stokes,England,All-rounder,105,3120,90
h,Shakib Al Hasan,Bangladesh,All-rounder,240,7000,310
i,Rashid Khan,Afghanistan,Bowler,100,1120,190
j,Jasprit Bumrah,India,Bowler,78,120,145


*PART F: Renaming Rows & Columns*

In [None]:
df.rename(columns={'Country': 'Birth Place'}, inplace=True)


In [None]:
df.columns = ["Cricketers", "Birth Place", "Cricket Role",
              "Matches Played", "Total Runs", "Total Wickets"]


In [None]:
df.index = [str(i) for i in range(1, 11)]
df


Unnamed: 0,Cricketers,Birth Place,Cricket Role,Matches Played,Total Runs,Total Wickets
1,Virat Kohli,India,Batsman,275,13000,4
2,Babar Azam,Pakistan,Batsman,117,5400,0
3,Kane Williamson,New Zealand,Batsman,161,6550,0
4,Steve Smith,Australia,Batsman,162,8700,0
5,Rohit Sharma,India,Batsman,262,10700,8
6,David Warner,Australia,Batsman,155,6500,0
7,Ben Stokes,England,All-rounder,105,3120,90
8,Shakib Al Hasan,Bangladesh,All-rounder,240,7000,310
9,Rashid Khan,Afghanistan,Bowler,100,1120,190
10,Jasprit Bumrah,India,Bowler,78,120,145


*PART G: Data Cleaning*

**Handling Missing Values**

In [None]:
df.dropna()
df.fillna("Unknown")


Unnamed: 0,Cricketers,Birth Place,Cricket Role,Matches Played,Total Runs,Total Wickets
1,Virat Kohli,India,Batsman,275,13000,4
2,Babar Azam,Pakistan,Batsman,117,5400,0
3,Kane Williamson,New Zealand,Batsman,161,6550,0
4,Steve Smith,Australia,Batsman,162,8700,0
5,Rohit Sharma,India,Batsman,262,10700,8
6,David Warner,Australia,Batsman,155,6500,0
7,Ben Stokes,England,All-rounder,105,3120,90
8,Shakib Al Hasan,Bangladesh,All-rounder,240,7000,310
9,Rashid Khan,Afghanistan,Bowler,100,1120,190
10,Jasprit Bumrah,India,Bowler,78,120,145


**Removing Duplicates**

In [None]:
df.drop_duplicates()


Unnamed: 0,Cricketers,Birth Place,Cricket Role,Matches Played,Total Runs,Total Wickets
1,Virat Kohli,India,Batsman,275,13000,4
2,Babar Azam,Pakistan,Batsman,117,5400,0
3,Kane Williamson,New Zealand,Batsman,161,6550,0
4,Steve Smith,Australia,Batsman,162,8700,0
5,Rohit Sharma,India,Batsman,262,10700,8
6,David Warner,Australia,Batsman,155,6500,0
7,Ben Stokes,England,All-rounder,105,3120,90
8,Shakib Al Hasan,Bangladesh,All-rounder,240,7000,310
9,Rashid Khan,Afghanistan,Bowler,100,1120,190
10,Jasprit Bumrah,India,Bowler,78,120,145


**Standardizing Case**

In [None]:
df['Birth Place'] = df['Birth Place'].str.lower()
df


Unnamed: 0,Cricketers,Birth Place,Cricket Role,Matches Played,Total Runs,Total Wickets
1,Virat Kohli,india,Batsman,275,13000,4
2,Babar Azam,pakistan,Batsman,117,5400,0
3,Kane Williamson,new zealand,Batsman,161,6550,0
4,Steve Smith,australia,Batsman,162,8700,0
5,Rohit Sharma,india,Batsman,262,10700,8
6,David Warner,australia,Batsman,155,6500,0
7,Ben Stokes,england,All-rounder,105,3120,90
8,Shakib Al Hasan,bangladesh,All-rounder,240,7000,310
9,Rashid Khan,afghanistan,Bowler,100,1120,190
10,Jasprit Bumrah,india,Bowler,78,120,145


**Replacing Values**

In [None]:
df['Cricket Role'] = df['Cricket Role'].replace('Batsman', 'All-rounder')
df


Unnamed: 0,Cricketers,Birth Place,Cricket Role,Matches Played,Total Runs,Total Wickets
1,Virat Kohli,india,All-rounder,275,13000,4
2,Babar Azam,pakistan,All-rounder,117,5400,0
3,Kane Williamson,new zealand,All-rounder,161,6550,0
4,Steve Smith,australia,All-rounder,162,8700,0
5,Rohit Sharma,india,All-rounder,262,10700,8
6,David Warner,australia,All-rounder,155,6500,0
7,Ben Stokes,england,All-rounder,105,3120,90
8,Shakib Al Hasan,bangladesh,All-rounder,240,7000,310
9,Rashid Khan,afghanistan,Bowler,100,1120,190
10,Jasprit Bumrah,india,Bowler,78,120,145


*PART H: Sorting DataFrames*

In [None]:
df.sort_values(by='Total Wickets')


Unnamed: 0,Cricketers,Birth Place,Cricket Role,Matches Played,Total Runs,Total Wickets
2,Babar Azam,pakistan,All-rounder,117,5400,0
3,Kane Williamson,new zealand,All-rounder,161,6550,0
4,Steve Smith,australia,All-rounder,162,8700,0
6,David Warner,australia,All-rounder,155,6500,0
1,Virat Kohli,india,All-rounder,275,13000,4
5,Rohit Sharma,india,All-rounder,262,10700,8
7,Ben Stokes,england,All-rounder,105,3120,90
10,Jasprit Bumrah,india,Bowler,78,120,145
9,Rashid Khan,afghanistan,Bowler,100,1120,190
8,Shakib Al Hasan,bangladesh,All-rounder,240,7000,310


In [None]:
df.sort_values(by=['Birth Place', 'Cricketers'])


Unnamed: 0,Cricketers,Birth Place,Cricket Role,Matches Played,Total Runs,Total Wickets
9,Rashid Khan,afghanistan,Bowler,100,1120,190
6,David Warner,australia,All-rounder,155,6500,0
4,Steve Smith,australia,All-rounder,162,8700,0
8,Shakib Al Hasan,bangladesh,All-rounder,240,7000,310
7,Ben Stokes,england,All-rounder,105,3120,90
10,Jasprit Bumrah,india,Bowler,78,120,145
5,Rohit Sharma,india,All-rounder,262,10700,8
1,Virat Kohli,india,All-rounder,275,13000,4
3,Kane Williamson,new zealand,All-rounder,161,6550,0
2,Babar Azam,pakistan,All-rounder,117,5400,0


In [None]:
df.sort_index()


Unnamed: 0,Cricketers,Birth Place,Cricket Role,Matches Played,Total Runs,Total Wickets
1,Virat Kohli,india,All-rounder,275,13000,4
10,Jasprit Bumrah,india,Bowler,78,120,145
2,Babar Azam,pakistan,All-rounder,117,5400,0
3,Kane Williamson,new zealand,All-rounder,161,6550,0
4,Steve Smith,australia,All-rounder,162,8700,0
5,Rohit Sharma,india,All-rounder,262,10700,8
6,David Warner,australia,All-rounder,155,6500,0
7,Ben Stokes,england,All-rounder,105,3120,90
8,Shakib Al Hasan,bangladesh,All-rounder,240,7000,310
9,Rashid Khan,afghanistan,Bowler,100,1120,190


Conclusion (Markdown Cell)

### Conclusion:
This experiment demonstrated DataFrame creation, manipulation,
cleaning, and sorting using Pandas, which is essential for data analysis.
