# 📘 Mastering Pandas: 100 Coding Questions

This Jupyter Notebook contains **100 Pandas coding questions** in markdown format, followed by blank code cells where you can write your solutions.

---

### 📌 Instructions:
- Read each question carefully.
- Write your solution in the blank code cell below each question.
- Test your code with different inputs to gain mastery.
- Feel free to modify and explore different approaches! 🚀

Let's get started! 🎯


### 1. Import Pandas and create a simple DataFrame from a dictionary.

In [77]:
import pandas as pd
pd.DataFrame({"Majd" : [1,2,3,4],
              "GiGi":[5,6,7,8]})

Unnamed: 0,Majd,GiGi
0,1,5
1,2,6
2,3,7
3,4,8


### 2. Create a Pandas Series with a list of 10 numbers.

In [78]:
import pandas as pd
import numpy as np

pd.Series([1,2,3,4,5,6,7,8,9,10])

0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64

### 3. Load a CSV file into a DataFrame and display the first 5 rows.

In [79]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace', 'Hannah', 'Ivan', 'Jack', 'Ivan', 'Jack', 'Grace', 'Hannah'],  
    'Age': [45, 58, 60, 28, 49, 67, None, 60, 37, 31, 37, 31, 29, 60],  
    'City': ['New York', 'London', 'Tokyo', 'Paris', 'Berlin', 'Paris', 'Berlin', 'New York', 'Tokyo', 'London', 'Tokyo', 'London', 'Berlin', 'New York'],  
    'Salary': [95445, 86235, 62330, 46022, 65999, 88320, 57141, 65413, 87016, 58725, 87016, 58725, 57141, None],  
    'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08', '2023-01-09', '2023-01-10', '2023-01-09', '2023-01-10', '2023-01-07', '2023-01-08'],  
    'Category': ['A', 'B', 'C', 'A', 'B', 'A', 'C', 'B', 'A', 'B', 'A', 'B', None, 'B']  
}
df = pd.DataFrame(data)
df.to_csv("CSV.csv",index = False)
csv = pd.read_csv("/Users/mynimbus/Library/Mobile Documents/com~apple~CloudDocs/Pandas codes/Problems/CSV.csv")
csv.head(5)

Unnamed: 0,Name,Age,City,Salary,Date,Category
0,Alice,45.0,New York,95445.0,2023-01-01,A
1,Bob,58.0,London,86235.0,2023-01-02,B
2,Charlie,60.0,Tokyo,62330.0,2023-01-03,C
3,David,28.0,Paris,46022.0,2023-01-04,A
4,Eva,49.0,Berlin,65999.0,2023-01-05,B


### 4. Display the column names and data types of a DataFrame.

In [80]:
csv.columns,csv.dtypes

(Index(['Name', 'Age', 'City', 'Salary', 'Date', 'Category'], dtype='object'),
 Name         object
 Age         float64
 City         object
 Salary      float64
 Date         object
 Category     object
 dtype: object)

### 5. Get the shape, size, and number of dimensions of a DataFrame.

In [81]:
csv.shape,csv.size,csv.ndim

((14, 6), 84, 2)

### 6. Check for missing values in a DataFrame.

In [82]:
csv.isnull().sum()

Name        0
Age         1
City        0
Salary      1
Date        0
Category    1
dtype: int64

### 7. Convert a NumPy array into a Pandas DataFrame.

In [83]:
pd.DataFrame(np.arange(1,10))

Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5
5,6
6,7
7,8
8,9


### 8. Save a DataFrame to a CSV file without the index.

In [84]:
df.to_csv("CSV.csv",index = False)

### 9. Convert a dictionary into a DataFrame.

In [85]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace', 'Hannah', 'Ivan', 'Jack', 'Ivan', 'Jack', 'Grace', 'Hannah'],  
    'Age': [45, 58, 60, 28, 49, 67, None, 60, 37, 31, 37, 31, 29, 60],  
    'City': ['New York', 'London', 'Tokyo', 'Paris', 'Berlin', 'Paris', 'Berlin', 'New York', 'Tokyo', 'London', 'Tokyo', 'London', 'Berlin', 'New York'],  
    'Salary': [95445, 86235, 62330, 46022, 65999, 88320, 57141, 65413, 87016, 58725, 87016, 58725, 57141, None],  
    'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08', '2023-01-09', '2023-01-10', '2023-01-09', '2023-01-10', '2023-01-07', '2023-01-08'],  
    'Category': ['A', 'B', 'C', 'A', 'B', 'A', 'C', 'B', 'A', 'B', 'A', 'B', None, 'B']  
}
df = pd.DataFrame(data)

### 10. Generate a DataFrame with random numbers and set custom column names.

In [86]:
# Set seed for reproducibility
np.random.seed(42)

# Generate a 5×3 DataFrame with random numbers
df = pd.DataFrame(np.random.rand(5, 3), columns=['A', 'B', 'C'])

print(df)

          A         B         C
0  0.374540  0.950714  0.731994
1  0.598658  0.156019  0.155995
2  0.058084  0.866176  0.601115
3  0.708073  0.020584  0.969910
4  0.832443  0.212339  0.181825


### 11. Select a single column from a DataFrame.

In [87]:
csv["Age"]

0     45.0
1     58.0
2     60.0
3     28.0
4     49.0
5     67.0
6      NaN
7     60.0
8     37.0
9     31.0
10    37.0
11    31.0
12    29.0
13    60.0
Name: Age, dtype: float64

### 12. Select multiple columns from a DataFrame.

In [88]:
csv[["Name","Age"]]

Unnamed: 0,Name,Age
0,Alice,45.0
1,Bob,58.0
2,Charlie,60.0
3,David,28.0
4,Eva,49.0
5,Frank,67.0
6,Grace,
7,Hannah,60.0
8,Ivan,37.0
9,Jack,31.0


### 13. Select the first 10 rows of a DataFrame.

In [89]:
csv.head(10)

Unnamed: 0,Name,Age,City,Salary,Date,Category
0,Alice,45.0,New York,95445.0,2023-01-01,A
1,Bob,58.0,London,86235.0,2023-01-02,B
2,Charlie,60.0,Tokyo,62330.0,2023-01-03,C
3,David,28.0,Paris,46022.0,2023-01-04,A
4,Eva,49.0,Berlin,65999.0,2023-01-05,B
5,Frank,67.0,Paris,88320.0,2023-01-06,A
6,Grace,,Berlin,57141.0,2023-01-07,C
7,Hannah,60.0,New York,65413.0,2023-01-08,B
8,Ivan,37.0,Tokyo,87016.0,2023-01-09,A
9,Jack,31.0,London,58725.0,2023-01-10,B


### 14. Select the last 5 rows of a DataFrame.

In [90]:
csv.tail()

Unnamed: 0,Name,Age,City,Salary,Date,Category
9,Jack,31.0,London,58725.0,2023-01-10,B
10,Ivan,37.0,Tokyo,87016.0,2023-01-09,A
11,Jack,31.0,London,58725.0,2023-01-10,B
12,Grace,29.0,Berlin,57141.0,2023-01-07,
13,Hannah,60.0,New York,,2023-01-08,B


### 15. Use `.loc[]` to select a row based on an index label.

In [91]:
csv.loc[1]

Name               Bob
Age               58.0
City            London
Salary         86235.0
Date        2023-01-02
Category             B
Name: 1, dtype: object

### 16. Use `.iloc[]` to select a row by position.

In [92]:
csv.iloc[8]

Name              Ivan
Age               37.0
City             Tokyo
Salary         87016.0
Date        2023-01-09
Category             A
Name: 8, dtype: object

### 17. Use `.loc[]` to select specific columns and rows.

In [93]:
csv.loc[0:2,"Name":"Salary"]

Unnamed: 0,Name,Age,City,Salary
0,Alice,45.0,New York,95445.0
1,Bob,58.0,London,86235.0
2,Charlie,60.0,Tokyo,62330.0


### 18. Select rows where a column value is greater than a given number.

In [94]:
csv[csv["Salary"]>90000]

Unnamed: 0,Name,Age,City,Salary,Date,Category
0,Alice,45.0,New York,95445.0,2023-01-01,A


### 19. Select rows where a column value is between two numbers.

In [95]:
csv[(csv["Salary"]<90000) & (csv["Salary"]>80000)]

Unnamed: 0,Name,Age,City,Salary,Date,Category
1,Bob,58.0,London,86235.0,2023-01-02,B
5,Frank,67.0,Paris,88320.0,2023-01-06,A
8,Ivan,37.0,Tokyo,87016.0,2023-01-09,A
10,Ivan,37.0,Tokyo,87016.0,2023-01-09,A


### 20. Use `.query()` to filter data based on a condition.

In [96]:
csv.query("Age > 55")

Unnamed: 0,Name,Age,City,Salary,Date,Category
1,Bob,58.0,London,86235.0,2023-01-02,B
2,Charlie,60.0,Tokyo,62330.0,2023-01-03,C
5,Frank,67.0,Paris,88320.0,2023-01-06,A
7,Hannah,60.0,New York,65413.0,2023-01-08,B
13,Hannah,60.0,New York,,2023-01-08,B


### 21. Find the total number of missing values in a DataFrame.

In [97]:
csv.isnull().sum().sum()

np.int64(3)

### 22. Fill missing values with the column mean.

In [98]:
csv[["Age","Salary"]] = csv[["Age","Salary"]].apply(lambda col: col.fillna(col.mean()))

### 23. Fill missing values with forward-fill and backward-fill methods.

In [99]:
csv["Age"].fillna(method="ffill")
csv["Age"].fillna(method="bfill")

  csv["Age"].fillna(method="ffill")
  csv["Age"].fillna(method="bfill")


0     45.000000
1     58.000000
2     60.000000
3     28.000000
4     49.000000
5     67.000000
6     45.538462
7     60.000000
8     37.000000
9     31.000000
10    37.000000
11    31.000000
12    29.000000
13    60.000000
Name: Age, dtype: float64

### 24. Drop all rows with missing values.

In [100]:
csv.dropna()

Unnamed: 0,Name,Age,City,Salary,Date,Category
0,Alice,45.0,New York,95445.0,2023-01-01,A
1,Bob,58.0,London,86235.0,2023-01-02,B
2,Charlie,60.0,Tokyo,62330.0,2023-01-03,C
3,David,28.0,Paris,46022.0,2023-01-04,A
4,Eva,49.0,Berlin,65999.0,2023-01-05,B
5,Frank,67.0,Paris,88320.0,2023-01-06,A
6,Grace,45.538462,Berlin,57141.0,2023-01-07,C
7,Hannah,60.0,New York,65413.0,2023-01-08,B
8,Ivan,37.0,Tokyo,87016.0,2023-01-09,A
9,Jack,31.0,London,58725.0,2023-01-10,B


### 25. Drop all columns with missing values.

In [101]:
csv.dropna(axis=1)

Unnamed: 0,Name,Age,City,Salary,Date
0,Alice,45.0,New York,95445.0,2023-01-01
1,Bob,58.0,London,86235.0,2023-01-02
2,Charlie,60.0,Tokyo,62330.0,2023-01-03
3,David,28.0,Paris,46022.0,2023-01-04
4,Eva,49.0,Berlin,65999.0,2023-01-05
5,Frank,67.0,Paris,88320.0,2023-01-06
6,Grace,45.538462,Berlin,57141.0,2023-01-07
7,Hannah,60.0,New York,65413.0,2023-01-08
8,Ivan,37.0,Tokyo,87016.0,2023-01-09
9,Jack,31.0,London,58725.0,2023-01-10


### 26. Replace all NaN values with a specific value.

In [102]:
csv.replace(np.nan,'Unknown')

Unnamed: 0,Name,Age,City,Salary,Date,Category
0,Alice,45.0,New York,95445.0,2023-01-01,A
1,Bob,58.0,London,86235.0,2023-01-02,B
2,Charlie,60.0,Tokyo,62330.0,2023-01-03,C
3,David,28.0,Paris,46022.0,2023-01-04,A
4,Eva,49.0,Berlin,65999.0,2023-01-05,B
5,Frank,67.0,Paris,88320.0,2023-01-06,A
6,Grace,45.538462,Berlin,57141.0,2023-01-07,C
7,Hannah,60.0,New York,65413.0,2023-01-08,B
8,Ivan,37.0,Tokyo,87016.0,2023-01-09,A
9,Jack,31.0,London,58725.0,2023-01-10,B


### 27. Convert `NaN` values to 0 in a DataFrame.

In [103]:
csv.fillna(0)

Unnamed: 0,Name,Age,City,Salary,Date,Category
0,Alice,45.0,New York,95445.0,2023-01-01,A
1,Bob,58.0,London,86235.0,2023-01-02,B
2,Charlie,60.0,Tokyo,62330.0,2023-01-03,C
3,David,28.0,Paris,46022.0,2023-01-04,A
4,Eva,49.0,Berlin,65999.0,2023-01-05,B
5,Frank,67.0,Paris,88320.0,2023-01-06,A
6,Grace,45.538462,Berlin,57141.0,2023-01-07,C
7,Hannah,60.0,New York,65413.0,2023-01-08,B
8,Ivan,37.0,Tokyo,87016.0,2023-01-09,A
9,Jack,31.0,London,58725.0,2023-01-10,B


### 28. Identify duplicate rows in a DataFrame.

In [104]:
csv[csv.duplicated()]

Unnamed: 0,Name,Age,City,Salary,Date,Category
10,Ivan,37.0,Tokyo,87016.0,2023-01-09,A
11,Jack,31.0,London,58725.0,2023-01-10,B


### 29. Remove duplicate rows from a DataFrame.

In [105]:
csv.drop_duplicates()

Unnamed: 0,Name,Age,City,Salary,Date,Category
0,Alice,45.0,New York,95445.0,2023-01-01,A
1,Bob,58.0,London,86235.0,2023-01-02,B
2,Charlie,60.0,Tokyo,62330.0,2023-01-03,C
3,David,28.0,Paris,46022.0,2023-01-04,A
4,Eva,49.0,Berlin,65999.0,2023-01-05,B
5,Frank,67.0,Paris,88320.0,2023-01-06,A
6,Grace,45.538462,Berlin,57141.0,2023-01-07,C
7,Hannah,60.0,New York,65413.0,2023-01-08,B
8,Ivan,37.0,Tokyo,87016.0,2023-01-09,A
9,Jack,31.0,London,58725.0,2023-01-10,B


### 30. Replace specific values in a column with another value.

In [106]:
csv.replace(to_replace="C",value="Z")

Unnamed: 0,Name,Age,City,Salary,Date,Category
0,Alice,45.0,New York,95445.0,2023-01-01,A
1,Bob,58.0,London,86235.0,2023-01-02,B
2,Charlie,60.0,Tokyo,62330.0,2023-01-03,Z
3,David,28.0,Paris,46022.0,2023-01-04,A
4,Eva,49.0,Berlin,65999.0,2023-01-05,B
5,Frank,67.0,Paris,88320.0,2023-01-06,A
6,Grace,45.538462,Berlin,57141.0,2023-01-07,Z
7,Hannah,60.0,New York,65413.0,2023-01-08,B
8,Ivan,37.0,Tokyo,87016.0,2023-01-09,A
9,Jack,31.0,London,58725.0,2023-01-10,B


### 31. Change the data type of a column from `object` to `int`.

In [107]:
csv["Age"] = csv["Age"].astype("object")
csv["Age"] = csv["Age"].astype("float")

### 32. Convert a column to datetime format.

In [108]:
csv["Date"]=pd.to_datetime(csv["Date"])

### 33. Extract the year, month, and day from a datetime column.

In [109]:
csv["Date"].dt.year

0     2023
1     2023
2     2023
3     2023
4     2023
5     2023
6     2023
7     2023
8     2023
9     2023
10    2023
11    2023
12    2023
13    2023
Name: Date, dtype: int32

### 34. Create a new column by performing operations on two columns.

In [110]:
csv

Unnamed: 0,Name,Age,City,Salary,Date,Category
0,Alice,45.0,New York,95445.0,2023-01-01,A
1,Bob,58.0,London,86235.0,2023-01-02,B
2,Charlie,60.0,Tokyo,62330.0,2023-01-03,C
3,David,28.0,Paris,46022.0,2023-01-04,A
4,Eva,49.0,Berlin,65999.0,2023-01-05,B
5,Frank,67.0,Paris,88320.0,2023-01-06,A
6,Grace,45.538462,Berlin,57141.0,2023-01-07,C
7,Hannah,60.0,New York,65413.0,2023-01-08,B
8,Ivan,37.0,Tokyo,87016.0,2023-01-09,A
9,Jack,31.0,London,58725.0,2023-01-10,B


In [111]:
csv["Sum"] = csv.apply(lambda col : col["Age"] + col["Salary"],axis=1)

### 35. Rename multiple columns in a DataFrame.

In [112]:
csv.rename(columns={"Age" : "AGE","Sum" : "SUM"})

Unnamed: 0,Name,AGE,City,Salary,Date,Category,SUM
0,Alice,45.0,New York,95445.0,2023-01-01,A,95490.0
1,Bob,58.0,London,86235.0,2023-01-02,B,86293.0
2,Charlie,60.0,Tokyo,62330.0,2023-01-03,C,62390.0
3,David,28.0,Paris,46022.0,2023-01-04,A,46050.0
4,Eva,49.0,Berlin,65999.0,2023-01-05,B,66048.0
5,Frank,67.0,Paris,88320.0,2023-01-06,A,88387.0
6,Grace,45.538462,Berlin,57141.0,2023-01-07,C,57186.538462
7,Hannah,60.0,New York,65413.0,2023-01-08,B,65473.0
8,Ivan,37.0,Tokyo,87016.0,2023-01-09,A,87053.0
9,Jack,31.0,London,58725.0,2023-01-10,B,58756.0


### 36. Convert categorical columns into numerical values.

In [113]:
csv["City_code"] = pd.factorize(csv["City"])[0]
print(csv[["City","City_code"]])
pd.to_numeric(csv["Category"],errors="coerce")

        City  City_code
0   New York          0
1     London          1
2      Tokyo          2
3      Paris          3
4     Berlin          4
5      Paris          3
6     Berlin          4
7   New York          0
8      Tokyo          2
9     London          1
10     Tokyo          2
11    London          1
12    Berlin          4
13  New York          0


0    NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN
6    NaN
7    NaN
8    NaN
9    NaN
10   NaN
11   NaN
12   NaN
13   NaN
Name: Category, dtype: float64

### 37. Create a new column based on conditions using `apply()`.

In [114]:
csv["Sum"] = csv.apply(lambda col : col["Age"] + col["Salary"],axis=1)

### 38. Normalize all numeric columns in a DataFrame.

In [115]:
csv["Salary"] =(csv["Salary"]-csv["Salary"].min()) / (csv["Salary"].max()-csv["Salary"].min())
csv

Unnamed: 0,Name,Age,City,Salary,Date,Category,Sum,City_code
0,Alice,45.0,New York,1.0,2023-01-01,A,95490.0,0
1,Bob,58.0,London,0.81365,2023-01-02,B,86293.0,1
2,Charlie,60.0,Tokyo,0.329968,2023-01-03,C,62390.0,2
3,David,28.0,Paris,0.0,2023-01-04,A,46050.0,3
4,Eva,49.0,Berlin,0.404205,2023-01-05,B,66048.0,4
5,Frank,67.0,Paris,0.855836,2023-01-06,A,88387.0,3
6,Grace,45.538462,Berlin,0.224976,2023-01-07,C,57186.538462,4
7,Hannah,60.0,New York,0.392348,2023-01-08,B,65473.0,0
8,Ivan,37.0,Tokyo,0.829452,2023-01-09,A,87053.0,2
9,Jack,31.0,London,0.257026,2023-01-10,B,58756.0,1


### 39. Binning a numerical column into categories using `pd.cut()`.

In [116]:
bins=[0,45,70]
lables=["juniour","senior"]
csv["Expertise"] = pd.cut(csv["Age"],bins=bins,labels=lables)
csv


Unnamed: 0,Name,Age,City,Salary,Date,Category,Sum,City_code,Expertise
0,Alice,45.0,New York,1.0,2023-01-01,A,95490.0,0,juniour
1,Bob,58.0,London,0.81365,2023-01-02,B,86293.0,1,senior
2,Charlie,60.0,Tokyo,0.329968,2023-01-03,C,62390.0,2,senior
3,David,28.0,Paris,0.0,2023-01-04,A,46050.0,3,juniour
4,Eva,49.0,Berlin,0.404205,2023-01-05,B,66048.0,4,senior
5,Frank,67.0,Paris,0.855836,2023-01-06,A,88387.0,3,senior
6,Grace,45.538462,Berlin,0.224976,2023-01-07,C,57186.538462,4,senior
7,Hannah,60.0,New York,0.392348,2023-01-08,B,65473.0,0,senior
8,Ivan,37.0,Tokyo,0.829452,2023-01-09,A,87053.0,2,juniour
9,Jack,31.0,London,0.257026,2023-01-10,B,58756.0,1,juniour


### 40. Convert a DataFrame to a NumPy array.

In [117]:
csv.to_numpy()

array([['Alice', 45.0, 'New York', 1.0, Timestamp('2023-01-01 00:00:00'),
        'A', 95490.0, 0, 'juniour'],
       ['Bob', 58.0, 'London', 0.8136495154078061,
        Timestamp('2023-01-02 00:00:00'), 'B', 86293.0, 1, 'senior'],
       ['Charlie', 60.0, 'Tokyo', 0.32996782874370234,
        Timestamp('2023-01-03 00:00:00'), 'C', 62390.0, 2, 'senior'],
       ['David', 28.0, 'Paris', 0.0, Timestamp('2023-01-04 00:00:00'),
        'A', 46050.0, 3, 'juniour'],
       ['Eva', 49.0, 'Berlin', 0.40420452016267727,
        Timestamp('2023-01-05 00:00:00'), 'B', 66048.0, 4, 'senior'],
       ['Frank', 67.0, 'Paris', 0.8558363514962669,
        Timestamp('2023-01-06 00:00:00'), 'A', 88387.0, 3, 'senior'],
       ['Grace', 45.53846153846154, 'Berlin', 0.22497622564393097,
        Timestamp('2023-01-07 00:00:00'), 'C', 57186.53846153846, 4,
        'senior'],
       ['Hannah', 60.0, 'New York', 0.3923476923699492,
        Timestamp('2023-01-08 00:00:00'), 'B', 65473.0, 0, 'senior'],
       ['I

### 41. Group a DataFrame by a column and find the mean.

In [118]:
csv.groupby(["City"])["Salary"].mean()

City
Berlin      0.284719
London      0.442567
New York    0.628703
Paris       0.427918
Tokyo       0.662957
Name: Salary, dtype: float64

### 42. Group by multiple columns and compute the sum.

In [119]:
csv.groupby(["City","Expertise"])["Salary"].sum()

  csv.groupby(["City","Expertise"])["Salary"].sum()


City      Expertise
Berlin    juniour      0.224976
          senior       0.629181
London    juniour      0.514052
          senior       0.813650
New York  juniour      1.000000
          senior       0.886110
Paris     juniour      0.000000
          senior       0.855836
Tokyo     juniour      1.658904
          senior       0.329968
Name: Salary, dtype: float64

### 43. Find the maximum value of each group.

In [120]:
csv.groupby(["Expertise"])["Salary"].max()

  csv.groupby(["Expertise"])["Salary"].max()


Expertise
juniour    1.000000
senior     0.855836
Name: Salary, dtype: float64

### 44. Find the total count of each group.

In [121]:
(csv.groupby("City")["Name"].count())

City
Berlin      3
London      3
New York    3
Paris       2
Tokyo       3
Name: Name, dtype: int64

### 45. Get the top 3 rows from each group.

In [122]:
csv.groupby(["City","Expertise"]).head(3)

  csv.groupby(["City","Expertise"]).head(3)


Unnamed: 0,Name,Age,City,Salary,Date,Category,Sum,City_code,Expertise
0,Alice,45.0,New York,1.0,2023-01-01,A,95490.0,0,juniour
1,Bob,58.0,London,0.81365,2023-01-02,B,86293.0,1,senior
2,Charlie,60.0,Tokyo,0.329968,2023-01-03,C,62390.0,2,senior
3,David,28.0,Paris,0.0,2023-01-04,A,46050.0,3,juniour
4,Eva,49.0,Berlin,0.404205,2023-01-05,B,66048.0,4,senior
5,Frank,67.0,Paris,0.855836,2023-01-06,A,88387.0,3,senior
6,Grace,45.538462,Berlin,0.224976,2023-01-07,C,57186.538462,4,senior
7,Hannah,60.0,New York,0.392348,2023-01-08,B,65473.0,0,senior
8,Ivan,37.0,Tokyo,0.829452,2023-01-09,A,87053.0,2,juniour
9,Jack,31.0,London,0.257026,2023-01-10,B,58756.0,1,juniour


### 46. Use `agg()` to apply multiple aggregation functions.

In [123]:
csv.groupby(["City"])["Salary"].agg(["mean","max","sum"])

Unnamed: 0_level_0,mean,max,sum
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Berlin,0.284719,0.404205,0.854157
London,0.442567,0.81365,1.327702
New York,0.628703,1.0,1.88611
Paris,0.427918,0.855836,0.855836
Tokyo,0.662957,0.829452,1.988872


### 47. Get unique values in a column.

In [124]:
csv["City"].unique()

array(['New York', 'London', 'Tokyo', 'Paris', 'Berlin'], dtype=object)

### 48. Count the occurrences of each unique value in a column.

In [125]:
csv["City"].nunique()

5

### 49. Compute the cumulative sum of a column.

In [126]:
csv["Age"].cumsum()

0      45.000000
1     103.000000
2     163.000000
3     191.000000
4     240.000000
5     307.000000
6     352.538462
7     412.538462
8     449.538462
9     480.538462
10    517.538462
11    548.538462
12    577.538462
13    637.538462
Name: Age, dtype: float64

### 50. Compute rolling averages with a window size of 3.

In [127]:
csv["Age"].rolling(window=3).mean()

0           NaN
1           NaN
2     54.333333
3     48.666667
4     45.666667
5     48.000000
6     53.846154
7     57.512821
8     47.512821
9     42.666667
10    35.000000
11    33.000000
12    32.333333
13    40.000000
Name: Age, dtype: float64

### 51. Merge two DataFrames based on a common column.

In [139]:
left_data = {
    'ID': [1, 2, 3, 4, 5],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [25, 30, 35, 28, 24],
}

left_df = pd.DataFrame(left_data)

# Sample data for right DataFrame
right_data = {
    'ID': [1, 2, 4, 5],
    'City': ['New York', 'London', 'Paris', 'Berlin'],
    'Salary': [60000, 75000, 62000, 57000],
}

right_df = pd.DataFrame(right_data)

In [129]:
pd.merge(left_df,right_df,on="ID")

Unnamed: 0,ID,Name,Age,City,Salary
0,1,Alice,25,New York,60000
1,2,Bob,30,London,75000
2,4,David,28,Paris,62000
3,5,Eva,24,Berlin,57000


### 52. Perform an inner join on two DataFrames.

In [130]:
pd.merge(left_df,right_df,on="ID",how="inner")

Unnamed: 0,ID,Name,Age,City,Salary
0,1,Alice,25,New York,60000
1,2,Bob,30,London,75000
2,4,David,28,Paris,62000
3,5,Eva,24,Berlin,57000


### 53. Perform a left join on two DataFrames.

In [131]:
pd.merge(left_df,right_df,on="ID",how="left")

Unnamed: 0,ID,Name,Age,City,Salary
0,1,Alice,25,New York,60000.0
1,2,Bob,30,London,75000.0
2,3,Charlie,35,,
3,4,David,28,Paris,62000.0
4,5,Eva,24,Berlin,57000.0


### 54. Perform a right join on two DataFrames.

In [132]:
pd.merge(left_df,right_df,on="ID",how="right")

Unnamed: 0,ID,Name,Age,City,Salary
0,1,Alice,25,New York,60000
1,2,Bob,30,London,75000
2,4,David,28,Paris,62000
3,5,Eva,24,Berlin,57000


### 55. Perform an outer join on two DataFrames.

In [133]:
pd.merge(left_df,right_df,on="ID",how="outer")

Unnamed: 0,ID,Name,Age,City,Salary
0,1,Alice,25,New York,60000.0
1,2,Bob,30,London,75000.0
2,3,Charlie,35,,
3,4,David,28,Paris,62000.0
4,5,Eva,24,Berlin,57000.0


### 56. Concatenate two DataFrames vertically.

In [134]:
pd.concat([left_df,right_df],axis=0)

Unnamed: 0,ID,Name,Age,City,Salary
0,1,Alice,25.0,,
1,2,Bob,30.0,,
2,3,Charlie,35.0,,
3,4,David,28.0,,
4,5,Eva,24.0,,
0,1,,,New York,60000.0
1,2,,,London,75000.0
2,4,,,Paris,62000.0
3,5,,,Berlin,57000.0


### 57. Concatenate two DataFrames horizontally.

In [135]:
pd.concat([left_df,right_df],axis=1)

Unnamed: 0,ID,Name,Age,ID.1,City,Salary
0,1,Alice,25,1.0,New York,60000.0
1,2,Bob,30,2.0,London,75000.0
2,3,Charlie,35,4.0,Paris,62000.0
3,4,David,28,5.0,Berlin,57000.0
4,5,Eva,24,,,


In [136]:
left_data = {
    'ID': [1, 2, 3, 4, 5],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [25, 30, 35, 28, 24],
}

left_df = pd.DataFrame(left_data)

# Sample data for right DataFrame
right_data = {
    'ID1': [1, 2, 4, 5],
    'City': ['New York', 'London', 'Paris', 'Berlin'],
    'Salary': [60000, 75000, 62000, 57000],
}

right_df = pd.DataFrame(right_data)

### 58. Merge two DataFrames with different column names.

In [137]:
pd.merge(left_df,right_df,left_on="ID",right_on="ID1",how="outer")

Unnamed: 0,ID,Name,Age,ID1,City,Salary
0,1,Alice,25,1.0,New York,60000.0
1,2,Bob,30,2.0,London,75000.0
2,3,Charlie,35,,,
3,4,David,28,4.0,Paris,62000.0
4,5,Eva,24,5.0,Berlin,57000.0


### 59. Join two DataFrames based on the index.

In [141]:
pd.merge(left_df,right_df,left_index=True,right_index=True,how="outer")

Unnamed: 0,ID_x,Name,Age,ID_y,City,Salary
0,1,Alice,25,1.0,New York,60000.0
1,2,Bob,30,2.0,London,75000.0
2,3,Charlie,35,4.0,Paris,62000.0
3,4,David,28,5.0,Berlin,57000.0
4,5,Eva,24,,,


### 60. Stack and unstack a DataFrame.

In [147]:
import pandas as pd

# Create a DataFrame with MultiIndex columns
df = pd.DataFrame({
    ('A', 'Math'): [85, 90],
    ('A', 'Science'): [78, 88],
    ('B', 'Math'): [92, 81],
    ('B', 'Science'): [80, 79]
}, index=['Alice', 'Bob'])

# Stack columns into rows
stacked_df = df.stack()

print(stacked_df)

df.unstack().T

                A   B
Alice Math     85  92
      Science  78  80
Bob   Math     90  81
      Science  88  79


  stacked_df = df.stack()


A  Math     Alice    85
            Bob      90
   Science  Alice    78
            Bob      88
B  Math     Alice    92
            Bob      81
   Science  Alice    80
            Bob      79
dtype: int64

### 61. Sort a DataFrame based on a single column in ascending order.

In [153]:
csv.sort_values(by="Age",ascending=True)

Unnamed: 0,Name,Age,City,Salary,Date,Category,Sum,City_code,Expertise
3,David,28.0,Paris,0.0,2023-01-04,A,46050.0,3,juniour
12,Grace,29.0,Berlin,0.224976,2023-01-07,,57170.0,4,juniour
9,Jack,31.0,London,0.257026,2023-01-10,B,58756.0,1,juniour
11,Jack,31.0,London,0.257026,2023-01-10,B,58756.0,1,juniour
8,Ivan,37.0,Tokyo,0.829452,2023-01-09,A,87053.0,2,juniour
10,Ivan,37.0,Tokyo,0.829452,2023-01-09,A,87053.0,2,juniour
0,Alice,45.0,New York,1.0,2023-01-01,A,95490.0,0,juniour
6,Grace,45.538462,Berlin,0.224976,2023-01-07,C,57186.538462,4,senior
4,Eva,49.0,Berlin,0.404205,2023-01-05,B,66048.0,4,senior
1,Bob,58.0,London,0.81365,2023-01-02,B,86293.0,1,senior


### 62. Sort a DataFrame by multiple columns.

In [154]:
csv.sort_values(by=["Age","Salary"],ascending=[True,False])

Unnamed: 0,Name,Age,City,Salary,Date,Category,Sum,City_code,Expertise
3,David,28.0,Paris,0.0,2023-01-04,A,46050.0,3,juniour
12,Grace,29.0,Berlin,0.224976,2023-01-07,,57170.0,4,juniour
9,Jack,31.0,London,0.257026,2023-01-10,B,58756.0,1,juniour
11,Jack,31.0,London,0.257026,2023-01-10,B,58756.0,1,juniour
8,Ivan,37.0,Tokyo,0.829452,2023-01-09,A,87053.0,2,juniour
10,Ivan,37.0,Tokyo,0.829452,2023-01-09,A,87053.0,2,juniour
0,Alice,45.0,New York,1.0,2023-01-01,A,95490.0,0,juniour
6,Grace,45.538462,Berlin,0.224976,2023-01-07,C,57186.538462,4,senior
4,Eva,49.0,Berlin,0.404205,2023-01-05,B,66048.0,4,senior
1,Bob,58.0,London,0.81365,2023-01-02,B,86293.0,1,senior


### 63. Sort a DataFrame in descending order.

In [160]:
csv.sort_values(by="City_code",ascending=False)

Unnamed: 0,Name,Age,City,Salary,Date,Category,Sum,City_code,Expertise
4,Eva,49.0,Berlin,0.404205,2023-01-05,B,66048.0,4,senior
6,Grace,45.538462,Berlin,0.224976,2023-01-07,C,57186.538462,4,senior
12,Grace,29.0,Berlin,0.224976,2023-01-07,,57170.0,4,juniour
3,David,28.0,Paris,0.0,2023-01-04,A,46050.0,3,juniour
5,Frank,67.0,Paris,0.855836,2023-01-06,A,88387.0,3,senior
2,Charlie,60.0,Tokyo,0.329968,2023-01-03,C,62390.0,2,senior
8,Ivan,37.0,Tokyo,0.829452,2023-01-09,A,87053.0,2,juniour
10,Ivan,37.0,Tokyo,0.829452,2023-01-09,A,87053.0,2,juniour
1,Bob,58.0,London,0.81365,2023-01-02,B,86293.0,1,senior
9,Jack,31.0,London,0.257026,2023-01-10,B,58756.0,1,juniour


### 64. Rank values in a column.

In [162]:
csv["Rank"]=csv["Age"].rank(ascending=False,method="average")
csv["Rank"]

0      8.0
1      5.0
2      3.0
3     14.0
4      6.0
5      1.0
6      7.0
7      3.0
8      9.5
9     11.5
10     9.5
11    11.5
12    13.0
13     3.0
Name: Rank, dtype: float64

### 65. Get the row with the highest value in a specific column.

In [171]:
csv.loc[csv["Age"].idxmax()]


Name                       Frank
Age                         67.0
City                       Paris
Salary                  0.855836
Date         2023-01-06 00:00:00
Category                       A
Sum                      88387.0
City_code                      3
Expertise                 senior
Rank                         1.0
Name: 5, dtype: object

In [None]:
csv.nlargest(1,"Age")

In [167]:
csv.sort_values(by="Age",ascending=False).iloc[0]

Name                       Frank
Age                         67.0
City                       Paris
Salary                  0.855836
Date         2023-01-06 00:00:00
Category                       A
Sum                      88387.0
City_code                      3
Expertise                 senior
Rank                         1.0
Name: 5, dtype: object

### 66. Get the row with the lowest value in a specific column.

In [172]:
csv.sort_values(by="Age",ascending=True).iloc[0]

Name                       David
Age                         28.0
City                       Paris
Salary                       0.0
Date         2023-01-04 00:00:00
Category                       A
Sum                      46050.0
City_code                      3
Expertise                juniour
Rank                        14.0
Name: 3, dtype: object

### 67. Shuffle the rows of a DataFrame randomly.

In [173]:
csv.sample(frac=1,random_state=42).reset_index(drop=True)

Unnamed: 0,Name,Age,City,Salary,Date,Category,Sum,City_code,Expertise,Rank
0,Jack,31.0,London,0.257026,2023-01-10,B,58756.0,1,juniour,11.5
1,Jack,31.0,London,0.257026,2023-01-10,B,58756.0,1,juniour,11.5
2,Alice,45.0,New York,1.0,2023-01-01,A,95490.0,0,juniour,8.0
3,Grace,29.0,Berlin,0.224976,2023-01-07,,57170.0,4,juniour,13.0
4,Frank,67.0,Paris,0.855836,2023-01-06,A,88387.0,3,senior,1.0
5,Ivan,37.0,Tokyo,0.829452,2023-01-09,A,87053.0,2,juniour,9.5
6,Charlie,60.0,Tokyo,0.329968,2023-01-03,C,62390.0,2,senior,3.0
7,Bob,58.0,London,0.81365,2023-01-02,B,86293.0,1,senior,5.0
8,Hannah,60.0,New York,0.493763,2023-01-08,B,70485.230769,0,senior,3.0
9,Eva,49.0,Berlin,0.404205,2023-01-05,B,66048.0,4,senior,6.0


### 68. Reset the index of a DataFrame.

In [174]:
csv.reset_index(drop=True)

Unnamed: 0,Name,Age,City,Salary,Date,Category,Sum,City_code,Expertise,Rank
0,Alice,45.0,New York,1.0,2023-01-01,A,95490.0,0,juniour,8.0
1,Bob,58.0,London,0.81365,2023-01-02,B,86293.0,1,senior,5.0
2,Charlie,60.0,Tokyo,0.329968,2023-01-03,C,62390.0,2,senior,3.0
3,David,28.0,Paris,0.0,2023-01-04,A,46050.0,3,juniour,14.0
4,Eva,49.0,Berlin,0.404205,2023-01-05,B,66048.0,4,senior,6.0
5,Frank,67.0,Paris,0.855836,2023-01-06,A,88387.0,3,senior,1.0
6,Grace,45.538462,Berlin,0.224976,2023-01-07,C,57186.538462,4,senior,7.0
7,Hannah,60.0,New York,0.392348,2023-01-08,B,65473.0,0,senior,3.0
8,Ivan,37.0,Tokyo,0.829452,2023-01-09,A,87053.0,2,juniour,9.5
9,Jack,31.0,London,0.257026,2023-01-10,B,58756.0,1,juniour,11.5


### 69. Set a column as the index of a DataFrame.

In [176]:
csv.set_index("Age")

Unnamed: 0_level_0,Name,City,Salary,Date,Category,Sum,City_code,Expertise,Rank
Age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
45.0,Alice,New York,1.0,2023-01-01,A,95490.0,0,juniour,8.0
58.0,Bob,London,0.81365,2023-01-02,B,86293.0,1,senior,5.0
60.0,Charlie,Tokyo,0.329968,2023-01-03,C,62390.0,2,senior,3.0
28.0,David,Paris,0.0,2023-01-04,A,46050.0,3,juniour,14.0
49.0,Eva,Berlin,0.404205,2023-01-05,B,66048.0,4,senior,6.0
67.0,Frank,Paris,0.855836,2023-01-06,A,88387.0,3,senior,1.0
45.538462,Grace,Berlin,0.224976,2023-01-07,C,57186.538462,4,senior,7.0
60.0,Hannah,New York,0.392348,2023-01-08,B,65473.0,0,senior,3.0
37.0,Ivan,Tokyo,0.829452,2023-01-09,A,87053.0,2,juniour,9.5
31.0,Jack,London,0.257026,2023-01-10,B,58756.0,1,juniour,11.5


### 70. Sort index values in a DataFrame.

In [179]:
csv.sort_index()

Unnamed: 0,Name,Age,City,Salary,Date,Category,Sum,City_code,Expertise,Rank
0,Alice,45.0,New York,1.0,2023-01-01,A,95490.0,0,juniour,8.0
1,Bob,58.0,London,0.81365,2023-01-02,B,86293.0,1,senior,5.0
2,Charlie,60.0,Tokyo,0.329968,2023-01-03,C,62390.0,2,senior,3.0
3,David,28.0,Paris,0.0,2023-01-04,A,46050.0,3,juniour,14.0
4,Eva,49.0,Berlin,0.404205,2023-01-05,B,66048.0,4,senior,6.0
5,Frank,67.0,Paris,0.855836,2023-01-06,A,88387.0,3,senior,1.0
6,Grace,45.538462,Berlin,0.224976,2023-01-07,C,57186.538462,4,senior,7.0
7,Hannah,60.0,New York,0.392348,2023-01-08,B,65473.0,0,senior,3.0
8,Ivan,37.0,Tokyo,0.829452,2023-01-09,A,87053.0,2,juniour,9.5
9,Jack,31.0,London,0.257026,2023-01-10,B,58756.0,1,juniour,11.5


### 71. Plot a simple line chart from a DataFrame.

### 72. Create a bar plot for a categorical column.

### 73. Generate a histogram for a numerical column.

### 74. Create a scatter plot between two numerical columns.

### 75. Generate a box plot for a column.

### 76. Create a pie chart using Pandas.

### 77. Visualize the correlation matrix of a DataFrame.

### 78. Plot multiple line charts in a single figure.

### 79. Change the style and color of a plot.

### 80. Save a plot as an image file.

### 81. Use a MultiIndex in a DataFrame.

In [180]:
dat = pd.DataFrame(
    [
        [1,1,0,0],
        [0,0,1,1],
        [1,1,0,0],
        [0,0,1,1]
    ],
index=["A","B","C","D"],
columns=pd.MultiIndex.from_product([["BMSCE","VIT"],["ML","DSA"]])
)

dat

Unnamed: 0_level_0,BMSCE,BMSCE,VIT,VIT
Unnamed: 0_level_1,ML,DSA,ML,DSA
A,1,1,0,0
B,0,0,1,1
C,1,1,0,0
D,0,0,1,1


### 82. Work with time-series data and resample it.

In [181]:
import pandas as pd

# Sample time-series data
data = {
    'Date': pd.date_range(start='2024-03-01', periods=10, freq='D'),
    'Sales': [100, 120, 90, 150, 180, 130, 170, 200, 190, 160]
}

df = pd.DataFrame(data)
df.set_index('Date', inplace=True)  # Set 'Date' as the index

print(df)


df.resample("W").mean()


            Sales
Date             
2024-03-01    100
2024-03-02    120
2024-03-03     90
2024-03-04    150
2024-03-05    180
2024-03-06    130
2024-03-07    170
2024-03-08    200
2024-03-09    190
2024-03-10    160


Unnamed: 0_level_0,Sales
Date,Unnamed: 1_level_1
2024-03-03,103.333333
2024-03-10,168.571429


### 83. Calculate the rolling mean of a column.

In [182]:
csv["City_code"].rolling(window=2).mean()

0     NaN
1     0.5
2     1.5
3     2.5
4     3.5
5     3.5
6     3.5
7     2.0
8     1.0
9     1.5
10    1.5
11    1.5
12    2.5
13    2.0
Name: City_code, dtype: float64

### 84. Perform cross-tabulation of two columns.

In [183]:
pd.crosstab(csv["Expertise"],csv["City"])

City,Berlin,London,New York,Paris,Tokyo
Expertise,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
juniour,1,2,1,1,2
senior,2,1,2,1,1


### 85. Create a pivot table from a DataFrame.

In [184]:
csv.pivot_table(index="Name",columns="Expertise",values="Age")

  csv.pivot_table(index="Name",columns="Expertise",values="Age")


Expertise,juniour,senior
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Alice,45.0,
Bob,,58.0
Charlie,,60.0
David,28.0,
Eva,,49.0
Frank,,67.0
Grace,29.0,45.538462
Hannah,,60.0
Ivan,37.0,
Jack,31.0,


### 86. Apply a lambda function to a DataFrame.

In [185]:
csv["Sum"] = csv.apply(lambda col : col["Age"] + col["Salary"],axis=1)

### 87. Find and remove outliers using IQR method.

In [None]:
import pandas as pd

def remove_outliers(df, columns):
    """Removes outliers in specified columns based on the IQR method."""
    cleaned_df = df.copy()

    for col in columns:
        if col not in df.columns:
            print(f"Warning: Column '{col}' not found in DataFrame!")
            continue  # Skip missing columns
        
        Q1 = cleaned_df[col].quantile(0.25)
        Q3 = cleaned_df[col].quantile(0.75)
        IQR = Q3 - Q1
        lower_bound = Q1 - 1.5 * IQR
        upper_bound = Q3 + 1.5 * IQR
        
        # Keep only the rows within the bounds
        cleaned_df = cleaned_df[(cleaned_df[col] >= lower_bound) & (cleaned_df[col] <= upper_bound)]
    
    return cleaned_df

# Example DataFrame
data = pd.DataFrame({
    'A': [10, 12, 14, 100, 15, 18, 110],  # 100 and 110 are outliers
    'B': [5, 8, 7, 9, 6, 200, 220]        # 200 and 220 are outliers
})

# Remove Outliers
cleaned_data = remove_outliers(data, ['A', 'B'])
print(cleaned_data)

### 88. Detect and replace outliers using the Z-score method.

In [None]:
import pandas as pd
import numpy as np

def remove_outliers_zscore(df, columns, threshold=3):
    """Removes outliers using the Z-score method."""
    cleaned_df = df.copy()

    for col in columns:
        if col not in df.columns:
            print(f"Warning: Column '{col}' not found in DataFrame!")
            continue  # Skip missing columns

        # Compute Z-scores
        mean = cleaned_df[col].mean()
        std = cleaned_df[col].std()
        z_scores = (cleaned_df[col] - mean) / std

        # Keep only rows where Z-score is within the threshold
        cleaned_df = cleaned_df[np.abs(z_scores) <= threshold]

    return cleaned_df

# Example DataFrame
data = pd.DataFrame({
    'A': [10, 12, 14, 100, 15, 18, 110],  # 100 and 110 are outliers
    'B': [5, 8, 7, 9, 6, 200, 220]        # 200 and 220 are outliers
})

# Remove Outliers Using Z-score Method
cleaned_data = remove_outliers_zscore(data, ['A', 'B'])
print(cleaned_data)

### 89. Compute weighted averages using Pandas.

In [188]:
import pandas as pd
import numpy as np

# Example DataFrame
data = pd.DataFrame({
    'Value': [10, 20, 30, 40, 50],
    'Weight': [1, 2, 3, 4, 5]  # Higher weight means more influence
})

# Compute Weighted Average
weighted_avg = np.average(data['Value'], weights=data['Weight'])
print("Weighted Average:", weighted_avg)

Weighted Average: 36.666666666666664


In [189]:
weighted_avg = (data['Value'] * data['Weight']).sum() / data['Weight'].sum()
print("Weighted Average:", weighted_avg)

Weighted Average: 36.666666666666664


### 90. Convert a DataFrame from wide to long format.

### 91. Read a large CSV file in chunks using `chunksize`.

### 92. Use `astype('category')` to optimize memory usage.

### 93. Convert a DataFrame to a dictionary.

In [190]:
dict(csv)

{'Name': 0       Alice
 1         Bob
 2     Charlie
 3       David
 4         Eva
 5       Frank
 6       Grace
 7      Hannah
 8        Ivan
 9        Jack
 10       Ivan
 11       Jack
 12      Grace
 13     Hannah
 Name: Name, dtype: object,
 'Age': 0     45.000000
 1     58.000000
 2     60.000000
 3     28.000000
 4     49.000000
 5     67.000000
 6     45.538462
 7     60.000000
 8     37.000000
 9     31.000000
 10    37.000000
 11    31.000000
 12    29.000000
 13    60.000000
 Name: Age, dtype: float64,
 'City': 0     New York
 1       London
 2        Tokyo
 3        Paris
 4       Berlin
 5        Paris
 6       Berlin
 7     New York
 8        Tokyo
 9       London
 10       Tokyo
 11      London
 12      Berlin
 13    New York
 Name: City, dtype: object,
 'Salary': 0     1.000000
 1     0.813650
 2     0.329968
 3     0.000000
 4     0.404205
 5     0.855836
 6     0.224976
 7     0.392348
 8     0.829452
 9     0.257026
 10    0.829452
 11    0.257026
 12    0.224976
 13

### 94. Use `to_parquet()` for faster storage than CSV.

### 95. Use `query()` for fast filtering instead of Boolean indexing.

### 96. Use `explode()` to split lists in a column into separate rows.

### 97. Convert timestamps to Unix format for efficiency.

### 98. Parallelize Pandas operations using `modin.pandas`.

### 99. Use `memory_usage()` to check DataFrame memory consumption.

### 100. Use `df.style.format()` to display numbers with specific formatting.