# Pandas Practice Questions

Run the cell below to import pandas before starting.

In [None]:
import pandas as pd
import numpy as np

## TOPIC 1: BASICS & INSPECTION

**1. How do you load a CSV file named 'data.csv' into a Pandas DataFrame?**

In [None]:
import pandas as pd

df = pd.read_csv('data.csv')
print(df.head())  # optional, to check first few rows


**2. How can you display the first 5 rows and the last 5 rows of a DataFrame?**

In [None]:
# First 5 rows
print(df.head())

# Last 5 rows
print(df.tail())


**3. What is the difference between `df.info()` and `df.describe()`?**

In [None]:
 Method          Purpose                                                                  
`df.info()`      Gives summary of DataFrame: number of rows, columns, non-null counts, and data types.        
 `df.describe()` Gives statistical summary (count, mean, std, min, 25%, 50%, 75%, max) for numerical columns. 


**4. How do you check the data types of all columns in a DataFrame?**

In [None]:
print(df.dtypes)


**5. How do you find the shape (number of rows and columns) of a DataFrame?**

In [None]:
print(df.shape)


## TOPIC 2: SELECTION & INDEXING

**6. Explain the difference between `.loc[]` and `.iloc[]`.**

In [None]:
Feature	.loc[]	.iloc[]
Indexing type	Label-based (uses row/column names)	Integer position-based (uses row/column numbers)
Example usage	df.loc[2, 'Age'] → row with label 2, column 'Age'	df.iloc[2, 1] → 3rd row, 2nd column
Slicing	Includes the end label in slices	End index is excluded (like standard Python)

# Using loc (label)
df.loc[0, 'Name']

# Using iloc (position)
df.iloc[0, 1]


**7. How do you select a specific column named 'Age' from a DataFrame?**

In [None]:
# Option 1: using column name as attribute
ages = df.Age

# Option 2: using column name as key
ages = df['Age']

print(ages)


**8. How would you filter a DataFrame to show only rows where the 'Salary' column is greater than 50,000?**

In [None]:
filtered_df = df[df['Salary'] > 50000]
print(filtered_df)


**9. How do you select multiple columns (e.g., 'Name' and 'Age') at once?**

In [None]:
selected = df[['Name', 'Age']]
print(selected)


**10. How do you filter rows based on multiple conditions? (e.g., 'Age' > 30 AND 'Department' == 'Sales')**

In [None]:
filtered = df[(df['Age'] > 30) & (df['Department'] == 'Sales')]
print(filtered)


## TOPIC 3: DATA CLEANING

**11. How do you check for null (missing) values in each column?**

In [None]:
# Returns True for missing values
print(df.isnull())

# Count of missing values per column
print(df.isnull().sum())


**12. How do you fill missing values in the 'Age' column with the mean age?**

In [None]:
df['Age'].fillna(df['Age'].mean(), inplace=True)


**13. How do you drop rows that contain any missing values?**

In [None]:
df_clean = df.dropna()



**14. How do you check for and remove duplicate rows?**

In [None]:
# Check duplicates (returns True/False for each row)
print(df.duplicated())

# Remove duplicate rows
df = df.drop_duplicates()



**15. How do you rename a column from 'emp_id' to 'EmployeeID'?**

In [None]:
df.rename(columns={'emp_id':'EmployeeID'}, inplace=True)
print(df.head())


## TOPIC 4: MANIPULATION

**16. How do you add a new column 'Bonus' that is 10% of the 'Salary' column?**

In [None]:
df['Bonus'] = df['Salary'] * 0.10
print(df.head())


**17. How do you drop a column named 'Temporary' from the DataFrame?**

In [None]:
df.drop(columns=['Temporary'], inplace=True)
print(df.head())


**18. How do you sort a DataFrame by 'Date' in descending order?**

In [None]:
df_sorted = df.sort_values(by='Date', ascending=False)
print(df_sorted.head())


**19. How do you apply a custom function to every element in a column using `.apply()`?**

In [None]:
def add_tax(salary):
    return salary * 1.10

df['Salary_with_Tax'] = df['Salary'].apply(add_tax)
print(df.head())
# lambda function

df['Salary_with_Tax'] = df['Salary'].apply(lambda x: x*1.10)


**20. How do you convert a column 'Date' (currently string) into a datetime object?**

In [None]:
df['Date'] = pd.to_datetime(df['Date'])
print(df.dtypes)


## TOPIC 5: AGGREGATION

**21. How do you calculate the average 'Salary' for each 'Department' using `groupby`?**

In [None]:
avg_salary = df.groupby('Department')['Salary'].mean()
print(avg_salary)


**22. How can you get multiple statistics (mean, min, max) for a group at once?**

In [None]:
stats = df.groupby('Department')['Salary'].agg(['mean', 'min', 'max'])
print(stats)


**23. Create a pivot table showing mean Salary for each Department broken down by JobTitle.**

In [None]:
pivot = df.pivot_table(values='Salary', index='Department', columns='JobTitle', aggfunc='mean')
print(pivot)


**24. How do you count the number of unique values in the 'City' column?**

In [None]:
unique_count = df['City'].nunique()
print(unique_count)


## TOPIC 6: MERGING

**25. How do you concatenate two DataFrames vertically (stacking them)?**

In [None]:
import pandas as pd

# Example DataFrames
df1 = pd.DataFrame({'ID':[1,2], 'Name':['Alice','Bob']})
df2 = pd.DataFrame({'ID':[3,4], 'Name':['Charlie','David']})

# Vertical concatenation
df_combined = pd.concat([df1, df2], ignore_index=True)
print(df_combined)


**26. How do you merge two DataFrames (`df1` and `df2`) on a common key 'ID'?**

In [None]:
df1 = pd.DataFrame({'ID':[1,2,3], 'Name':['Alice','Bob','Charlie']})
df2 = pd.DataFrame({'ID':[2,3,4], 'Salary':[50000,60000,70000]})

merged_df = pd.merge(df1, df2, on='ID', how='inner')
print(merged_df)


**27. What is the difference between an inner join and a left join?**

In [None]:
Join Type	Description	Example result
Inner	Returns only rows with matching keys in both DataFrames	Intersection
Left	Returns all rows from left DataFrame; non-matching rows in right are filled with NaN	Left DF preserved, right DF may have NaN

## TOPIC 7: TIME SERIES

**28. How do you set the 'Date' column as the index of the DataFrame?**

In [None]:
df.set_index('Date', inplace=True)
print(df.head())


**29. If you have a datetime index, how do you resample the data to find the monthly average?**

In [None]:
# Ensure 'Date' column is datetime and set as index
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# Resample monthly and take mean
monthly_avg = df.resample('M').mean()
print(monthly_avg)


**30. How do you optimize memory by converting a string column to 'category' type?**

In [None]:
df['City'] = df['City'].astype('category')
print(df.info())
