# Assignment 8

## 1. What is Pandas and how does it differ from NumPy? What are its main data structures?
**Pandas** is a powerful Python library used for data manipulation and analysis. It provides two primary data structures:
- `Series`: One-dimensional labeled array.
- `DataFrame`: Two-dimensional labeled data structure (like a table).

**Difference from NumPy:**
- NumPy focuses on numerical operations on arrays.
- Pandas provides more flexible data handling, including support for heterogeneous data types, missing data handling, and labeling.


In [None]:
import pandas as pd

# Coding Challenge
data = {
    'Name': ['Aman', 'Binod', 'Chetan'],
    'Age': [25, 30, 35],
    'City': ['New Delhi', 'jaipur', 'Chandigarh']
}
df = pd.DataFrame(data)
print(df.head())


## 2. How can you read and write data to various file formats (e.g., CSV, Excel) using Pandas?
Pandas provides convenient functions like `read_csv()`, `to_csv()`, `read_excel()`, `to_excel()`.


In [None]:
import pandas as pd

# Coding Challenge
df = pd.read_csv('data.csv')  # Read CSV
df.dropna(inplace=True)  # Remove rows with missing values
df.to_excel('cleaned_data.xlsx', index=False)  # Write to Excel


## 3. What are some common methods for indexing and selecting data in a Pandas DataFrame?
- `.loc[]`: label-based selection
- `.iloc[]`: integer-location based selection
- Boolean indexing: using conditions


In [None]:
import pandas as pd

data = {'Name': ['Ajay', 'Vinod', 'Chotu'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)

# Coding Challenge
print(df.loc[0, 'Name'])           # Using loc
print(df.iloc[1, 1])               # Using iloc
print(df[df['Age'] > 25])         # Boolean indexing


## 4. How can you handle missing data in a Pandas DataFrame?
Methods:
- `fillna()`: Fill missing values
- `dropna()`: Remove rows/columns with missing data


In [None]:
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, 2, np.nan],
    'B': [4, np.nan, 6]
})

# Coding Challenge
print(df.fillna(0))
print(df.dropna())


## 5. Explain how to perform data aggregation and grouping in Pandas.
Use `.groupby()` to split the data into groups and compute aggregate functions like `mean()`, `sum()`.


In [None]:
import pandas as pd

data = {
    'Category': ['A', 'B', 'A', 'B'],
    'Values': [10, 20, 30, 40]
}
df = pd.DataFrame(data)

# Coding Challenge
print(df.groupby('Category').mean())
print(df.groupby('Category').sum())


## 6. How can you merge, join, or concatenate multiple DataFrames in Pandas?
- `merge()`: merge on common columns or indices
- `join()`: join DataFrames using index
- `concat()`: concatenate along axis


In [None]:
import pandas as pd

df1 = pd.DataFrame({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'ID': [1, 2], 'Age': [25, 30]})

# Coding Challenge
print(pd.merge(df1, df2, on='ID'))
print(df1.join(df2.set_index('ID'), on='ID'))
print(pd.concat([df1, df2], axis=1))


## 7. What are Pandas' `pivot_table` and `crosstab` functions?
- `pivot_table()`: creates a spreadsheet-style pivot table.
- `crosstab()`: shows frequency count of two categorical variables.


In [None]:
import pandas as pd

df = pd.DataFrame({
    'Category': ['A', 'A', 'B', 'B'],
    'Type': ['X', 'Y', 'X', 'Y'],
    'Value': [10, 20, 30, 40]
})

# Coding Challenge
print(pd.pivot_table(df, values='Value', index='Category', columns='Type', aggfunc='sum'))
print(pd.crosstab(df['Category'], df['Type']))


## 8. How can you apply custom functions to columns or rows in a Pandas DataFrame?
Use `.apply()` to apply a function across axis or elements.


In [None]:
import pandas as pd

df = pd.DataFrame({'Numbers': [1, 2, 3, 4]})

# Coding Challenge
def square(x):
    return x * x

df['Squared'] = df['Numbers'].apply(square)
print(df)


## 9. Explain how to sort and filter data in a Pandas DataFrame.
Use `sort_values()` for sorting, and boolean conditions for filtering.


In [None]:
import pandas as pd

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Salary': [50000, 60000, 70000]
})

# Coding Challenge
print(df.sort_values(by=['Age', 'Salary'], ascending=[True, False]))
print(df[df['Salary'] > 55000])


## 10. How can you handle and process categorical data in Pandas?
Use `pd.get_dummies()` to convert to numerical format and `.astype('category')` for categorizing.


In [None]:
import pandas as pd

df = pd.DataFrame({
    'Category': ['A', 'B', 'A', 'C']
})

# Coding Challenge
dummies = pd.get_dummies(df['Category'])
print(dummies)
df['Category'] = df['Category'].astype('category')
print(df['Category'])