# **Unit 2 Assignment**
### **ADITYA PAL  CU24250198**
##### **BTECH CSE 14-02-2025**

## Section A: Basic Concepts

---

1.**What is Pandas, and what are its key features?**  
   Pandas is a powerful open-source data manipulation and analysis library for Python. Its key features include:
   - Data structures for handling data.
   - Handling of missing data.
   - Merging and joining data from different sources.
   - Label-based indexing and selection of data.
   - Grouping, aggregation, and reshaping of data.
   - Reading/writing from various file formats (e.g., CSV, Excel, SQL).

2.**How do you install and import Pandas in Python?**  
   To install Pandas:

In [None]:
pip install pandas

To import:

In [None]:
import pandas as pd

3.**Define a Pandas Series and explain its significance.**  
   A Pandas Series is a 1D labeled array that can hold any data type and is similar to a list but with an index attached. It’s significant because it allows for efficient operations on labeled data, and you can access data using either its position or label.

4.**How do you create a Series from a Python list? Provide an example.**  

In [None]:
import pandas as pd
data = [10, 20, 30, 40]
series = pd.Series(data)
print(series)

5.**Explain how a Series can be created from a dictionary.**  

In [None]:
import pandas as pd
data={'a':1,'b':2,'c':3}
series=pd.Series(data)
print(series)

a    1
b    2
c    3
dtype: int64


6.**What are the key differences between a Series and a DataFrame?**  
   - **Series**: 1D data structure, indexed by labels. It's essentially a column of data.
   -**DataFrame**: 2D data structure (table-like), consisting of rows and columns, where each column can be a Series.

7.**How do you create a DataFrame using a dictionary? Provide an example.**

In [None]:
data = {'Name': ['Sam', 'Ben'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)

  Name  Age
0  Sam   25
1  Ben   30


8.**What is the function of `read_csv()` in Pandas?**  
`read_csv()` function is used to read a CSV file into a Pandas DataFrame. Example:

In [None]:
df = pd.read_csv('file.csv')

9.**How can you access the first five rows of a DataFrame?**  
   Using the `.head()` method:

In [None]:
print(df.head())

10.**Explain the significance of indexing in Pandas.**  
   Indexing allows for efficient data access, selection, and alignment. It enables fast filtering and manipulation of data. With labeled indexing, you can reference data by labels instead of integer positions, which makes operations more intuitive.

## Section B: DataFrame Operations

---

1.**What are the different ways to select a column from a DataFrame?**  
   - `df.column_name`
   - `df['column_name']`

2.**Explain the difference between `.loc[]` and `.iloc[]`.**  
   - `.loc[]`: Used for label-based indexing. You select data by row/column labels.
   - `.iloc[]`: Used for position-based indexing. You select data by integer position (i.e., index).

3.**How can you filter rows based on a condition? Provide an example.**

In [None]:
df[df['Age'] > 25]

Unnamed: 0,Name,Age
1,Ben,30


4.**What is Boolean indexing in Pandas?**  
   Boolean indexing is a technique where you filter data by passing a condition that returns a boolean value (`True` or `False`) for each row or column. Example:

In [None]:
df[df['Age'] > 25]

5.**How do you access subsets of data in a DataFrame?**  
   You can access subsets using `.loc[]` (label-based) or `.iloc[]` (position-based):

In [None]:
df.loc[0:2, 'Name']
df.iloc[0:2, 0]

Unnamed: 0,Name
0,Sam
1,Ben


6.**Explain the role of `dropna()` in handling missing values.**  
   The `dropna()` function removes any rows or columns that contain missing (NaN) values:

In [None]:
df.dropna(axis=0)
df.dropna(axis=1)

7.**How do you replace missing values in a DataFrame?**

Using fillna() method

In [None]:
df.fillna(value=0)

8.**What is the purpose of the `fillna()` function?**  
   The `fillna()` function is used to replace missing (NaN) values in a DataFrame with a specified value, method, or a computed value (mean, median, etc.).

9.**How do you check for missing values in a DataFrame?**

In [None]:
df.isna()

Unnamed: 0,Name,Age
0,False,False
1,False,False


10.**Describe the different ways to handle missing data in Pandas.**  
   - Drop rows or columns with NaN:

In [None]:
df.dropna()

Unnamed: 0,Name,Age
0,Sam,25
1,Ben,30


- Fill NaN values:

In [None]:
df.fillna()

- froward fill

In [None]:
df.fillna(method="ffill")

  df.fillna(method="ffill")


Unnamed: 0,Name,Age
0,Sam,25
1,Ben,30


- Backword fill

In [None]:
df.fillna(method="bfill")

  df.fillna(method="bfill")


Unnamed: 0,Name,Age
0,Sam,25
1,Ben,30


## Section C: Advanced Operations

1.**What is the purpose of the `merge()` function in Pandas?**  
   The `merge()` function is used to combine two DataFrames based on one or more common columns or indices. It’s used to merge data from different sources.

2.**Explain different types of joins in Pandas (inner, outer, left, right).**  
   - **Inner Join**: Returns only rows that have common values in both DataFrames.
   - **Outer Join**: Returns all rows from both DataFrames, filling missing values with NaN where there’s no match.
   - **Left Join**: Returns all rows from the left DataFrame and matching rows from the right DataFrame.
   - **Right Join**: Returns all rows from the right DataFrame and matching rows from the left DataFrame.

3.**How do you merge DataFrames on specific columns?**  

In [None]:
merged_df = pd.merge(df1, df2, on='column_name')

4.**What is the difference between `merge()` and `join()`?**  
   - **merge()**: Merges DataFrames based on columns or indices and provides more control over the join type.
   - **join()**: Joins DataFrames based on indices (by default) and is simpler to use for index-based joining.

5.**How do you concatenate DataFrames vertically and horizontally?**  
   - Vertical concatenation:

In [None]:
pd.concat([df1, df2], axis=0)

- Horizontal concatenation:

In [None]:
pd.concat([df1, df2], axis=1)

6.**Explain the purpose of `groupby()` in Pandas.**  
   The `groupby()` function is used to split the data into groups based on some criteria (such as a column value) and then apply an operation to each group.

7.**How do you perform aggregation operations in Pandas?**  
   You can use `groupby()` followed by aggregation functions like `.sum()`, `.mean()`, `.count()`:

In [None]:
df.groupby('column_name').sum()

8.**What is a pivot table, and how is it created in Pandas?**  
   A pivot table is a data summarization tool that aggregates data based on specified rows and columns. It can be created using `pivot_table()`:

In [None]:
df.pivot_table(values='value_column', index='row_column', columns='column_column', aggfunc='sum')

9.**Explain how `crosstab()` differs from `pivot_table()`.**  
   - `pivot_table()`: More flexible, allows for aggregation of data (mean, sum, etc.).
   - `crosstab()`: Used for simple contingency tables, often for categorical data, without aggregation.

10.**How do you iterate over groups in a `groupby()` operation?**

In [None]:
for name, group in df.groupby('column_name'):
       print(name)
       print(group)