
# Pandas Advanced-2 Assignment

## Q1. Write a code to print the data present in the second row of the dataframe `df`.

```python
import pandas as pd

course_name = ['Data Science', 'Machine Learning', 'Big Data', 'Data Engineer']
duration = [2, 3, 6, 4]

df = pd.DataFrame(data={'course_name': course_name, 'duration': duration})
# Printing the second row using iloc
print(df.iloc[1])
```

## Q2. What is the difference between the functions `loc` and `iloc` in pandas.DataFrame?

- `loc`: Accesses rows and columns by **labels** (index or column names).
- `iloc`: Accesses rows and columns by **integer positions** (0-based index).

#### Example:
```python
# Using loc
print(df.loc[1])  # Accessing by label (row index)

# Using iloc
print(df.iloc[1])  # Accessing by position (integer index)
```

## Q3. Reindex the given dataframe using a variable `reindex = [3, 0, 1, 2]` and store it in the variable `new_df`, then find the output for both `new_df.loc[2]` and `new_df.iloc[2]`.

```python
reindex = [3, 0, 1, 2]
new_df = df.reindex(reindex)

# Using loc and iloc
print(new_df.loc[2])
print(new_df.iloc[2])
```

## Q4. Write a code to find the following statistical measurements for the dataframe `df1`:

### (i) Mean of each and every column present in the dataframe.

```python
mean_values = df1.mean()
print(mean_values)
```

### (ii) Standard deviation of column `column_2`.

```python
std_dev_column2 = df1['column_2'].std()
print(std_dev_column2)
```

## Q5. Replace the data present in the second row of column `column_2` by a string variable then find the mean of column `column_2`.

If you get errors, explain why.

```python
df1.loc[1, 'column_2'] = 'string_value'
# Attempt to calculate the mean will raise an error
mean_column2 = df1['column_2'].mean()  # This will result in an error
```

### Explanation:
The error occurs because column `column_2` now contains both numerical data and a string, making it impossible to calculate the mean.

## Q6. What do you understand about the window function in pandas and list the types of window functions?

Window functions in pandas allow you to perform operations over a sliding window of data. These functions are useful for calculating rolling statistics, such as moving averages.

### Types of window functions:
- **Rolling Window**: `rolling()`
- **Expanding Window**: `expanding()`
- **Exponentially Weighted Window**: `ewm()`

## Q7. Write a code to print only the current month and year at the time of answering this question.

```python
import pandas as pd
current_date = pd.Timestamp.now()
print(current_date.strftime('%B, %Y'))
```

## Q8. Write a Python program that takes in two dates as input (in the format YYYY-MM-DD) and calculates the difference between them in days, hours, and minutes using Pandas time delta.

```python
date_1 = pd.Timestamp(input("Enter the first date (YYYY-MM-DD): "))
date_2 = pd.Timestamp(input("Enter the second date (YYYY-MM-DD): "))

time_difference = date_2 - date_1

days = time_difference.days
hours, remainder = divmod(time_difference.seconds, 3600)
minutes = remainder // 60

print(f"Difference: {days} days, {hours} hours, {minutes} minutes")
```

## Q9. Write a Python program that reads a CSV file containing categorical data and converts a specified column to a categorical data type.

```python
import pandas as pd

file_path = input("Enter the file path: ")
column_name = input("Enter the column name: ")
category_order = input("Enter the category order as a comma-separated list: ").split(',')

df = pd.read_csv(file_path)
df[column_name] = pd.Categorical(df[column_name], categories=category_order, ordered=True)

print(df.sort_values(by=column_name))
```

## Q10. Write a Python program that reads a CSV file containing sales data for different products and visualizes the data using a stacked bar chart.

```python
import pandas as pd
import matplotlib.pyplot as plt

file_path = input("Enter the file path: ")
df_sales = pd.read_csv(file_path)

# Assume sales data has 'Product', 'Category', and 'Sales'
df_sales.groupby(['Category', 'Product'])['Sales'].sum().unstack().plot(kind='bar', stacked=True)

plt.title('Sales of Product Categories Over Time')
plt.xlabel('Category')
plt.ylabel('Sales')
plt.show()
```

## Q11. Write a Python program that reads a CSV file containing student data, calculates the mean, median, and mode of the test scores, and displays the results in a table.

```python
import pandas as pd
from scipy import stats

file_path = input("Enter the file path of the CSV file containing the student data: ")
df_students = pd.read_csv(file_path)

mean_score = df_students['Test Score'].mean()
median_score = df_students['Test Score'].median()
mode_score = stats.mode(df_students['Test Score'])[0]

# Display the results in a table
print("+-----------+--------+")
print("| Statistic | Value  |")
print("+-----------+--------+")
print(f"| Mean      | {mean_score:.1f}    |")
print(f"| Median    | {median_score:.0f}    |")
print(f"| Mode      | {', '.join(map(str, mode_score))} |")
print("+-----------+--------+")
```
