# Pandas Advance 2 Assignment

## Q1. Write a code to print the data present in the second row of the dataframe, df.

In [None]:
import pandas as pd
course_name = ['Data Science', 'Machine Learning', 'Big Data', 'Data Engineer']
duration = [2,3,6,4]
df = pd.DataFrame(data = {'course_name' : course_name, 'duration' : duration})

# Print the data in the second row (index 1)
print(df.iloc[1])

## Q2. What is the difference between the functions loc and iloc in pandas.DataFrame?

- `loc` is label-based indexing. It is used to access rows and columns by their labels (names).
- `iloc` is integer position-based indexing. It is used to access rows and columns by their integer positions (starting from 0).

## Q3. Reindex the given dataframe using a variable, reindex = [3,0,1,2] and store it in the variable, new_df then find the output for both new_df.loc[2] and new_df.iloc[2].

In [None]:
reindex = [3,0,1,2]
new_df = df.reindex(reindex)
print('new_df.loc[2]:')
print(new_df.loc[2])
print('\nnew_df.iloc[2]:')
print(new_df.iloc[2])

## Q4. Write a code to find the following statistical measurements for the above dataframe df1:
(i) mean of each and every column present in the dataframe.
(ii) standard deviation of column, ‘column_2’

In [None]:
import numpy as np
columns = ['column_1', 'column_2', 'column_3', 'column_4', 'column_5', 'column_6']
indices = [1,2,3,4,5,6]
df1 = pd.DataFrame(np.random.rand(6,6), columns = columns, index = indices)

# (i) Mean of each column
print(df1.mean())
# (ii) Standard deviation of column_2
print(df1['column_2'].std())

## Q5. Replace the data present in the second row of column, ‘column_2’ by a string variable then find the mean of column, column_2.
If you are getting errors in executing it then explain why.

In [None]:
df1.loc[2, 'column_2'] = 'string_value'
try:
    print(df1['column_2'].mean())
except Exception as e:
    print('Error:', e)

# Explanation: The mean cannot be calculated because the column contains a string value, which is not numeric.

## Q6. What do you understand about the windows function in pandas and list the types of windows functions?

**Window functions** in pandas are used to perform calculations across a set of rows that are related to the current row. They are commonly used for moving averages, rolling statistics, and expanding statistics.

**Types of window functions:**
- Rolling window (e.g., `rolling()`)
- Expanding window (e.g., `expanding()`)
- Exponentially weighted window (e.g., `ewm()`)

## Q7. Write a code to print only the current month and year at the time of answering this question.

In [None]:
import pandas as pd
date = pd.Timestamp.now()
print('Current Month:', date.month)
print('Current Year:', date.year)

## Q8. Write a Python program that takes in two dates as input (in the format YYYY-MM-DD) and calculates the difference between them in days, hours, and minutes using Pandas time delta. The program should prompt the user to enter the dates and display the result.

In [None]:
import pandas as pd

def date_difference():
    date1 = pd.to_datetime(input('Enter first date (YYYY-MM-DD): '))
    date2 = pd.to_datetime(input('Enter second date (YYYY-MM-DD): '))
    delta = abs(date2 - date1)
    days = delta.days
    hours = delta.seconds // 3600
    minutes = (delta.seconds % 3600) // 60
    print(f'Difference: {days} days, {hours} hours, {minutes} minutes')

# date_difference()
# If you observe a difference in outputs, it may be due to the time component in the input or the way timedelta handles time.

## Q9. Write a Python program that reads a CSV file containing categorical data and converts a specified column to a categorical data type. The program should prompt the user to enter the file path, column name, and category order, and then display the sorted data.

In [None]:
import pandas as pd

def convert_to_categorical():
    file_path = input('Enter CSV file path: ')
    col_name = input('Enter column name to convert: ')
    cat_order = input('Enter category order (comma separated): ').split(',')
    df = pd.read_csv(file_path)
    df[col_name] = pd.Categorical(df[col_name], categories=cat_order, ordered=True)
    print(df.sort_values(col_name))

# convert_to_categorical()

## Q10. Write a Python program that reads a CSV file containing sales data for different products and visualizes the data using a stacked bar chart to show the sales of each product category over time. The program should prompt the user to enter the file path and display the chart.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

def plot_stacked_bar():
    file_path = input('Enter CSV file path: ')
    df = pd.read_csv(file_path)
    df_grouped = df.groupby(['Date', 'Product']).sum().unstack('Product').fillna(0)
    df_grouped.plot(kind='bar', stacked=True)
    plt.ylabel('Sales')
    plt.title('Sales by Product Category Over Time')
    plt.show()

# plot_stacked_bar()

## Q11. You are given a CSV file containing student data that includes the student ID and their test score. Write a Python program that reads the CSV file, calculates the mean, median, and mode of the test scores, and displays the results in a table.

In [None]:
import pandas as pd

def student_stats():
    file_path = input('Enter the file path of the CSV file containing the student data: ')
    df = pd.read_csv(file_path)
    mean = df['Test Score'].mean()
    median = df['Test Score'].median()
    mode = ', '.join(map(str, df['Test Score'].mode().tolist()))
    print('+-----------+--------+')
    print('| Statistic | Value  |')
    print('+-----------+--------+')
    print(f'| Mean      | {mean:.1f}   |')
    print(f'| Median    | {median}     |')
    print(f'| Mode      | {mode} |')
    print('+-----------+--------+')

# student_stats()