### Question1

In [1]:
import pandas as pd

course_name = ['Data Science', 'Machine Learning', 'Big Data', 'Data Engineer']
duration = [2, 3, 6, 4]

df = pd.DataFrame(data={'course_name': course_name, 'duration': duration})

# Print the data in the second row
second_row = df.iloc[1]
print(second_row)


course_name    Machine Learning
duration                      3
Name: 1, dtype: object


### Question2

In [None]:
# The functions loc and iloc are used in Pandas to access data in a DataFrame, but they have different ways of referencing and retrieving
# data:

#    loc:
#    The loc function is primarily label-based and is used to access data using labels or conditions based on the row and column names. It 
#    allows you to access data by specifying explicit row and column labels.

# Syntax: df.loc[row_label, column_label]

# Example: df.loc[2, 'column_name']

#    iloc:
#    The iloc function is primarily integer-based and is used to access data using integer positions or conditions based on the integer 
#    index. It allows you to access data by specifying explicit row and column indices.

#Syntax: df.iloc[row_index, column_index]

#Example: df.iloc[2, 1]

# Key differences between loc and iloc:

#    The loc function works with labels, while the iloc function works with integer-based positions.
#    loc uses inclusive slicing, meaning the range includes both the start and end positions, whereas iloc uses exclusive slicing.
#    With loc, you can use conditional expressions to filter rows or columns based on labels, while iloc only supports integer-based 
#    indexing.
#    loc is more intuitive and easier to understand when working with labeled data, while iloc is useful for numerical indexing and when 
#    the row and column labels are integers or not explicitly defined.

# In summary, loc is primarily used for label-based indexing and referencing using explicit row and column labels, while iloc is primarily
# used for integer-based indexing and referencing using explicit row and column indices.

### Question3

In [4]:
import pandas as pd

course_name = ['Data Science', 'Machine Learning', 'Big Data', 'Data Engineer']
duration = [2, 3, 6, 4]

df = pd.DataFrame(data={'course_name': course_name, 'duration': duration})

reindex = [3, 0, 1, 2]

new_df = df.reindex(reindex)
print("Output of new df is")
print(new_df)
print("\nOutput for new_df.loc[2]:")
print(new_df.loc[2])

print("\nOutput for new_df.iloc[2]:")
print(new_df.iloc[2])

# we print the output for new_df.loc[2], which uses label-based indexing to access the row with label/index '2' in the reindexed DataFrame
# new_df.

# Similarly, we print the output for new_df.iloc[2], which uses integer-based indexing to access the row at the integer index '2' in the
# reindexed DataFrame new_df.

Output of new df is
        course_name  duration
3     Data Engineer         4
0      Data Science         2
1  Machine Learning         3
2          Big Data         6

Output for new_df.loc[2]:
course_name    Big Data
duration              6
Name: 2, dtype: object

Output for new_df.iloc[2]:
course_name    Machine Learning
duration                      3
Name: 1, dtype: object


### Question4

In [5]:
import pandas as pd
import numpy as np

columns = ['column_1', 'column_2', 'column_3', 'column_4', 'column_5', 'column_6']
indices = [1, 2, 3, 4, 5, 6]

# Creating a DataFrame
df1 = pd.DataFrame(np.random.rand(6, 6), columns=columns, index=indices)
print(df1)
# (i) Mean of each column
column_means = df1.mean()
print("\nMean of each column:")
print(column_means)

# (ii) Standard deviation of 'column_2'
column_2_std = df1['column_2'].std()
print("\nStandard deviation of 'column_2':")
print(column_2_std)


   column_1  column_2  column_3  column_4  column_5  column_6
1  0.963995  0.859593  0.169459  0.509102  0.136301  0.106516
2  0.411833  0.365872  0.766930  0.259529  0.441675  0.928998
3  0.761074  0.009692  0.143478  0.830063  0.747629  0.777935
4  0.625033  0.052361  0.500196  0.968670  0.158500  0.595375
5  0.779292  0.040192  0.763402  0.317095  0.167989  0.927612
6  0.741753  0.273055  0.154804  0.768245  0.175050  0.206323

Mean of each column:
column_1    0.713830
column_2    0.266794
column_3    0.416378
column_4    0.608784
column_5    0.304524
column_6    0.590460
dtype: float64

Standard deviation of 'column_2':
0.323943789345209


### Question5

In [None]:
# To replace the data present in the second row of the column 'column_2' in DataFrame df1 with a string variable, and then find the mean
# of the updated column 'column_2', you can use the df1.loc[] accessor to modify the specific value and the mean() function to calculate
# the mean. However, there is an issue with replacing the data in the column with a string variable. The mean calculation will also be 
# affected by this inconsistency since the column type will be mixed (string and numeric). Pandas requires a consistent data type within a
# column to perform mathematical operations such as mean calculation.

# Here's the code with an explanation of the issue:

import pandas as pd
import numpy as np

columns = ['column_1', 'column_2', 'column_3', 'column_4', 'column_5', 'column_6']
indices = [1, 2, 3, 4, 5, 6]

# Creating a DataFrame
df1 = pd.DataFrame(np.random.rand(6, 6), columns=columns, index=indices)

# Replacing the data in the second row of 'column_2' with a string variable
df1.loc[2, 'column_2'] = 'string_value'

# Finding the mean of 'column_2'
column_2_mean = df1['column_2'].mean()
print("Mean of 'column_2':", column_2_mean)

# In this code, we first import the necessary libraries and define the columns and indices for the DataFrame df1 as you specified.

# To replace the data in the second row of 'column_2' with a string variable, we use df1.loc[2, 'column_2'] and assign it the desired 
# string value. However, since the column 'column_2' is originally of numeric data type (float), replacing a value with a string creates a
# mixed data type column. This can cause errors or unexpected behavior when performing numerical operations, including calculating the mean.

# Therefore, when calculating the mean of 'column_2' using df1['column_2'].mean(), you might encounter a TypeError due to the mixed data 
# types in the column.

# To avoid this issue, it's recommended to ensure consistent data types within a column for meaningful statistical computations.


### Question6

In [None]:
# In pandas, the "window" functions are used to perform calculations on a sliding or rolling window of data. These functions are applied to a
# specified window of values and can be useful for various analytical tasks such as calculating moving averages, cumulative sums, or other 
# aggregations over a defined period.

# Pandas provides several types of window functions that can be applied using the .rolling() method or the .expanding() method on a DataFrame or 
#  Series object. Here are some common types of window functions:

#    Rolling Window Functions:
#        .rolling().mean(): Calculates the mean of values in a rolling window.
#        .rolling().sum(): Calculates the sum of values in a rolling window.
#        .rolling().min(): Finds the minimum value in a rolling window.
#        .rolling().max(): Finds the maximum value in a rolling window.
#        .rolling().std(): Calculates the standard deviation of values in a rolling window.
#        .rolling().var(): Calculates the variance of values in a rolling window.
#        .rolling().apply(): Applies a custom function to values in a rolling window.
#    Expanding Window Functions:
#        .expanding().mean(): Calculates the mean of values in an expanding window.
#        .expanding().sum(): Calculates the sum of values in an expanding window.
#        .expanding().min(): Finds the minimum value in an expanding window.
#        .expanding().max(): Finds the maximum value in an expanding window.
#        .expanding().std(): Calculates the standard deviation of values in an expanding window.
#        .expanding().var(): Calculates the variance of values in an expanding window.
#        .expanding().apply(): Applies a custom function to values in an expanding window.

# These window functions can be combined with other pandas operations to perform complex calculations and analysis on time series or other sequential
# data.


### Question7

In [1]:
import pandas as pd
from datetime import datetime

current_date = datetime.now()
current_month_year = pd.Timestamp(current_date).strftime("%B %Y")

print("Current month and year:", current_month_year)


Current month and year: July 2023


### Question8

In [2]:
import pandas as pd

# Prompt the user to enter the first date
date1 = input("Enter the first date (YYYY-MM-DD): ")

# Prompt the user to enter the second date
date2 = input("Enter the second date (YYYY-MM-DD): ")

# Convert the input dates to Pandas Timestamp objects
date1 = pd.to_datetime(date1)
date2 = pd.to_datetime(date2)

# Calculate the difference between the dates
time_diff = date2 - date1

# Extract the number of days, hours, and minutes from the timedelta
days = time_diff.days
hours = time_diff.seconds // 3600
minutes = (time_diff.seconds // 60) % 60

# Display the result
print("Difference between the dates:")
print(f"Days: {days}")
print(f"Hours: {hours}")
print(f"Minutes: {minutes}")


Enter the first date (YYYY-MM-DD):  2023-01-01
Enter the second date (YYYY-MM-DD):  2023-07-14


Difference between the dates:
Days: 194
Hours: 0
Minutes: 0


### Question9

In [None]:
import pandas as pd

# Prompt the user to enter the file path
file_path = input("Enter the file path: ")

# Prompt the user to enter the column name
column_name = input("Enter the column name: ")

# Prompt the user to enter the category order
category_order = input("Enter the category order (comma-separated): ").split(",")

# Read the CSV file into a DataFrame
data = pd.read_csv(file_path)

# Convert the specified column to categorical data type with the specified category order
data[column_name] = pd.Categorical(data[column_name], categories=category_order, ordered=True)

# Sort the data based on the specified column
sorted_data = data.sort_values(column_name)

# Display the sorted data
print(sorted_data)

#In this program, the user is prompted to enter the file path, column name, and category order.

#The CSV file is then read into a DataFrame using pd.read_csv().

#The specified column is converted to a categorical data type using pd.Categorical(). The categories parameter is set to the category_order
# input provided by the user, and ordered is set to True to maintain the specified order of categories.

# The data is sorted based on the specified column using data.sort_values() and stored in the sorted_data variable.

# Finally, the sorted data is displayed using print().

# When you run this program, it will prompt you to enter the file path, column name, and category order. After entering the inputs,
# it will display the sorted data based on the specified column.

### Question10

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Prompt the user to enter the file path
file_path = input("Enter the file path: ")

# Read the CSV file into a DataFrame
data = pd.read_csv(file_path)

# Convert the 'Date' column to datetime
data['Date'] = pd.to_datetime(data['Date'])

# Group the data by 'Date' and 'Category' and calculate the total sales
grouped_data = data.groupby(['Date', 'Category']).sum()['Sales'].unstack()

# Plot the stacked bar chart
fig, ax = plt.subplots()
grouped_data.plot(kind='bar', stacked=True, ax=ax)

# Set labels and title
ax.set_xlabel('Date')
ax.set_ylabel('Sales')
ax.set_title('Sales of Each Product Category Over Time')

# Display the chart
plt.show()

# the user is prompted to enter the file path by using the input() function and assigning the input to the file_path variable.
# It reads the CSV file into a DataFrame, converts the 'Date' column to datetime, groups the data by 'Date' and 'Category', calculates the 
# total sales for each category, and plots the stacked bar chart.

# When you run this program, it will prompt you to enter the file path of the CSV file. After entering the file path, it will process the 
# data and display a stacked bar chart showing the sales of each product category over time.

### Question11

In [None]:
import pandas as pd

# Prompt the user to enter the file path
file_path = input("Enter the file path of the CSV file containing the student data: ")

# Read the CSV file into a DataFrame
data = pd.read_csv(file_path)

# Calculate the mean, median, and mode of the test scores
mean_score = data['Test Score'].mean()
median_score = data['Test Score'].median()
mode_scores = data['Test Score'].mode().tolist()

# Create a table to display the results
results = pd.DataFrame({'Statistic': ['Mean', 'Median', 'Mode'],
                        'Value': [mean_score, median_score, ', '.join(map(str, mode_scores))]})

# Display the results table
print(results.to_string(index=False))
# When you run this program, it will prompt you to enter the file path of the CSV file containing the student data. After entering the file path, the
# program will read the CSV file into a Pandas DataFrame using pd.read_csv().

# Next, the program calculates the mean, median, and mode of the test scores using the corresponding Pandas functions: mean(), median(), and mode().

# A results table is created using a dictionary to store the statistic names and their corresponding values. The mode scores are converted to a string
# using ', '.join(map(str, mode_scores)) to display multiple mode values as a comma-separated string.

# Finally, the program displays the results table using print(results.to_string(index=False)), ensuring that the index is not included in 
# the table output.

# When you run the program and provide the file path to the CSV file, it will calculate the mean, median, and mode of the test scores and display the
# results in a table format, as shown in the example usage you provided.