In [1]:
# Consider following code to answer further questions:
# import pandas as pd
# course_name = [‘Data Science’, ‘Machine Learning’, ‘Big Data’, ‘Data Engineer’]
# duration = [2,3,6,4]
# df = pd.DataFrame(data = {‘course_name’ : course_name, ‘duration’ : duration})
# Q1. Write a code to print the data present in the second row of the dataframe, df.

import pandas as pd

course_name = ['Data Science', 'Machine Learning', 'Big Data', 'Data Engineer']
duration = [2, 3, 6, 4]
df = pd.DataFrame(data={'course_name': course_name, 'duration': duration})

second_row = df.iloc[1]
print(second_row)


course_name    Machine Learning
duration                      3
Name: 1, dtype: object


In [None]:
# Q2. What is the difference between the functions loc and iloc in pandas.DataFrame?

The functions `loc` and `iloc` are both used for indexing and selecting data in a Pandas DataFrame, but they have some differences in their usage and behavior:

1. loc: 
   - `loc` is primarily label-based. It is used to select data based on the row and column labels.
   - With `loc`, you specify the row label(s) and column label(s) explicitly.
   - It accepts label-based slicing and indexing, where the start and end points are inclusive.
   - The syntax for using `loc` is `df.loc[row_label(s), column_label(s)]`.

2. iloc:
   - `iloc` is primarily integer-based. It is used to select data based on integer positions of rows and columns.
   - With `iloc`, you specify the integer position(s) of the rows and columns.
   - It accepts integer-based slicing and indexing, where the start point is inclusive and the end point is exclusive (similar to regular Python indexing).
   - The syntax for using `iloc` is `df.iloc[row_position(s), column_position(s)]`.



In summary, `loc` is used for label-based selection (using row and column labels), while `iloc` is used for integer-based selection (using integer positions of rows and columns).

In [None]:
# Q3. Reindex the given dataframe using a variable, reindex = [3,0,1,2] and store it in the variable, new_df
# then find the output for both new_df.loc[2] and new_df.iloc[2].
# Did you observe any difference in both the outputs? If so then explain it.


To reindex the given DataFrame `df1` using the variable `reindex = [3,0,1,2]` and store it in the variable `new_df`, we can use the `reindex()` function. Here's the code:


reindex = [3, 0, 1, 2]
new_df = df1.reindex(reindex)

print(new_df.loc[2])
print(new_df.iloc[2])


The output for `new_df.loc[2]` will give the row with index label 2 in the `new_df`. Since we reindexed the DataFrame using the `reindex` variable, the row with index label 2 in the original DataFrame (`df1`) will be at index label 1 in the `new_df`.

On the other hand, the output for `new_df.iloc[2]` will give the row with integer position 2 in the `new_df`. Here, the integer position refers to the position of the row in the `new_df` DataFrame after reindexing.

Therefore, there will be a difference in the outputs of `new_df.loc[2]` and `new_df.iloc[2]`. `new_df.loc[2]` will give the row with index label 2 in the `new_df`, while `new_df.iloc[2]` will give the row with integer position 2 in the `new_df`.


In [None]:
# Consider the below code to answer further questions:

# import pandas as pd
# import numpy as np
# columns = ['column_1', 'column_2', 'column_3', 'column_4', 'column_5', 'column_6']
# indices = [1,2,3,4,5,6]
# #Creating a dataframe:
# df1 = pd.DataFrame(np.random.rand(6,6), columns = columns, index = indices)


# Q4. Write a code to find the following statistical measurements for the above dataframe df1:
# (i) mean of each and every column present in the dataframe.
# (ii) standard deviation of column, ‘column_2’


To find the statistical measurements for the DataFrame `df1`, you can use the appropriate functions provided by pandas. 
Here's the code to calculate the mean of each column and the standard deviation of the 'column_2':

import pandas as pd
import numpy as np

columns = ['column_1', 'column_2', 'column_3', 'column_4', 'column_5', 'column_6']
indices = [1, 2, 3, 4, 5, 6]
df1 = pd.DataFrame(np.random.rand(6, 6), columns=columns, index=indices)

# (i) Mean of each column
column_means = df1.mean()
print("Mean of each column:")
print(column_means)

# (ii) Standard deviation of 'column_2'
column2_std = df1['column_2'].std()
print("Standard deviation of column_2:")
print(column2_std)


This code calculates the mean of each column using the `mean()` function on the DataFrame `df1`. The result is stored in 
the `column_means` variable and then printed.

To calculate the standard deviation of 'column_2', we use the `std()` function on the specific column 'column_2' of the DataFrame `df1`. 
The result is stored in the `column2_std` variable and then printed.

In [None]:
# Q5. Replace the data present in the second row of column, ‘column_2’ by a string variable then find the
# mean of column, column_2.
# If you are getting errors in executing it then explain why.
# [Hint: To replace the data use df1.loc[] and equate this to string data of your choice.]

To replace the data present in the second row of the 'column_2' with a string variable in the DataFrame `df1`, you can use 
the `df1.loc[]` syntax and assign a string value to it. However, please note that this operation will result in an error 
because 'column_2' is expected to contain numerical data, and assigning a string value would violate the column's data type.


Here's an example of how the code would look like, but it will raise an error:


df1.loc[2, 'column_2'] = 'example_string'


If we want to replace the data in 'column_2' with a string, you should ensure that the column's data type is compatible with string values. 
You can convert the column to an object type (which can hold strings) before making the replacement. Here's an example:


df1['column_2'] = df1['column_2'].astype(str)
df1.loc[2, 'column_2'] = 'example_string'


After making the replacement, you can find the mean of 'column_2' by using the `mean()` function:


column2_mean = df1['column_2'].mean()
print("Mean of column_2:", column2_mean)


using a string variable in a column typically implies treating the column as categorical, and calculations 
like mean might not provide meaningful results.

In [None]:
# Q6. What do you understand about the windows function in pandas and list the types of windows
# functions?

In pandas, the window functions are a powerful tool for performing various operations on a rolling or expanding window of data within a DataFrame or Series. These functions allow you to calculate aggregated values, apply transformations, and perform other operations on a subset of data defined by the window.

The types of window functions in pandas can be broadly categorized as follows:

1. Rolling Window Functions:
   - These functions operate on a fixed-size sliding window of data.
   - The window "rolls" or moves along the data in fixed-size steps.
   - Common rolling window functions include mean, sum, min, max, standard deviation, etc.
   - Example: `rolling().mean()`, `rolling().sum()`, `rolling().std()`

2. Expanding Window Functions:
   - These functions operate on an expanding window that grows with each data point.
   - The window includes all preceding data points.
   - Expanding window functions calculate cumulative statistics or aggregated values.
   - Common expanding window functions include cumulative sum, cumulative product, etc.
   - Example: `expanding().sum()`, `expanding().cumsum()`

3. Exponentially Weighted Window Functions:
   - These functions apply weights to data points in a window, with more recent data receiving higher weights.
   - The weights are typically defined exponentially, with decay rates controlling the weight given to older data.
   - Exponentially weighted window functions are useful for calculating moving averages with different levels of emphasis on recent data.
   - Example: `ewm().mean()`, `ewm().std()`

These are some of the common types of window functions available in pandas. Each type of window function has specific parameters and options to control the window size, behavior, and calculations performed. By using these window functions, you can perform various rolling or expanding window operations on your data, enabling powerful analysis and insights.

In [None]:
# Q7. Write a code to print only the current month and year at the time of answering this question.
# [Hint: Use pandas.datetime function]


To print the current month and year using the `pandas` library, you can utilize the `datetime` module. Here's an example code 
that retrieves the current month and year:


from datetime import datetime

current_date = datetime.now()
current_month = current_date.month
current_year = current_date.year

print("Current month:", current_month)
print("Current year:", current_year)


In this code, we import the necessary libraries and retrieve the current date using `datetime.now()`. Then, we extract the 
month and year from the `current_date` object using the `month` and `year` attributes, respectively. Finally, we print the current 
month and year.

Please note that the code provides the current month and year based on the system time when the code is executed.

In [None]:
# Q8. Write a Python program that takes in two dates as input (in the format YYYY-MM-DD) and
# calculates the difference between them in days, hours, and minutes using Pandas time delta. The
# program should prompt the user to enter the dates and display the result.

# Prompt the user to enter the dates
date1 = input("Enter the first date (YYYY-MM-DD): ")
date2 = input("Enter the second date (YYYY-MM-DD): ")