Q1. Here are five common functions of the Pandas library with examples of how to execute them:
read_csv(): Used to read data from a CSV file into a Pandas DataFrame.
head(): Used to display the first few rows of a Pandas DataFrame.
groupby(): Used to group data in a Pandas DataFrame by one or more columns, and then apply a function to the groups.
merge(): Used to combine two or more Pandas DataFrames based on a common column.
to_csv(): Used to write a Pandas DataFrame to a CSV file.

In [None]:
import pandas as pd
df = pd.read_csv('my_data.csv')

print(df.head())

group = df.groupby('Category')
group.mean()

df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'value': [4, 5, 6]})
merged = pd.merge(df1, df2, on='key')

df.to_csv('my_output.csv', index=False)


Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [None]:
import pandas as pd

def reindex_df(df):
    # Create a new index starting from 1 and incrementing by 2 for each row
    new_index = pd.RangeIndex(start=1, stop=2*len(df), step=2)
    
    # Set the new index for the DataFrame
    df = df.set_index(new_index)
    
    return df

Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.

In [1]:
import pandas as pd

def sum_first_three(df):
    # Get the first three values of the 'Values' column
    values = df['Values'][:3]
    
    # Calculate the sum of the first three values
    total = sum(values)
    
    # Print the sum to the console
    print(f"The sum of the first three values is {total}")
    
# Create the DataFrame with the 'Values' column
df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})

# Call the function to calculate the sum of the first three values
sum_first_three(df)



The sum of the first three values is 60


Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column
'Word_Count' that contains the number of words in each row of the 'Text' column.

In [None]:
import pandas as pd

def add_word_count(df):
    # Split the 'Text' column into words and count the number of words in each row
    word_counts = df['Text'].str.split().str.len()
    
    # Add the 'Word_Count' column to the DataFrame
    df['Word_Count'] = word_counts
    
    # Return the updated DataFrame
    return df


Q5. How are DataFrame.size() and DataFrame.shape() different?
DataFrame.size() and DataFrame.shape() are both methods in Pandas that return information about the shape of a DataFrame, but they differ in what information they return.
DataFrame.size() returns the total number of elements in the DataFrame, which is equal to the product of the number of rows and columns in the DataFrame.
On the other hand, DataFrame.shape() returns a tuple containing the number of rows and columns in the DataFrame, respectively. So the shape method returns the dimensions of the DataFrame.

Q6. Which function of pandas do we use to read an excel file?
To read an Excel file in Pandas, you can use the read_excel() function. This function is part of the pandas library and allows you to read Excel files into a DataFrame object.

In [None]:
import pandas as pd

# Read an Excel file into a DataFrame
df = pd.read_excel('filename.xlsx')

# Print the DataFrame
print(df)


Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email
addresses in the format 'username@domain.com'. Write a Python function that creates a new column
'Username' in df that contains only the username part of each email address.

In [2]:
import pandas as pd

# Create a DataFrame with an 'Email' column
df = pd.DataFrame({'Email': ['john.doe@example.com', 'jane.smith@example.com', 'bob.johnson@example.com']})

# Extract the username from the 'Email' column using the 'str' accessor and 'split()' method
df['Username'] = df['Email'].str.split('@').str[0]

# Print the DataFrame
print(df)


                     Email     Username
0     john.doe@example.com     john.doe
1   jane.smith@example.com   jane.smith
2  bob.johnson@example.com  bob.johnson


Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.
For example, if df contains the following values:
A B C
0 3 5 1
1 8 2 7
2 6 9 4
3 2 3 5
4 9 1 2

In [6]:
import pandas as pd

def select_rows(df):
    # Select rows where column 'A' is greater than 5 and column 'B' is less than 10
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    # Return the new DataFrame with only the selected rows
    return selected_rows

# create the DataFrame
df = pd.DataFrame({'A': [3, 8, 6, 2, 9],
                   'B': [5, 2, 9, 3, 1],
                   'C': [1, 7, 4, 5, 2]})

# call the function to select rows
selected_df = select_rows(df)

# print the selected rows
print(selected_df)


   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean,
median, and standard deviation of the values in the 'Values' column.

In [None]:
import pandas as pd

def calculate_statistics(df):
    # Calculate mean, median, and standard deviation of 'Values' column
    mean = df['Values'].mean()
    median = df['Values'].median()
    std = df['Values'].std()
    # Return a dictionary with the calculated statistics
    return {'mean': mean, 'median': median, 'std': std}

statistics = calculate_statistics(df)
print("Mean: ", statistics['mean'])
print("Median: ", statistics['median'])
print("Standard Deviation: ", statistics['std'])


Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to
create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days
for each row in the DataFrame. The moving average should be calculated using a window of size 7 and
should include the current day.

In [8]:
import pandas as pd

def add_moving_average(df):
    # Create a new column 'MovingAverage' that contains the moving average of 'Sales'
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    # Return the modified DataFrame
    return df
df = pd.DataFrame({'Date': pd.date_range('2022-01-01', '2022-01-10'),
                   'Sales': [10, 15, 20, 25, 30, 35, 40, 45, 50, 55]})
df_with_ma = add_moving_average(df)
print(df_with_ma['MovingAverage'])


0    10.0
1    12.5
2    15.0
3    17.5
4    20.0
5    22.5
6    25.0
7    30.0
8    35.0
9    40.0
Name: MovingAverage, dtype: float64


Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new
column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.
Monday, Tuesday) corresponding to each date in the 'Date' column.
For example, if df contains the following values:
Date
0 2023-01-01
1 2023-01-02
2 2023-01-03
3 2023-01-04
4 2023-01-05
Your function should create the following DataFrame:

Date Weekday
0 2023-01-01 Sunday
1 2023-01-02 Monday
2 2023-01-03 Tuesday
3 2023-01-04 Wednesday
4 2023-01-05 Thursday
The function should return the modified DataFrame.

In [9]:
import pandas as pd

def add_weekday(df):
    # Convert the 'Date' column to a datetime datatype if it is not already
    df['Date'] = pd.to_datetime(df['Date'])
    # Create a new column 'Weekday' that contains the weekday name
    df['Weekday'] = df['Date'].dt.day_name()
    # Return the modified DataFrame
    return df
df = pd.DataFrame({'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']})
df_with_weekday = add_weekday(df)
print(df_with_weekday['Weekday'])


0       Sunday
1       Monday
2      Tuesday
3    Wednesday
4     Thursday
Name: Weekday, dtype: object


Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python
function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [10]:
import pandas as pd

def select_date_range(df):
    # Convert the 'Date' column to a datetime datatype if it is not already
    df['Date'] = pd.to_datetime(df['Date'])
    # Select all rows where the date is between '2023-01-01' and '2023-01-31'
    mask = (df['Date'] >= '2023-01-01') & (df['Date'] <= '2023-01-31')
    selected_rows = df.loc[mask]
    # Return the selected rows DataFrame
    return selected_rows
df = pd.DataFrame({'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']})
selected_rows = select_date_range(df)
print(selected_rows)


        Date
0 2023-01-01
1 2023-01-02
2 2023-01-03
3 2023-01-04
4 2023-01-05


Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to
be imported?

In [None]:
The first and foremost necessary library that needs to be imported to use the basic functions of Pandas is pandas itself.

You can import the Pandas library with the following statement:

import pandas as pd