Q1. List any five functions of the pandas library with execution.
Here are five functions of the pandas library along with their execution:

read_csv(): This function is used to read a CSV file and create a DataFrame from it. It takes the file path as input and returns a DataFrame. For example:

In [None]:
import pandas as pd

df = pd.read_csv('example.csv')
print(df.head())


groupby(): This function is used to group the rows of a DataFrame based on a given column or columns. It returns a groupby object, which can be further used to perform aggregation operations like mean, sum, count, etc. For example:

In [None]:
import pandas as pd

df = pd.read_csv('example.csv')
grouped = df.groupby('Category')
print(grouped['Sales'].sum())


dropna(): This function is used to remove missing or null values from a DataFrame. It takes several parameters such as axis, how, thresh, subset, etc. For example

In [None]:
import pandas as pd

df = pd.read_csv('example.csv')
df = df.dropna(axis=0, subset=['Sales'])
print(df.head())


apply(): This function is used to apply a given function to each row or column of a DataFrame. It takes the function as input and returns a new DataFrame. For example:

In [None]:
import pandas as pd

df = pd.read_csv('example.csv')
df['Sales'] = df['Sales'].apply(lambda x: x * 2)
print(df.head())


merge(): This function is used to merge two DataFrames based on a common column or columns. It takes the DataFrames as input and returns a new DataFrame. For example:

In [None]:
import pandas as pd

df1 = pd.read_csv('example1.csv')
df2 = pd.read_csv('example2.csv')
merged_df = pd.merge(df1, df2, on='ID')
print(merged_df.head())


Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.


In [None]:
import pandas as pd

def reindex_dataframe(df):
    # Get the number of rows in the DataFrame
    num_rows = df.shape[0]

    # Create a new index that starts from 1 and increments by 2 for each row
    new_index = range(1, num_rows * 2, 2)

    # Set the new index for the DataFrame
    df.set_index(pd.Index(new_index), inplace=True)

    # Return the updated DataFrame
    return df


In [None]:
new_df = reindex_dataframe(df)


Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.
For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should
calculate and print the sum of the first three values, which is 60.

In [None]:
import pandas as pd

def sum_first_three_values(df):
  
    total_sum = 0

    
    for value in df['Values'][:3]:
       
        total_sum += value

   
    print("Total sum of first three values:", total_sum)


In [None]:
df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})
sum_first_three_values(df)


Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column
'Word_Count' that contains the number of words in each row of the 'Text' column.


In [None]:
import pandas as pd

def add_word_count_column(df):
    
    df['Word_Count'] = df['Text'].apply(lambda x: len(x.split()))

    
    return df


In [None]:
df = pd.DataFrame({'Text': ['This is a sample sentence', 'Here is another sentence']})
df = add_word_count_column(df)


Q5. How are DataFrame.size() and DataFrame.shape() different?
DataFrame.size and DataFrame.shape are both attributes of a Pandas DataFrame that provide information about the size of the DataFrame, but they have different meanings:

DataFrame.size returns the total number of elements in the DataFrame, which is equal to the product of the number of rows and the number of columns. In other words, DataFrame.size is equal to DataFrame.shape[0] * DataFrame.shape[1].

DataFrame.shape returns a tuple containing the number of rows and columns in the DataFrame. The first element of the tuple is the number of rows, and the second element is the number of columns.

Q6. Which function of pandas do we use to read an excel file?
he function we use to read an Excel file in Pandas is read_excel(). It is a part of the Pandas IO tools, and it reads data from an Excel file into a DataFrame.

Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email
addresses in the format 'username@domain.com'. Write a Python function that creates a new column
'Username' in df that contains only the username part of each email address.
The username is the part of the email address that appears before the '@' symbol. For example, if the
email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your
function should extract the username from each email address and store it in the new 'Username'
column.

In [None]:
import pandas as pd

# Sample data
df = pd.DataFrame({'Email': ['john.doe@example.com', 'jane.doe@example.com', 'bob.smith@example.com']})

# Function to extract username from email address
def extract_username(email):
    return email.split('@')[0]

# Apply the function to the 'Email' column and create a new 'Username' column
df['Username'] = df['Email'].apply(extract_username)

# Print the result
print(df)


Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.
For example, if df contains the following values:
A B C
0 3 5 1
1 8 2 7
2 6 9 4
3 2 3 5
4 9 1 2
Your function should select the following rows: A B C
1 8 2 7
4 9 1 2
The function should return a new DataFrame that contains only the selected rows.

In [None]:
import pandas as pd

def select_rows(df):
    return df[(df['A'] > 5) & (df['B'] < 10)]

df = pd.DataFrame({'A': [3, 8, 6, 2, 9],
                   'B': [5, 2, 9, 3, 1],
                   'C': [1, 7, 4, 5, 2]})

selected_rows = select_rows(df)


print(selected_rows)


Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean,
median, and standard deviation of the values in the 'Values' column.

In [None]:
def calculate_stats(df):
    mean = df['Values'].mean()
    median = df['Values'].median()
    std_dev = df['Values'].std()
    print("Mean:", mean)
    print("Median:", median)
    print("Standard deviation:", std_dev)


In [None]:
import pandas as pd

data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

calculate_stats(df)


Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to
create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days
for each row in the DataFrame. The moving average should be calculated using a window of size 7 and
should include the current day.

In [None]:
import pandas as pd

def add_moving_average(df):
    window_size = 7
    df['MovingAverage'] = df['Sales'].rolling(window_size, min_periods=1).mean()
    return df


In [None]:

data = {'Sales': [10, 12, 15, 18, 20, 17, 14, 13, 11, 9],
        'Date': pd.date_range('2022-01-01', periods=10, freq='D')}
df = pd.DataFrame(data)


df = add_moving_average(df)

# print the result
print(df)


Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new
column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.
Monday, Tuesday) corresponding to each date in the 'Date' column.
For example, if df contains the following values:
Date
0 2023-01-01
1 2023-01-02
2 2023-01-03
3 2023-01-04
4 2023-01-05
Your function should create the following DataFrame:

Date Weekday
0 2023-01-01 Sunday
1 2023-01-02 Monday
2 2023-01-03 Tuesday
3 2023-01-04 Wednesday
4 2023-01-05 Thursday
The function should return the modified DataFrame.

In [None]:
import pandas as pd

def add_weekday_column(df):
    df['Weekday'] = df['Date'].dt.day_name()
    return df


In [None]:
df = pd.DataFrame({'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']})
df['Date'] = pd.to_datetime(df['Date']) # convert the 'Date' column to datetime
df = add_weekday_column(df)
print(df)


Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python
function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [None]:
import pandas as pd

def select_rows_between_dates(df):
    
    df['Date'] = pd.to_datetime(df['Date'])
    
    selected_rows = df[(df['Date'] >= '2023-01-01') & (df['Date'] <= '2023-01-31')]
    return selected_rows


Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to
be imported?
The first and foremost necessary library that needs to be imported to use the basic functions of pandas is pandas itself. It can be imported with the following code:

In [None]:
import pandas as pd
