Q1. List any five functions of the pandas library with execution.
Here are five functions of the pandas library with examples:

pd.read_csv(): Reads a CSV file into a DataFrame.
df.head(): Returns the first n rows of the DataFrame.
df.describe(): Generates descriptive statistics of the DataFrame.
df.groupby(): Groups the DataFrame using a mapper or by a Series of columns.
df.merge(): Merges DataFrame objects by performing a database-style join operation.

Q2. Re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [2]:
import pandas as pd
def reindex_dataframe(df):
    df.index = range(1, 2 * len(df) + 1, 2)
    return df

# Example DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
df = reindex_dataframe(df)
print(df)


   A  B  C
1  1  4  7
3  2  5  8
5  3  6  9


Q3. Calculate the sum of the first three values in the 'Values' column.


In [3]:
def sum_first_three(df):
    result = df['Values'][:3].sum()
    print("Sum of the first three values:", result)

# Example DataFrame
df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})
sum_first_three(df)


Sum of the first three values: 60


Q4. Create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.

In [4]:
def add_word_count_column(df):
    df['Word_Count'] = df['Text'].apply(lambda x: len(str(x).split()))
    return df

# Example DataFrame
df = pd.DataFrame({'Text': ['Hello world', 'This is a test', 'Pandas is great']})
df = add_word_count_column(df)
print(df)


              Text  Word_Count
0      Hello world           2
1   This is a test           4
2  Pandas is great           3


Q5. Difference between DataFrame.size() and DataFrame.shape().

DataFrame.size: Returns the number of elements in the DataFrame (rows * columns).

DataFrame.shape: Returns a tuple representing the dimensionality of the DataFrame (rows, columns).

Q6. Which function of pandas is used to read an Excel file?

The function used to read an Excel file in pandas is pd.read_excel().

Q7. Create a new column 'Username' from the 'Email' column.

In [5]:
def extract_username(df):
    df['Username'] = df['Email'].apply(lambda x: x.split('@')[0])
    return df

# Example DataFrame
df = pd.DataFrame({'Email': ['john.doe@example.com', 'jane.doe@example.com']})
df = extract_username(df)
print(df)


                  Email  Username
0  john.doe@example.com  john.doe
1  jane.doe@example.com  jane.doe


Q8. Select rows where 'A' > 5 and 'B' < 10.

In [6]:
def filter_dataframe(df):
    return df[(df['A'] > 5) & (df['B'] < 10)]

# Example DataFrame
df = pd.DataFrame({'A': [3, 8, 6, 2, 9], 'B': [5, 2, 9, 3, 1], 'C': [1, 7, 4, 5, 2]})
filtered_df = filter_dataframe(df)
print(filtered_df)


   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


Q9. Calculate the mean, median, and standard deviation of the 'Values' column.

In [7]:
def calculate_statistics(df):
    mean_value = df['Values'].mean()
    median_value = df['Values'].median()
    std_deviation = df['Values'].std()
    return mean_value, median_value, std_deviation

# Example DataFrame
df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})
mean, median, std = calculate_statistics(df)
print("Mean:", mean)
print("Median:", median)
print("Standard Deviation:", std)


Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896


Q10. Create a 'MovingAverage' column with a window of size 7.

In [8]:
def add_moving_average(df):
    df['MovingAverage'] = df['Sales'].rolling(window=7).mean()
    return df

# Example DataFrame
df = pd.DataFrame({
    'Date': pd.date_range(start='2023-01-01', periods=10),
    'Sales': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
})
df = add_moving_average(df)
print(df)


        Date  Sales  MovingAverage
0 2023-01-01      1            NaN
1 2023-01-02      2            NaN
2 2023-01-03      3            NaN
3 2023-01-04      4            NaN
4 2023-01-05      5            NaN
5 2023-01-06      6            NaN
6 2023-01-07      7            4.0
7 2023-01-08      8            5.0
8 2023-01-09      9            6.0
9 2023-01-10     10            7.0


Q11  Create a 'Weekday' column from the 'Date' column.

In [10]:
def add_weekday_column(df):
    df['Weekday'] = df['Date'].dt.day_name()
    return df

# Example DataFrame
df = pd.DataFrame({
    'Date': pd.date_range(start='2023-01-01', periods=5)
})
df = add_weekday_column(df)
print(df)


        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


Q12. Select rows where the date is between '2023-01-01' and '2023-01-31'.

In [11]:
def select_date_range(df):
    start_date = '2023-01-01'
    end_date = '2023-01-31'
    mask = (df['Date'] >= start_date) & (df['Date'] <= end_date)
    return df.loc[mask]

# Example DataFrame
df = pd.DataFrame({
    'Date': pd.date_range(start='2023-01-01', periods=40)
})
filtered_df = select_date_range(df)
print(filtered_df)


         Date
0  2023-01-01
1  2023-01-02
2  2023-01-03
3  2023-01-04
4  2023-01-05
5  2023-01-06
6  2023-01-07
7  2023-01-08
8  2023-01-09
9  2023-01-10
10 2023-01-11
11 2023-01-12
12 2023-01-13
13 2023-01-14
14 2023-01-15
15 2023-01-16
16 2023-01-17
17 2023-01-18
18 2023-01-19
19 2023-01-20
20 2023-01-21
21 2023-01-22
22 2023-01-23
23 2023-01-24
24 2023-01-25
25 2023-01-26
26 2023-01-27
27 2023-01-28
28 2023-01-29
29 2023-01-30
30 2023-01-31


Q13. The first and foremost necessary library to import for using pandas.