Q1. List any five functions of the pandas library with execution.
Ans:
1. read_csv(): Reads a CSV file into a DataFrame.

2. head(): Displays the first few rows of a DataFrame (default is 5 rows).

3. info(): Provides a concise summary of the DataFrame including column data types and non-null counts.

4. describe(): Generates descriptive statistics of numerical columns in the DataFrame.

5. groupby(): Groups the DataFrame by a specified column and allows performing operations on these groups.

In [7]:
import pandas as pd

# Reading a CSV file into a DataFrame
data = pd.read_csv('Pandas_Example.csv', index_col=0)

# Displaying the first 5 rows of the DataFrame
print("Example of head() function:")
print(data.head())
print()

# Getting information about the DataFrame
print("Example of info() function:")
print(data.info())
print()

# Generating descriptive statistics of the DataFrame
print("Example of describe() function:")
print(data.describe())
print()

# Grouping the DataFrame by a column and calculating mean for each group
print("Example of groupby() function:")
grouped_data = data.groupby('Apples').mean()
print(grouped_data)
print()

Example of head() function:
        Apples  Oranges
June         3        0
Robert       2        3
Lily         0        7
David        1        2

Example of info() function:
<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, June to David
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype
---  ------   --------------  -----
 0   Apples   4 non-null      int64
 1   Oranges  4 non-null      int64
dtypes: int64(2)
memory usage: 96.0+ bytes
None

Example of describe() function:
         Apples  Oranges
count  4.000000  4.00000
mean   1.500000  3.00000
std    1.290994  2.94392
min    0.000000  0.00000
25%    0.750000  1.50000
50%    1.500000  2.50000
75%    2.250000  4.00000
max    3.000000  7.00000

Example of groupby() function:
        Oranges
Apples         
0           7.0
1           2.0
2           3.0
3           0.0



Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [8]:
# Creating a DataFrame
data = {
    'A': [10, 20, 30],
    'B': [40, 50, 60],
    'C': [70, 80, 90]
}

df = pd.DataFrame(data)
print(df)

    A   B   C
0  10  40  70
1  20  50  80
2  30  60  90


In [9]:
def reindex_with_increment(df):

    # Creating a new index starting from 1 and incrementing by 2 for each row
    new_index = pd.Series(range(1, len(df) * 2, 2))

    # Assigning the new index to the DataFrame
    df_reindexed = df.set_index(new_index)

    return df_reindexed

df_reindexed = reindex_with_increment(df)
print(df_reindexed)

    A   B   C
1  10  40  70
3  20  50  80
5  30  60  90


Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.

In [14]:
# Example DataFrame
data = {
    'Values': [10, 20, 30, 40, 50]
}

df = pd.DataFrame(data)
print(df)

   Values
0      10
1      20
2      30
3      40
4      50


In [15]:
def calculate_sum_first_three(df):

    # Accessing the 'Values' column and calculating the sum of the first three values
    sum_first_three = df['Values'].iloc[:3].sum()
    print("Sum of the first three values:", sum_first_three)

# Calling the function with the DataFrame
sum = calculate_sum_first_three(df)

Sum of the first three values: 60


Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column
'Word_Count' that contains the number of words in each row of the 'Text' column.

In [16]:
# Example DataFrame
data = {
    'Text': [
        'This is a sample sentence.',
        'Another sentence here.',
        'A short text.'
    ]
}

df = pd.DataFrame(data)
print(df)

                         Text
0  This is a sample sentence.
1      Another sentence here.
2               A short text.


In [17]:
def count_words(df):

    # Counting the number of words in each row of the 'Text' column
    df['Word_Count'] = df['Text'].apply(lambda x: len(str(x).split()))
    return df

# Calling the function with the DataFrame
df = count_words(df)
print(df)

                         Text  Word_Count
0  This is a sample sentence.           5
1      Another sentence here.           3
2               A short text.           3


Q5. How are DataFrame.size() and DataFrame.shape() different?

Ans:
DataFrame.size: This attribute returns the total number of elements in the DataFrame, which is calculated by multiplying the number of rows by the number of columns. It gives the total count of cells or values present in the DataFrame.

DataFrame.shape: This attribute returns a tuple representing the dimensions of the DataFrame. It provides information about the number of rows and columns in the DataFrame as (rows, columns).

In [20]:
## Example
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
print(df)
print()

# Getting the size of the DataFrame
size = df.size
print("Size of DataFrame:", size)
print()

# Getting the shape of the DataFrame
shape = df.shape
print("Shape of DataFrame:", shape)

   A  B
0  1  4
1  2  5
2  3  6

Size of DataFrame: 6

Shape of DataFrame: (3, 2)


Q6. Which function of pandas do we use to read an excel file?

In [23]:
## In Pandas, the function used to read an Excel file is pd.read_excel().

# Reading an Excel file into a DataFrame
df = pd.read_excel('Pandas_Example.xlsx', index_col=0)
print(df)

        Apples  Oranges
June         3        0
Robert       2        3
Lily         0        7
David        1        2


Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email
addresses in the format 'username@domain.com'. Write a Python function that creates a new column
'Username' in df that contains only the username part of each email address.

The username is the part of the email address that appears before the '@' symbol. For example, if the
email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your
function should extract the username from each email address and store it in the new 'Username'
column.

In [25]:
data = {
    'Email': [
        'abc.123@example.com',
        'xyz.456@example.com',
        'pqr.789@example.com'
    ]
}

df = pd.DataFrame(data)
print(df)

                 Email
0  abc.123@example.com
1  xyz.456@example.com
2  pqr.789@example.com


In [26]:
def extract_username(email):

    # Splitting the email address by '@' and extracting the username
    return email.split('@')[0]

def add_username_column(df):

    # Creating a new 'Username' column by applying the extract_username function to 'Email' column
    df['Username'] = df['Email'].apply(extract_username)
    return df

# Calling the function with the DataFrame
df = add_username_column(df)
print(df)

                 Email Username
0  abc.123@example.com  abc.123
1  xyz.456@example.com  xyz.456
2  pqr.789@example.com  pqr.789


Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.

For example, if df contains the following values:

A B C

0 3 5 1

1 8 2 7

2 6 9 4

3 2 3 5

4 9 1 2

Your function should select the following rows: A B C

1 8 2 7

4 9 1 2

The function should return a new DataFrame that contains only the selected rows.

In [27]:
data = {
    'A': [3, 8, 6, 2, 9],
    'B': [5, 2, 9, 3, 1],
    'C': [1, 7, 4, 5, 2]
}

df = pd.DataFrame(data)
print(df)

   A  B  C
0  3  5  1
1  8  2  7
2  6  9  4
3  2  3  5
4  9  1  2


In [28]:
def select_rows(df):

    # Selecting rows where 'A' > 5 and 'B' < 10
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows

# Calling the function with the DataFrame
selected_df = select_rows(df)
print(selected_df)

   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean,
median, and standard deviation of the values in the 'Values' column.

In [29]:
data = {
    'Values': [10, 20, 30, 40, 50]
}

df = pd.DataFrame(data)
print(df)
print()

def calculate_statistics(df):

    # Calculating mean, median, and standard deviation of 'Values' column
    mean_value = df['Values'].mean()
    median_value = df['Values'].median()
    std_deviation = df['Values'].std()

    return mean_value, median_value, std_deviation

# Calling the function with the DataFrame
mean, median, std = calculate_statistics(df)
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Standard Deviation: {std}")

   Values
0      10
1      20
2      30
3      40
4      50

Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896


Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to
create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days
for each row in the DataFrame. The moving average should be calculated using a window of size 7 and
should include the current day.

In [30]:
data = {
    'Date': pd.date_range('2023-01-01', periods=20),  # Sample date range
    'Sales': [30, 40, 25, 35, 45, 55, 60, 70, 65, 80, 75, 90, 85, 95, 100, 110, 105, 120, 125, 115]  # Sample sales data
}

df = pd.DataFrame(data)
print(df)

         Date  Sales
0  2023-01-01     30
1  2023-01-02     40
2  2023-01-03     25
3  2023-01-04     35
4  2023-01-05     45
5  2023-01-06     55
6  2023-01-07     60
7  2023-01-08     70
8  2023-01-09     65
9  2023-01-10     80
10 2023-01-11     75
11 2023-01-12     90
12 2023-01-13     85
13 2023-01-14     95
14 2023-01-15    100
15 2023-01-16    110
16 2023-01-17    105
17 2023-01-18    120
18 2023-01-19    125
19 2023-01-20    115


In [31]:
def calculate_moving_average(df):

    # Sorting DataFrame by 'Date' column
    df = df.sort_values(by='Date')

    # Calculating moving average using a window of size 7
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()

    return df

# Calling the function with the DataFrame
df = calculate_moving_average(df)
print(df)

         Date  Sales  MovingAverage
0  2023-01-01     30      30.000000
1  2023-01-02     40      35.000000
2  2023-01-03     25      31.666667
3  2023-01-04     35      32.500000
4  2023-01-05     45      35.000000
5  2023-01-06     55      38.333333
6  2023-01-07     60      41.428571
7  2023-01-08     70      47.142857
8  2023-01-09     65      50.714286
9  2023-01-10     80      58.571429
10 2023-01-11     75      64.285714
11 2023-01-12     90      70.714286
12 2023-01-13     85      75.000000
13 2023-01-14     95      80.000000
14 2023-01-15    100      84.285714
15 2023-01-16    110      90.714286
16 2023-01-17    105      94.285714
17 2023-01-18    120     100.714286
18 2023-01-19    125     105.714286
19 2023-01-20    115     110.000000


Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new
column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.
Monday, Tuesday) corresponding to each date in the 'Date' column.

For example, if df contains the following values:
Date

0 2023-01-01

1 2023-01-02

2 2023-01-03

3 2023-01-04

4 2023-01-05

Your function should create the following DataFrame:

Date Weekday

0 2023-01-01 Sunday

1 2023-01-02 Monday

2 2023-01-03 Tuesday

3 2023-01-04 Wednesday

4 2023-01-05 Thursday

The function should return the modified DataFrame.

In [32]:
data = {
    'Date': pd.date_range('2023-01-01', periods=5)  # Sample date range
}

df = pd.DataFrame(data)
print(df)

        Date
0 2023-01-01
1 2023-01-02
2 2023-01-03
3 2023-01-04
4 2023-01-05


In [33]:
def add_weekday_column(df):

    # Adding a new 'Weekday' column containing the weekday names
    df['Weekday'] = df['Date'].dt.day_name()
    return df

# Calling the function with the DataFrame
df = add_weekday_column(df)
print(df)

        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python
function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [34]:
data = {
    'Date': pd.date_range('2023-01-01', periods=50)  # Sample date range
}

df = pd.DataFrame(data)
print(df)

         Date
0  2023-01-01
1  2023-01-02
2  2023-01-03
3  2023-01-04
4  2023-01-05
5  2023-01-06
6  2023-01-07
7  2023-01-08
8  2023-01-09
9  2023-01-10
10 2023-01-11
11 2023-01-12
12 2023-01-13
13 2023-01-14
14 2023-01-15
15 2023-01-16
16 2023-01-17
17 2023-01-18
18 2023-01-19
19 2023-01-20
20 2023-01-21
21 2023-01-22
22 2023-01-23
23 2023-01-24
24 2023-01-25
25 2023-01-26
26 2023-01-27
27 2023-01-28
28 2023-01-29
29 2023-01-30
30 2023-01-31
31 2023-02-01
32 2023-02-02
33 2023-02-03
34 2023-02-04
35 2023-02-05
36 2023-02-06
37 2023-02-07
38 2023-02-08
39 2023-02-09
40 2023-02-10
41 2023-02-11
42 2023-02-12
43 2023-02-13
44 2023-02-14
45 2023-02-15
46 2023-02-16
47 2023-02-17
48 2023-02-18
49 2023-02-19


In [35]:
def select_date_range(df):

    # Converting string dates to datetime objects for comparison
    start_date = pd.to_datetime('2023-01-01')
    end_date = pd.to_datetime('2023-01-31')

    # Selecting rows where 'Date' is between '2023-01-01' and '2023-01-31'
    selected_rows = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]
    return selected_rows

# Calling the function with the DataFrame
selected_df = select_date_range(df)
print(selected_df)

         Date
0  2023-01-01
1  2023-01-02
2  2023-01-03
3  2023-01-04
4  2023-01-05
5  2023-01-06
6  2023-01-07
7  2023-01-08
8  2023-01-09
9  2023-01-10
10 2023-01-11
11 2023-01-12
12 2023-01-13
13 2023-01-14
14 2023-01-15
15 2023-01-16
16 2023-01-17
17 2023-01-18
18 2023-01-19
19 2023-01-20
20 2023-01-21
21 2023-01-22
22 2023-01-23
23 2023-01-24
24 2023-01-25
25 2023-01-26
26 2023-01-27
27 2023-01-28
28 2023-01-29
29 2023-01-30
30 2023-01-31


Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to
be imported?

In [36]:
## The primary library required to use the basic functions of Pandas is pandas itself.
import pandas as pd