Q1. List any five functions of the pandas library with execution.


pd.DataFrame() - This function is used to create a DataFrame from various data sources like lists, dictionaries, or NumPy arrays.

In [7]:
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Claire'], 'Age': [25, 30, 27]}
df = pd.DataFrame(data)
print(df)


     Name  Age
0   Alice   25
1     Bob   30
2  Claire   27


df.head() - This function returns the first n rows of the DataFrame. By default, it returns the first 5 rows.

In [8]:
import pandas as pd

df = pd.read_csv('data.csv')
print(df.head())


FileNotFoundError: ignored

df.info() - This function provides a summary of the DataFrame, including the number of non-null values and the data types of each column.

In [None]:
import pandas as pd

df = pd.read_csv('data.csv')
print(df.info())


df.describe() - This function generates descriptive statistics of the DataFrame, such as count, mean, min, max, and quartiles, for numerical columns.

In [None]:
import pandas as pd

df = pd.read_csv('data.csv')
print(df.describe())


df.groupby() - This function is used to group the DataFrame by one or more columns and perform aggregate operations on them.

In [None]:
import pandas as pd

df = pd.read_csv('data.csv')
grouped = df.groupby('Category')['Sales'].sum()
print(grouped)


Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [6]:
import pandas as pd

def reindex_dataframe(df):
    df.index = pd.RangeIndex(start=1, stop=len(df)*2, step=2)
    return df

# Usage example
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
df_reindexed = reindex_dataframe(df)
print(df_reindexed)


   A  B  C
1  1  4  7
3  2  5  8
5  3  6  9


Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.
For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should
calculate and print the sum of the first three values, which is 60.


In [5]:
import pandas as pd

def calculate_sum(df):
    sum_values = df['Values'].head(3).sum()
    print("Sum of the first three values:", sum_values)

# Usage example
df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})
calculate_sum(df)


Sum of the first three values: 60



Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column
'Word_Count' that contains the number of words in each row of the 'Text' column.

In [4]:
import pandas as pd

def count_words(df):
    df['Word_Count'] = df['Text'].str.split().apply(len)
    return df

# Usage example
df = pd.DataFrame({'Text': ['Hello world', 'Python is awesome', 'Data science']})
df_with_word_count = count_words(df)
print(df_with_word_count)


                Text  Word_Count
0        Hello world           2
1  Python is awesome           3
2       Data science           2


Q5. How are DataFrame.size() and DataFrame.shape() different?

DataFrame.size() returns the total number of elements in the DataFrame, which is equal to the product of the number of rows and the number of columns.
DataFrame.shape() returns a tuple representing the dimensions of the DataFrame, where the first element is the number of rows and the second element is the number of columns.

In [3]:
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

print(df.size)        # Output: 9 (3 rows * 3 columns = 9 elements)
print(df.shape)       # Output: (3, 3) (3 rows, 3 columns)


9
(3, 3)


Q6. Which function of pandas do we use to read an excel file?

In [2]:
import pandas as pd

df = pd.read_excel('data.xlsx')
print(df)


FileNotFoundError: ignored

Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email
addresses in the format 'username@domain.com'. Write a Python function that creates a new column
'Username' in df that contains only the username part of each email address.
The username is the part of the email address that appears before the '@' symbol. For example, if the
email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your
function should extract the username from each email address and store it in the new 'Username'
column.

In [1]:
import pandas as pd

def extract_username(df):
    df['Username'] = df['Email'].str.split('@').str[0]
    return df

# Usage example
df = pd.DataFrame({'Email': ['john.doe@example.com', 'jane.smith@example.com']})
df_with_username = extract_username(df)
print(df_with_username)


                    Email    Username
0    john.doe@example.com    john.doe
1  jane.smith@example.com  jane.smith


Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.
For example, if df contains the following values:

A B C

0 3 5 1

1 8 2 7

2 6 9 4

3 2 3 5

4 9 1 2

In [1]:
import pandas as pd

def select_rows(df):
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows

# Usage example
df = pd.DataFrame({'A': [3, 8, 6, 2, 9], 'B': [5, 2, 9, 3, 1], 'C': [1, 7, 4, 5, 2]})
selected_df = select_rows(df)
print(selected_df)


   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2
