Q1. List any five functions of the pandas library with execution.

ANS-Sure! The pandas library is a powerful Python library for data manipulation and analysis. Here are five commonly used functions from pandas along with their execution:

 **read_csv(): This function is used to read data from a CSV file and create a DataFrame, which is a primary data structure in pandas.

import pandas as pd

# Example CSV file: data.csv
# Name,Age,Gender
# John,25,Male
# Sarah,30,Female

# Execution of read_csv()
df = pd.read_csv('data.csv')
print(df)


** head(): This function displays the first few rows of the DataFrame. By default, it shows the first 5 rows.

# Assuming df is the DataFrame created in the previous example

# Execution of head()
print(df.head())


**info(): This function provides a summary of the DataFrame, including the data types, non-null values, and memory usage.

# Execution of info()
print(df.info())


**groupby(): This function is used for grouping data based on one or more columns and then applying some aggregation function.

# Assuming df is the DataFrame created in the previous example

# Grouping by the 'Gender' column and calculating the average age for each gender
grouped_df = df.groupby('Gender')['Age'].mean()
print(grouped_df)


**fillna(): This function is used to fill missing or NaN values in the DataFrame with a specified value.

# Assuming df is the DataFrame created in the first example, but with some missing values in the 'Age' column

# Filling missing 'Age' values with the mean age of the entire column
df['Age'] = df['Age'].fillna(df['Age'].mean())
print(df)

Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.


To re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row, you can use the reset_index method from pandas. Here's a Python function that accomplishes this:


import pandas as pd

def reindex_with_increment(df):
    # Resetting the index with a new range starting from 1 and incrementing by 2
    new_index = pd.RangeIndex(start=1, stop=len(df)*2, step=2)
    df = df.reset_index(drop=True)
    df.index = new_index
    return df

# Example DataFrame with columns 'A', 'B', and 'C'
data = {
    'A': [10, 20, 30, 40],
    'B': [100, 200, 300, 400],
    'C': [1000, 2000, 3000, 4000]
}

df = pd.DataFrame(data)

# Reindexing the DataFrame using the function
df_reindexed = reindex_with_increment(df)
print(df_reindexed)
Output:

OUTPUT

    A    B     C
1  10  100  1000
3  20  200  2000
5  30  300  3000
7  40  400  4000

Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.

ANS- import pandas as pd

def calculate_sum_of_first_three(df):
    # Initialize a variable to store the sum
    total_sum = 0
    
# Iterate over the first three rows of the 'Values' column
    
    for value in df['Values'].iloc[:3]:
        total_sum += value
    
# Print the sum to the console
    print("Sum of the first three values:", total_sum)

# Example DataFrame with 'Values' column
data = {
    'Values': [10, 20, 30, 40, 50, 60]
}

df = pd.DataFrame(data)

# Call the function to calculate the sum of the first three values in the 'Values' column
calculate_sum_of_first_three(df)

OUTPUT

Sum of the first three values: 60



Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column
'Word_Count' that contains the number of words in each row of the 'Text' column.

ANS-You can create a Python function to add a new column 'Word_Count' to the DataFrame 'df', which contains the number of words in each row of the 'Text' column. To achieve this, you can use the apply() method along with a lambda function to count the words in each row of the 'Text' column. Here's how you can do it:


import pandas as pd

def add_word_count_column(df):
    # Function to count the number of words in a given text
    def count_words(text):
        words = text.split()
        return len(words)

# Applying the count_words function to each row of the 'Text' column
    
    df['Word_Count'] = df['Text'].apply(lambda x: count_words(x))
    return df

# Example DataFrame with 'Text' column
data = {
    'Text': [
        'This is a sample sentence.',
        'Hello, how are you?',
        'Python pandas is great!'
    ]
}

df = pd.DataFrame(data)

# Call the function to add the 'Word_Count' column
df_with_word_count = add_word_count_column(df)
print(df_with_word_count)


Output:

            Text                   Word_Count
0   This is a sample sentence.           5
1           Hello, how are you?           4
2       Python pandas is great!           4

Q5. How are DataFrame.size() and DataFrame.shape() different?

ANS- DataFrame.size() and DataFrame.shape() are both methods in pandas to obtain information about the DataFrame, but they serve different purposes:

DataFrame.size():  DataFrame.size() is used to get the total number of elements in the DataFrame.
It returns a single integer representing the total number of cells (rows x columns) in the DataFrame.
The size includes all elements, regardless of whether they are NaN (missing) or not.

Example:

import pandas as pd

data = {
    'A': [1, 2, 3],
    'B': [4, 5, 6]
}

df = pd.DataFrame(data)

print(df.size())  # Output: 6
In this example, the DataFrame df has 3 rows and 2 columns, so the total number of elements is 3 * 2 = 6.


DataFrame.shape(): DataFrame.shape() is used to get the dimensions of the DataFrame as a tuple (rows, columns).
It returns a tuple representing the number of rows and columns in the DataFrame.
The shape does not include any information about the content of the cells, only the structure.

Example:

import pandas as pd

data = {
    'A': [1, 2, 3],
    'B': [4, 5, 6]
}

df = pd.DataFrame(data)

print(df.shape())  # Output: (3, 2)

Q6. Which function of pandas do we use to read an excel file?

ANS- To read an Excel file into a pandas DataFrame, you can use the pd.read_excel() function from the pandas library. This function allows you to read data from Excel files and create a DataFrame, making it easy to work with tabular data in Python.

Here's the basic syntax of the pd.read_excel() function:


import pandas as pd

# Read an Excel file and create a DataFrame
df = pd.read_excel('filename.xlsx')  

Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email
addresses in the format 'username@domain.com'. Write a Python function that creates a new column
'Username' in df that contains only the username part of each email address.


ANS- To create a new column 'Username' in the DataFrame df that contains only the username part of each email address in the 'Email' column, you can use the str.split() method along with string slicing. The str.split() method will split the email address into two parts: the username and the domain. You can then extract the username part using string slicing.

Here's a Python function to accomplish this:


import pandas as pd

def extract_username(df):
    # Extracting the username from the 'Email' column
    df['Username'] = df['Email'].str.split('@').str[0]
    return df

# Example DataFrame with 'Email' column
data = {
    'Email': [
        'john.doe@example.com',
        'jane.smith@example.com',
        'user123@gmail.com'
    ]
}

df = pd.DataFrame(data)

# Call the function to create the 'Username' column
df_with_username = extract_username(df)
print(df_with_username)

Output:
                  Email   Username
0   john.doe@example.com  john.doe
1  jane.smith@example.com jane.smith
2       user123@gmail.com   user123

Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.
For example, if df contains the following values:
A B C
0 3 5 1
1 8 2 7
2 6 9 4
3 2 3 5
4 9 1 2

ANS- You can create a Python function that selects all rows from the DataFrame df where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The function can use boolean indexing to filter the rows that meet the specified conditions and return a new DataFrame containing only the selected rows.

Here's a Python function to achieve this:


import pandas as pd

def select_rows_by_conditions(df):
    # Boolean indexing to select rows where 'A' > 5 and 'B' < 10
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows

# Example DataFrame with columns 'A', 'B', and 'C'
data = {
    'A': [3, 8, 6, 2, 9],
    'B': [5, 2, 9, 3, 1],
    'C': [1, 7, 4, 5, 2]
}

df = pd.DataFrame(data)

# Call the function to select rows that meet the conditions
selected_df = select_rows_by_conditions(df)
print(selected_df)

Output:

   A  B  C
2  6  9  4