In [None]:
Answer 1:
    
Five functions from the Pandas library along with their execution:

1. read_csv(): This function is used to read data from a CSV file into a DataFrame.

import pandas as pd

# Read data from a CSV file into a DataFrame
df = pd.read_csv('data.csv')
print(df.head())  # Print the first few rows of the DataFrame

2. head(): This function returns the first n rows of a DataFrame. By default, it returns the first 5 rows.

import pandas as pd

# Creating a DataFrame
data = {'A': [1, 2, 3, 4, 5],
        'B': ['a', 'b', 'c', 'd', 'e']}
df = pd.DataFrame(data)

# Print the first 3 rows of the DataFrame
print(df.head(3))

3. groupby(): This function is used to split the data into groups based on some criteria and apply a function to each group independently.

import pandas as pd

# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
        'City': ['NY', 'LA', 'NY', 'LA', 'NY'],
        'Salary': [50000, 60000, 55000, 62000, 53000]}
df = pd.DataFrame(data)

# Grouping the data by 'City' and calculating the mean salary for each city
grouped_data = df.groupby('City')['Salary'].mean()
print(grouped_data)

4. dropna(): This function is used to remove rows or columns from a DataFrame that contain missing values (NaN).

import pandas as pd
import numpy as np

# Creating a DataFrame with missing values
data = {'A': [1, 2, np.nan, 4],
        'B': [5, np.nan, np.nan, 8],
        'C': [np.nan, np.nan, np.nan, np.nan]}
df = pd.DataFrame(data)

# Dropping rows with missing values
cleaned_df = df.dropna()
print(cleaned_df)

5. concat(): This function is used to concatenate two or more DataFrames along rows or columns.

import pandas as pd

# Creating two DataFrames
df1 = pd.DataFrame({'A': [1, 2, 3],
                    'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [7, 8, 9],
                    'B': [10, 11, 12]})

# Concatenating the DataFrames along rows
result = pd.concat([df1, df2])
print(result)



Answer 2:


We can achieve this by setting the index of the DataFrame to a custom index that starts from 1 and increments by 2 for each row. Here's a Python function that does that:

import pandas as pd

def reindex_dataframe(df):
    # Create a new index starting from 1 and incrementing by 2
    new_index = range(1, len(df) * 2, 2)

    # Set the new index to the DataFrame
    df.index = new_index

    return df

# Example usage:
# Assuming df is your DataFrame with columns 'A', 'B', and 'C'
data = {'A': [10, 20, 30],
        'B': [40, 50, 60],
        'C': [70, 80, 90]}
df = pd.DataFrame(data)

# Call the function to re-index the DataFrame
new_df = reindex_dataframe(df)

print(new_df)



Answer 3:
    

Here's a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column, then prints the sum to the console:

import pandas as pd

def calculate_sum_first_three(df):
    # Check if the DataFrame is empty or if the 'Values' column is absent
    if df.empty or 'Values' not in df.columns:
        print("DataFrame is empty or 'Values' column is absent.")
        return

    # Get the 'Values' column from the DataFrame
    values_column = df['Values']

    # Calculate the sum of the first three values
    sum_first_three = values_column.iloc[:3].sum()

    # Print the sum to the console
    print("Sum of the first three values:", sum_first_three)

# Example usage:
# Assuming df is your DataFrame with the 'Values' column
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Call the function to calculate the sum of the first three values
calculate_sum_first_three(df)




Answer 4:
    

You can accomplish this by defining a Python function that calculates the word count for each row in the 'Text' column and then adds a new column 'Word_Count' to the DataFrame. Here's how you can do it:

import pandas as pd

def add_word_count_column(df):
    # Split each row in the 'Text' column into words and count the number of words
    word_counts = df['Text'].apply(lambda x: len(x.split()))

    # Add a new column 'Word_Count' to the DataFrame with the word counts
    df['Word_Count'] = word_counts

    return df

# Example usage:
# Assuming df is your DataFrame with the 'Text' column
data = {'Text': ['This is a sample text', 'Python is awesome', 'Data Science is interesting']}
df = pd.DataFrame(data)

# Call the function to add the 'Word_Count' column
df = add_word_count_column(df)

# Print the updated DataFrame
print(df)



Answer 5:

DataFrame.size and DataFrame.shape are both attributes of a Pandas DataFrame, but they serve different purposes:

1. DataFrame.size:

DataFrame.size returns the total number of elements in the DataFrame, which is calculated by multiplying the number of rows by the number of columns.
It returns an integer value representing the total number of elements in the DataFrame.
The attribute does not require parentheses when accessing it.
Example: df.size

2. DataFrame.shape:

DataFrame.shape returns a tuple representing the dimensions of the DataFrame. The tuple contains two elements: the number of rows and the number of columns, respectively.
It provides a quick way to understand the structure of the DataFrame.
The attribute returns a tuple, so you access it using parentheses.
Example: df.shape
Here's a summary of the differences:

DataFrame.size: Returns the total number of elements (rows * columns) in the DataFrame.
DataFrame.shape: Returns a tuple representing the dimensions (number of rows, number of columns) of the DataFrame.
In summary, size gives you the total number of elements in the DataFrame, while shape gives you the number of rows and columns in the DataFrame as a tuple.



Answer 6:
    

In Pandas, you can use the read_excel() function to read data from an Excel file into a DataFrame. This function allows you to read data from Excel files of various formats, including .xls and .xlsx.

Here's how you can use read_excel():

import pandas as pd

# Read data from an Excel file into a DataFrame
df = pd.read_excel('example.xlsx')

# Now you can work with the DataFrame 'df'



Answer 7:
    
We can achieve this by using the str.split() method along with string manipulation to extract the username part from each email address and then create a new column 'Username' in the DataFrame. Here's how you can do it:

import pandas as pd

def extract_username(df):
    # Check if the DataFrame is empty or if the 'Email' column is absent
    if df.empty or 'Email' not in df.columns:
        print("DataFrame is empty or 'Email' column is absent.")
        return

    # Extract the username part from each email address
    usernames = df['Email'].str.split('@').str[0]

    # Add a new column 'Username' to the DataFrame with the extracted usernames
    df['Username'] = usernames

    return df

# Example usage:
# Assuming df is your DataFrame with the 'Email' column
data = {'Email': ['john.doe@example.com', 'jane.doe@example.com']}
df = pd.DataFrame(data)

# Call the function to extract usernames and add the 'Username' column
df = extract_username(df)

# Print the updated DataFrame
print(df)



Answer 8:
    
We can achieve this by using boolean indexing in Pandas. Here's how you can write the Python function to select rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10:

import pandas as pd

def select_rows(df):
    # Check if the DataFrame is empty or if the columns 'A' and 'B' are absent
    if df.empty or 'A' not in df.columns or 'B' not in df.columns:
        print("DataFrame is empty or columns 'A' and 'B' are absent.")
        return pd.DataFrame()  # Return an empty DataFrame

    # Select rows where 'A' > 5 and 'B' < 10
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]

    return selected_rows

# Example usage:
# Assuming df is your DataFrame with columns 'A', 'B', and 'C'
data = {'A': [3, 8, 6, 2, 9],
        'B': [5, 2, 9, 3, 1],
        'C': [1, 7, 4, 5, 2]}
df = pd.DataFrame(data)

# Call the function to select rows based on criteria
selected_df = select_rows(df)

# Print the selected DataFrame
print(selected_df)



Answer 9:
    

We can calculate the mean, median, and standard deviation of the values in the 'Values' column of a Pandas DataFrame using built-in Pandas functions. Here's how you can write the Python function to achieve this:

import pandas as pd

def calculate_stats(df):
    # Check if the DataFrame is empty or if the 'Values' column is absent
    if df.empty or 'Values' not in df.columns:
        print("DataFrame is empty or 'Values' column is absent.")
        return

    # Calculate mean, median, and standard deviation of the 'Values' column
    mean_value = df['Values'].mean()
    median_value = df['Values'].median()
    std_deviation = df['Values'].std()

    # Print the calculated statistics
    print("Mean:", mean_value)
    print("Median:", median_value)
    print("Standard Deviation:", std_deviation)

# Example usage:
# Assuming df is your DataFrame with the 'Values' column
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Call the function to calculate statistics
calculate_stats(df)



Answer 10:


To create a new column 'MovingAverage' in the DataFrame that contains the moving average of the sales for the past 7 days for each row, you can use the rolling() function along with the mean() function in Pandas. Here's how you can write the Python function to achieve this:

import pandas as pd

def calculate_moving_average(df):
    # Check if the DataFrame is empty or if the columns 'Sales' and 'Date' are absent
    if df.empty or 'Sales' not in df.columns or 'Date' not in df.columns:
        print("DataFrame is empty or columns 'Sales' and 'Date' are absent.")
        return

    # Convert the 'Date' column to datetime type if it's not already
    df['Date'] = pd.to_datetime(df['Date'])

    # Sort the DataFrame by date
    df.sort_values(by='Date', inplace=True)

    # Calculate the moving average of sales for the past 7 days
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()

    return df

# Example usage:
# Assuming df is your DataFrame with the 'Sales' and 'Date' columns
data = {'Date': ['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05', '2024-01-06', '2024-01-07',
                 '2024-01-08', '2024-01-09', '2024-01-10'],
        'Sales': [100, 200, 150, 250, 300, 200, 350, 400, 450, 500]}
df = pd.DataFrame(data)

# Call the function to calculate moving average
df = calculate_moving_average(df)

# Print the updated DataFrame
print(df)



Answer 11:
    
To create a new column 'Weekday' in the DataFrame that contains the weekday name corresponding to each date in the 'Date' column, you can use the dt.weekday_name attribute of the datetime objects in Pandas. Here's how you can write the Python function to achieve this:

import pandas as pd

def add_weekday_column(df):
    # Check if the DataFrame is empty or if the 'Date' column is absent
    if df.empty or 'Date' not in df.columns:
        print("DataFrame is empty or 'Date' column is absent.")
        return

    # Convert the 'Date' column to datetime type if it's not already
    df['Date'] = pd.to_datetime(df['Date'])

    # Add a new column 'Weekday' to the DataFrame with the weekday names
    df['Weekday'] = df['Date'].dt.day_name()

    return df

# Example usage:
# Assuming df is your DataFrame with the 'Date' column
data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']}
df = pd.DataFrame(data)

# Call the function to add the 'Weekday' column
df = add_weekday_column(df)

# Print the updated DataFrame
print(df)



Answer 12:
    

We can select rows based on a date range using boolean indexing in Pandas. Here's how you can write the Python function to select all rows where the date is between '2023-01-01' and '2023-01-31':

import pandas as pd

def select_rows_in_date_range(df):
    # Check if the DataFrame is empty or if the 'Date' column is absent
    if df.empty or 'Date' not in df.columns:
        print("DataFrame is empty or 'Date' column is absent.")
        return pd.DataFrame()  # Return an empty DataFrame

    # Convert the 'Date' column to datetime type if it's not already
    df['Date'] = pd.to_datetime(df['Date'])

    # Select rows where the date is between '2023-01-01' and '2023-01-31'
    selected_rows = df[(df['Date'] >= '2023-01-01') & (df['Date'] <= '2023-01-31')]

    return selected_rows

# Example usage:
# Assuming df is your DataFrame with the 'Date' column
data = {'Date': ['2023-01-01', '2023-01-15', '2023-01-25', '2023-02-05']}
df = pd.DataFrame(data)

# Call the function to select rows in the date range
selected_df = select_rows_in_date_range(df)

# Print the selected DataFrame
print(selected_df)



Answer 13:

To use the basic functions of Pandas, you need to import the Pandas library itself. The conventional import statement for Pandas is:

import pandas as pd

Here, pd is an alias commonly used to refer to the Pandas library. By importing Pandas with this alias, you can access its functions and classes using the prefix pd.

Once you've imported Pandas, you can start using its functions and classes to work with data in Python, such as creating DataFrames, reading data from various file formats, performing data manipulation, and much more.