ASSIGMENT :- PANDAS_ADVANCE

Q1. List any five functions of the pandas library with execution.

(1):- read_csv(): Load data from a CSV file into a Pandas DataFrame.
(2):-fillna(): Replace missing values (NaN) in a DataFrame with specified values.
(3):-mean(): Calculate the mean (average) of a Series or DataFram
(4):-std(): Calculate the standard deviation of a Series or DataFrame.
(5):-describe(): Get descriptive statistics about the data (mean, min, max, etc.).

Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [1]:
import pandas as pd

def reindex_dataframe(df):
    """
    Re-indexes the DataFrame with a new index starting from 1 and incrementing by 2.
    
    Args:
        df (pd.DataFrame): Input DataFrame with columns 'A', 'B', and 'C'.
    
    Returns:
        pd.DataFrame: DataFrame with the new index.
    """
    new_index = pd.RangeIndex(start=1, stop=len(df) * 2, step=2)
    df_reindexed = df.reset_index(drop=True)
    df_reindexed.index = new_index
    return df_reindexed

# Example usage:
data = {'A': [10, 20, 30], 'B': [100, 200, 300], 'C': [1000, 2000, 3000]}
df = pd.DataFrame(data)
reindexed_df = reindex_dataframe(df)
print(reindexed_df)


    A    B     C
1  10  100  1000
3  20  200  2000
5  30  300  3000


Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.
For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should
calculate and print the sum of the first three values, which is 60.

In [2]:
import pandas as pd

def calculate_sum_of_first_three(df):
    """
    Calculates the sum of the first three values in the 'Values' column of the DataFrame.

    Args:
        df (pd.DataFrame): Input DataFrame with a 'Values' column.

    Returns:
        None (Prints the sum to the console).
    """
    try:
        first_three_values = df['Values'][:3]
        total_sum = first_three_values.sum()
        print(f"Sum of the first three values: {total_sum}")
    except KeyError:
        print("Error: 'Values' column not found in the DataFrame.")

# Example usage:
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)
calculate_sum_of_first_three(df)


Sum of the first three values: 60


Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column
'Word_Count' that contains the number of words in each row of the 'Text' column.

In [3]:
import pandas as pd

def add_word_count_column(df):
    """
    Adds a new column 'Word_Count' to the DataFrame, containing the word count for each row in the 'Text' column.

    Args:
        df (pd.DataFrame): Input DataFrame with a 'Text' column.

    Returns:
        pd.DataFrame: DataFrame with the new 'Word_Count' column.
    """
    df['Word_Count'] = df['Text'].apply(lambda x: len(x.split()))
    return df

# Example usage:
data = {'Text': ["This is a sample text.", "Count the words in this sentence."]}
df = pd.DataFrame(data)
df_with_word_count = add_word_count_column(df)
print(df_with_word_count)


                                Text  Word_Count
0             This is a sample text.           5
1  Count the words in this sentence.           6


Q5. How are DataFrame.size() and DataFrame.shape() different?

DataFrame.size():
The size() method returns the total number of elements (cells) in the DataFrame.
It calculates the product of the number of rows and the number of columns.
The result represents the total size of the DataFrame.
DataFrame.shape():
The shape() method returns a tuple representing the dimensionality of the DataFrame.
It provides the number of rows and the number of columns.
The result is in the format (rows, columns).


Q6. Which function of pandas do we use to read an excel file?

In [6]:
import pandas as pd

# Read an Excel file into a DataFrame
df = pd.read_excel("file name")


FileNotFoundError: [Errno 2] No such file or directory: 'file name'

Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email
addresses in the format 'username@domain.com'. Write a Python function that creates a new column
'Username' in df that contains only the username part of each email address.
The username is the part of the email address that appears before the '@' symbol. For example, if the
email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your
function should extract the username from each email address and store it in the new 'Username'
column.

In [7]:
import pandas as pd

def extract_username(df):
    """
    Extracts the username from each email address in the 'Email' column and creates a new 'Username' column.

    Args:
        df (pd.DataFrame): Input DataFrame with an 'Email' column.

    Returns:
        pd.DataFrame: DataFrame with the new 'Username' column.
    """
    # Split the email addresses at the '@' symbol and take the first part
    df['Username'] = df['Email'].str.split('@').str[0]
    return df

# Example usage:
data = {'Email': ['john.doe@example.com', 'alice.smith@example.com']}
df = pd.DataFrame(data)
df_with_username = extract_username(df)
print(df_with_username)


                     Email     Username
0     john.doe@example.com     john.doe
1  alice.smith@example.com  alice.smith


Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.
For example, if df contains the following values:
A B C
0 3 5 1
1 8 2 7
2 6 9 4
3 2 3 5
4 9 1 2

In [9]:
import pandas as pd

def filter_dataframe(df):
    """
    Filters the DataFrame to select rows where 'A' > 5 and 'B' < 10.

    Args:
        df (pd.DataFrame): Input DataFrame with columns 'A', 'B', and 'C'.

    Returns:
        pd.DataFrame: New DataFrame containing the selected rows.
    """
    filtered_df = df[(df['A'] > 5) & (df['B'] < 10)]
    return filtered_df

# Example usage:
data = {'A': [3, 1, 7, 6, 9], 'B': [5, 8, 2, 9, 1], 'C': [2, 3, 2, 5, 4]}
df = pd.DataFrame(data)
filtered_result = filter_dataframe(df)
print(filtered_result)


   A  B  C
2  7  2  2
3  6  9  5
4  9  1  4


Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean,
median, and standard deviation of the values in the 'Values' column.

In [10]:
import pandas as pd

def calculate_statistics(df):
    """
    Calculates the mean, median, and standard deviation of the 'Values' column.

    Args:
        df (pd.DataFrame): Input DataFrame with a 'Values' column.

    Returns:
        dict: A dictionary containing the calculated statistics.
    """
    statistics = {
        'Mean': df['Values'].mean(),
        'Median': df['Values'].median(),
        'Standard Deviation': df['Values'].std()
    }
    return statistics

# Example usage:
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)
result = calculate_statistics(df)
print(result)


{'Mean': 30.0, 'Median': 30.0, 'Standard Deviation': 15.811388300841896}


Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to
create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days
for each row in the DataFrame. The moving average should be calculated using a window of size 7 and
should include the current day.

In [11]:
import pandas as pd

def calculate_moving_average(df):
    """
    Calculates the moving average of the 'Sales' column using a window of size 7.

    Args:
        df (pd.DataFrame): Input DataFrame with columns 'Sales' and 'Date'.

    Returns:
        pd.DataFrame: DataFrame with the new 'MovingAverage' column.
    """
    # Sort the DataFrame by date (if not already sorted)
    df.sort_values(by='Date', inplace=True)

    # Calculate the moving average with a window of size 7
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()

    return df

# Example usage:
data = {'Date': ['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05'],
        'Sales': [100, 120, 110, 130, 140]}
df = pd.DataFrame(data)
df_with_moving_average = calculate_moving_average(df)
print(df_with_moving_average)


         Date  Sales  MovingAverage
0  2022-01-01    100          100.0
1  2022-01-02    120          110.0
2  2022-01-03    110          110.0
3  2022-01-04    130          115.0
4  2022-01-05    140          120.0


Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new
column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.
Monday, Tuesday) corresponding to each date in the 'Date' column.
For example, if df contains the following values:
Date
0 2023-01-01
1 2023-01-02
2 2023-01-03
3 2023-01-04
4 2023-01-05
Your function should create the following DataFrame:

Date Weekday
0 2023-01-01 Sunday
1 2023-01-02 Monday
2 2023-01-03 Tuesday
3 2023-01-04 Wednesday
4 2023-01-05 Thursday
The function should return the modified DataFrame.

In [12]:
import pandas as pd

def add_weekday_column(df):
    """
    Adds a new column 'Weekday' to the DataFrame, containing the weekday name for each date in the 'Date' column.

    Args:
        df (pd.DataFrame): Input DataFrame with a 'Date' column.

    Returns:
        pd.DataFrame: DataFrame with the new 'Weekday' column.
    """
    # Convert the 'Date' column to datetime format
    df['Date'] = pd.to_datetime(df['Date'])

    # Extract the weekday name
    df['Weekday'] = df['Date'].dt.strftime('%A')

    return df

# Example usage:
data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']}
df = pd.DataFrame(data)
df_with_weekday = add_weekday_column(df)
print(df_with_weekday)


        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday
