**Q1. List any five functions of the pandas library with execution**

Certainly, here are the names of five functions from the Pandas library:

1. `pd.read_csv()`
2. `DataFrame.head()`
3. `DataFrame.describe()`
4. `DataFrame.groupby()`
5. `DataFrame.to_csv()`

**Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.**


In [1]:
import pandas as pd

def reindex_dataframe(df):
    # Create a new index starting from 1 and incrementing by 2
    new_index = range(1, 2 * len(df) + 1, 2)

    # Set the new index to the DataFrame
    df = df.set_index(pd.Index(new_index))

    return df

# Sample DataFrame with columns 'A', 'B', and 'C'
data = {'A': [10, 20, 30, 40],
        'B': [15, 25, 35, 45],
        'C': [12, 22, 32, 42]}
df = pd.DataFrame(data)

# Re-index the DataFrame
df = reindex_dataframe(df)
print(df)


    A   B   C
1  10  15  12
3  20  25  22
5  30  35  32
7  40  45  42


**Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The function should print the sum to the console.**

**For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should calculate and print the sum of the first three values, which is 60.**

In [3]:
import pandas as pd

def calculate_sum_of_first_three_values(df):
    # Check if the 'Values' column exists in the DataFrame
    if 'Values' in df.columns:
        # Extract the 'Values' column
        values_column = df['Values']

        # Calculate the sum of the first three values
        sum_of_first_three_values = values_column.head(3).sum()

        # Print the sum to the console
        print("Sum of the first three values:", sum_of_first_three_values)
    else:
        print("The 'Values' column does not exist in the DataFrame.")

# Sample DataFrame with a 'Values' column
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Calculate and print the sum of the first three values
calculate_sum_of_first_three_values(df)


Sum of the first three values: 60


**Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.**

In [4]:
import pandas as pd

def add_word_count_column(df):
    # Check if the 'Text' column exists in the DataFrame
    if 'Text' in df.columns:
        # Split each row in the 'Text' column into words and count them
        df['Word_Count'] = df['Text'].apply(lambda x: len(x.split()))
    else:
        print("The 'Text' column does not exist in the DataFrame.")

# Sample DataFrame with a 'Text' column
data = {'Text': ["This is a sample sentence.",
                 "Count the words in this text.",
                 "Another example of word count."]}
df = pd.DataFrame(data)

# Add the 'Word_Count' column to the DataFrame
add_word_count_column(df)

# Display the updated DataFrame
print(df)


                             Text  Word_Count
0      This is a sample sentence.           5
1   Count the words in this text.           6
2  Another example of word count.           5


**Q5. How are DataFrame.size() and DataFrame.shape() different?**

In Pandas, `DataFrame.size` and `DataFrame.shape` are two different attributes of a DataFrame that provide different types of information about the DataFrame:

1. `DataFrame.size`:
   - `DataFrame.size` returns the total number of elements in the DataFrame, which is equivalent to the product of the number of rows and the number of columns.
   - It gives the total number of cells or data points in the DataFrame.
   - It returns a single integer value.

2. `DataFrame.shape`:
   - `DataFrame.shape` returns a tuple representing the dimensions of the DataFrame, where the first element of the tuple is the number of rows, and the second element is the number of columns.
   - It provides information about the structure of the DataFrame, specifying how many rows and columns it has.
   - It returns a tuple of two integers.


**Q6. Which function of pandas do we use to read an excel file?**

In Pandas, you can use the `pd.read_excel()` function to read data from an Excel file. This function allows you to read Excel files and create a DataFrame from the data within the file.

**Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email addresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address.**

**The username is the part of the email address that appears before the '@' symbol. For example, if the email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your function should extract the username from each email address and store it in the new 'Username' column.**

In [7]:
import pandas as pd

def extract_username_from_email(df):
    # Check if the 'Email' column exists in the DataFrame
    if 'Email' in df.columns:
        # Extract the username part using a lambda function and the str.split method
        df['Username'] = df['Email'].apply(lambda email: email.split('@')[0])
    else:
        print("The 'Email' column does not exist in the DataFrame.")

# Sample DataFrame with an 'Email' column
data = {'Email': ['john.doe@example.com', 'jane.smith@example.com', 'alice.wonderland@example.com']}
df = pd.DataFrame(data)

# Extract and add the 'Username' column to the DataFrame
extract_username_from_email(df)

# Display the updated DataFrame
print(df)


                          Email          Username
0          john.doe@example.com          john.doe
1        jane.smith@example.com        jane.smith
2  alice.wonderland@example.com  alice.wonderland


Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.
For example, if df contains the following values:
A B C
0 3 5 1
1 8 2 7
2 6 9 4
3 2 3 5
4 9 1 2
Your function should select the following rows: A B C
1 8 2 7
4 9 1 2
The function should return a new DataFrame that contains only the selected rows.

In [8]:
import pandas as pd

def filter_dataframe(df):
    # Check if the columns 'A' and 'B' exist in the DataFrame
    if 'A' in df.columns and 'B' in df.columns:
        # Use boolean indexing to filter rows where 'A' > 5 and 'B' < 10
        selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
        return selected_rows
    else:
        print("The 'A' and/or 'B' columns do not exist in the DataFrame.")
        return None

# Sample DataFrame with columns 'A', 'B', and 'C'
data = {'A': [3, 8, 6, 2, 9],
        'B': [5, 2, 9, 3, 1],
        'C': [1, 7, 4, 5, 2]}
df = pd.DataFrame(data)

# Filter the DataFrame and get the selected rows
selected_rows_df = filter_dataframe(df)

# Display the selected rows
print(selected_rows_df)


   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


**Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.**

In [9]:
import pandas as pd

def calculate_statistics(df):
    # Check if the 'Values' column exists in the DataFrame
    if 'Values' in df.columns:
        values_column = df['Values']
        # Calculate mean, median, and standard deviation
        mean_value = values_column.mean()
        median_value = values_column.median()
        std_deviation = values_column.std()
        return mean_value, median_value, std_deviation
    else:
        print("The 'Values' column does not exist in the DataFrame.")
        return None

# Sample DataFrame with a 'Values' column
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Calculate mean, median, and standard deviation
mean, median, std = calculate_statistics(df)

# Print the results
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Standard Deviation: {std}")


Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896


**Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and should include the current day.**

In [10]:
import pandas as pd

def calculate_moving_average(df):
    # Check if the 'Sales' and 'Date' columns exist in the DataFrame
    if 'Sales' in df.columns and 'Date' in df.columns:
        # Convert the 'Date' column to a datetime data type if it's not already
        df['Date'] = pd.to_datetime(df['Date'])

        # Sort the DataFrame by the 'Date' column in ascending order
        df = df.sort_values(by='Date')

        # Calculate the moving average with a window of size 7, including the current day
        df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()

        return df
    else:
        print("The 'Sales' and/or 'Date' columns do not exist in the DataFrame.")
        return None

# Sample DataFrame with 'Sales' and 'Date' columns
data = {'Date': ['2023-10-01', '2023-10-02', '2023-10-03', '2023-10-04', '2023-10-05', '2023-10-06', '2023-10-07'],
        'Sales': [10, 20, 30, 40, 50, 60, 70]}
df = pd.DataFrame(data)

# Calculate the moving average and add it to the DataFrame
df = calculate_moving_average(df)

# Display the updated DataFrame
print(df)


        Date  Sales  MovingAverage
0 2023-10-01     10           10.0
1 2023-10-02     20           15.0
2 2023-10-03     30           20.0
3 2023-10-04     40           25.0
4 2023-10-05     50           30.0
5 2023-10-06     60           35.0
6 2023-10-07     70           40.0


Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new
column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.
Monday, Tuesday) corresponding to each date in the 'Date' column.
For example, if df contains the following values:
Date
0 2023-01-01
1 2023-01-02
2 2023-01-03
3 2023-01-04
4 2023-01-05
Your function should create the following DataFrame:

Date Weekday
0 2023-01-01 Sunday
1 2023-01-02 Monday
2 2023-01-03 Tuesday
3 2023-01-04 Wednesday
4 2023-01-05 Thursday
The function should return the modified DataFrame.

In [11]:
import pandas as pd

def add_weekday_column(df):
    # Check if the 'Date' column exists in the DataFrame
    if 'Date' in df.columns:
        # Convert the 'Date' column to a datetime data type if it's not already
        df['Date'] = pd.to_datetime(df['Date'])

        # Extract and add the 'Weekday' column with weekday names
        df['Weekday'] = df['Date'].dt.strftime('%A')
        return df
    else:
        print("The 'Date' column does not exist in the DataFrame.")
        return None

# Sample DataFrame with a 'Date' column
data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']}
df = pd.DataFrame(data)

# Add the 'Weekday' column to the DataFrame
df = add_weekday_column(df)

# Display the modified DataFrame
print(df)


        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


**Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python function to select all rows where the date is between '2023-01-01' and '2023-01-31'.**

In [12]:
import pandas as pd

def select_rows_between_dates(df):
    # Check if the 'Date' column exists in the DataFrame
    if 'Date' in df.columns:
        # Convert the 'Date' column to a datetime data type if it's not already
        df['Date'] = pd.to_datetime(df['Date'])

        # Define the date range
        start_date = '2023-01-01'
        end_date = '2023-01-31'

        # Use boolean indexing to select rows within the date range
        selected_rows = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]

        return selected_rows
    else:
        print("The 'Date' column does not exist in the DataFrame.")
        return None

# Sample DataFrame with a 'Date' column
data = {'Date': ['2023-01-01', '2023-01-15', '2023-01-20', '2023-02-05', '2023-03-10']}
df = pd.DataFrame(data)

# Select rows between '2023-01-01' and '2023-01-31'
selected_df = select_rows_between_dates(df)

# Display the selected rows
print(selected_df)


        Date
0 2023-01-01
1 2023-01-15
2 2023-01-20


**Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?**


To use the basic functions of Pandas, the first and foremost library you need to import is, of course, Pandas itself. You can import Pandas using the following commonly used import statement:

```python
import pandas as pd
```

By convention, the alias `pd` is used, which allows you to refer to Pandas functions and objects using the `pd` prefix. This makes it easier to work with Pandas in your code. Once you've imported Pandas, you can use its functions and data structures to work with tabular data, manipulate DataFrames, and perform various data analysis tasks.