#### Q1. List any five functions of the pandas library with execution.

Certainly! Here are five functions along with sample executions:

1. **`read_csv`**: This function is used to read data from a CSV file into a DataFrame.

```python
import pandas as pd

# Assuming you have a CSV file named 'example.csv'
df = pd.read_csv('example.csv')
print(df.head())
```

2. **`head`**: This method is used to display the first n rows of a DataFrame.

```python
# Assuming 'df' is a DataFrame
print(df.head(3))
```

3. **`info`**: This method is used to get a concise summary of the DataFrame, including information about the data types and missing values.

```python
# Assuming 'df' is a DataFrame
print(df.info())
```

4. **`describe`**: This method generates descriptive statistics of the DataFrame, such as count, mean, std (standard deviation), min, and max.

```python
# Assuming 'df' is a DataFrame
print(df.describe())
```

5. **`groupby`**: This method is used to group data based on one or more columns and perform operations on each group.

```python
# Assuming 'df' is a DataFrame with columns 'category' and 'value'
grouped_df = df.groupby('category').sum()
print(grouped_df)
```

#### Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [9]:
import pandas as pd

def reindex_dataframe(df):
    # Reset the index and create a new index starting from 1 and incrementing by 2
    df_reindexed = df.reset_index(drop=True)
    df_reindexed.index = df_reindexed.index * 2 + 1

    return df_reindexed

# Example usage:
# Assuming 'df' is your DataFrame with columns 'A', 'B', and 'C'
df = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60], 'C': [70, 80, 90]})

# Call the function to re-index the DataFrame
df_reindexed = reindex_dataframe(df)

# Display the re-indexed DataFrame
print(df_reindexed)

    A   B   C
1  10  40  70
3  20  50  80
5  30  60  90


In [10]:
def reindex_dataframe(df):
    # Create a new index starting from 1 and incrementing by 2
    new_index = range(1, 2 * len(df) + 1, 2)

    # Use the set_index function to set the new index
    df_reindexed = df.set_index(pd.Index(new_index))

    return df_reindexed

# Example usage:
# Assuming 'df' is your DataFrame with columns 'A', 'B', and 'C'
df = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60], 'C': [70, 80, 90]})

# Call the function to re-index the DataFrame
df_reindexed = reindex_dataframe(df)

# Display the re-indexed DataFrame
print(df_reindexed)


    A   B   C
1  10  40  70
3  20  50  80
5  30  60  90


#### Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The function should print the sum to the console.


#### For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should calculate and print the sum of the first three values, which is 60.

In [11]:
def sum_of_first_three1(df):
    # Check if the 'Values' column exists in the DataFrame
    if 'Values' in df.columns:
        # Calculate and print the sum of the first three values
        total_sum = df.loc[:2, 'Values'].sum()
        print(f"The sum of the first three values is: {total_sum}")
    else:
        print("The 'Values' column is not found in the DataFrame.")
        


def sum_of_first_three2(df):
    # Check if the 'Values' column exists in the DataFrame
    if 'Values' in df.columns:
        # Calculate and print the sum of the first three values
        total_sum = df['Values'].head(3).sum()
        print(f"The sum of the first three values is: {total_sum}")
    else:
        print("The 'Values' column is not found in the DataFrame.")


df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})

# Call the function to calculate and print the sum of the first three values
sum_of_first_three1(df)
sum_of_first_three2(df)

The sum of the first three values is: 60
The sum of the first three values is: 60


#### Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.

In [12]:
def add_word_count(df):
    # Check if the 'Text' column exists in the DataFrame
    if 'Text' in df.columns:
        # Apply a lambda function to count words in each row and create a new column 'Word_Count'
        df['Word_Count1'] = df['Text'].apply(lambda x: len(str(x).split()))
        df['Word_Count2'] = df['Text'].str.split().str.len()
    else:
        print("The 'Text' column is not found in the DataFrame.")     
        
        
        
# Example usage:
# Assuming 'df' is your DataFrame with a 'Text' column
df = pd.DataFrame({'Text': ["This is a sample sentence.", "Count the words in this one.", "A third sentence."]})

# Call the function to add the 'Word_Count' column
add_word_count(df)

# Display the DataFrame with the new 'Word_Count' column
df

Unnamed: 0,Text,Word_Count1,Word_Count2
0,This is a sample sentence.,5,5
1,Count the words in this one.,6,6
2,A third sentence.,3,3


#### Q5. How are DataFrame.size() and DataFrame.shape() different?

There is a difference in the functionality and purpose between `DataFrame.size` and `DataFrame.shape` in pandas:

1. **`DataFrame.size`**:
   - The `DataFrame.size` attribute returns the total number of elements in the DataFrame, which is equivalent to the total number of cells.
   - It calculates the size by multiplying the number of rows by the number of columns.

   Example:
   ```python
   import pandas as pd

   df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
   size_result = df.size
   print(size_result)  # Output: 6 (3 rows * 2 columns)
   ```

2. **`DataFrame.shape`**:
   - The `DataFrame.shape` attribute returns a tuple representing the dimensions of the DataFrame.
   - The tuple contains two values: the number of rows and the number of columns.

   Example:
   ```python
   import pandas as pd

   df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
   shape_result = df.shape
   print(shape_result)  # Output: (3, 2) (3 rows, 2 columns)
   ```

In summary:

- `DataFrame.size` provides the total number of elements in the DataFrame.
- `DataFrame.shape` provides a tuple representing the dimensions of the DataFrame (number of rows, number of columns).

Keep in mind that both `DataFrame.size` and `DataFrame.shape` are attributes, not methods, so you should access them without parentheses.

#### Q6. Which function of pandas do we use to read an excel file?

To read an Excel file in pandas, we can use the `pd.read_excel()` function. This function is part of the pandas library and is specifically designed for reading data from Excel files.

Here's an example of how to use `pd.read_excel()`:

```python
# Replace 'your_file.xlsx' with the actual path to your Excel file
df = pd.read_excel('your_file.xlsx')

# Now 'df' is a DataFrame containing the data from the Excel file
```

#### Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email addresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address.


#### The username is the part of the email address that appears before the '@' symbol. For example, if the email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your function should extract the username from each email address and store it in the new 'Username' column.

In [13]:
import pandas as pd

def extract_username_method1(df):
    # Use str.split() and str.get() to extract the username
    df['Username_1'] = df['Email'].str.split('@').str.get(0)
    
    # Use apply with a lambda function to extract the username
    df['Username_2'] = df['Email'].apply(lambda x: x.split('@')[0])
    
     # Use str.split() and str.get() to extract the username
    df['Username_3'] = df['Email'].str.split('@').str[0]
    return df

# Example usage:
df = pd.DataFrame({'Email': ['john.doe@example.com', 'jane.smith@example.com', 'bob.jones@example.com']})
result_df = extract_username_method1(df)
result_df

Unnamed: 0,Email,Username_1,Username_2,Username_3
0,john.doe@example.com,john.doe,john.doe,john.doe
1,jane.smith@example.com,jane.smith,jane.smith,jane.smith
2,bob.jones@example.com,bob.jones,bob.jones,bob.jones


#### Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The function should return a new DataFrame that contains only the selected rows.

#### (For example, if df contains the following values:

#### A B C

#### 0 3 5 1

#### 1 8 2 7

#### 2 6 9 4

#### 3 2 3 5

#### 4 9 1 2

#### Your function should select the following rows: 

#### A B C

#### 1 8 2 7

#### 4 9 1 2

#### The function should return a new DataFrame that contains only the selected rows.

In [22]:
def select_rows_1(df):
    # Select rows where 'A' > 5 and 'B' < 10
    selected_rows_1 = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows_1

# Example usage:
# Assuming 'df' is your DataFrame with columns 'A', 'B', and 'C'
df = pd.DataFrame({'A' : [3,8,6,2,9], 'B': [5,2,9,3,1], 'C':[1,7,4,5,2]})


# Call the function to select rows based on conditions
result_df_1 = select_rows_1(df)

# Display the resulting DataFrame
print(result_df_1)


   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


In [23]:
def select_rows_2(df):
    # Use the query method to select rows based on conditions
    selected_rows_2 = df.query('A > 5 and B < 10')
    return selected_rows_2

# Example usage:
# Assuming 'df' is your DataFrame with columns 'A', 'B', and 'C'
df = pd.DataFrame({'A': [4, 7, 2, 9], 'B': [8, 5, 3, 12], 'C': [10, 15, 20, 25]})

# Call the function to select rows based on conditions
result_df_2 = select_rows_2(df)

# Display the resulting DataFrame
print(result_df_2)


   A  B   C
1  7  5  15


In [24]:
def select_rows_3(df):
    # Use the query method to select rows based on conditions
    selected_rows_3 = df.loc[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows_3

# Example usage:
# Assuming 'df' is your DataFrame with columns 'A', 'B', and 'C'
df = pd.DataFrame({'A' : [3,8,6,2,9], 'B': [5,2,9,3,1], 'C':[1,7,4,5,2]})

# Call the function to select rows based on conditions
result_df_3 = select_rows_3(df)

# Display the resulting DataFrame
print(result_df_3)


   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


In [25]:
def select_rows_4(df):
    # Use the query method to select rows based on conditions
    selected_rows_4 = df[df['A'].gt(5) & df['B'].lt(10)]
    return selected_rows_4

# Example usage:
# Assuming 'df' is your DataFrame with columns 'A', 'B', and 'C'
df = pd.DataFrame({'A' : [3,8,6,2,9], 'B': [5,2,9,3,1], 'C':[1,7,4,5,2]})

# Call the function to select rows based on conditions
result_df_4 = select_rows_4(df)

# Display the resulting DataFrame
print(result_df_4)


   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


#### Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.

In [26]:
def calculate_statistics(df):
    # Check if the 'Values' column exists in the DataFrame
    if 'Values' in df.columns:
        # Calculate mean, median, and standard deviation
        mean_value = df['Values'].mean()
        median_value = df['Values'].median()
        std_deviation = df['Values'].std()

        # Display the results
        print(f"Mean: {mean_value}")
        print(f"Median: {median_value}")
        print(f"Standard Deviation: {std_deviation}")
    else:
        print("The 'Values' column is not found in the DataFrame.")

# Example usage:
# Assuming 'df' is your DataFrame with a 'Values' column
df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})

# Call the function to calculate statistics
calculate_statistics(df)


Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896


#### Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and should include the current day.

In [27]:
def calculate_moving_average(df):
    # Check if the 'Sales' and 'Date' columns exist in the DataFrame
    if 'Sales' in df.columns and 'Date' in df.columns:
        # Convert the 'Date' column to datetime type if it's not already
        df['Date'] = pd.to_datetime(df['Date'])
        
        # Sort the DataFrame by 'Date' to ensure correct rolling calculation
        df = df.sort_values(by='Date')

        # Calculate the moving average using a window of size 7
        df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()

        return df
    else:
        print("The 'Sales' or 'Date' column is not found in the DataFrame.")
        return None

# Example usage:
# Assuming 'df' is your DataFrame with 'Sales' and 'Date' columns
df = pd.DataFrame({
    'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06', '2023-01-07'],
    'Sales': [10, 15, 20, 25, 30, 35, 40]
})

# Call the function to calculate the moving average
result_df = calculate_moving_average(df)

# Display the resulting DataFrame
print(result_df)


        Date  Sales  MovingAverage
0 2023-01-01     10           10.0
1 2023-01-02     15           12.5
2 2023-01-03     20           15.0
3 2023-01-04     25           17.5
4 2023-01-05     30           20.0
5 2023-01-06     35           22.5
6 2023-01-07     40           25.0


#### Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g. Monday, Tuesday) corresponding to each date in the 'Date' column. <br> For example, if df contains the following values:<br>Date<br><br>0 2023-01-01<br><br>1 2023-01-02<br><br>2 2023-01-03<br><br>3 2023-01-04<br><br>4 2023-01-05<br><br>Your function should create the following DataFrame:<br><br>Date Weekday<br><br>0 2023-01-01 Sunday<br><br>1 2023-01-02 Monday<br><br>2 2023-01-03 Tuesday<br><br>3 2023-01-04 Wednesday<br><br>4 2023-01-05 Thursday<br><br>The function should return the modified DataFrame.<br>

In [29]:
def add_weekday_column(df):
    # Check if the 'Date' column exists in the DataFrame
    if 'Date' in df.columns:
        # Convert the 'Date' column to datetime type if it's not already
        df['Date'] = pd.to_datetime(df['Date'])

        # Add a new 'Weekday' column containing weekday names
        # df['Weekday'] = df['Date'].dt.strftime('%A')
        # df['Weekday'] = df['Date'].apply(lambda x: x.strftime('%A'))
        
        df['Weekday'] = df['Date'].dt.day_name()

        return df
    else:
        print("The 'Date' column is not found in the DataFrame.")
        return None

# Example usage:
# Assuming 'df' is DataFrame with a 'Date' column
df = pd.DataFrame({'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']})

# Call the function to add the 'Weekday' column
result_df = add_weekday_column(df)

# Display the resulting DataFrame
print(result_df)

        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


#### Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [30]:
def select_rows_in_date_range(df):
    # Check if the 'Date' column exists in the DataFrame
    if 'Date' in df.columns:
        # Convert the 'Date' column to datetime type if it's not already
        df['Date'] = pd.to_datetime(df['Date'])

        # Define the date range
        start_date = '2023-01-01'
        end_date = '2023-01-31'

        # Select rows within the date range
        selected_rows = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]

        return selected_rows
    else:
        print("The 'Date' column is not found in the DataFrame.")
        return None

# Example usage:
# Assuming 'df' is DataFrame with a 'Date' column
df = pd.DataFrame({
    'Date': ['2023-01-01', '2023-01-15', '2023-01-25', '2023-02-05']
})

# Call the function to select rows within the date range
result_df = select_rows_in_date_range(df)

# Display the resulting DataFrame
print(result_df)

        Date
0 2023-01-01
1 2023-01-15
2 2023-01-25


#### Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?

To use the basic functions of pandas, the first and foremost necessary library that needs to be imported is the pandas library itself. We can import it using the following convention:

```python
import pandas as pd
```

Here, `pd` is a commonly used alias for the pandas library. This convention makes it easier to reference pandas functions and objects in our code. Once you've imported pandas, we can use its various functions and classes to work with data in a tabular format using DataFrames and Series.