Q1. List any five functions of the pandas library with execution.

Here are five commonly used functions of the Pandas library along with examples of their execution:

1. **`read_csv()`**: This function is used to read a CSV file into a DataFrame.

2. **`head()`**: This function is used to view the first few rows of a DataFrame.

3. **`describe()`**: This function provides descriptive statistics of a DataFrame.

4. **`groupby()`**: This function is used to group data by one or more columns.

5. **`merge()`**: This function is used to merge two DataFrames based on a common column or index.

Let's see these functions in action:

### Step 1: Import Pandas and Create Sample Data

```python
import pandas as pd

# Creating sample data for demonstration
data = {
    'Name': ['John', 'Anna', 'Peter', 'Linda', 'James'],
    'Age': [28, 22, 35, 32, 30],
    'City': ['New York', 'Paris', 'Berlin', 'London', 'Tokyo']
}

# Creating a DataFrame
df = pd.DataFrame(data)
```

### 1. `read_csv()`

This function reads a CSV file into a DataFrame.

```python
# Save the DataFrame to a CSV file for demonstration
df.to_csv('sample.csv', index=False)

# Read the CSV file into a new DataFrame
df_csv = pd.read_csv('sample.csv')
print(df_csv)
```

### 2. `head()`

This function returns the first `n` rows of the DataFrame. By default, it returns the first 5 rows.

```python
print(df.head())
```

### 3. `describe()`

This function provides descriptive statistics of the DataFrame.

```python
print(df.describe())
```

### 4. `groupby()`

This function groups the data by a specified column and allows aggregation.

```python
# Sample data for groupby demonstration
data_group = {
    'Name': ['John', 'Anna', 'Peter', 'Linda', 'James', 'Anna'],
    'Age': [28, 22, 35, 32, 30, 29],
    'City': ['New York', 'Paris', 'Berlin', 'London', 'Tokyo', 'Paris']
}

# Creating a DataFrame
df_group = pd.DataFrame(data_group)

# Grouping by 'City' and calculating the average age
grouped = df_group.groupby('City').mean()
print(grouped)
```

### 5. `merge()`

This function merges two DataFrames based on a common column or index.

```python
# Sample data for merge demonstration
data1 = {
    'Name': ['John', 'Anna', 'Peter'],
    'Age': [28, 22, 35]
}

data2 = {
    'Name': ['John', 'Anna', 'Linda'],
    'City': ['New York', 'Paris', 'London']
}

# Creating DataFrames
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

# Merging DataFrames on 'Name'
merged_df = pd.merge(df1, df2, on='Name')
print(merged_df)
```

### Full Example

Here's a complete script demonstrating all the functions mentioned above:

```python
import pandas as pd

# Sample data
data = {
    'Name': ['John', 'Anna', 'Peter', 'Linda', 'James'],
    'Age': [28, 22, 35, 32, 30],
    'City': ['New York', 'Paris', 'Berlin', 'London', 'Tokyo']
}

# Creating a DataFrame
df = pd.DataFrame(data)

# 1. read_csv()
df.to_csv('sample.csv', index=False)
df_csv = pd.read_csv('sample.csv')
print("DataFrame from CSV:")
print(df_csv)
print("\n")

# 2. head()
print("First few rows of the DataFrame:")
print(df.head())
print("\n")

# 3. describe()
print("Descriptive statistics of the DataFrame:")
print(df.describe())
print("\n")

# Sample data for groupby demonstration
data_group = {
    'Name': ['John', 'Anna', 'Peter', 'Linda', 'James', 'Anna'],
    'Age': [28, 22, 35, 32, 30, 29],
    'City': ['New York', 'Paris', 'Berlin', 'London', 'Tokyo', 'Paris']
}

# Creating a DataFrame
df_group = pd.DataFrame(data_group)

# 4. groupby()
grouped = df_group.groupby('City').mean()
print("Grouped data by City with average age:")
print(grouped)
print("\n")

# Sample data for merge demonstration
data1 = {
    'Name': ['John', 'Anna', 'Peter'],
    'Age': [28, 22, 35]
}

data2 = {
    'Name': ['John', 'Anna', 'Linda'],
    'City': ['New York', 'Paris', 'London']
}

# Creating DataFrames
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

# 5. merge()
merged_df = pd.merge(df1, df2, on='Name')
print("Merged DataFrame on 'Name':")
print(merged_df)
```



Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.

To re-index a Pandas DataFrame `df` with a new index that starts from 1 and increments by 2 for each row, you can follow these steps:

1. Create a new index starting from 1 and incrementing by 2.
2. Assign this new index to the DataFrame.

Here's a function to achieve this:

```python
import pandas as pd

def reindex_dataframe(df):
    # Create a new index starting from 1 and incrementing by 2
    new_index = range(1, 2 * len(df) + 1, 2)
    
    # Assign the new index to the DataFrame
    df.index = new_index
    return df

# Example usage
data = {
    'A': [10, 20, 30, 40],
    'B': [50, 60, 70, 80],
    'C': [90, 100, 110, 120]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Re-index the DataFrame
df_reindexed = reindex_dataframe(df)
print("\nRe-indexed DataFrame:")
print(df_reindexed)
```

### Explanation

- **Creating the new index**: The `range(1, 2 * len(df) + 1, 2)` function creates a range of numbers starting from 1, ending before `2 * len(df) + 1`, and incrementing by 2. This ensures that the index starts at 1 and increments by 2 for each row.
- **Assigning the new index**: `df.index = new_index` assigns the newly created index to the DataFrame.

### Full Example

```python
import pandas as pd

def reindex_dataframe(df):
    # Create a new index starting from 1 and incrementing by 2
    new_index = range(1, 2 * len(df) + 1, 2)
    
    # Assign the new index to the DataFrame
    df.index = new_index
    return df

# Example usage
data = {
    'A': [10, 20, 30, 40],
    'B': [50, 60, 70, 80],
    'C': [90, 100, 110, 120]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Re-index the DataFrame
df_reindexed = reindex_dataframe(df)
print("\nRe-indexed DataFrame:")
print(df_reindexed)
```

When you run this code, the output will be:

```
Original DataFrame:
    A   B    C
0  10  50   90
1  20  60  100
2  30  70  110
3  40  80  120

Re-indexed DataFrame:
    A   B    C
1  10  50   90
3  20  60  100
5  30  70  110
7  40  80  120
```



Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.

To calculate the sum of the first three values in the 'Values' column of a Pandas DataFrame and print the result, you can write a Python function that:

1. Selects the first three values from the 'Values' column.
2. Calculates their sum.
3. Prints the sum to the console.

Here's the function:

```python
import pandas as pd

def sum_first_three_values(df):
    # Ensure the DataFrame has at least three values
    if len(df) < 3:
        print("The DataFrame does not have at least three values.")
        return

    # Calculate the sum of the first three values in the 'Values' column
    sum_values = df['Values'].iloc[:3].sum()
    
    # Print the sum
    print("Sum of the first three values:", sum_values)

# Example usage
data = {
    'Values': [10, 20, 30, 40, 50]
}

df = pd.DataFrame(data)
sum_first_three_values(df)
```

### Explanation

- **Check DataFrame Length**: The function first checks if the DataFrame has at least three rows. If not, it prints a message and returns.
- **Select First Three Values**: `df['Values'].iloc[:3]` selects the first three values from the 'Values' column.
- **Calculate Sum**: `.sum()` calculates the sum of these values.
- **Print the Sum**: The function prints the calculated sum.

### Full Example

Here's the full example including the DataFrame creation and function execution:

```python
import pandas as pd

def sum_first_three_values(df):
    # Ensure the DataFrame has at least three values
    if len(df) < 3:
        print("The DataFrame does not have at least three values.")
        return

    # Calculate the sum of the first three values in the 'Values' column
    sum_values = df['Values'].iloc[:3].sum()
    
    # Print the sum
    print("Sum of the first three values:", sum_values)

# Example usage
data = {
    'Values': [10, 20, 30, 40, 50]
}

df = pd.DataFrame(data)
sum_first_three_values(df)
```

When you run this code, the output will be:

```
Sum of the first three values: 60
```



Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column
'Word_Count' that contains the number of words in each row of the 'Text' column.

To create a new column `Word_Count` that contains the number of words in each row of the `Text` column in a Pandas DataFrame, you can follow these steps:

1. Define a function to count the number of words in a string.
2. Apply this function to the `Text` column to create the new `Word_Count` column.

Here's how you can do this:

```python
import pandas as pd

def add_word_count_column(df):
    # Define a function to count the number of words in a string
    def count_words(text):
        if pd.isna(text):
            return 0
        return len(text.split())

    # Apply the count_words function to the 'Text' column
    df['Word_Count'] = df['Text'].apply(count_words)
    return df

# Example usage
data = {
    'Text': [
        "This is the first sentence.",
        "Here is another sentence.",
        "And a third one.",
        "Last but not least."
    ]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Add the 'Word_Count' column
df = add_word_count_column(df)
print("\nDataFrame with 'Word_Count' column:")
print(df)
```

### Explanation

- **Define `count_words` Function**: This function takes a text string, splits it into words using `split()`, and returns the number of words. It handles missing values by returning 0 if the text is `NaN`.
- **Apply Function to `Text` Column**: The `apply()` method is used to apply the `count_words` function to each element in the `Text` column.
- **Assign to `Word_Count`**: The result is assigned to the new `Word_Count` column in the DataFrame.

### Full Example

Here's the full example including the DataFrame creation and function execution:

```python
import pandas as pd

def add_word_count_column(df):
    # Define a function to count the number of words in a string
    def count_words(text):
        if pd.isna(text):
            return 0
        return len(text.split())

    # Apply the count_words function to the 'Text' column
    df['Word_Count'] = df['Text'].apply(count_words)
    return df

# Example usage
data = {
    'Text': [
        "This is the first sentence.",
        "Here is another sentence.",
        "And a third one.",
        "Last but not least."
    ]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Add the 'Word_Count' column
df = add_word_count_column(df)
print("\nDataFrame with 'Word_Count' column:")
print(df)
```

When you run this code, the output will be:

```
Original DataFrame:
                        Text
0  This is the first sentence.
1       Here is another sentence.
2                And a third one.
3           Last but not least.

DataFrame with 'Word_Count' column:
                        Text  Word_Count
0  This is the first sentence.           5
1       Here is another sentence.           4
2                And a third one.           4
3           Last but not least.           4
```



Q5. How are DataFrame.size() and DataFrame.shape() different?

`DataFrame.size` and `DataFrame.shape` are two attributes in Pandas that provide information about the dimensions and the size of the DataFrame, but they offer different pieces of information:

### `DataFrame.size`

- **Description**: Returns the number of elements in the DataFrame.
- **Calculation**: It is equivalent to the product of the number of rows and the number of columns.
- **Type**: It returns a single integer value.
- **Usage**: It's useful when you want to know the total number of elements in the DataFrame, regardless of its shape.

### Example

```python
import pandas as pd

data = {
    'A': [10, 20, 30],
    'B': [40, 50, 60],
    'C': [70, 80, 90]
}

df = pd.DataFrame(data)
print("DataFrame:")
print(df)

# Size of the DataFrame
print("\nSize of the DataFrame:", df.size)
```

### Output

```
DataFrame:
    A   B   C
0  10  40  70
1  20  50  80
2  30  60  90

Size of the DataFrame: 9
```

### `DataFrame.shape`

- **Description**: Returns a tuple representing the dimensionality of the DataFrame.
- **Calculation**: The tuple contains two values: the number of rows and the number of columns.
- **Type**: It returns a tuple of integers.
- **Usage**: It's useful when you need to know the structure of the DataFrame, such as the number of rows and columns.

### Example

```python
import pandas as pd

data = {
    'A': [10, 20, 30],
    'B': [40, 50, 60],
    'C': [70, 80, 90]
}

df = pd.DataFrame(data)
print("DataFrame:")
print(df)

# Shape of the DataFrame
print("\nShape of the DataFrame:", df.shape)
```

### Output

```
DataFrame:
    A   B   C
0  10  40  70
1  20  50  80
2  30  60  90

Shape of the DataFrame: (3, 3)
```

### Summary

- **`DataFrame.size`**: Provides the total number of elements in the DataFrame (product of rows and columns). Returns an integer.
- **`DataFrame.shape`**: Provides the number of rows and columns as a tuple. Returns a tuple (rows, columns).

### Combined Example

```python
import pandas as pd

data = {
    'A': [10, 20, 30],
    'B': [40, 50, 60],
    'C': [70, 80, 90]
}

df = pd.DataFrame(data)
print("DataFrame:")
print(df)

# Size of the DataFrame
print("\nSize of the DataFrame:", df.size)

# Shape of the DataFrame
print("\nShape of the DataFrame:", df.shape)
```

### Output

```
DataFrame:
    A   B   C
0  10  40  70
1  20  50  80
2  30  60  90

Size of the DataFrame: 9

Shape of the DataFrame: (3, 3)
```



Q6. Which function of pandas do we use to read an excel file?

To read an Excel file into a Pandas DataFrame, you can use the `pandas.read_excel()` function. This function is highly versatile and allows you to read data from Excel files with various configurations, such as specifying the sheet name, selecting specific columns, and more.

### Basic Usage

Here is a basic example of how to use `pandas.read_excel()`:

```python
import pandas as pd

# Reading an Excel file
df = pd.read_excel('file_path.xlsx')

# Display the DataFrame
print(df)
```

### Advanced Usage

The `read_excel()` function has several parameters that you can use to customize how the Excel file is read. Here are some of the most commonly used parameters:

- `io`: The file path or file-like object of the Excel file.
- `sheet_name`: The sheet to read. You can use the sheet name as a string or the sheet index (0-based). You can also use `None` to read all sheets.
- `header`: The row number(s) to use as the column names.
- `names`: The list of column names to use.
- `usecols`: The columns to read.
- `skiprows`: The number of rows to skip at the beginning.
- `nrows`: The number of rows to read.

### Example with Parameters

Here's an example that demonstrates some of these parameters:

```python
import pandas as pd

# Reading specific sheet and columns from an Excel file
df = pd.read_excel(
    'file_path.xlsx',
    sheet_name='Sheet1',
    usecols=['A', 'C', 'E'],
    skiprows=1,
    nrows=10
)

# Display the DataFrame
print(df)
```

### Reading Multiple Sheets

You can also read multiple sheets by passing a list of sheet names or `None` to read all sheets. The result will be a dictionary with sheet names as keys and DataFrames as values.

```python
import pandas as pd

# Reading all sheets
all_sheets = pd.read_excel('file_path.xlsx', sheet_name=None)

# Display the DataFrame of a specific sheet
print(all_sheets['Sheet1'])
```

### Summary

- Use `pandas.read_excel()` to read an Excel file into a Pandas DataFrame.
- Customize the reading process with various parameters like `sheet_name`, `usecols`, `skiprows`, etc.
- You can read specific sheets, multiple sheets, or the entire Excel file based on your needs.

This function is part of the Pandas library and requires the `openpyxl` or `xlrd` package to be installed for reading `.xlsx` and `.xls` files, respectively. Make sure you have the appropriate package installed:

```bash
pip install openpyxl xlrd
```

Here is a complete example:

```python
import pandas as pd

# Reading an Excel file
df = pd.read_excel('example.xlsx', sheet_name='Sheet1')

# Display the DataFrame
print(df)
```

Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email
addresses in the format 'username@domain.com'. Write a Python function that creates a new column
'Username' in df that contains only the username part of each email address.
The username is the part of the email address that appears before the '@' symbol. For example, if the
email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your
function should extract the username from each email address and store it in the new 'Username'
column.

To create a new column `Username` in a Pandas DataFrame that contains only the username part of each email address, you can follow these steps:

1. Define a function to extract the username from an email address.
2. Apply this function to the `Email` column to create the new `Username` column.

Here is the function to achieve this:

```python
import pandas as pd

def add_username_column(df):
    # Define a function to extract the username from an email address
    def extract_username(email):
        return email.split('@')[0]

    # Apply the extract_username function to the 'Email' column
    df['Username'] = df['Email'].apply(extract_username)
    return df

# Example usage
data = {
    'Email': [
        'john.doe@example.com',
        'jane.smith@domain.org',
        'user123@service.net',
        'info@company.com'
    ]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Add the 'Username' column
df = add_username_column(df)
print("\nDataFrame with 'Username' column:")
print(df)
```

### Explanation

- **Define `extract_username` Function**: This function takes an email address as input, splits the string at the `@` symbol, and returns the part before the `@`.
- **Apply Function to `Email` Column**: The `apply()` method is used to apply the `extract_username` function to each element in the `Email` column.
- **Assign to `Username`**: The result is assigned to the new `Username` column in the DataFrame.

### Full Example

Here is the complete example including the DataFrame creation and function execution:

```python
import pandas as pd

def add_username_column(df):
    # Define a function to extract the username from an email address
    def extract_username(email):
        return email.split('@')[0]

    # Apply the extract_username function to the 'Email' column
    df['Username'] = df['Email'].apply(extract_username)
    return df

# Example usage
data = {
    'Email': [
        'john.doe@example.com',
        'jane.smith@domain.org',
        'user123@service.net',
        'info@company.com'
    ]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Add the 'Username' column
df = add_username_column(df)
print("\nDataFrame with 'Username' column:")
print(df)
```

### Output

```
Original DataFrame:
                   Email
0    john.doe@example.com
1    jane.smith@domain.org
2      user123@service.net
3        info@company.com

DataFrame with 'Username' column:
                   Email   Username
0    john.doe@example.com   john.doe
1    jane.smith@domain.org jane.smith
2      user123@service.net   user123
3        info@company.com     info
```


Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.
For example, if df contains the following values:
A B C
0 3 5 1
1 8 2 7
2 6 9 4
3 2 3 5
4 9 1 2

To select rows from a Pandas DataFrame `df` where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10, you can use boolean indexing. This involves creating a boolean condition and applying it to the DataFrame to filter the rows.

Here’s a Python function that performs this operation:

```python
import pandas as pd

def filter_rows(df):
    # Create the condition for selecting rows
    condition = (df['A'] > 5) & (df['B'] < 10)
    
    # Apply the condition to the DataFrame to get the filtered rows
    filtered_df = df[condition]
    
    return filtered_df

# Example usage
data = {
    'A': [3, 8, 6, 2, 9],
    'B': [5, 2, 9, 3, 1],
    'C': [1, 7, 4, 5, 2]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Get the filtered DataFrame
filtered_df = filter_rows(df)
print("\nFiltered DataFrame:")
print(filtered_df)
```

### Explanation

- **Create the Condition**: `condition = (df['A'] > 5) & (df['B'] < 10)` creates a boolean Series where each row is `True` if the condition is met and `False` otherwise.
  - `df['A'] > 5` checks if values in column 'A' are greater than 5.
  - `df['B'] < 10` checks if values in column 'B' are less than 10.
  - `&` is the logical "AND" operator used to combine these two conditions.
  
- **Apply the Condition**: `df[condition]` uses the boolean Series to filter the rows of the DataFrame, returning only those that satisfy both conditions.

### Full Example

Here's the complete example including the DataFrame creation and function execution:

```python
import pandas as pd

def filter_rows(df):
    # Create the condition for selecting rows
    condition = (df['A'] > 5) & (df['B'] < 10)
    
    # Apply the condition to the DataFrame to get the filtered rows
    filtered_df = df[condition]
    
    return filtered_df

# Example usage
data = {
    'A': [3, 8, 6, 2, 9],
    'B': [5, 2, 9, 3, 1],
    'C': [1, 7, 4, 5, 2]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Get the filtered DataFrame
filtered_df = filter_rows(df)
print("\nFiltered DataFrame:")
print(filtered_df)
```

### Output

```
Original DataFrame:
   A  B  C
0  3  5  1
1  8  2  7
2  6  9  4
3  2  3  5
4  9  1  2

Filtered DataFrame:
   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2
```


In [1]:
import pandas as pd

def filter_rows(df):
    # Create the condition for selecting rows
    condition = (df['A'] > 5) & (df['B'] < 10)
    
    # Apply the condition to the DataFrame to get the filtered rows
    filtered_df = df[condition]
    
    return filtered_df

# Example usage
data = {
    'A': [3, 8, 6, 2, 9],
    'B': [5, 2, 9, 3, 1],
    'C': [1, 7, 4, 5, 2]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Get the filtered DataFrame
filtered_df = filter_rows(df)
print("\nFiltered DataFrame:")
print(filtered_df)

Original DataFrame:
   A  B  C
0  3  5  1
1  8  2  7
2  6  9  4
3  2  3  5
4  9  1  2

Filtered DataFrame:
   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2
