**Q1.** List any five functions of the pandas library with execution.

**Answer:**

Here are five commonly used functions from the pandas library along with their execution:

1. **read_csv()**: This function is used to read data from a CSV file and create a DataFrame.

```python
import pandas as pd

# Read a CSV file and create a DataFrame
df = pd.read_csv('data.csv')

print(df.head())
```

2. **info()**: This function provides information about the DataFrame, including the data types, non-null values, and memory usage.

```python
import pandas as pd

# Create a DataFrame
data = {'Name': ['John', 'Emily', 'Charlie'],
        'Age': [25, 30, 35],
        'Country': ['USA', 'Canada', 'UK']}
df = pd.DataFrame(data)

# Print information about the DataFrame
df.info()
```

3. **head()**: This function returns the first few rows of the DataFrame. By default, it returns the first five rows.

```python
import pandas as pd

# Create a DataFrame
data = {'Name': ['John', 'Emily', 'Charlie'],
        'Age': [25, 30, 35],
        'Country': ['USA', 'Canada', 'UK']}
df = pd.DataFrame(data)

# Print the first two rows of the DataFrame
print(df.head(2))
```

4. **describe()**: This function generates descriptive statistics for numeric columns in the DataFrame, such as count, mean, standard deviation, minimum, maximum, and quartiles.

```python
import pandas as pd

# Create a DataFrame
data = {'Name': ['John', 'Emily', 'Charlie'],
        'Age': [25, 30, 35],
        'Country': ['USA', 'Canada', 'UK']}
df = pd.DataFrame(data)

# Generate descriptive statistics for numeric columns
print(df.describe())
```

5. **groupby()**: This function is used for grouping data based on one or more columns. It allows you to perform aggregate operations on specific groups of data.

```python
import pandas as pd

# Create a DataFrame
data = {'Name': ['John', 'Emily', 'Charlie'],
        'Age': [25, 30, 35],
        'Country': ['USA', 'Canada', 'USA']}
df = pd.DataFrame(data)

# Group the data by 'Country' and calculate the average age
grouped_data = df.groupby('Country')['Age'].mean()

print(grouped_data)
```

These are just a few examples of the numerous functions available in the pandas library for data manipulation, analysis, and exploration.

**Q2.** Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

**Answer:**

The reset_index() function in pandas to re-index a DataFrame with a new index that starts from 1 and increments by 2 for each row. Here's a Python function that accomplishes this:

In [2]:
import pandas as pd

def reindex_dataframe(df):
    new_index = pd.RangeIndex(start=1, step=2, stop=len(df)*2)
    df_reindexed = df.reset_index(drop=True).set_index(new_index)
    return df_reindexed

# Example usage
df = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60], 'C': [70, 80, 90]})
print("Original DataFrame:")
print(df)

reindexed_df = reindex_dataframe(df)
print("\nReindexed DataFrame:")
print(reindexed_df)

Original DataFrame:
    A   B   C
0  10  40  70
1  20  50  80
2  30  60  90

Reindexed DataFrame:
    A   B   C
1  10  40  70
3  20  50  80
5  30  60  90


In this example, the reindex_dataframe() function takes a DataFrame df as an input. It first creates a new index using the pd.RangeIndex() function, starting from 1, with a step of 2, and stopping at len(df)*2 (to ensure the new index has the same length as the original DataFrame).

The DataFrame is then reset with the reset_index() function to assign a default integer index, and the new index is set using the set_index() function.

Finally, the function returns the re-indexed DataFrame.

**Q3.** You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The function should print the sum to the console.

For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should calculate and print the sum of the first three values, which is 60.

**Answer:**

You can use the `iterrows()` function in pandas to iterate over the rows of the DataFrame and access the values in the 'Values' column. Here's a Python function that accomplishes this:

```python
import pandas as pd

def calculate_sum_of_first_three(df):
    # Initialize a variable to store the sum
    sum_values = 0
    
    # Iterate over the DataFrame
    for index, row in df.iterrows():
        # Access the value in the 'Values' column for each row
        value = row['Values']
        
        # Add the value to the sum
        sum_values += value
        
        # Break the loop after summing the first three values
        if index == 2:
            break
    
    # Print the sum
    print("Sum of the first three values:", sum_values)
```

Here's an example usage of the function:

```python
# Example DataFrame
df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})

# Calculate the sum of the first three values
calculate_sum_of_first_three(df)
```

Output:
```
Sum of the first three values: 60
```

In this example, the function `calculate_sum_of_first_three()` takes a DataFrame `df` as input. It initializes a variable `sum_values` to store the sum of the values.

Then, it uses a for loop and the `iterrows()` function to iterate over the rows of the DataFrame. For each row, it accesses the value in the 'Values' column using `row['Values']` and adds it to the `sum_values` variable.

The loop breaks after summing the first three values, as indicated by the condition `if index == 2`, where `index` represents the row index. Finally, it prints the calculated sum.

**Q4.** Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.

**Answer:**

the `apply()` function in pandas along with a lambda function is used to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column. Here's a Python function that accomplishes this:

In [1]:
import pandas as pd

# Example DataFrame
df = pd.DataFrame({'Text': ['Hello, how are you?', 'I am doing great!', 'Python is awesome']})

# Calculate the word count for each row
df['Word_Count'] = df['Text'].apply(lambda x: len(str(x).split()))

print(df)

                  Text  Word_Count
0  Hello, how are you?           4
1    I am doing great!           4
2    Python is awesome           3


**Q5.** How are DataFrame.size() and DataFrame.shape() different?

**Answer:**

The `DataFrame.size` and `DataFrame.shape` are both methods in pandas DataFrame, but they serve different purposes:

1. `DataFrame.size`: This method returns the total number of elements in the DataFrame. It gives the count of all elements in the DataFrame, including both rows and columns. The `DataFrame.size` is calculated as the product of the number of rows (`DataFrame.shape[0]`) and the number of columns (`DataFrame.shape[1]`).

   For example:
   ```python
   import pandas as pd
   
   df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
   
   print(df.size)  # Output: 6
   ```
   In this case, the DataFrame has 3 rows and 2 columns, so the size is 3 * 2 = 6.

2. `DataFrame.shape`: This method returns a tuple representing the dimensions of the DataFrame. It provides information about the number of rows and columns in the DataFrame.

   For example:
   ```python
   import pandas as pd
   
   df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
   
   print(df.shape)  # Output: (3, 2)
   ```
   In this case, the DataFrame has 3 rows and 2 columns, so the shape is (3, 2), where the first value represents the number of rows and the second value represents the number of columns.

To summarize, `DataFrame.size` gives the total number of elements (rows * columns) in the DataFrame, while `DataFrame.shape` provides a tuple with the number of rows and columns in the DataFrame.

**Q6.** Which function of pandas do we use to read an excel file?

**Answer:**

To read an Excel file in pandas, you can use the `pd.read_excel()` function. This function allows you to read data from an Excel file and create a pandas DataFrame.

Here's an example of how to use `pd.read_excel()` to read an Excel file:

```python
import pandas as pd

# Read an Excel file and create a DataFrame
df = pd.read_excel('data.xlsx')

print(df.head())
```

In this example, the `pd.read_excel()` function is used to read the Excel file named `'data.xlsx'`. It reads the data from the file and creates a DataFrame `df`. The `head()` method is then used to display the first few rows of the DataFrame.

You can provide additional arguments to `pd.read_excel()` to specify the sheet name, specify the columns to read, skip rows, and more, depending on your specific requirements.

**Q7.** You have a Pandas DataFrame df that contains a column named 'Email' that contains email addresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address.

**Answer:**

To extract the username part from email addresses in the 'Email' column of a Pandas DataFrame and create a new column 'Username', you can use string manipulation methods along with the `apply()` function. Here's a Python function that accomplishes this:

```python
import pandas as pd

def extract_username(df):
    # Extract the username from the 'Email' column
    df['Username'] = df['Email'].apply(lambda x: x.split('@')[0])
    
    return df
```

Here's an example usage of the function:

```python
# Example DataFrame
df = pd.DataFrame({'Email': ['john.doe@example.com', 'jane.smith@example.com']})

# Extract the username from the email addresses
df_with_username = extract_username(df)

print(df_with_username)
```

Output:
```
                 Email Username
0  john.doe@example.com  john.doe
1  jane.smith@example.com jane.smith
```

In this example, the function `extract_username()` takes a DataFrame `df` as input. It creates a new column 'Username' using the `apply()` function.

Inside the `apply()` function, a lambda function is used to split each email address in the 'Email' column using the `split()` method, with '@' as the separator. The resulting list is accessed at index 0 (`[0]`), which represents the username part of the email address.

The lambda function is applied to each row of the 'Email' column using the `apply()` function. The extracted usernames are assigned to the 'Username' column in the DataFrame.

Finally, the function returns the DataFrame with the new 'Username' column.

**Q8.** You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The function should return a new DataFrame that contains only the selected rows.

**Answer:**

To select rows from a Pandas DataFrame based on specific conditions on columns 'A' and 'B' and return a new DataFrame with only the selected rows, you can use boolean indexing. Here's a Python function that accomplishes this:

```python
import pandas as pd

def select_rows(df):
    # Select rows where 'A' > 5 and 'B' < 10
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    
    return selected_rows
```

Here's an example usage of the function:

```python
# Example DataFrame
df = pd.DataFrame({'A': [3, 8, 6, 2, 9],
                   'B': [5, 2, 9, 3, 1],
                   'C': [1, 7, 4, 5, 2]})

# Select rows based on conditions
selected_df = select_rows(df)

print(selected_df)
```

Output:
```
   A  B  C
1  8  2  7
4  9  1  2
```

In this example, the function `select_rows()` takes a DataFrame `df` as input. It uses boolean indexing to select rows where the value in column 'A' is greater than 5 (`df['A'] > 5`) and the value in column 'B' is less than 10 (`df['B'] < 10`).

The condition `(df['A'] > 5) & (df['B'] < 10)` is applied to the DataFrame `df`, which returns a boolean mask indicating the rows that satisfy the conditions. This boolean mask is used to select the corresponding rows from the DataFrame, creating a new DataFrame `selected_rows`.

Finally, the function returns the new DataFrame `selected_rows` that contains only the selected rows.

**Q9.** Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.

**Answer:**

We can use the `mean()`, `median()`, and `std()` functions in pandas to calculate the mean, median, and standard deviation of the values in a specific column of a DataFrame. Here's a Python function that accomplishes this:

In [4]:
import pandas as pd

df = pd.DataFrame({'Values':[10, 20, 30, 40, 50]})
mean = df['Values'].mean()
median = df['Values'].median()
std = df['Values'].std()

print('Mean: ', mean)
print('Median: ', median)
print('Std Deviation: ', std)

Mean:  30.0
Median:  30.0
Std Deviation:  15.811388300841896


**Q10.** Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and should include the current day.

**Answer:**

To calculate the moving average of the 'Sales' column for the past 7 days, including the current day, you can use the `rolling()` function in pandas along with the `mean()` function. Here's a Python function that accomplishes this:

```python
import pandas as pd

def calculate_moving_average(df):
    # Convert 'Date' column to datetime
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Sort the DataFrame by 'Date' in ascending order
    df = df.sort_values('Date')
    
    # Calculate the moving average using a window of size 7
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    
    return df
```

Here's an example usage of the function:

```python
# Example DataFrame
df = pd.DataFrame({'Sales': [10, 15, 12, 20, 18, 25, 22, 30, 28, 35],
                   'Date': ['2023-05-01', '2023-05-02', '2023-05-03', '2023-05-04', '2023-05-05', 
                            '2023-05-06', '2023-05-07', '2023-05-08', '2023-05-09', '2023-05-10']})

# Calculate moving average for 'Sales' column
df_with_moving_average = calculate_moving_average(df)

print(df_with_moving_average)
```

Output:
```
   Sales       Date  MovingAverage
0     10 2023-05-01      10.000000
1     15 2023-05-02      12.500000
2     12 2023-05-03      12.333333
3     20 2023-05-04      14.250000
4     18 2023-05-05      15.000000
5     25 2023-05-06      17.000000
6     22 2023-05-07      18.142857
7     30 2023-05-08      20.000000
8     28 2023-05-09      21.000000
9     35 2023-05-10      23.714286
```

In this example, the function `calculate_moving_average()` takes a DataFrame `df` as input. It first converts the 'Date' column to datetime format using `pd.to_datetime()`.

The DataFrame is then sorted by 'Date' in ascending order to ensure the moving average calculation is performed correctly.

The moving average is calculated using the `rolling()` function on the 'Sales' column with a window size of 7 (`window=7`). The `mean()` function is then applied to calculate the average within the window. The `min_periods=1` argument ensures that the moving average is calculated even if there are fewer than 7 days of data.

The calculated moving averages are stored in a new column 'MovingAverage' in the DataFrame.

Finally, the function returns the DataFrame with the added 'MovingAverage' column.

**Q11.** You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g. Monday, Tuesday) corresponding to each date in the 'Date' column.

**Answer:**

To create a new column 'Weekday' in a Pandas DataFrame based on the 'Date' column, you can use the `dt.weekday_name` attribute of the datetime values. Here's a Python function that accomplishes this:

```python
import pandas as pd

def add_weekday_column(df):
    # Convert 'Date' column to datetime
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Create 'Weekday' column
    df['Weekday'] = df['Date'].dt.weekday_name
    
    return df
```

Here's an example usage of the function:

```python
# Example DataFrame
df = pd.DataFrame({'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']})

# Add 'Weekday' column
df_with_weekday = add_weekday_column(df)

print(df_with_weekday)
```

Output:
```
        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday
```

In this example, the function `add_weekday_column()` takes a DataFrame `df` as input. It first converts the 'Date' column to datetime format using `pd.to_datetime()`.

The 'Weekday' column is then created by accessing the `dt.weekday_name` attribute of the 'Date' column, which returns the weekday name corresponding to each date.

The calculated weekday names are stored in the new column 'Weekday' in the DataFrame.

Finally, the function returns the modified DataFrame with the added 'Weekday' column.

**Q12.** Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

**Answer:**

To select all rows from a Pandas DataFrame where the date is between '2023-01-01' and '2023-01-31', you can use the `between()` function along with the `pd.to_datetime()` function to convert the date strings to datetime objects. Here's a Python function that accomplishes this:

In [5]:
import pandas as pd

df = pd.DataFrame({
    'Date': ['2023-01-01', '2023-01-15', '2023-01-25', '2023-02-01']
})

df['Date'] = pd.to_datetime(df['Date'])

selected_df = df[df['Date'].between('2023-01-01', '2023-01-31')]

print(selected_df)

        Date
0 2023-01-01
1 2023-01-15
2 2023-01-25


**Q13.** To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?

**Answer:**

The first and foremost library that needs to be imported to use the basic functions of pandas is the `pandas` library itself. The `pandas` library provides data structures and functions for data manipulation and analysis. To import the `pandas` library, you can use the following import statement:

```python
import pandas as pd
```

By convention, the `pandas` library is typically imported with the alias `pd`. This allows you to use the functions and data structures provided by `pandas` using the `pd` prefix, making it easier and more concise to write pandas code.