# Q1. List any five functions of the pandas library with execution.

Certainly! Let's explore five common functions from the `pandas` library in Python with examples.

### 1. **`read_csv()`**: 
   This function is used to read a CSV file into a DataFrame.

```python
import pandas as pd

# Read CSV file into a DataFrame
df = pd.read_csv('sample_data.csv')
print(df.head())  # Display the first 5 rows of the DataFrame
```

### 2. **`head()`**: 
   This function returns the first `n` rows of a DataFrame. The default is 5 rows.

```python
# Display the first 5 rows of the DataFrame
df_head = df.head()
print(df_head)
```

### 3. **`describe()`**: 
   This function generates descriptive statistics for numeric columns in the DataFrame.

```python
# Get summary statistics of the DataFrame
df_description = df.describe()
print(df_description)
```

### 4. **`groupby()`**:
   This function is used to group data by one or more columns and perform aggregate functions.

```python
# Group by a column and calculate the mean of each group
grouped_df = df.groupby('column_name').mean()
print(grouped_df)
```

### 5. **`dropna()`**:
   This function is used to remove missing values (NaN) from a DataFrame.

```python
# Drop rows with missing values
df_cleaned = df.dropna()
print(df_cleaned)
```

Let me know if you'd like to see any specific details or additional functions!

# Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

Certainly! Below is a Python function that re-indexes a given DataFrame `df` with a new index that starts from 1 and increments by 2 for each row.

```python
import pandas as pd

def reindex_dataframe(df):
    # Create a new index starting from 1 and incrementing by 2
    new_index = range(1, 2*len(df) + 1, 2)
    
    # Re-index the DataFrame with the new index
    df_reindexed = df.copy()
    df_reindexed.index = new_index
    
    return df_reindexed

# Example usage
data = {'A': [10, 20, 30], 'B': [40, 50, 60], 'C': [70, 80, 90]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Re-index the DataFrame
df_reindexed = reindex_dataframe(df)
print("\nRe-indexed DataFrame:")
print(df_reindexed)
```

### Explanation:
- The `range(1, 2*len(df) + 1, 2)` generates a range of numbers starting from 1, incrementing by 2, and ending at `2*len(df) + 1` (not inclusive).
- The DataFrame is then re-indexed using this new index.

### Example Output:
```plaintext
Original DataFrame:
    A   B   C
0  10  40  70
1  20  50  80
2  30  60  90

Re-indexed DataFrame:
    A   B   C
1  10  40  70
3  20  50  80
5  30  60  90
```

This function will re-index any DataFrame to have its index start at 1 and increment by 2 for each subsequent row.

# Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The function should print the sum to the console.
For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should
calculate and print the sum of the first three values, which is 60.

Here’s a Python function that iterates over a DataFrame `df` and calculates the sum of the first three values in the 'Values' column. The function then prints the sum to the console.

```python
import pandas as pd

def sum_first_three_values(df):
    # Check if 'Values' column has at least 3 elements
    if len(df['Values']) >= 3:
        # Calculate the sum of the first three values in the 'Values' column
        sum_of_values = df['Values'].iloc[0] + df['Values'].iloc[1] + df['Values'].iloc[2]
        print(f"The sum of the first three values is: {sum_of_values}")
    else:
        print("The 'Values' column has fewer than 3 elements.")

# Example usage
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

sum_first_three_values(df)
```

### Explanation:
- The function checks if the 'Values' column contains at least three elements.
- It uses the `.iloc[]` method to access the first three elements by their positions (`iloc[0]`, `iloc[1]`, and `iloc[2]`).
- The sum of these three values is calculated and printed to the console.

### Example Output:
```plaintext
The sum of the first three values is: 60
```

This function will work as expected for any DataFrame that has at least three values in the 'Values' column.

# Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.

Here’s a Python function that adds a new column `'Word_Count'` to a DataFrame `df`. This new column will contain the number of words in each row of the `'Text'` column.

```python
import pandas as pd

def add_word_count_column(df):
    # Create a new column 'Word_Count' that contains the number of words in each row of 'Text'
    df['Word_Count'] = df['Text'].apply(lambda x: len(str(x).split()))
    return df

# Example usage
data = {'Text': ['Hello world', 'Pandas is great for data analysis', 'Python programming']}
df = pd.DataFrame(data)

# Add the 'Word_Count' column
df_with_word_count = add_word_count_column(df)
print(df_with_word_count)
```

### Explanation:
- The function uses the `.apply()` method to apply a lambda function to each row in the `'Text'` column.
- The lambda function converts each entry to a string (in case it's not already) and then splits it into words using `.split()`.
- The `len()` function counts the number of words in the resulting list and assigns this count to the `'Word_Count'` column.

### Example Output:
```plaintext
                          Text  Word_Count
0                  Hello world           2
1  Pandas is great for data analysis           6
2           Python programming           2
```

This function will correctly compute the number of words in each row of the `'Text'` column and store the result in the `'Word_Count'` column.

# Q5. How are DataFrame.size() and DataFrame.shape() different?

`DataFrame.size` and `DataFrame.shape` are both attributes of a Pandas DataFrame, but they serve different purposes:

### 1. **`DataFrame.size`**
- **Purpose**: Returns the total number of elements in the DataFrame.
- **Calculation**: It is calculated as the number of rows multiplied by the number of columns.
- **Type**: It returns a single integer value.
  
  **Example**:
  ```python
  import pandas as pd

  df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
  print(df.size)
  ```
  **Output**:
  ```plaintext
  6
  ```
  In this case, `df.size` is 6 because the DataFrame has 3 rows and 2 columns (3 * 2 = 6).

### 2. **`DataFrame.shape`**
- **Purpose**: Returns the dimensions of the DataFrame as a tuple.
- **Calculation**: It provides the number of rows and columns in the form `(number_of_rows, number_of_columns)`.
- **Type**: It returns a tuple with two elements.
  
  **Example**:
  ```python
  print(df.shape)
  ```
  **Output**:
  ```plaintext
  (3, 2)
  ```
  Here, `df.shape` returns `(3, 2)`, indicating the DataFrame has 3 rows and 2 columns.

### Summary:
- **`DataFrame.size`** gives the total number of elements (rows * columns).
- **`DataFrame.shape`** gives the structure of the DataFrame, indicating the number of rows and columns as a tuple `(rows, columns)`.

# Q6. Which function of pandas do we use to read an excel file?

To read an Excel file into a Pandas DataFrame, you use the `pandas` function **`read_excel()`**.

### Example Usage:

```python
import pandas as pd

# Read an Excel file into a DataFrame
df = pd.read_excel('file_name.xlsx')

# Display the DataFrame
print(df)
```

### Key Points:
- **File Path**: The `'file_name.xlsx'` is the path to the Excel file you want to read.
- **Sheet Name**: By default, `read_excel()` reads the first sheet in the Excel file. You can specify a different sheet using the `sheet_name` parameter.
- **Additional Options**: `read_excel()` has several optional parameters, such as `header`, `names`, `index_col`, etc., to customize the reading process.

### Example with Sheet Name:

```python
# Read a specific sheet by name
df = pd.read_excel('file_name.xlsx', sheet_name='Sheet2')
```

This function is versatile and works with `.xlsx`, `.xls`, and other Excel formats supported by Pandas.

# Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains emailaddresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address. The username is the part of the email address that appears before the '@' symbol. For example, if the email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your function should extract the username from each email address and store it in the new 'Username' column.

Here’s a Python function that creates a new column `'Username'` in the DataFrame `df`. This column will contain only the username part of each email address from the `'Email'` column.

```python
import pandas as pd

def extract_username(df):
    # Create a new column 'Username' by extracting the part of the email before '@'
    df['Username'] = df['Email'].apply(lambda x: x.split('@')[0])
    return df

# Example usage
data = {'Email': ['john.doe@example.com', 'jane.smith@domain.com', 'user123@website.org']}
df = pd.DataFrame(data)

# Extract usernames and add them to the 'Username' column
df_with_usernames = extract_username(df)
print(df_with_usernames)
```

### Explanation:
- The `.apply()` function is used to apply a lambda function to each element in the `'Email'` column.
- The `x.split('@')[0]` expression splits the email string at the `'@'` symbol and takes the first part (the username).

### Example Output:
```plaintext
                  Email    Username
0   john.doe@example.com   john.doe
1  jane.smith@domain.com  jane.smith
2    user123@website.org    user123
```

This function will correctly extract the username from each email address in the `'Email'` column and store it in the new `'Username'` column.

# Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The function should return a new DataFrame that contains only the selected rows. For example, if df contains the following values: 
  A B C
0 3 5 1
1 8 2 7
2 6 9 4
3 2 3 5
4 9 1 2
Your function should select the following rows: A B C
1 8 2 7
4 9 1 2
The function should return a new DataFrame that contains only the selected rows.

Here’s a Python function that selects all rows in a DataFrame `df` where the value in column `'A'` is greater than 5 and the value in column `'B'` is less than 10. The function returns a new DataFrame containing only the selected rows.

```python
import pandas as pd

def select_rows(df):
    # Select rows where 'A' > 5 and 'B' < 10
    selected_df = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_df

# Example usage
data = {'A': [3, 8, 6, 2, 9], 'B': [5, 2, 9, 3, 1], 'C': [1, 7, 4, 5, 2]}
df = pd.DataFrame(data)

# Apply the function to select specific rows
selected_df = select_rows(df)
print(selected_df)
```

### Explanation:
- The function uses a boolean condition to filter the DataFrame:
  - `df['A'] > 5`: Selects rows where the value in column `'A'` is greater than 5.
  - `df['B'] < 10`: Selects rows where the value in column `'B'` is less than 10.
- The `&` operator is used to combine both conditions.
- The filtered rows are returned as a new DataFrame `selected_df`.

### Example Output:
```plaintext
   A  B  C
1  8  2  7
4  9  1  2
```

This function will correctly select and return the rows where the specified conditions are met.

# Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.

Here’s a Python function that calculates the mean, median, and standard deviation of the values in the `'Values'` column of a DataFrame `df`.

```python
import pandas as pd

def calculate_statistics(df):
    # Calculate the mean of the 'Values' column
    mean_value = df['Values'].mean()
    
    # Calculate the median of the 'Values' column
    median_value = df['Values'].median()
    
    # Calculate the standard deviation of the 'Values' column
    std_dev = df['Values'].std()
    
    return mean_value, median_value, std_dev

# Example usage
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

mean_value, median_value, std_dev = calculate_statistics(df)

print(f"Mean: {mean_value}")
print(f"Median: {median_value}")
print(f"Standard Deviation: {std_dev}")
```

### Explanation:
- **Mean**: The average of the values in the `'Values'` column is calculated using `.mean()`.
- **Median**: The middle value of the sorted values in the `'Values'` column is calculated using `.median()`.
- **Standard Deviation**: The measure of the amount of variation or dispersion of the values in the `'Values'` column is calculated using `.std()`.

### Example Output:
```plaintext
Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896
```

This function will return the mean, median, and standard deviation of the values in the `'Values'` column, and you can print them out or use them in further analysis.

# Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and should include the current day.

Here’s a Python function that adds a new column `'MovingAverage'` to a DataFrame `df`. This column will contain the 7-day moving average of the `'Sales'` column, calculated for each row and including the current day.

```python
import pandas as pd

def add_moving_average(df):
    # Ensure the DataFrame is sorted by 'Date' if not already
    df = df.sort_values(by='Date')
    
    # Calculate the 7-day moving average of the 'Sales' column
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    
    return df

# Example usage
data = {
    'Date': pd.date_range(start='2023-01-01', periods=10, freq='D'),
    'Sales': [100, 120, 130, 110, 150, 160, 170, 180, 140, 200]
}
df = pd.DataFrame(data)

# Add the 'MovingAverage' column
df_with_moving_avg = add_moving_average(df)
print(df_with_moving_avg)
```

### Explanation:
- **Sorting by Date**: The DataFrame is first sorted by the `'Date'` column to ensure the moving average is calculated in the correct chronological order.
- **`rolling(window=7, min_periods=1).mean()`**:
  - `window=7`: Specifies the window size for the moving average (7 days).
  - `min_periods=1`: Ensures that the moving average can be calculated even if there are fewer than 7 days of data (e.g., at the start of the dataset).
  - `.mean()`: Calculates the mean of the values within the rolling window.

### Example Output:
```plaintext
        Date  Sales  MovingAverage
0 2023-01-01    100     100.000000
1 2023-01-02    120     110.000000
2 2023-01-03    130     116.666667
3 2023-01-04    110     115.000000
4 2023-01-05    150     122.000000
5 2023-01-06    160     128.333333
6 2023-01-07    170     134.285714
7 2023-01-08    180     145.714286
8 2023-01-09    140     148.571429
9 2023-01-10    200     161.428571
```

This function will compute the 7-day moving average of sales and store the results in the `'MovingAverage'` column, considering the past 7 days, including the current day for each row.

# Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g. Monday, Tuesday) corresponding to each date in the 'Date' column. For example, if df contains the following values: 
Date
0 2023-01-01
1 2023-01-02
2 2023-01-03
3 2023-01-04
4 2023-01-05
Your function should create the following DataFrame:

Date Weekday
0 2023-01-01 Sunday
1 2023-01-02 Monday
2 2023-01-03 Tuesday
3 2023-01-04 Wednesday
4 2023-01-05 Thursday
The function should return the modified DataFrame.

Here’s a Python function that adds a new column `'Weekday'` to a DataFrame `df`. This column will contain the name of the weekday corresponding to each date in the `'Date'` column.

```python
import pandas as pd

def add_weekday_column(df):
    # Convert 'Date' column to datetime format if it's not already
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Create the 'Weekday' column with the weekday names
    df['Weekday'] = df['Date'].dt.day_name()
    
    return df

# Example usage
data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']}
df = pd.DataFrame(data)

# Add the 'Weekday' column
df_with_weekday = add_weekday_column(df)
print(df_with_weekday)
```

### Explanation:
- **`pd.to_datetime(df['Date'])`**: Converts the `'Date'` column to a datetime format, which is necessary for extracting the weekday names.
- **`df['Date'].dt.day_name()`**: Extracts the weekday name from each date in the `'Date'` column.

### Example Output:
```plaintext
        Date    Weekday
0 2023-01-01    Sunday
1 2023-01-02    Monday
2 2023-01-03   Tuesday
3 2023-01-04 Wednesday
4 2023-01-05  Thursday
```

This function will add the `'Weekday'` column to the DataFrame, providing the name of the weekday for each date.

# Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

Here's a Python function that selects all rows in a DataFrame `df` where the date in the `'Date'` column falls between `'2023-01-01'` and `'2023-01-31'`.

```python
import pandas as pd

def select_date_range(df):
    # Convert 'Date' column to datetime format if it's not already
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Define the start and end date for the filter
    start_date = '2023-01-01'
    end_date = '2023-01-31'
    
    # Filter the DataFrame to select rows within the date range
    filtered_df = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]
    
    return filtered_df

# Example usage
data = {'Date': ['2023-01-01', '2023-01-15', '2023-02-01', '2023-01-25', '2023-03-01']}
df = pd.DataFrame(data)

# Select rows within the date range
filtered_df = select_date_range(df)
print(filtered_df)
```

### Explanation:
- **`pd.to_datetime(df['Date'])`**: Converts the `'Date'` column to datetime format to ensure proper comparison.
- **`start_date` and `end_date`**: Define the date range for filtering.
- **Filtering**: The condition `(df['Date'] >= start_date) & (df['Date'] <= end_date)` selects rows where the date is within the specified range.

### Example Output:
```plaintext
        Date
0 2023-01-01
1 2023-01-15
3 2023-01-25
```

This function filters the DataFrame to include only rows where the `'Date'` falls between `'2023-01-01'` and `'2023-01-31'`.

# Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?

To use the basic functions of pandas, the first and foremost necessary library that needs to be imported is **`pandas`** itself. 

You can import it using:

```python
import pandas as pd
```

This import statement makes the `pandas` library available in your script, and `pd` is a commonly used alias for pandas, allowing you to access its functions and data structures more conveniently. For example, you can then use functions like `pd.DataFrame()`, `pd.read_csv()`, and `pd.Series()`.