**Q1. List any five functions of the pandas library with execution.**

Sure, I can provide you with a list of five common functions in the pandas library along with their execution examples. Make sure you have the pandas library installed before running the code. You can install it using the following command:

```bash
pip install pandas
```

Now, let's look at five functions:

1. **`read_csv()`**: This function is used to read data from a CSV file into a DataFrame.

```python
import pandas as pd

# Example CSV file: data.csv
# Name, Age, City
# Alice, 25, New York
# Bob, 30, San Francisco

# Reading CSV into a DataFrame
df = pd.read_csv('data.csv')
print(df)
```

2. **`head()`**: This function is used to display the first n rows of a DataFrame. By default, it shows the first 5 rows.

```python
# Displaying the first 3 rows of the DataFrame
print(df.head(3))
```

3. **`info()`**: This function provides a concise summary of a DataFrame, including the data types, non-null values, and memory usage.

```python
# Displaying information about the DataFrame
print(df.info())
```

4. **`describe()`**: This function generates descriptive statistics of the DataFrame, including measures of central tendency, dispersion, and shape of the distribution.

```python
# Displaying descriptive statistics of the DataFrame
print(df.describe())
```

5. **`groupby()`**: This function is used to split the data into groups based on some criteria and then apply a function to each group independently.

```python
# Grouping the DataFrame by the 'City' column and calculating the mean age in each city
grouped_data = df.groupby('City')['Age'].mean()
print(grouped_data)
```

These are just a few examples, and pandas has a wide range of functions for data manipulation, analysis, and visualization.

**Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.**

Certainly! You can use the `set_index` function in pandas to achieve this. Here's a Python function that takes a DataFrame as input and re-indexes it with a new index starting from 1 and incrementing by 2 for each row:

```python
import pandas as pd

def reindex_dataframe(df):
    # Resetting the index and creating a new index starting from 1 with an increment of 2
    new_index = range(1, 2 * len(df) + 1, 2)
    
    # Setting the new index to the DataFrame
    df_reindexed = df.set_index(pd.Index(new_index))
    
    return df_reindexed

# Example usage:
# Assuming df is your original DataFrame with columns 'A', 'B', and 'C'
data = {'A': [1, 2, 3],
        'B': [4, 5, 6],
        'C': [7, 8, 9]}

df = pd.DataFrame(data)

# Calling the function to re-index the DataFrame
df_reindexed = reindex_dataframe(df)

# Displaying the re-indexed DataFrame
print(df_reindexed)
```

In this example, the new index starts from 1 and increments by 2 for each row. Adjust the function according to your specific DataFrame structure and requirements.

**Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.**

For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should
calculate and print the sum of the first three values, which is 60.

Certainly! You can use the `head()` function to select the first three values in the 'Values' column and then calculate their sum. Here's a Python function that does this:

```python
import pandas as pd

def calculate_sum_of_first_three(df):
    # Selecting the first three values in the 'Values' column
    first_three_values = df['Values'].head(3)
    
    # Calculating the sum
    sum_of_first_three = first_three_values.sum()
    
    # Printing the sum to the console
    print("Sum of the first three values:", sum_of_first_three)

# Example usage:
# Assuming df is your DataFrame with the 'Values' column
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Calling the function to calculate and print the sum of the first three values
calculate_sum_of_first_three(df)
```

This function first extracts the first three values in the 'Values' column using `head(3)` and then calculates their sum using the `sum()` function. Finally, it prints the result to the console. Adjust the function according to your specific DataFrame structure if needed.

Certainly! You can achieve this by using the `apply` function along with a custom function to count the number of words in each row of the 'Text' column. Here's a Python function that adds a new column 'Word_Count' to the DataFrame with the word count for each row:

```python
import pandas as pd

def add_word_count_column(df):
    # Custom function to count words in a given text
    def count_words(text):
        words = text.split()
        return len(words)
    
    # Applying the custom function to each row of the 'Text' column and creating a new 'Word_Count' column
    df['Word_Count'] = df['Text'].apply(count_words)
    
    return df

# Example usage:
# Assuming df is your DataFrame with the 'Text' column
data = {'Text': ['This is an example.', 'Count the words in this sentence.', 'Pandas is powerful.']}
df = pd.DataFrame(data)

# Calling the function to add the 'Word_Count' column
df_with_word_count = add_word_count_column(df)

# Displaying the DataFrame with the new 'Word_Count' column
print(df_with_word_count)
```

This function defines a custom `count_words` function, which splits a given text into words and returns the count. Then, it uses the `apply` function to apply this custom function to each row in the 'Text' column, creating a new 'Word_Count' column in the DataFrame. Adjust the function according to your specific DataFrame structure if needed.

**Q5. How are DataFrame.size() and DataFrame.shape() different?**

`DataFrame.size` and `DataFrame.shape` are both attributes in pandas that provide information about the dimensions of a DataFrame, but they serve different purposes:

1. **`DataFrame.size`**:
   - **Type**: Attribute
   - **Description**: Returns the total number of elements in the DataFrame, which is equal to the product of the number of rows and columns.
   - **Usage**: `df.size`

   Example:
   ```python
   import pandas as pd

   data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
   df = pd.DataFrame(data)

   size_of_df = df.size
   print("Size of DataFrame:", size_of_df)
   ```

2. **`DataFrame.shape`**:
   - **Type**: Attribute
   - **Description**: Returns a tuple representing the dimensions of the DataFrame in the form `(number of rows, number of columns)`.
   - **Usage**: `df.shape`

   Example:
   ```python
   import pandas as pd

   data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
   df = pd.DataFrame(data)

   shape_of_df = df.shape
   print("Shape of DataFrame:", shape_of_df)
   ```

In summary:
- `DataFrame.size` provides the total number of elements (cells) in the DataFrame.
- `DataFrame.shape` provides the dimensions of the DataFrame as a tuple, with the number of rows and columns.

Remember, both of these are attributes, not methods, so you access them without using parentheses.

**Q6. Which function of pandas do we use to read an excel file?**

To read an Excel file in pandas, you can use the `read_excel()` function. This function is part of the pandas library and is specifically designed to read data from Excel files into a DataFrame.

Here's a basic example:

```python
import pandas as pd

# Replace 'your_excel_file.xlsx' with the actual path or URL to your Excel file
excel_file_path = 'your_excel_file.xlsx'

# Reading Excel file into a DataFrame
df = pd.read_excel(excel_file_path)

# Displaying the DataFrame
print(df)
```

Make sure to install the `openpyxl` library if you haven't already, as it is the default engine for reading and writing Excel files in pandas. You can install it using:

```bash
pip install openpyxl
```

If your Excel file is in an older format (`.xls`), you might need to use the `xlrd` engine, which you can install using:

```bash
pip install xlrd
```

In the `read_excel()` function, you provide the path or URL to your Excel file, and it returns a DataFrame containing the data from the specified Excel sheet.

**Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email
addresses in the format 'username@domain.com'. Write a Python function that creates a new column
'Username' in df that contains only the username part of each email address.**


The username is the part of the email address that appears before the '@' symbol. For example, if the
email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your
function should extract the username from each email address and store it in the new 'Username'
column.

Certainly! You can use the `str.split()` method to split the email addresses based on the '@' symbol and then extract the username part. Here's a Python function that does this:

```python
import pandas as pd

def extract_username(df):
    # Extracting the username from the 'Email' column and creating a new 'Username' column
    df['Username'] = df['Email'].str.split('@').str[0]
    
    return df

# Example usage:
# Assuming df is your DataFrame with the 'Email' column
data = {'Email': ['john.doe@example.com', 'alice.smith@example.com', 'bob.jones@example.com']}
df = pd.DataFrame(data)

# Calling the function to add the 'Username' column
df_with_username = extract_username(df)

# Displaying the DataFrame with the new 'Username' column
print(df_with_username)
```

This function uses the `str.split('@')` to split each email address into two parts based on the '@' symbol, and then `str[0]` is used to select the first part, which is the username. The result is stored in a new 'Username' column in the DataFrame.

Adjust the function according to your specific DataFrame structure if needed.

**Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The function should return a new DataFrame that contains only the selected rows.
For example, if df contains the following values:**

A B C<br>
0 3 5 1<br>
1 8 2 7<br>
2 6 9 4<br>
3 2 3 5<br>
4 9 1 2<br>

Your function should select the following rows: 

A B C<br>
1 8 2 7<br>
4 9 1 2<br>

The function should return a new DataFrame that contains only the selected rows.

Certainly! You can use boolean indexing to filter the rows based on the given conditions. Here's a Python function that does this:

```python
import pandas as pd

def filter_rows(df):
    # Selecting rows where 'A' is greater than 5 and 'B' is less than 10
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    
    return selected_rows

# Example usage:
# Assuming df is your DataFrame with columns 'A', 'B', and 'C'
data = {'A': [3, 8, 6, 2, 9],
        'B': [5, 2, 9, 3, 1],
        'C': [1, 7, 4, 5, 2]}
df = pd.DataFrame(data)

# Calling the function to select rows based on conditions
selected_rows_df = filter_rows(df)

# Displaying the new DataFrame with selected rows
print(selected_rows_df)
```

This function uses boolean indexing to select rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The resulting DataFrame, `selected_rows_df`, contains only the rows that meet these conditions. Adjust the function according to your specific DataFrame structure if needed.

**Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean,
median, and standard deviation of the values in the 'Values' column.**

Certainly! You can use the `mean()`, `median()`, and `std()` functions in pandas to calculate the mean, median, and standard deviation of a specific column. Here's a Python function that does this:

```python
import pandas as pd

def calculate_statistics(df):
    # Calculating mean, median, and standard deviation of the 'Values' column
    mean_value = df['Values'].mean()
    median_value = df['Values'].median()
    std_deviation = df['Values'].std()
    
    # Creating a dictionary to store the results
    statistics_dict = {
        'Mean': mean_value,
        'Median': median_value,
        'Standard Deviation': std_deviation
    }
    
    return statistics_dict

# Example usage:
# Assuming df is your DataFrame with the 'Values' column
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Calling the function to calculate statistics
statistics_result = calculate_statistics(df)

# Displaying the calculated statistics
print(statistics_result)
```

This function calculates the mean, median, and standard deviation of the values in the 'Values' column and stores the results in a dictionary. You can adjust the function according to your specific DataFrame structure if needed.

**Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to
create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days
for each row in the DataFrame. The moving average should be calculated using a window of size 7 and
should include the current day.**

In [1]:
import pandas as pd

def calculate_moving_average(df):
    # Sorting the DataFrame by the 'Date' column (assuming it's not already sorted)
    df = df.sort_values(by='Date')

    # Calculating the moving average with a window size of 7
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()

    return df

# Example usage:
# Assuming df is your DataFrame with the 'Sales' and 'Date' columns
data = {'Sales': [10, 15, 20, 25, 30, 35, 40, 45],
        'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08']}
df = pd.DataFrame(data)

# Converting the 'Date' column to datetime type
df['Date'] = pd.to_datetime(df['Date'])

# Calling the function to calculate moving average
df_with_moving_average = calculate_moving_average(df)

# Displaying the DataFrame with the new 'MovingAverage' column
print(df_with_moving_average)


   Sales       Date  MovingAverage
0     10 2023-01-01           10.0
1     15 2023-01-02           12.5
2     20 2023-01-03           15.0
3     25 2023-01-04           17.5
4     30 2023-01-05           20.0
5     35 2023-01-06           22.5
6     40 2023-01-07           25.0
7     45 2023-01-08           30.0


**Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new
column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.
Monday, Tuesday) corresponding to each date in the 'Date' column.**<br>

For example, if df contains the following values:<br>
Date<br>
0 2023-01-01<br>
1 2023-01-02<br>
2 2023-01-03<br>
3 2023-01-04<br>
4 2023-01-05<br>

Your function should create the following DataFrame:<br>

Date Weekday<br>
0 2023-01-01 Sunday<br>
1 2023-01-02 Monday<br>
2 2023-01-03 Tuesday<br>
3 2023-01-04 Wednesday<br>
4 2023-01-05 Thursday<br>
The function should return the modified DataFrame.<br>

Certainly! You can use the `dt` accessor in pandas to extract the weekday name from the 'Date' column. Here's a Python function that does this:

```python
import pandas as pd

def add_weekday_column(df):
    # Converting the 'Date' column to datetime type
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Extracting the weekday name and creating a new 'Weekday' column
    df['Weekday'] = df['Date'].dt.strftime('%A')
    
    return df

# Example usage:
# Assuming df is your DataFrame with the 'Date' column
data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']}
df = pd.DataFrame(data)

# Calling the function to add the 'Weekday' column
df_with_weekday = add_weekday_column(df)

# Displaying the DataFrame with the new 'Weekday' column
print(df_with_weekday)
```

Output:
```
        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday
```

In this example, the `pd.to_datetime()` function is used to convert the 'Date' column to datetime type. Then, the `dt.strftime('%A')` extracts the weekday name and creates a new 'Weekday' column in the DataFrame. Adjust the function according to your specific DataFrame structure if needed.

**Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python
function to select all rows where the date is between '2023-01-01' and '2023-01-31'.**

Certainly! You can use boolean indexing to select rows based on a date range in the 'Date' column. Here's a Python function that does this:

```python
import pandas as pd

def select_rows_in_date_range(df):
    # Converting the 'Date' column to datetime type
    df['Date'] = pd.to_datetime(df['Date'])

    # Defining the date range
    start_date = '2023-01-01'
    end_date = '2023-01-31'

    # Selecting rows where the date is between start_date and end_date
    selected_rows = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]
    
    return selected_rows

# Example usage:
# Assuming df is your DataFrame with the 'Date' column
data = {'Date': ['2023-01-01', '2023-01-15', '2023-02-05', '2023-01-25', '2023-01-10']}
df = pd.DataFrame(data)

# Calling the function to select rows in the date range
selected_rows_df = select_rows_in_date_range(df)

# Displaying the DataFrame with selected rows
print(selected_rows_df)
```

Output:
```
        Date
0 2023-01-01
1 2023-01-15
3 2023-01-25
4 2023-01-10
```

In this example, the 'Date' column is first converted to datetime format using `pd.to_datetime()`. Then, the function selects rows where the date is between '2023-01-01' and '2023-01-31' using boolean indexing. Adjust the function according to your specific DataFrame structure if needed.

**Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to
be imported?**

The first and foremost necessary library that needs to be imported to use the basic functions of pandas is, unsurprisingly, the `pandas` library itself. You typically import it using the following import statement:

```python
import pandas as pd
```

The `pd` is a commonly used alias for the `pandas` library. This alias is not required, but it is a widely adopted convention to make the code more concise and readable.

Once you've imported pandas, you can use its various functions and features to work with data in a tabular format, such as DataFrames and Series.