In [1]:
# Q1. List any five functions of the pandas library with execution.

Certainly! Here are five functions of the Pandas library along with their execution:

1. **`read_csv()` - Read a CSV file into a DataFrame:**
   ```python
   import pandas as pd

   # Reading a CSV file into a DataFrame
   df = pd.read_csv('example.csv')

   # Displaying the DataFrame
   print(df)
   ```

2. **`head()` - Display the first n rows of a DataFrame:**
   ```python
   # Displaying the first 3 rows of the DataFrame
   df.head(3)
   ```

3. **`describe()` - Generate descriptive statistics of a DataFrame:**
   ```python
   # Displaying summary statistics of the DataFrame
   df.describe()
   ```

4. **`groupby()` - Group a DataFrame by one or more columns:**
   ```python
   # Grouping the DataFrame by the 'Category' column and calculating the mean of 'Value' for each group
   grouped_df = df.groupby('Category')['Value'].mean()

   # Displaying the result
   print(grouped_df)
   ```

5. **`plot()` - Create a plot of a DataFrame:**
   ```python
   import matplotlib.pyplot as plt

   # Plotting the 'Value' column of the DataFrame
   df['Value'].plot(kind='bar')

   # Displaying the plot
   plt.show()
   ```

These are just a few examples, and Pandas offers a wide range of functions for data manipulation, analysis, and visualization. The choice of functions depends on the specific tasks you need to perform with your data.

In [None]:
# Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

Certainly! You can use the `set_index` function in Pandas to re-index the DataFrame with a new index. Here's a Python function that achieves this:

```python
import pandas as pd

def reindex_dataframe(df):
    # Create a new index starting from 1 and incrementing by 2
    new_index = range(1, 2 * len(df) + 1, 2)

    # Set the new index for the DataFrame
    df_reindexed = df.set_index(pd.Index(new_index))

    return df_reindexed

# Example usage:
# Assuming df is your original DataFrame with columns 'A', 'B', and 'C'
df = pd.DataFrame({
    'A': [10, 20, 30],
    'B': [40, 50, 60],
    'C': [70, 80, 90]
})

# Call the function to re-index the DataFrame
df_reindexed = reindex_dataframe(df)

# Display the re-indexed DataFrame
print(df_reindexed)
```

This will output a DataFrame with a new index starting from 1 and incrementing by 2 for each row:

```
    A   B   C
1  10  40  70
3  20  50  80
5  30  60  90
```

In this example, the new index starts from 1 and increments by 2 for each row in the original DataFrame.

In [2]:
# Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The function should print the sum to the console.

Certainly! You can create a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. Here's an example:

```python
import pandas as pd

def calculate_sum_of_first_three(df):
    # Check if 'Values' column exists in the DataFrame
    if 'Values' in df.columns:
        # Extract the first three values in the 'Values' column
        first_three_values = df['Values'].head(3)

        # Calculate the sum of the first three values
        sum_of_first_three = first_three_values.sum()

        # Print the result to the console
        print("Sum of the first three values:", sum_of_first_three)
    else:
        print("DataFrame does not have a 'Values' column.")

# Example usage:
# Assuming df is your original DataFrame with a 'Values' column
df = pd.DataFrame({
    'Values': [10, 20, 30, 40, 50]
})

# Call the function to calculate the sum of the first three values
calculate_sum_of_first_three(df)
```

This will output:

```
Sum of the first three values: 60
```


In [3]:
# Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.

Certainly! You can use the `apply` function along with a lambda function to create a new column 'Word_Count' in your DataFrame. Here's an example Python function:

```python
import pandas as pd

def add_word_count_column(df):
    # Check if 'Text' column exists in the DataFrame
    if 'Text' in df.columns:
        # Create a new 'Word_Count' column by counting the number of words in each row of 'Text'
        df['Word_Count'] = df['Text'].apply(lambda x: len(str(x).split()))

        # Optionally, you can also remove leading and trailing whitespaces before counting
        # df['Word_Count'] = df['Text'].apply(lambda x: len(str(x).strip().split()))

        return df
    else:
        print("DataFrame does not have a 'Text' column.")
        return df

# Example usage:
# Assuming df is your original DataFrame with a 'Text' column
df = pd.DataFrame({
    'Text': ['This is an example.', 'Pandas is powerful.', 'Word count in DataFrame.']
})

# Call the function to add the 'Word_Count' column
df = add_word_count_column(df)

# Display the DataFrame with the new 'Word_Count' column
print(df)
```

This will output:

```
                        Text  Word_Count
0         This is an example.           4
1         Pandas is powerful.           3
2  Word count in DataFrame.           4
```

# Q5. How are DataFrame.size() and DataFrame.shape() different?
`DataFrame.size` and `DataFrame.shape` are two different attributes in Pandas that provide information about the dimensions of a DataFrame, but they serve different purposes.

1. **DataFrame.size:**
   - `DataFrame.size` returns the total number of elements in the DataFrame.
   - It represents the product of the number of rows and the number of columns in the DataFrame.
   - The result is a single integer value.

   Example:
   ```python
   import pandas as pd

   # Creating a DataFrame
   df = pd.DataFrame({
       'A': [1, 2, 3],
       'B': [4, 5, 6],
       'C': [7, 8, 9]
   })

   # Displaying the size of the DataFrame
   print("DataFrame.size:", df.size)
   ```

   Output:
   ```
   DataFrame.size: 9
   ```

2. **DataFrame.shape:**
   - `DataFrame.shape` returns a tuple representing the dimensions of the DataFrame.
   - The tuple contains two elements: the number of rows and the number of columns.
   - It provides a clearer breakdown of the structure of the DataFrame.

   Example:
   ```python
   import pandas as pd

   # Creating a DataFrame
   df = pd.DataFrame({
       'A': [1, 2, 3],
       'B': [4, 5, 6],
       'C': [7, 8, 9]
   })

   # Displaying the shape of the DataFrame
   print("DataFrame.shape:", df.shape)
   ```

   Output:
   ```
   DataFrame.shape: (3, 3)
   ```

In summary, `DataFrame.size` gives you the total number of elements, while `DataFrame.shape` provides a tuple with the number of rows and columns. If you want to know the total number of elements in the DataFrame, use `size`. If you want a more detailed breakdown of the dimensions, use `shape`.

# Q6. Which function of pandas do we use to read an excel file?
To read an Excel file in Pandas, you can use the `pd.read_excel()` function. This function allows you to read the data from an Excel file into a Pandas DataFrame.

Here's an example of how you can use it:

```python
import pandas as pd

# Specify the Excel file path
excel_file_path = 'example.xlsx'

# Read the Excel file into a DataFrame
df = pd.read_excel(excel_file_path)

# Display the DataFrame
print(df)
```

In this example, `example.xlsx` is the name of the Excel file you want to read. You should replace it with the actual path or URL of your Excel file. The resulting DataFrame (`df`) will contain the data from the Excel file, and you can perform various data manipulations and analyses using Pandas functions.

In [4]:
# Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email
# addresses in the format 'username@domain.com'. Write a Python function that creates a new column
# 'Username' in df that contains only the username part of each email address.
# The username is the part of the email address that appears before the '@' symbol. For example, if the
# email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your
# function should extract the username from each email address and store it in the new 'Username'
# column.

Certainly! You can use the `str.split()` method to split the email addresses at the '@' symbol and then extract the username part. Here's a Python function that achieves this:

```python
import pandas as pd

def extract_username(df, email_column='Email', new_column='Username'):
    # Check if the specified email column exists in the DataFrame
    if email_column in df.columns:
        # Extract the username from each email address
        df[new_column] = df[email_column].str.split('@').str[0]
        return df
    else:
        print(f"DataFrame does not have a '{email_column}' column.")
        return df

# Example usage:
# Assuming df is your original DataFrame with an 'Email' column
df = pd.DataFrame({
    'Email': ['john.doe@example.com', 'jane.smith@example.com', 'bob@example.com']
})

# Call the function to extract usernames and create the 'Username' column
df = extract_username(df)

# Display the DataFrame with the new 'Username' column
print(df)
```

This will output:

```
                  Email Username
0  john.doe@example.com  john.doe
1  jane.smith@example.com jane.smith
2        bob@example.com       bob
```

Adjust the function based on your DataFrame structure and column names if needed.

In [5]:
# Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
# all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
# function should return a new DataFrame that contains only the selected rows.
# For example, if df contains the following values:
# A B C
# 0 3 5 1
# 1 8 2 7
# 2 6 9 4
# 3 2 3 5
# 4 9 1 2

In [6]:
import pandas as pd

def select_rows(df):
    # Select rows where 'A' is greater than 5 and 'B' is less than 10
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    
    return selected_rows

# Example usage:
# Assuming df is your original DataFrame with columns 'A', 'B', and 'C'
df = pd.DataFrame({
    'A': [3, 8, 6, 2, 9],
    'B': [5, 2, 9, 3, 1],
    'C': [1, 7, 4, 5, 2]
})

# Call the function to select rows based on the conditions
selected_df = select_rows(df)

# Display the DataFrame with selected rows
print(selected_df)

   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


In [7]:
# Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.

In [8]:
import pandas as pd

def calculate_statistics(df, column='Values'):
    # Check if the specified column exists in the DataFrame
    if column in df.columns:
        # Calculate mean, median, and standard deviation
        mean_value = df[column].mean()
        median_value = df[column].median()
        std_deviation = df[column].std()

        # Print the results
        print(f"Mean of '{column}': {mean_value}")
        print(f"Median of '{column}': {median_value}")
        print(f"Standard Deviation of '{column}': {std_deviation}")
    else:
        print(f"DataFrame does not have a '{column}' column.")

# Example usage:
# Assuming df is your original DataFrame with a 'Values' column
df = pd.DataFrame({
    'Values': [10, 20, 30, 40, 50]
})

# Call the function to calculate statistics for the 'Values' column
calculate_statistics(df)

Mean of 'Values': 30.0
Median of 'Values': 30.0
Standard Deviation of 'Values': 15.811388300841896


In [9]:
# Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to
# create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days
# for each row in the DataFrame. The moving average should be calculated using a window of size 7 and
# should include the current day.

In [10]:
import pandas as pd

def calculate_moving_average(df, sales_column='Sales', date_column='Date', window_size=7):
    # Check if the specified columns exist in the DataFrame
    if sales_column in df.columns and date_column in df.columns:
        # Sort the DataFrame by the 'Date' column
        df = df.sort_values(by=date_column)

        # Calculate the moving average and create the 'MovingAverage' column
        df['MovingAverage'] = df[sales_column].rolling(window=window_size, min_periods=1).mean()

        return df
    else:
        print(f"DataFrame is missing either '{sales_column}' or '{date_column}' column.")
        return df

# Example usage:
# Assuming df is your original DataFrame with 'Sales' and 'Date' columns
df = pd.DataFrame({
    'Date': pd.date_range(start='2023-01-01', periods=10),
    'Sales': [100, 150, 200, 120, 180, 250, 300, 220, 200, 180]
})

# Call the function to calculate the moving average for 'Sales' over the past 7 days
df = calculate_moving_average(df)

# Display the DataFrame with the new 'MovingAverage' column
print(df)

        Date  Sales  MovingAverage
0 2023-01-01    100     100.000000
1 2023-01-02    150     125.000000
2 2023-01-03    200     150.000000
3 2023-01-04    120     142.500000
4 2023-01-05    180     150.000000
5 2023-01-06    250     166.666667
6 2023-01-07    300     185.714286
7 2023-01-08    220     202.857143
8 2023-01-09    200     210.000000
9 2023-01-10    180     207.142857


In [11]:
# Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new
# column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.
# Monday, Tuesday) corresponding to each date in the 'Date' column.
# For example, if df contains the following values:
# Date
# 0 2023-01-01
# 1 2023-01-02
# 2 2023-01-03
# 3 2023-01-04
# 4 2023-01-05

In [12]:
import pandas as pd

def add_weekday_column(df, date_column='Date', new_column='Weekday'):
    # Check if the specified date column exists in the DataFrame
    if date_column in df.columns:
        # Convert the 'Date' column to datetime if it's not already
        df[date_column] = pd.to_datetime(df[date_column])

        # Create the 'Weekday' column containing weekday names
        df[new_column] = df[date_column].dt.strftime('%A')

        return df
    else:
        print(f"DataFrame does not have a '{date_column}' column.")
        return df

# Example usage:
# Assuming df is your original DataFrame with a 'Date' column
df = pd.DataFrame({
    'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']
})

# Call the function to add the 'Weekday' column
df = add_weekday_column(df)

# Display the DataFrame with the new 'Weekday' column
print(df)

        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


In [13]:
# Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python
# function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [None]:
import pandas as pd

def select_rows_in_date_range(df, date_column='Date', start_date='2023-01-01', end_date='2023-01-31'):
    # Check if the specified date column exists in the DataFrame
    if date_column in df.columns:
        # Convert the 'Date' column to datetime if it's not already
        df[date_column] = pd.to_datetime(df[date_column])

        # Select rows within the specified date range
        selected_rows = df[(df[date_column] >= start_date) & (df[date_column] <= end_date)]

        return selected_rows
    else:
        print(f"DataFrame does not have a '{date_column}' column.")
        return df

# Example usage:
# Assuming df is your original DataFrame with a 'Date' column
df = pd.DataFrame({
    'Date': ['2023-01-01', '2023-01-15', '2023-01-20', '2023-02-05', '2023-02-10']
})

# Call the function to select rows within the date range
selected_df = select_rows_in_date_range(df)

# Display the DataFrame with selected rows
print(selected_df)

        Date
0 2023-01-01
1 2023-01-15
2 2023-01-20


In [15]:
# Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to
# be imported?

The first and foremost library that needs to be imported to use the basic functions of Pandas is the Pandas library itself. You can import Pandas using the `import` statement in Python. The common convention is to import it as `pd`:

```python
import pandas as pd
```

By convention, most of the Pandas documentation and examples use `pd` as the alias for Pandas. Once you've imported Pandas, you can use its various functions and classes to work with tabular data, manipulate DataFrames, and perform data analysis.