Q1. List any five functions of the pandas library with execution.

Certainly! Here are five common functions from the pandas library in Python along with their executions:

1. **`pandas.read_csv()`**: This function is used to read data from a CSV file into a pandas DataFrame.

```python
import pandas as pd

# Reading a CSV file and creating a DataFrame
data = pd.read_csv('data.csv')
print(data.head())
```

2. **`pandas.DataFrame()`**: This function is used to create a DataFrame from various data sources, such as lists, dictionaries, or NumPy arrays.

```python
import pandas as pd

# Creating a DataFrame from a dictionary
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 22]}
df = pd.DataFrame(data)
print(df)
```

3. **`pandas.DataFrame.groupby()`**: This function is used to group rows of a DataFrame based on one or more columns, enabling aggregation and analysis.

```python
import pandas as pd

# Grouping by the 'Category' column and calculating the mean of 'Value'
data = {'Category': ['A', 'B', 'A', 'B', 'A'],
        'Value': [10, 20, 15, 25, 12]}
df = pd.DataFrame(data)
grouped = df.groupby('Category')['Value'].mean()
print(grouped)
```

4. **`pandas.DataFrame.merge()`**: This function is used to merge two or more DataFrames based on common columns or indices.

```python
import pandas as pd

# Merging two DataFrames based on a common column 'ID'
df1 = pd.DataFrame({'ID': [1, 2, 3],
                    'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [2, 3, 4],
                    'Age': [25, 30, 22]})
merged = df1.merge(df2, on='ID')
print(merged)
```

5. **`pandas.Series.value_counts()`**: This function is used to count the occurrences of unique values in a Series.

```python
import pandas as pd

# Counting the occurrences of unique values in a Series
data = pd.Series([2, 3, 2, 4, 3, 2, 4, 2])
value_counts = data.value_counts()
print(value_counts)
```

Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [6]:
import pandas as pd

def reindex_with_increment(df):
    new_index = range(1, 2 * len(df), 2)  # New index values
    df_reindexed = df.copy()
    df_reindexed.index = new_index
    return df_reindexed

# Example DataFrame
data = {'A': [10, 20, 30],
        'B': [5, 15, 25],
        'C': [100, 200, 300]}
df = pd.DataFrame(data)

# Re-index the DataFrame
df_new_index = reindex_with_increment(df)
print(df_new_index)


    A   B    C
1  10   5  100
3  20  15  200
5  30  25  300


Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.

In [9]:
import pandas as pd

def calculate_sum_first_three(df):
    values_column = df['Values']  # Extract the 'Values' column
    print(values_column)
    sum_first_three = values_column.head(3).sum()  # Calculate sum of the first three values
    print("Sum of the first three values:", sum_first_three)

# Example DataFrame
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Calculate and print the sum of the first three values
calculate_sum_first_three(df)


0    10
1    20
2    30
3    40
4    50
Name: Values, dtype: int64
Sum of the first three values: 60


Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column
'Word_Count' that contains the number of words in each row of the 'Text' column.

In [10]:
import pandas as pd
import re  # Import the regular expression module

def add_word_count_column(df):
    # Function to count words in a text
    def count_words(text):
        words = re.findall(r'\w+', text)  # Use regex to find all words
        return len(words)

    df['Word_Count'] = df['Text'].apply(count_words)  # Apply the count_words function to each row

# Example DataFrame
data = {'Text': ["Hello, how are you?", "Python programming is fun", "Data analysis"]}
df = pd.DataFrame(data)

# Add the 'Word_Count' column to the DataFrame
add_word_count_column(df)
print(df)


                        Text  Word_Count
0        Hello, how are you?           4
1  Python programming is fun           4
2              Data analysis           2


Q5. How are DataFrame.size() and DataFrame.shape() different?

DataFrame.size and DataFrame.shape are both attributes in pandas that provide information about the dimensions of a DataFrame, but they represent slightly different aspects of the DataFrame.

    DataFrame.size: This attribute returns the total number of elements in the DataFrame. It's calculated as the product of the number of rows and the number of columns in the DataFrame.

    DataFrame.shape: This attribute returns a tuple representing the dimensions of the DataFrame. The tuple contains two values: the number of rows and the number of columns.

In [11]:
import pandas as pd

# Creating an example DataFrame
data = {'A': [1, 2, 3],
        'B': [4, 5, 6]}
df = pd.DataFrame(data)

# Using DataFrame.size
size = df.size
print("DataFrame size:", size)  # Output: DataFrame size: 6 (2 rows * 3 columns)

# Using DataFrame.shape
shape = df.shape
print("DataFrame shape:", shape)  # Output: DataFrame shape: (3, 2) (3 rows, 2 columns)


DataFrame size: 6
DataFrame shape: (3, 2)


Q6. Which function of pandas do we use to read an excel file?

In pandas, the function used to read an Excel file is pandas.read_excel(). This function allows you to read data from an Excel file (both .xls and .xlsx formats) and create a DataFrame from it.

Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email
addresses in the format 'username@domain.com'. Write a Python function that creates a new column
'Username' in df that contains only the username part of each email address.
The username is the part of the email address that appears before the '@' symbol. For example, if the
email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your
function should extract the username from each email address and store it in the new 'Username'
column.

In [12]:
import pandas as pd

def extract_username(df):
    # Function to extract username from an email
    def get_username(email):
        return email.split('@')[0]

    df['Username'] = df['Email'].apply(get_username)  # Apply the get_username function to each row

# Example DataFrame
data = {'Email': ["john.doe@example.com", "jane.smith@example.com", "bob.jones@example.com"]}
df = pd.DataFrame(data)

# Extract usernames and add them to the 'Username' column
extract_username(df)
print(df)


                    Email    Username
0    john.doe@example.com    john.doe
1  jane.smith@example.com  jane.smith
2   bob.jones@example.com   bob.jones


Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.

In [13]:
import pandas as pd

def filter_dataframe(df):
    filtered_df = df[(df['A'] > 5) & (df['B'] < 10)]
    return filtered_df

# Example DataFrame
data = {'A': [3, 8, 2, 10],
        'B': [5, 4, 7, 2],
        'C': [100, 200, 300, 400]}
df = pd.DataFrame(data)

# Filter the DataFrame based on conditions and get the new filtered DataFrame
filtered_df = filter_dataframe(df)
print(filtered_df)

    A  B    C
1   8  4  200
3  10  2  400


Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean,
median, and standard deviation of the values in the 'Values' column.

In [14]:
import pandas as pd

def calculate_stats(df):
    values_column = df['Values']  # Extract the 'Values' column
    mean_value = values_column.mean()  # Calculate mean
    median_value = values_column.median()  # Calculate median
    std_dev = values_column.std()  # Calculate standard deviation
    
    return mean_value, median_value, std_dev

# Example DataFrame
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Calculate mean, median, and standard deviation
mean, median, std = calculate_stats(df)
print("Mean:", mean)
print("Median:", median)
print("Standard Deviation:", std)


Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896


Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to
create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days
for each row in the DataFrame. The moving average should be calculated using a window of size 7 and
should include the current day.

In [15]:
import pandas as pd

def calculate_moving_average(df):
    # Convert the 'Date' column to a datetime type
    df['Date'] = pd.to_datetime(df['Date'])

    # Sort the DataFrame by 'Date'
    df = df.sort_values(by='Date')

    # Calculate the moving average using a rolling window of size 7
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()

    return df

# Example DataFrame
data = {'Date': ['2023-08-01', '2023-08-02', '2023-08-03', '2023-08-04', '2023-08-05', '2023-08-06', '2023-08-07'],
        'Sales': [100, 150, 200, 180, 220, 250, 300]}
df = pd.DataFrame(data)

# Calculate the moving average and add it to the DataFrame
df_with_moving_avg = calculate_moving_average(df)
print(df_with_moving_avg)


        Date  Sales  MovingAverage
0 2023-08-01    100     100.000000
1 2023-08-02    150     125.000000
2 2023-08-03    200     150.000000
3 2023-08-04    180     157.500000
4 2023-08-05    220     170.000000
5 2023-08-06    250     183.333333
6 2023-08-07    300     200.000000


Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new
column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.
Monday, Tuesday) corresponding to each date in the 'Date' column.

In [16]:
import pandas as pd

def add_weekday_column(df):
    # Convert the 'Date' column to a datetime type
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Map weekday names to the 'Weekday' column
    df['Weekday'] = df['Date'].dt.strftime('%A')

# Example DataFrame
data = {'Date': ['2023-08-01', '2023-08-02', '2023-08-03']}
df = pd.DataFrame(data)

# Add the 'Weekday' column to the DataFrame
add_weekday_column(df)
print(df)


        Date    Weekday
0 2023-08-01    Tuesday
1 2023-08-02  Wednesday
2 2023-08-03   Thursday


Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python
function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [17]:
import pandas as pd

def select_january_data(df):
    # Convert the 'Date' column to a datetime type
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Select rows with dates between '2023-01-01' and '2023-01-31'
    selected_rows = df[(df['Date'] >= '2023-01-01') & (df['Date'] <= '2023-01-31')]
    
    return selected_rows

# Example DataFrame
data = {'Date': ['2023-01-15', '2023-01-20', '2023-02-05', '2023-01-10']}
df = pd.DataFrame(data)

# Select rows with dates between '2023-01-01' and '2023-01-31'
selected_january_data = select_january_data(df)
print(selected_january_data)


        Date
0 2023-01-15
1 2023-01-20
3 2023-01-10


Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to
be imported?

The first and foremost necessary library that needs to be imported to use the basic functions of pandas is the pandas library itself.