**Q1. List any five functions of the pandas library with execution.**

In [1]:
# Import the pandas library and use the alias 'pd' for convenience
import pandas as pd  

# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'Mike', 'Scott'],
        'Age': [25, 30, 22, 29, 31],
        'City': ['New York', 'Los Angeles', 'Chicago', 'Chicago', 'New York']}

# Create a DataFrame using the provided data
df = pd.DataFrame(data)  
# Show the original dataframe
print("THe Original DataFrame:")
print(df) 

# Function 1: head()
# Display the first n rows of the DataFrame (default n=5)
print("\nTop 2 rows of the DataFrame:")
print(df.head(2))  # Display the top 2 rows of the DataFrame

# Function 2: info()
# Display concise summary of the DataFrame, including data types and non-null counts
print("\nSummary of DataFrame:")
print(df.info())  # Display summary information about the DataFrame

# Function 3: describe()
# Generate descriptive statistics of the DataFrame's numerical columns
print("\nDescriptive statistics of numerical columns:")
print(df.describe())  # Display descriptive statistics of numerical columns

# Function 4: unique()
# Get unique values in a specific column
unique_cities = df['City'].unique()  # Get unique cities from the 'City' column
print("\nUnique cities:", unique_cities)

# Function 5: groupby()
# Group DataFrame rows based on a categorical column and perform aggregation
city_group = df.groupby('City')['Name'].count()  # Count the occurrences of names in each city
print("\nGrouping of names by city:")
print(city_group)


THe Original DataFrame:
      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   22      Chicago
3     Mike   29      Chicago
4    Scott   31     New York

Top 2 rows of the DataFrame:
    Name  Age         City
0  Alice   25     New York
1    Bob   30  Los Angeles

Summary of DataFrame:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    5 non-null      object
 1   Age     5 non-null      int64 
 2   City    5 non-null      object
dtypes: int64(1), object(2)
memory usage: 252.0+ bytes
None

Descriptive statistics of numerical columns:
             Age
count   5.000000
mean   27.400000
std     3.781534
min    22.000000
25%    25.000000
50%    29.000000
75%    30.000000
max    31.000000

Unique cities: ['New York' 'Los Angeles' 'Chicago']

Grouping of names by city:
City
Chicago        2
Los Angeles    1
New York 

**Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.**

In [2]:
# Import the pandas library and use the alias 'pd' for convenience
import pandas as pd  

# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'Mike', 'Scott'],
        'Age': [25, 30, 22, 29, 31],
        'City': ['New York', 'Los Angeles', 'Chicago', 'Chicago', 'New York']}

# Create a DataFrame using the provided data
df = pd.DataFrame(data)  
# Show the original dataframe
print("THe Original DataFrame:")
print(df) 

# Re-index the DataFrame with a new index that starts from 1 and increments by 2
new_index = range(1, len(df)*2 , 2)  # Generate a new index using the desired pattern
df_reindexed = df.copy()  # Create a copy of the original DataFrame
df_reindexed.index = new_index  # Assign the new index to the copied DataFrame

# Display the re-indexed DataFrame
print("\nRe-indexed DataFrame:")
print(df_reindexed)

THe Original DataFrame:
      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   22      Chicago
3     Mike   29      Chicago
4    Scott   31     New York

Re-indexed DataFrame:
      Name  Age         City
1    Alice   25     New York
3      Bob   30  Los Angeles
5  Charlie   22      Chicago
7     Mike   29      Chicago
9    Scott   31     New York


**Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The function should print the sum to the console.**

For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should
calculate and print the sum of the first three values, which is 60.

In [3]:
# Import the pandas library and use the alias 'pd' for convenience
import pandas as pd

# Create a sample DataFrame
data = {'Values': [10, 20, 30, 40, 50]}  # Sample data for the 'Values' column
df = pd.DataFrame(data)  # Create a DataFrame using the provided data
# Show the dataframe
print("Given DataFrame:")
print(df)

# Define a function to calculate and print the sum of the first three values in the 'Values' column
def calculate_sum_of_first_three_values(data_frame):
    # Extract the first three values from the 'Values' column
    first_three_values = data_frame['Values'].head(3)
    
    # Calculate the sum of the first three values
    sum_of_values = first_three_values.sum()
    
    # Print the calculated sum to the console
    print("\nSum of the first three values:", sum_of_values)

# Call the defined function with the DataFrame 'df'
calculate_sum_of_first_three_values(df)


Given DataFrame:
   Values
0      10
1      20
2      30
3      40
4      50

Sum of the first three values: 60


**Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.**

In [4]:
# Import the pandas library and use the alias 'pd' for convenience
import pandas as pd

# Create a sample DataFrame with a 'Text' column
data = {'Text': ["This is a sample sentence.", "Count the words here.", "Another example for word count."]}
df = pd.DataFrame(data)  # Create a DataFrame using the provided data
# Show the dataframe
print("Original DataFrame:")
print(df)

# Define a function to count the number of words in a given text
def count_words(text):
    # Split the text into words using spaces as separators
    words = text.split()
    
    # Return the count of words
    return len(words)

# Apply the 'count_words' function to each row in the 'Text' column and create a new 'Word_Count' column
df['Word_Count'] = df['Text'].apply(count_words)

# Display the updated DataFrame
print("\nDataFrame with Word_Count:")
print(df)


Original DataFrame:
                              Text
0       This is a sample sentence.
1            Count the words here.
2  Another example for word count.

DataFrame with Word_Count:
                              Text  Word_Count
0       This is a sample sentence.           5
1            Count the words here.           4
2  Another example for word count.           5


**Q5. How are DataFrame.size() and DataFrame.shape() different?**

In [5]:
# Import the pandas library and use the alias 'pd' for convenience
import pandas as pd

# Create a sample DataFrame with a 'Text' column
data = {'Text': ["This is a sample sentence.", "Count the words here.", "Another example for word count."]}
df = pd.DataFrame(data)  # Create a DataFrame using the provided data
# Show the dataframe
print("Original DataFrame:")
print(df)

# Define a function to count the number of words in a given text
def count_words(text):
    # Split the text into words using spaces as separators
    words = text.split()
    
    # Return the count of words
    return len(words)

# Apply the 'count_words' function to each row in the 'Text' column and create a new 'Word_Count' column
df['Word_Count'] = df['Text'].apply(count_words)

# Display the updated DataFrame
print("\nDataFrame with Word_Count:")
print(df)

# Get the number of elements in the DataFrame using the 'size' function
dataframe_size = df.size
print("\nDataFrame Size:", dataframe_size)

# Get the dimensions (rows and columns) of the DataFrame using the 'shape' function
dataframe_shape = df.shape
print("\nDataFrame Shape:", dataframe_shape)


Original DataFrame:
                              Text
0       This is a sample sentence.
1            Count the words here.
2  Another example for word count.

DataFrame with Word_Count:
                              Text  Word_Count
0       This is a sample sentence.           5
1            Count the words here.           4
2  Another example for word count.           5

DataFrame Size: 6

DataFrame Shape: (3, 2)


**Q6. Which function of pandas do we use to read an excel file?**

In [6]:
!pip install openpyxl



In [7]:
# Import the pandas library and use the alias 'pd' for convenience
import pandas as pd

# Read an Excel file into a DataFrame using the 'read_excel' function
# Replace 'file_path.xls' with the actual path to your Excel file
excel_file_path = './Data/LUSID Excel - Setting up your market data.xlsx'
df = pd.read_excel(excel_file_path)

# Display the loaded DataFrame
print("DataFrame from Excel:")
print(df)


DataFrame from Excel:
    Unnamed: 0  Unnamed: 1  Unnamed: 2  \
0          NaN         NaN         NaN   
1          NaN         NaN         NaN   
2          NaN         NaN         NaN   
3          NaN         NaN         NaN   
4          NaN         NaN         NaN   
5          NaN         NaN         NaN   
6          NaN         NaN         NaN   
7          NaN         NaN         NaN   
8          NaN         NaN         NaN   
9          NaN         NaN         NaN   
10         NaN         NaN         NaN   
11         NaN         NaN         NaN   
12         NaN         NaN         NaN   
13         NaN         NaN         NaN   
14         NaN         NaN         NaN   
15         NaN         NaN         NaN   
16         NaN         NaN         NaN   
17         NaN         NaN         NaN   
18         NaN         NaN         NaN   
19         NaN         NaN         NaN   
20         NaN         NaN         NaN   
21         NaN         NaN         NaN   
22         N

**Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email addresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address.**

The username is the part of the email address that appears before the '@' symbol. For example, if the email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your function should extract the username from each email address and store it in the new 'Username' column.


In [8]:
# Import the pandas library and use the alias 'pd' for convenience
import pandas as pd

# Create a sample DataFrame with 'Email', 'Age', 'Phone', and 'Name' columns
data = {'Email': ["john.doe@example.com", "jane.smith@example.com", "bob.johnson@example.com"],
        'Age': [25, 30, 28],
        'Phone': ["123-456-7890", "987-654-3210", "555-123-4567"],
        'Name': ["John Doe", "Jane Smith", "Bob Johnson"]}

# Create a DataFrame using the provided data
df = pd.DataFrame(data)  
# Print the original data frame
print("The Original DataFrame:")
print(df)


# Function to extract username from email
def extract_username(email):
    # Split the email using '@' symbol and get the first part
    username = email.split('@')[0]
    return username

# Apply the 'extract_username' function to each row in the 'Email' column and create a new 'Username' column
df['Username'] = df['Email'].apply(extract_username)

# Display the updated DataFrame
print("\nDataFrame with Username:")
print(df)


The Original DataFrame:
                     Email  Age         Phone         Name
0     john.doe@example.com   25  123-456-7890     John Doe
1   jane.smith@example.com   30  987-654-3210   Jane Smith
2  bob.johnson@example.com   28  555-123-4567  Bob Johnson

DataFrame with Username:
                     Email  Age         Phone         Name     Username
0     john.doe@example.com   25  123-456-7890     John Doe     john.doe
1   jane.smith@example.com   30  987-654-3210   Jane Smith   jane.smith
2  bob.johnson@example.com   28  555-123-4567  Bob Johnson  bob.johnson


**Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The function should return a new DataFrame that contains only the selected rows.**

For example, if df contains the following values:

A B C

0 3 5 1

1 8 2 7

2 6 9 4

3 2 3 5

4 9 1 2

Your function should select the following rows: A B C

1 8 2 7

2 6 9 4

4 9 1 2

The function should return a new DataFrame that contains only the selected rows.


In [9]:
import pandas as pd

# Create a sample DataFrame with columns 'A', 'B', and 'C'
data = {'A': [3, 8, 6, 2, 9],
        'B': [5, 2, 9, 3, 1],
        'C': [1, 7, 4, 5, 2]}

# Create a DataFrame using the provided data
df = pd.DataFrame(data)

# Print the original data frame
print("The Original DataFrame:")
print(df)

# Select rows where 'A' > 5 and 'B' < 10
selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]

# Display the new DataFrame containing the selected rows
print("\nSelected Rows:")
print(selected_rows)


The Original DataFrame:
   A  B  C
0  3  5  1
1  8  2  7
2  6  9  4
3  2  3  5
4  9  1  2

Selected Rows:
   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


**Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.**

In [10]:
import pandas as pd

# Create a sample DataFrame with 'Values' column
data = {'Values': [12, 24, 18, 30, 22, 33, 67, 56, 49, 31, 87]}
df = pd.DataFrame(data)

# Print the original data frame
print("The Original DataFrame:")
print(df)

# Calculate the mean of the 'Values' column
mean_value = df['Values'].mean()

# Calculate the median of the 'Values' column
median_value = df['Values'].median()

# Calculate the standard deviation of the 'Values' column
std_deviation_value = df['Values'].std()

# Print the calculated statistics
print("\nStatistics for 'Values' column:")
print(f"Mean: {mean_value}")
print(f"Median: {median_value}")
print(f"Standard Deviation: {std_deviation_value}")

The Original DataFrame:
    Values
0       12
1       24
2       18
3       30
4       22
5       33
6       67
7       56
8       49
9       31
10      87

Statistics for 'Values' column:
Mean: 39.0
Median: 31.0
Standard Deviation: 23.112767034693185


**Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and
should include the current day.**

In [11]:
import pandas as pd

# Create a sample DataFrame with 'Sales' and 'Date' columns
data = {'Sales': [120, 150, 180, 130, 200, 170, 190, 220, 250, 210, 180, 160],
        'Date': pd.date_range(start='2023-08-01', periods=12)}

# Show the original data frame
df = pd.DataFrame(data)
print("The Original DataFrame:")
print(df)

# Calculate the moving average using a rolling window of size 7
# The parameter min_periods=1 ensures that even when there are fewer than 7 days of data available, 
# or atleast one day of sales data available, the calculation still proceeds.
df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()

# Print the DataFrame with the moving averages
print("\nDataFrame with Moving Averages:")
print(df)

The Original DataFrame:
    Sales       Date
0     120 2023-08-01
1     150 2023-08-02
2     180 2023-08-03
3     130 2023-08-04
4     200 2023-08-05
5     170 2023-08-06
6     190 2023-08-07
7     220 2023-08-08
8     250 2023-08-09
9     210 2023-08-10
10    180 2023-08-11
11    160 2023-08-12

DataFrame with Moving Averages:
    Sales       Date  MovingAverage
0     120 2023-08-01     120.000000
1     150 2023-08-02     135.000000
2     180 2023-08-03     150.000000
3     130 2023-08-04     145.000000
4     200 2023-08-05     156.000000
5     170 2023-08-06     158.333333
6     190 2023-08-07     162.857143
7     220 2023-08-08     177.142857
8     250 2023-08-09     191.428571
9     210 2023-08-10     195.714286
10    180 2023-08-11     202.857143
11    160 2023-08-12     197.142857


**Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g. Monday, Tuesday) corresponding to each date in the 'Date' column.**

For example, if df contains the following values:

    Date

0 2023-01-01

1 2023-01-02

2 2023-01-03

3 2023-01-04

4 2023-01-05

function should create the following DataFrame:


    Date    Weekday

0 2023-01-01 Sunday

1 2023-01-02 Monday

2 2023-01-03 Tuesday

3 2023-01-04 Wednesday

4 2023-01-05 Thursday

The function should return the modified DataFrame.

In [12]:
import pandas as pd

def add_weekday_column(df):
    """
    Adds a 'Weekday' column to the DataFrame containing the weekday name corresponding to each date.
    
    Args:
        df (pd.DataFrame): The input DataFrame with a 'Date' column.
        
    Returns:
        pd.DataFrame: The modified DataFrame with the 'Weekday' column added.
    """
    # Convert the 'Date' column to datetime if not already
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Map weekday numbers to weekday names
    weekday_names = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
    
    # df['Date'] accesses the 'Date' column of the DataFrame 'df' 
    # Then uses .dt.weekday attribute to extract the weekday numbers (0 to 6), 
    # where Monday is 0 and Sunday is 6 corresponding to each date.
    # .map(lambda day: weekday_names[day]): This part maps each weekday number to the corresponding weekday name.
    # The lambda function takes the weekday number (day) as input and 
    # retrieves the corresponding weekday name from the weekday_names list.
    # Create the 'Weekday' column by adding the weekday names to the corresponding date
    df['Weekday'] = df['Date'].dt.weekday.map(lambda day: weekday_names[day])
    
    return df

# Create a sample DataFrame with 'Date' column
data = {'Date': pd.date_range(start='2023-01-01', periods=5)}
df = pd.DataFrame(data)
# Show the original dataframe
print("The Original DataFrame:")
print(df)

# Add the 'Weekday' column using the function
df_with_weekday = add_weekday_column(df)

# Print the modified DataFrame
print("\nThe DataFrame With Weekday:")
print(df_with_weekday)


The Original DataFrame:
        Date
0 2023-01-01
1 2023-01-02
2 2023-01-03
3 2023-01-04
4 2023-01-05

The DataFrame With Weekday:
        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


**Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python function to select all rows where the date is between '2023-01-01' and '2023-01-31'.**

In [13]:
import pandas as pd

def select_dates(df):
    """
    Selects rows from the DataFrame where the 'Date' column falls within the range '2023-01-01' to '2023-01-31'.
    
    Args:
        df (pd.DataFrame): The input DataFrame with a 'Date' column.
        
    Returns:
        pd.DataFrame: A new DataFrame containing rows with dates within the specified range.
    """
    # Convert the 'Date' column to datetime if not already
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Define the start and end dates for January
    start_date = pd.Timestamp('2023-01-01')
    end_date = pd.Timestamp('2023-01-31')
    
    # Use boolean indexing to select rows with dates within the range
    selected_rows = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]
    
    return selected_rows

# Create a sample DataFrame with 'Date' column
data = {'Date': pd.date_range(start='2022-12-25', periods=15)}
df = pd.DataFrame(data)

# Show the original dataframe
print("The Original DataFrame:")
print(df)

# Select rows within the date range using the function
selected_df = select_dates(df)

# Print the DataFrame with selected rows
print("\nSelected Rows:")
print(selected_df)


The Original DataFrame:
         Date
0  2022-12-25
1  2022-12-26
2  2022-12-27
3  2022-12-28
4  2022-12-29
5  2022-12-30
6  2022-12-31
7  2023-01-01
8  2023-01-02
9  2023-01-03
10 2023-01-04
11 2023-01-05
12 2023-01-06
13 2023-01-07
14 2023-01-08

Selected Rows:
         Date
7  2023-01-01
8  2023-01-02
9  2023-01-03
10 2023-01-04
11 2023-01-05
12 2023-01-06
13 2023-01-07
14 2023-01-08


**Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?**

The first and foremost library that needs to be imported to use the basic functions of pandas is the pandas library itself. It is imported using the line `import pandas as pd`. This library provides various data structures and functions for data manipulation and analysis, making it a fundamental requirement for working with pandas.