Q1. List any five functions of the pandas library with execution.

read_csv(): Used to read data from a CSV file and create a DataFrame.
head(): Returns the first n rows of a DataFrame (default n=5).
info(): Provides information about a DataFrame, including data types, non-null values, and memory usage.
groupby(): Used for grouping data based on one or more columns and applying aggregate functions.
describe(): Provides summary statistics of numerical columns in a DataFrame.

In [1]:
import pandas as pd

# Read data from a CSV file
df = pd.read_csv('data.csv')

print(df.head())


   name     email      phone no
0  yash  ynagpale  469282974092
1  kush  kush34@j       6489364


In [2]:
import pandas as pd

data = {'A': [1, 2, 3, 4, 5], 'B': ['a', 'b', 'c', 'd', 'e']}
df = pd.DataFrame(data)

print(df.head(3))


   A  B
0  1  a
1  2  b
2  3  c


In [3]:
import pandas as pd

data = {'A': [1, 2, 3], 'B': ['x', 'y', 'z']}
df = pd.DataFrame(data)

df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   A       3 non-null      int64 
 1   B       3 non-null      object
dtypes: int64(1), object(1)
memory usage: 176.0+ bytes


In [4]:
import pandas as pd

data = {'Category': ['A', 'B', 'A', 'B', 'A'], 'Value': [10, 20, 15, 25, 30]}
df = pd.DataFrame(data)

grouped = df.groupby('Category')['Value'].sum()
print(grouped)


Category
A    55
B    45
Name: Value, dtype: int64


In [5]:
import pandas as pd

data = {'A': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)

print(df['A'].describe())


count    5.000000
mean     3.000000
std      1.581139
min      1.000000
25%      2.000000
50%      3.000000
75%      4.000000
max      5.000000
Name: A, dtype: float64


Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [6]:
import pandas as pd

def reindex_with_incrementing_values(df):
    # Create a new index starting from 1 and incrementing by 2
    new_index = range(1, len(df) * 2, 2)
    
    # Assign the new index to the DataFrame
    df.index = new_index
    
    return df

# Example usage:
data = {'A': [10, 20, 30], 'B': [40, 50, 60], 'C': [70, 80, 90]}
df = pd.DataFrame(data)

# Call the function to re-index the DataFrame
df = reindex_with_incrementing_values(df)

# Print the DataFrame with the new index
print(df)


    A   B   C
1  10  40  70
3  20  50  80
5  30  60  90


Q3.You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.

In [7]:
import pandas as pd

def calculate_sum_of_first_three_values(df):
    # Check if the 'Values' column exists in the DataFrame
    if 'Values' in df.columns:
        # Get the first three values from the 'Values' column
        first_three_values = df['Values'].head(3)
        
        # Calculate the sum of the first three values
        sum_of_first_three = first_three_values.sum()
        
        # Print the sum to the console
        print("Sum of the first three values:", sum_of_first_three)
    else:
        print("The 'Values' column does not exist in the DataFrame.")

# Example usage:
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Call the function to calculate and print the sum
calculate_sum_of_first_three_values(df)


Sum of the first three values: 60


Q4.Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column
'Word_Count' that contains the number of words in each row of the 'Text' column.

In [8]:
import pandas as pd

def count_words(text):
    # Split the text into words using whitespace as a delimiter and count the words
    words = text.split()
    return len(words)

def add_word_count_column(df):
    # Apply the count_words function to each row in the 'Text' column
    df['Word_Count'] = df['Text'].apply(count_words)

    return df

# Example usage:
data = {'Text': ['This is a sample sentence.', 'Count the words here.', 'How many words are in this?']}
df = pd.DataFrame(data)

# Call the function to add the 'Word_Count' column
df = add_word_count_column(df)

# Print the DataFrame with the new 'Word_Count' column
print(df)


                          Text  Word_Count
0   This is a sample sentence.           5
1        Count the words here.           4
2  How many words are in this?           6


Q5. How are DataFrame.size() and DataFrame.shape() different?

DataFrame.size and DataFrame.shape are both attributes used in pandas to retrieve information about the dimensions of a DataFrame, but they serve different purposes and provide different information:

DataFrame.size:

DataFrame.size returns the total number of elements (cells) in the DataFrame.
It represents the product of the number of rows and the number of columns.
For a DataFrame with dimensions (m, n), DataFrame.size returns m * n.
DataFrame.shape:

DataFrame.shape returns a tuple representing the dimensions of the DataFrame.
The tuple consists of two values: the number of rows (m) and the number of columns (n).
It provides a clearer breakdown of the DataFrame's structure compared to DataFrame.size.
In summary, while DataFrame.size gives you the total number of elements in the DataFrame, DataFrame.shape provides a more informative breakdown of the number of rows and columns.

Q6. Which function of pandas do we use to read an excel file?

pd.read_excel()

Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email
addresses in the format 'username@domain.com'. Write a Python function that creates a new column
'Username' in df that contains only the username part of each email address.

In [11]:
import pandas as pd

def extract_username(df):
    # Extract the username using a lambda function
    df['Username'] = df['Email'].apply(lambda email: email.split('@')[0])

    return df

# Example usage:
data = {'Email': ['user1@example.com', 'user2@test.com', 'user3@example.org']}
df = pd.DataFrame(data)

# Call the function to add the 'Username' column
df = extract_username(df)

# Print the DataFrame with the new 'Username' column
print(df)


               Email Username
0  user1@example.com    user1
1     user2@test.com    user2
2  user3@example.org    user3


Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.

In [12]:
import pandas as pd

def select_rows_by_condition(df):
    # Use boolean indexing to select rows where 'A' > 5 and 'B' < 10
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]

    return selected_rows

# Example usage:
data = {'A': [3, 8, 6, 2, 9], 'B': [5, 2, 9, 3, 1], 'C': [1, 7, 4, 5, 2]}
df = pd.DataFrame(data)

# Call the function to select rows based on the condition
selected_df = select_rows_by_condition(df)

# Print the new DataFrame containing the selected rows
print(selected_df)


   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean,
median, and standard deviation of the values in the 'Values' column.

In [13]:
import pandas as pd

def calculate_statistics(df):
    # Calculate mean, median, and standard deviation
    mean_value = df['Values'].mean()
    median_value = df['Values'].median()
    std_deviation = df['Values'].std()

    return mean_value, median_value, std_deviation

# Example usage:
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Call the function to calculate statistics
mean, median, std = calculate_statistics(df)

# Print the calculated statistics
print("Mean:", mean)
print("Median:", median)
print("Standard Deviation:", std)


Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896


Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to
create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days
for each row in the DataFrame. The moving average should be calculated using a window of size 7 and
should include the current day.

In [14]:
import pandas as pd

def calculate_moving_average(df):
    # Convert the 'Date' column to datetime if it's not already
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Sort the DataFrame by date
    df = df.sort_values(by='Date')
    
    # Calculate the moving average with a window of size 7, including the current day
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    
    return df

# Example usage:
data = {'Date': ['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06', '2022-01-07'],
        'Sales': [10, 20, 15, 30, 25, 40, 35]}
df = pd.DataFrame(data)

# Call the function to calculate the moving average
df = calculate_moving_average(df)

# Print the DataFrame with the new 'MovingAverage' column
print(df)


        Date  Sales  MovingAverage
0 2022-01-01     10      10.000000
1 2022-01-02     20      15.000000
2 2022-01-03     15      15.000000
3 2022-01-04     30      18.750000
4 2022-01-05     25      20.000000
5 2022-01-06     40      23.333333
6 2022-01-07     35      25.000000


Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new
column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.
Monday, Tuesday) corresponding to each date in the 'Date' column.

In [15]:
import pandas as pd

def add_weekday_column(df):
    # Convert the 'Date' column to datetime if it's not already
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Use dt.strftime to get the weekday name and add it as 'Weekday' column
    df['Weekday'] = df['Date'].dt.strftime('%A')
    
    return df

# Example usage:
data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']}
df = pd.DataFrame(data)

# Call the function to add the 'Weekday' column
df = add_weekday_column(df)

# Print the DataFrame with the new 'Weekday' column
print(df)


        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python
function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [16]:
import pandas as pd

def select_rows_in_date_range(df):
    # Convert the 'Date' column to datetime if it's not already
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Define the date range
    start_date = pd.to_datetime('2023-01-01')
    end_date = pd.to_datetime('2023-01-31')
    
    # Use boolean indexing to select rows within the date range
    selected_rows = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]
    
    return selected_rows

# Example usage:
data = {'Date': ['2023-01-01', '2023-01-15', '2023-02-05', '2023-01-10', '2023-03-20']}
df = pd.DataFrame(data)

# Call the function to select rows within the date range
selected_df = select_rows_in_date_range(df)

# Print the new DataFrame containing selected rows
print(selected_df)


        Date
0 2023-01-01
1 2023-01-15
3 2023-01-10


Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to
be imported?

In [17]:
import pandas as pd