# **Q1**

**List any five functions of the pandas library with execution.**

**Answer:**

 * read_csv():


In [1]:
import pandas as pd

In [None]:
# Read a CSV file and create a DataFrame
df = pd.read_csv('data.csv')

# Print the DataFrame
print(df)

 * dropna():

In [2]:
data = {'Name': ['John', 'Emma', 'Peter', None],
        'Age': [25, 28, None, 30],
        'City': ['New York', 'London', 'Paris', '']}
df = pd.DataFrame(data)

# Drop rows with missing values
df_dropped = df.dropna()

# Print the DataFrame after dropping missing values
print(df_dropped)

   Name   Age      City
0  John  25.0  New York
1  Emma  28.0    London


 * groupby():

In [3]:
# Group the DataFrame by 'City' and calculate the average age in each city
grouped_df = df.groupby('City')['Age'].mean()

# Print the grouped DataFrame
print(grouped_df)

City
            30.0
London      28.0
New York    25.0
Paris        NaN
Name: Age, dtype: float64


 * merge():

In [4]:
import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['John', 'Emma', 'Peter']})
df2 = pd.DataFrame({'ID': [2, 3, 4], 'City': ['London', 'Paris', 'Sydney']})

# Merge the DataFrames based on the 'ID' column
merged_df = pd.merge(df1, df2, on='ID')

# Print the merged DataFrame
print(merged_df)

   ID   Name    City
0   2   Emma  London
1   3  Peter   Paris


 * to_csv():

In [5]:
# Write the DataFrame to a CSV file
df.to_csv('output.csv', index=False)

# **Q2**

**Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.**


**Answer:**



In [6]:
def reindex_dataframe(df):
    new_index = pd.Index(range(1, len(df) * 2, 2))  # Create a new index starting from 1 and incrementing by 2
    df = df.set_index(new_index)  # Set the new index on the DataFrame
    return df

# Example usage
data = {'A': [10, 20, 30],
        'B': [40, 50, 60],
        'C': [70, 80, 90]}
df = pd.DataFrame(data)

# Print the original DataFrame
print("Original DataFrame:")
print(df)

# Re-index the DataFrame
df_reindexed = reindex_dataframe(df)

# Print the re-indexed DataFrame
print("\nRe-indexed DataFrame:")
print(df_reindexed)

Original DataFrame:
    A   B   C
0  10  40  70
1  20  50  80
2  30  60  90

Re-indexed DataFrame:
    A   B   C
1  10  40  70
3  20  50  80
5  30  60  90


# **Q3**

**You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The function should print the sum to the console.**


**Answer:**



In [7]:
def calculate_sum(df):
    sum_values = 0
    count = 0

    for index, row in df.iterrows():
        if count < 3:
            sum_values += row['Values']
            count += 1
        else:
            break

    print("Sum of the first three values:", sum_values)

# Example usage
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Print the original DataFrame
print("Original DataFrame:")
print(df)

# Calculate the sum of the first three values
calculate_sum(df)


Original DataFrame:
   Values
0      10
1      20
2      30
3      40
4      50
Sum of the first three values: 60


# **Q4**

**Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.**

**Answer:**



In [8]:
def add_word_count_column(df):
    df['Word_Count'] = df['Text'].apply(lambda x: len(str(x).split()))
    return df

# Example usage
data = {'Text': ['Hello, how are you?', 'I am doing well.', 'Python programming is fun!']}
df = pd.DataFrame(data)

# Print the original DataFrame
print("Original DataFrame:")
print(df)

# Add the 'Word_Count' column
df_with_word_count = add_word_count_column(df)

# Print the DataFrame with the 'Word_Count' column
print("\nDataFrame with Word_Count column:")
print(df_with_word_count)

Original DataFrame:
                         Text
0         Hello, how are you?
1            I am doing well.
2  Python programming is fun!

DataFrame with Word_Count column:
                         Text  Word_Count
0         Hello, how are you?           4
1            I am doing well.           4
2  Python programming is fun!           4



# **Q5**

**How are DataFrame.size() and DataFrame.shape() different?**

**Answer:**

DataFrame.size(): This method returns the total number of elements in the DataFrame, which is equal to the number of rows multiplied by the number of columns. It calculates the size as the product of the DataFrame's shape (nrows, ncols). The size() method does not consider the data type of the elements; it simply returns the count of all elements in the DataFrame.

DataFrame.shape(): This method returns a tuple representing the dimensions of the DataFrame. The shape is represented as (nrows, ncols), where nrows is the number of rows and ncols is the number of columns in the DataFrame. It provides the number of rows and columns of the DataFrame based on its structure.

In [9]:
data = {'A': [1, 2, 3],
        'B': [4, 5, 6],
        'C': [7, 8, 9]}
df = pd.DataFrame(data)

# Size of the DataFrame
print("Size of the DataFrame:", df.size)

# Shape of the DataFrame
print("Shape of the DataFrame:", df.shape)

Size of the DataFrame: 9
Shape of the DataFrame: (3, 3)


# **Q6**

**Which function of pandas do we use to read an excel file?**

**Answer:**

To read an Excel file in pandas, you can use the pandas.read_excel() function. It is specifically designed to read data from Excel files and create a DataFrame.


In [None]:
import pandas as pd

# Read the Excel file
df = pd.read_excel('data.xlsx')

# Print the DataFrame
print(df)

# **Q7**

**You have a Pandas DataFrame df that contains a column named 'Email' that contains emailaddresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address.**

**Answer:**



In [12]:
def extract_username(df):
    df['Username'] = df['Email'].str.split('@').str[0]
    return df

data = {'Email': ['john.doe@example.com', 'jane.smith@example.com', 'james.brown@example.com']}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Extract the username and create the 'Username' column
df_with_username = extract_username(df)

# Print the DataFrame with the 'Username' column
print("\nDataFrame with Username column:")
print(df_with_username)

Original DataFrame:
                     Email
0     john.doe@example.com
1   jane.smith@example.com
2  james.brown@example.com

DataFrame with Username column:
                     Email     Username
0     john.doe@example.com     john.doe
1   jane.smith@example.com   jane.smith
2  james.brown@example.com  james.brown


# **Q8**

**You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that select all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The function should return a new DataFrame that contains only the selected rows.**

**Answer:**



In [13]:
def select_rows(df):
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows

# Example usage
data = {'A': [3, 8, 6, 2, 9],
        'B': [5, 2, 9, 3, 1],
        'C': [1, 7, 4, 5, 2]}
df = pd.DataFrame(data)

# Print the original DataFrame
print("Original DataFrame:")
print(df)

# Select rows based on conditions and create a new DataFrame
selected_df = select_rows(df)

# Print the selected DataFrame
print("\nSelected DataFrame:")
print(selected_df)

Original DataFrame:
   A  B  C
0  3  5  1
1  8  2  7
2  6  9  4
3  2  3  5
4  9  1  2

Selected DataFrame:
   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


# **Q9**

**Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.**

**Answer:**



In [14]:
def calculate_stats(df, column):
    mean_value = df[column].mean()
    median_value = df[column].median()
    std_value = df[column].std()
    return mean_value, median_value, std_value

# Example usage
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Print the original DataFrame
print("Original DataFrame:")
print(df)

# Calculate mean, median, and standard deviation
mean, median, std = calculate_stats(df, 'Values')

# Print the calculated statistics
print("\nMean:", mean)
print("Median:", median)
print("Standard Deviation:", std)

Original DataFrame:
   Values
0      10
1      20
2      30
3      40
4      50

Mean: 30.0
Median: 30.0
Standard Deviation: 15.811388300841896


# **Q10**

**Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to
create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days
for each row in the DataFrame. The moving average should be calculated using a window of size 7 and
should include the current day.**

**Answer:**



In [15]:
def calculate_moving_average(df):
    window_size = 7
    df['MovingAverage'] = df['Sales'].rolling(window_size, min_periods=1).mean()
    return df

# Example usage
data = {'Sales': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
        'Date': pd.date_range('2023-01-01', periods=10)}
df = pd.DataFrame(data)

# Print the original DataFrame
print("Original DataFrame:")
print(df)

# Calculate the moving average and create the 'MovingAverage' column
df_with_ma = calculate_moving_average(df)

# Print the DataFrame with the 'MovingAverage' column
print("\nDataFrame with MovingAverage column:")
print(df_with_ma)


Original DataFrame:
   Sales       Date
0     10 2023-01-01
1     20 2023-01-02
2     30 2023-01-03
3     40 2023-01-04
4     50 2023-01-05
5     60 2023-01-06
6     70 2023-01-07
7     80 2023-01-08
8     90 2023-01-09
9    100 2023-01-10

DataFrame with MovingAverage column:
   Sales       Date  MovingAverage
0     10 2023-01-01           10.0
1     20 2023-01-02           15.0
2     30 2023-01-03           20.0
3     40 2023-01-04           25.0
4     50 2023-01-05           30.0
5     60 2023-01-06           35.0
6     70 2023-01-07           40.0
7     80 2023-01-08           50.0
8     90 2023-01-09           60.0
9    100 2023-01-10           70.0


# **Q11**

**You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new
column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.
Monday, Tuesday) corresponding to each date in the 'Date' column.
For example, if df contains the following values:
Date

    0 2023-01-01
    1 2023-01-02
    2 2023-01-03
    3 2023-01-04
    4 2023-01-05
Your function should create the following DataFrame:

    Date Weekday
    0 2023-01-01 Sunday
    1 2023-01-02 Monday
    2 2023-01-03 Tuesday
    3 2023-01-04 Wednesday
    4 2023-01-05 Thursday
The function should return the modified DataFrame.

**Answer:**



In [23]:
data = {'Date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'])}
df = pd.DataFrame(data)

In [24]:
df

Unnamed: 0,Date
0,2023-01-01
1,2023-01-02
2,2023-01-03
3,2023-01-04
4,2023-01-05


In [25]:
df['weekday'] = df['Date'].dt.day

In [26]:
df

Unnamed: 0,Date,weekday
0,2023-01-01,1
1,2023-01-02,2
2,2023-01-03,3
3,2023-01-04,4
4,2023-01-05,5


# **Q12**

**Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Pythonfunction to select all rows where the date is between '2023-01-01' and '2023-01-31'.**

**Answer:**



In [27]:
def select_rows_by_date(df, start_date, end_date):
    mask = (df['Date'] >= start_date) & (df['Date'] <= end_date)
    selected_rows = df[mask]
    return selected_rows

# Example usage
data = {'Date': pd.to_datetime(['2023-01-01', '2023-01-15', '2023-01-30', '2023-02-05'])}
df = pd.DataFrame(data)

# Print the original DataFrame
print("Original DataFrame:")
print(df)

# Select rows between '2023-01-01' and '2023-01-31'
start_date = '2023-01-01'
end_date = '2023-01-31'
selected_df = select_rows_by_date(df, start_date, end_date)

# Print the selected DataFrame
print("\nSelected DataFrame:")
print(selected_df)

Original DataFrame:
        Date
0 2023-01-01
1 2023-01-15
2 2023-01-30
3 2023-02-05

Selected DataFrame:
        Date
0 2023-01-01
1 2023-01-15
2 2023-01-30


# **Q13**

**To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?**

**Answer:**

To use the basic functions of pandas, the first and foremost necessary library that needs to be imported is the pandas library itself.


In [None]:
import pandas as pd

By importing pandas with the alias 'pd', you can access the various functions and classes provided by the pandas library. This includes functions for data manipulation, analysis, and data structure creation, such as creating DataFrames, performing operations on columns, applying filters, and much more.