## Ans 1

Here are five functions of the Pandas library along with an example of how to execute them:
* 1: read_csv(): This function is used to read data from a CSV file and create a DataFrame object in Pandas.

In [None]:
import pandas as pd

#Read CSV file into a DataFrame
df = pd.read_csv('data.csv')

#Display the first five rows of the DataFrame
print(df.head())

* 2: fillna(): This function is used to fill missing values in a DataFrame with a specified value or method.

In [48]:
import pandas as pd
import numpy as np

# Create a DataFrame with missing values
df = pd.DataFrame({'A': [1, 2, np.nan, 4],
                   'B': [5, np.nan, 7, 8]})

# Fill missing values with 0
df.fillna(0, inplace=True)

# Display the DataFrame
print(df)


     A    B
0  1.0  5.0
1  2.0  0.0
2  0.0  7.0
3  4.0  8.0


* 3: groupby(): This function is used to group rows of a DataFrame based on one or more columns, and then perform some computation on each group.

In [49]:
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'Alice', 'Bob', 'Charlie'],
                   'Score': [85, 90, 75, 80, 95, 70]})

# Group the DataFrame by 'Name' column and calculate the mean of each group's 'Score'
grouped = df.groupby('Name').mean()

# Display the grouped DataFrame
print(grouped)


         Score
Name          
Alice     82.5
Bob       92.5
Charlie   72.5


* concat(): This function is used to concatenate two or more DataFrames along a specified axis.

In [50]:
import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [4, 5, 6], 'B': [7, 8, 9]})

# Concatenate the two DataFrames along the row axis (axis=0)
concatenated = pd.concat([df1, df2], axis=0)

# Display the concatenated DataFrame
print(concatenated)


   A  B
0  1  4
1  2  5
2  3  6
0  4  7
1  5  8
2  6  9


* pivot_table(): This function is used to create a pivot table from a DataFrame.

In [51]:
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'Alice', 'Bob', 'Charlie'],
                   'Subject': ['Math', 'Math', 'Math', 'Science', 'Science', 'Science'],
                   'Score': [85, 90, 75, 80, 95, 70]})

# Create a pivot table from the DataFrame, with 'Name' as rows and 'Subject' as columns
pivot_table = pd.pivot_table(df, values='Score', index='Name', columns='Subject', aggfunc=np.mean)

# Display the pivot table
print(pivot_table)


Subject  Math  Science
Name                  
Alice      85       80
Bob        90       95
Charlie    75       70


## Ans 2

Here's an example Python function that re-indexes a Pandas DataFrame with a new index that starts from 1 and increments by 2 for each row:

In [52]:
import pandas as pd

def reindex_dataframe(df):
    # Create a new index that starts from 1 and increments by 2 for each row
    new_index = pd.Index(range(1, 2*len(df)+1, 2))
    
    # Reindex the DataFrame with the new index
    df = df.reindex(new_index)
    
    # Return the reindexed DataFrame
    return df


In [53]:
# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

# Reindex the DataFrame using the reindex_dataframe() function
df = reindex_dataframe(df)

# Display the reindexed DataFrame
print(df)


     A    B    C
1  2.0  5.0  8.0
3  NaN  NaN  NaN
5  NaN  NaN  NaN


This will output a reindexed DataFrame with the same data as the original DataFrame, but with a new index that starts from 1 and increments by 2 for each row.

## Ans 3

Here's an example Python function that iterates over a Pandas DataFrame and calculates the sum of the first three values in the 'Values' column:

In [54]:
import pandas as pd

def calculate_sum(df):
    # Initialize a variable to store the sum
    total = 0
    
    # Iterate over the first three rows in the 'Values' column and calculate the sum
    for val in df['Values'].iloc[:3]:
        total += val
    
    # Print the sum to the console
    print('The sum of the first three values in the "Values" column is:', total)


We can call this function by passing in your DataFrame as an argument:

In [55]:
# Create a sample DataFrame
df = pd.DataFrame({'Values': [1, 2, 3, 4, 5]})

# Calculate the sum using the calculate_sum() function
calculate_sum(df)


The sum of the first three values in the "Values" column is: 6


## Ans 4
Here is a Python function that takes a Pandas DataFrame df with a column ‘Text’ as input and returns a new DataFrame with an additional column ‘Word_Count’ that contains the number of words in each row of the ‘Text’ column:

In [56]:
import pandas as pd

def add_word_count(df):
    df['Word_Count'] = df['Text'].apply(lambda x: len(str(x).split()))
    return df

# Create a sample DataFrame
data = {'Text': ['This is a sentence', 'This is another sentence', 'This is yet another sentence']}
df = pd.DataFrame(data)

# Call the add_word_count function
df = add_word_count(df)

# Display the resulting DataFrame
print(df)

                           Text  Word_Count
0            This is a sentence           4
1      This is another sentence           4
2  This is yet another sentence           5


## Ans 5
DataFrame.size and DataFrame.shape are two attributes of a Pandas DataFrame that provide information about its size.

DataFrame.size returns the total number of elements in the DataFrame. This is equal to the product of the number of rows and the number of columns.

DataFrame.shape returns a tuple representing the dimensions of the DataFrame. The first element of the tuple is the number of rows and the second element is the number of columns.
Here’s an example to illustrate the difference between DataFrame.size and DataFrame.shape:


In [57]:
import pandas as pd

# Create a sample DataFrame
data = {'A': [1, 2], 'B': [3, 4]}
df = pd.DataFrame(data)

# Get the size and shape of the DataFrame
size = df.size
shape = df.shape

print(f"Size: {size}")
print(f"Shape: {shape}")

Size: 4
Shape: (2, 2)


## Ans 6
You can use the pandas.read_excel function to read the contents of an Excel file into a Pandas DataFrame. This function takes the path to the Excel file as its first argument and returns a DataFrame containing the data from the file.

## Ans 7
Here is a Python function that takes a Pandas DataFrame df with a column ‘Email’ as input and returns a new DataFrame with an additional column ‘Username’ that contains only the username part of each email address:

In [58]:
import pandas as pd

def add_username(df):
    df['Username'] = df['Email'].apply(lambda x: str(x).split('@')[0])
    return df

# Create a sample DataFrame
data = {'Email': ['alice@example.com', 'bob@example.com', 'charlie@example.com']}
df = pd.DataFrame(data)

# Call the add_username function
df = add_username(df)

# Display the resulting DataFrame
print(df)

                 Email Username
0    alice@example.com    alice
1      bob@example.com      bob
2  charlie@example.com  charlie


## Ans 8


In [59]:
import pandas as pd

def select_rows(df):
    df = df[(df['A'] > 5) & (df['B'] < 10)]
    return df

# Create a sample DataFrame
data = {'A': [3, 8, 6, 2, 9], 'B': [5, 2, 9, 3, 1], 'C': [1, 7, 4, 5, 2]}
df = pd.DataFrame(data)

# Call the select_rows function
df = select_rows(df)

# Display the resulting DataFrame
print(df)

   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


## Ans 9
Here is a Python function that takes a Pandas DataFrame df with a column ‘Values’ as input and returns the mean, median, and standard deviation of the values in the ‘Values’ column:

In [60]:
import pandas as pd

def calculate_statistics(df):
    mean = df['Values'].mean()
    median = df['Values'].median()
    std = df['Values'].std()
    return mean, median, std

In [61]:
# Create a sample DataFrame
data = {'Values': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)

# Call the calculate_statistics function
mean, median, std = calculate_statistics(df)

# Display the resulting statistics
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Standard Deviation: {std}")

Mean: 3.0
Median: 3.0
Standard Deviation: 1.5811388300841898


## Ans 10

Here is a Python function that takes a Pandas DataFrame df with columns ‘Sales’ and ‘Date’ as input and returns a new DataFrame with an additional column ‘MovingAverage’ that contains the moving average of the sales for the past 7 days for each row in the DataFrame:

In [62]:
import pandas as pd

def add_moving_average(df):
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    return df

In [63]:
# Create a sample DataFrame
data = {'Date': ['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06', '2022-01-07', '2022-01-08'], 'Sales': [1, 2, 3, 4, 5, 6, 7, 8]}
df = pd.DataFrame(data)

# Call the add_moving_average function
df = add_moving_average(df)

# Display the resulting DataFrame
print(df)

         Date  Sales  MovingAverage
0  2022-01-01      1            1.0
1  2022-01-02      2            1.5
2  2022-01-03      3            2.0
3  2022-01-04      4            2.5
4  2022-01-05      5            3.0
5  2022-01-06      6            3.5
6  2022-01-07      7            4.0
7  2022-01-08      8            5.0


## Ans 11
Here is a Python function that takes a Pandas DataFrame df with a column ‘Date’ as input and returns a new DataFrame with an additional column ‘Weekday’ that contains the weekday name corresponding to each date in the ‘Date’ column:

In [64]:
import pandas as pd

def add_weekday(df):
    df['Weekday'] = pd.to_datetime(df['Date']).dt.day_name()
    return df

You can use this function by calling add_weekday(df) where df is your input DataFrame.

This function uses the pandas.to_datetime function to convert the values in the ‘Date’ column to Pandas Timestamp objects. The day_name method is then called on these Timestamp objects to get the weekday name corresponding to each date.

Here’s an example that shows how to use the add_weekday function:

In [65]:
# Create a sample DataFrame
data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']}
df = pd.DataFrame(data)

# Call the add_weekday function
df = add_weekday(df)

# Display the resulting DataFrame
print(df)

         Date    Weekday
0  2023-01-01     Sunday
1  2023-01-02     Monday
2  2023-01-03    Tuesday
3  2023-01-04  Wednesday
4  2023-01-05   Thursday


## Ans 12
Here is a Python function that takes a Pandas DataFrame df with a column ‘Date’ as input and returns a new DataFrame that contains only the rows where the date is between ‘2023-01-01’ and ‘2023-01-31’:

In [66]:
import pandas as pd

def select_rows(df):
    start_date = '2023-01-01'
    end_date = '2023-01-31'
    mask = (df['Date'] >= start_date) & (df['Date'] <= end_date)
    df = df.loc[mask]
    return df

# Create a sample DataFrame
data = {'Date': ['2022-12-31', '2023-01-01', '2023-01-15', '2023-01-31', '2023-02-01']}
df = pd.DataFrame(data)

# Call the select_rows function
df = select_rows(df)

# Display the resulting DataFrame
print(df)

         Date
1  2023-01-01
2  2023-01-15
3  2023-01-31


This code creates a sample DataFrame with a ‘Date’ column containing some example dates. The select_rows function is then called on this DataFrame to select only the rows where the date is between ‘2023-01-01’ and ‘2023-01-31’. The resulting DataFrame is then printed to the console.

## Ans 13
To use the basic functions of the Pandas library, you need to import the Pandas library itself. This can be done using the import statement in Python. It is common practice to import Pandas using the alias pd to make it easier to reference the library in your code. Here’s an example of how to import the Pandas library:

In [67]:
import pandas as pd