## Q1. List any five functions of the pandas library with execution.

1. read_csv(): This function is used to read a CSV file and create a DataFrame object.

2. head(): This function is used to display the first few rows of a DataFrame.

3. groupby(): This function is used to group rows of a DataFrame based on one or more columns.

4. fillna(): This function is used to replace missing values (NaN) in a DataFrame with a specified value.

5. plot(): This function is used to create plots from data in a DataFrame.

6. merge(): This function is used to merge two or more DataFrames based on a common column. 

## Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [19]:
import pandas as pd

def reindex_df(df):
    new_index = range(1, len(df)*2, 2)
    df.index = new_index
    return df



df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8], 'C': [9, 10, 11, 12]})
print(df)

df = reindex_df(df)
df


   A  B   C
0  1  5   9
1  2  6  10
2  3  7  11
3  4  8  12


Unnamed: 0,A,B,C
1,1,5,9
3,2,6,10
5,3,7,11
7,4,8,12


## Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The function should print the sum to the console. For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should calculate and print the sum of the first three values, which is 60.

In [17]:
import pandas as pd

def sum_first_three(df):
    values_sum = 0
    for i in range(3):
        values_sum += df['Values'][i]
    print("Sum of the first three values in 'Values' column: ", values_sum)

df = pd.DataFrame()
df['Values']= [10, 20, 30, 40, 50]
sum_first_three(df)

Sum of the first three values in 'Values' column:  60


## Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.

In [22]:
import pandas as pd

df = pd.DataFrame({'Text': ['My Name Is Pradosh', 'I am Learning Data Science Masters','Learing at Pwskills']})
def add_word_count_column(df, column_name):
    df['Word_Count'] = df[column_name].apply(lambda x: len(str(x).split()))
    return df
df = add_word_count_column(df, 'Text')
df

Unnamed: 0,Text,Word_Count
0,My Name Is Pradosh,4
1,I am Learning Data Science Masters,6
2,Learing at Pwskills,3


## Q5. How are DataFrame.size() and DataFrame.shape() different?

DataFrame.size() and DataFrame.shape() are both functions in Pandas that can be used to obtain information about the dimensions of a DataFrame, but they return different information.

DataFrame.size() returns the number of elements in the DataFrame, which is equal to the number of rows multiplied by the number of columns. It is equivalent to calling numpy.prod(df.shape).

DataFrame.shape() returns a tuple containing the number of rows and columns in the DataFrame. The tuple has the format (nrows, ncols).

In [26]:
#Example
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df.size

6

In [27]:
df.shape

(3, 2)

## Q6. Which function of pandas do we use to read an excel file?

To read an Excel file in Pandas, we use the read_excel() function.

## Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email addresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address. The username is the part of the email address that appears before the '@' symbol. For example, if the email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your function should extract the username from each email address and store it in the new 'Username' column.

In [28]:
import pandas as pd

def extract_username(df):
    df['Username'] = df['Email'].str.split('@').str[0]
    return df

df = pd.DataFrame()
df['Email'] = ['username@domain.com', 'john.doe@example.com']
extract_username(df)

Unnamed: 0,Email,Username
0,username@domain.com,username
1,john.doe@example.com,john.doe


## Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. Thefunction should return a new DataFrame that contains only the selected rows.
    For example, if df contains the following values:
      A B C
    0 3 5 1
    1 8 2 7
    2 6 9 4
    3 2 3 5
    4 9 1 2
    Your function should select the following rows: A B C
    1 8 2 7
    4 9 1 2
    The function should return a new DataFrame that contains only the selected rows.

In [1]:
import pandas as pd

def select_rows(df):
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows

def main():
    # create a sample DataFrame
    data = {'A': [3, 8, 6, 2, 9], 'B': [5, 2, 9, 3, 1], 'C': [1, 7, 4, 5, 2]}
    df = pd.DataFrame(data)

    # select rows where A > 5 and B < 10
    selected_df = select_rows(df)

    # print the selected rows
    print(selected_df)

if __name__ == '__main__':
    main()

   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


## Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.

In [5]:
import pandas as pd

def calculate_stats(df):
    # calculate the mean, median, and standard deviation of the 'Values' column
    mean = df['Values'].mean()
    median = df['Values'].median()
    std_dev = df['Values'].std()

    return mean, median, std_dev
df = pd.DataFrame({'Values': [100, 180, 300, 450, 607]})
df

Unnamed: 0,Values
0,100
1,180
2,300
3,450
4,607


In [6]:
mean, median , stdev = calculate_stats(df)
print(f'Mean of Values is : {mean}')
print(f'Median of Values is : {median}')
print(f'Standard Deviation of Values is : {round(stdev,4)}')

Mean of Values is : 327.4
Median of Values is : 300.0
Standard Deviation of Values is : 204.5698


## Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and should include the current day.

In [7]:
import pandas as pd

def calculate_moving_average(df):
    # sort the DataFrame by date
    df = df.sort_values(by='Date')

    # calculate the moving average using a rolling window of size 7
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()

    return df

In [8]:
df = pd.DataFrame({
    'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08', '2023-01-09', '2023-01-10', '2023-01-11', '2023-01-12'],
    'Sales': [100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650]
})

df = calculate_moving_average(df)

In [9]:
df_new = calculate_moving_average(df)
df_new

Unnamed: 0,Date,Sales,MovingAverage
0,2023-01-01,100,100.0
1,2023-01-02,150,125.0
2,2023-01-03,200,150.0
3,2023-01-04,250,175.0
4,2023-01-05,300,200.0
5,2023-01-06,350,225.0
6,2023-01-07,400,250.0
7,2023-01-08,450,300.0
8,2023-01-09,500,350.0
9,2023-01-10,550,400.0


## Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g. Monday, Tuesday) corresponding to each date in the 'Date' column.

                                        For example, if df contains the following values:
                                        Date
                                        0 2023-01-01
                                        1 2023-01-02
                                        2 2023-01-03
                                        3 2023-01-04
                                        4 2023-01-05
                                        Your function should create the following DataFrame:

                                        Date Weekday
                                        0 2023-01-01 Sunday
                                        1 2023-01-02 Monday
                                        2 2023-01-03 Tuesday
                                        3 2023-01-04 Wednesday
                                        4 2023-01-05 Thursday
                                        The function should return the modified DataFrame.

In [10]:
def return_weekdays(df):

    # Below changes String Date to datetime format
    df['Date']=pd.to_datetime(df['Date'])

    # below returns weekdays returns weekday
    df['Weekday'] = df['Date'].dt.day_name()
    return df

In [11]:
df = pd.DataFrame({'Date':['2023-01-01','2023-01-02','2023-01-03','2023-01-04','2023-01-05']})
df

Unnamed: 0,Date
0,2023-01-01
1,2023-01-02
2,2023-01-03
3,2023-01-04
4,2023-01-05


In [12]:
df_new = return_weekdays(df)
df_new

Unnamed: 0,Date,Weekday
0,2023-01-01,Sunday
1,2023-01-02,Monday
2,2023-01-03,Tuesday
3,2023-01-04,Wednesday
4,2023-01-05,Thursday


## Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [13]:
def select_january_rows(df):
    # Convert 'Date' column to pandas datetime format
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Select rows with dates between '2023-01-01' and '2023-01-31'
    january_rows = df[df['Date'].between('2023-01-01', '2023-01-31')]
    
    return january_rows

In [17]:
df = pd.DataFrame({'Date':pd.date_range(start='1/1/2023',end='10/1/2023',freq='D')})
df

Unnamed: 0,Date
0,2023-01-01
1,2023-01-02
2,2023-01-03
3,2023-01-04
4,2023-01-05
...,...
269,2023-09-27
270,2023-09-28
271,2023-09-29
272,2023-09-30


In [18]:
select_january_rows(df)

Unnamed: 0,Date
0,2023-01-01
1,2023-01-02
2,2023-01-03
3,2023-01-04
4,2023-01-05
5,2023-01-06
6,2023-01-07
7,2023-01-08
8,2023-01-09
9,2023-01-10


## Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?

The first and foremost necessary library that needs to be imported to use the basic functions of pandas is pandas itself. You can import pandas using the following statement:

In [19]:
import pandas as pd

This statement imports the pandas library and gives it an alias pd, which is a common convention in the Python community. 