Q1. List any five functions of the pandas library with execution.

1. read_csv(): reads a CSV file and returns a pandas DataFrame object.
2. to_csv(): This function is used to write a pandas DataFrame to a CSV file.
3. groupby(): groups a pandas DataFrame by one or more columns.
4. merge(): merges two pandas DataFrames on a specified column or index.
5. pivot_table(): creates a spreadsheet-style pivot table as a pandas DataFrame.

Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [2]:
import pandas as pd

def reindex_df(df):
    new_index = range(1, 2*len(df)+1, 2) # create a new index with odd numbers
    df.index = new_index # reindex the dataframe
    return df

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}) # create a dataframe
reindexed_df = reindex_df(df)
print(reindexed_df)

   A  B  C
1  1  4  7
3  2  5  8
5  3  6  9


Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.

In [3]:
import pandas as pd

def calculate_sum(df):
    total_sum = 0
    for i in range(3):
        total_sum += df['Values'][i]
    print(total_sum)

df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})
calculate_sum(df)

60


Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column
'Word_Count' that contains the number of words in each row of the 'Text' column.

In [4]:
import pandas as pd

def add_word_count(df):
    df['Word_Count'] = df['Text'].apply(lambda x: len(str(x).split(" "))) # add a new column with the word count and calculate word count
    return df

df = pd.DataFrame({'Text': ['This is a sentence', 'Another sentence', 'A third sentence']})
add_word_count(df)
print(df)

                 Text  Word_Count
0  This is a sentence           4
1    Another sentence           2
2    A third sentence           3


Q5. How are DataFrame.size() and DataFrame.shape() different?

DataFrame.size() and DataFrame.shape() are different methods in pandas DataFrame.

* DataFrame.size() returns the number of elements in a DataFrame, which is equal to the number of rows multiplied by the number of columns. It returns a single integer value.

* DataFrame.shape() returns a tuple that contains the number of rows and columns in a DataFrame. It returns two integer values, representing the number of rows and columns respectively.

For example, if we have a pandas DataFrame df with 5 rows and 3 columns, df.size() would return 15, and df.shape() would return (5, 3).

Q6. Which function of pandas do we use to read an excel file?

To read an Excel file in pandas, we can use the read_excel() function. This function can read Excel files in various formats, including .xls and .xlsx, and allows us to specify the sheet name or sheet index to read.

syntax: pd.read_excel('example.xlsx')

Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email
addresses in the format 'username@domain.com'. Write a Python function that creates a new column
'Username' in df that contains only the username part of each email address.

In [7]:
import pandas as pd

def add_username_column(df):
    df['Username'] = df['Email'].apply(lambda x: x.split('@')[0])
    return df

df = pd.DataFrame({'Email': ['vaibhav@example.com', 'jack@example.com', 'pwskills@example.com', 'john.doe@example.com']})
add_username_column(df)
print(df)


                  Email  Username
0   vaibhav@example.com   vaibhav
1      jack@example.com      jack
2  pwskills@example.com  pwskills
3  john.doe@example.com  john.doe


Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.
For example, if df contains the following values:
* <B>\ A B C<br>
 0 3 5 1<br>
 1 8 2 7<br>
 2 6 9 4<br>
 3 2 3 5<br>
 4 9 1 2<br>

In [8]:
df = pd.DataFrame({'A': [3, 8, 6, 2, 9], 'B': [5, 2, 9, 3, 1], 'C': [1, 7, 4, 5, 2]})
df[(df['A'] > 5) & (df['B'] < 10)]

Unnamed: 0,A,B,C
1,8,2,7
2,6,9,4
4,9,1,2


Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean,
median, and standard deviation of the values in the 'Values' column.

In [9]:
import pandas as pd

def calculate_statistics(df):
    mean = df['Values'].mean()
    median = df['Values'].median()
    std = df['Values'].std()
    
    return mean, median, std

df = pd.DataFrame({'Values': [1, 2, 3, 4, 5]})
mean, median, std = calculate_statistics(df)
print('Mean:', mean)
print('Median:', median)
print('Standard deviation:', std)

Mean: 3.0
Median: 3.0
Standard deviation: 1.5811388300841898


Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to
create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days
for each row in the DataFrame. The moving average should be calculated using a window of size 7 and
should include the current day.

In [10]:
import pandas as pd

def add_moving_average(df):
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()    
    return df

df = pd.DataFrame({'Sales': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
                   'Date': pd.date_range('2023-01-01', periods=10)})
df_with_ma = add_moving_average(df)
print(df_with_ma)

   Sales       Date  MovingAverage
0     10 2023-01-01           10.0
1     20 2023-01-02           15.0
2     30 2023-01-03           20.0
3     40 2023-01-04           25.0
4     50 2023-01-05           30.0
5     60 2023-01-06           35.0
6     70 2023-01-07           40.0
7     80 2023-01-08           50.0
8     90 2023-01-09           60.0
9    100 2023-01-10           70.0


Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new
column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.
Monday, Tuesday) corresponding to each date in the 'Date' column.
For example, if df contains the following values:
* Date<br>
0 2023-01-01<br>
1 2023-01-02<br>
2 2023-01-03<br>
3 2023-01-04<br>
4 2023-01-05<br>
Your function should create the following DataFrame:

* Date Weekday
0 2023-01-01 Sunday<br>
1 2023-01-02 Monday<br>
2 2023-01-03 Tuesday<br>
3 2023-01-04 Wednesday<br>
4 2023-01-05 Thursday<br>
The function should return the modified DataFrame.

In [11]:
import pandas as pd

def add_weekday(df):
    df['Weekday'] = df['Date'].dt.strftime('%A')    
    return df

df = pd.DataFrame({'Date': pd.date_range('2023-01-01', periods=5)})
print(add_weekday(df))

        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python
function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [12]:
import pandas as pd

def select_date_range(df):
    start_date = pd.Timestamp('2023-01-01')
    end_date = pd.Timestamp('2023-01-31') 
    mask = (df['Date'] >= start_date) & (df['Date'] <= end_date)
    selected_rows = df.loc[mask]
    return selected_rows

df = pd.DataFrame({'Date': pd.date_range('2023-01-01', periods=360)})
selected_df = select_date_range(df)
print(selected_df)

         Date
0  2023-01-01
1  2023-01-02
2  2023-01-03
3  2023-01-04
4  2023-01-05
5  2023-01-06
6  2023-01-07
7  2023-01-08
8  2023-01-09
9  2023-01-10
10 2023-01-11
11 2023-01-12
12 2023-01-13
13 2023-01-14
14 2023-01-15
15 2023-01-16
16 2023-01-17
17 2023-01-18
18 2023-01-19
19 2023-01-20
20 2023-01-21
21 2023-01-22
22 2023-01-23
23 2023-01-24
24 2023-01-25
25 2023-01-26
26 2023-01-27
27 2023-01-28
28 2023-01-29
29 2023-01-30
30 2023-01-31


In [18]:
f= open("test.txt","w")
f.write(str(selected_df))
f.close()

[Link to Text File](test.txt)

Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to
be imported?

The first and foremost necessary library that needs to be imported to use the basic functions of Pandas is pandas itself. The standard convention is to import Pandas with the alias pd as follows:

import pandas as pd