**Q1.** List any five functions of the pandas library with execution.

**Ans:**

1. read_csv(): This function is used to read a CSV file and create a Pandas DataFrame.

2. head(): Returns the first n rows of a DataFrame. This is useful for quickly checking the structure and content of the data.

In [4]:
import pandas as pd

df = pd.read_csv('Salary_Data.csv')
print(df.head())

   YearsExperience   Salary
0              1.1  39343.0
1              1.3  46205.0
2              1.5  37731.0
3              2.0  43525.0
4              2.2  39891.0


3. groupby(): Groups the DataFrame by one or more columns and applies a function to each group. This is useful for analyzing data by categories.

In [5]:
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'James', 'Jasmine', 'Helen'],'Age': [25, None, 27, 29, 28],
        'Gender': ['Female', 'Male', None, 'Female', 'Female'], 
        'Salary': [15000, 25000, 100000, 38000, 50000]}
df = pd.DataFrame(data)
g = df.groupby('Age').mean()
print(g)

      Salary
Age         
25.0   15000
27.0  100000
28.0   50000
29.0   38000


4. fillna(): It is a function in Pandas that is used to fill missing values in a DataFrame or Series. It takes one or more arguments that specify the value to use for filling the missing values.

In [6]:
df['Age'].fillna(29, inplace=True)
df['Gender'].fillna('Male', inplace=True)
df

Unnamed: 0,Name,Age,Gender,Salary
0,Alice,25.0,Female,15000
1,Bob,29.0,Male,25000
2,James,27.0,Male,100000
3,Jasmine,29.0,Female,38000
4,Helen,28.0,Female,50000


5. drop(): Drops specified labels from rows or columns of a DataFrame. This is useful for removing unwanted data.

In [7]:
df = df.drop(["Salary"], axis=1)
print(df.head())

      Name   Age  Gender
0    Alice  25.0  Female
1      Bob  29.0    Male
2    James  27.0    Male
3  Jasmine  29.0  Female
4    Helen  28.0  Female


**Q2.** Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.

**Ans:**

In [2]:
import pandas as pd

def reindex(df):
    new_index = pd.RangeIndex(start=1, step=2, stop=len(df)*2)
    df.index = new_index
    print(df)
        
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
reindex(df)

   A  B  C
1  1  4  7
3  2  5  8
5  3  6  9


**Q3.** You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.

For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should
calculate and print the sum of the first three values, which is 60.

**Ans:**

In [3]:
import pandas as pd

def sum_of_values(df):
    values_col = df['Values']
    sum_of_values = sum(values_col[:3])
    print("Sum of first three values:", sum_of_values)

df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})
sum_of_values(df)

Sum of first three values: 60


**Q4.** Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.

**Ans:**

In [4]:
import pandas as pd

def word_count(df):
    df['Word_Count'] = df['Text'].str.split().str.len()
    print(df)

df = pd.DataFrame({'Text': ['I like roese.', 'Pass me that book.', 'Good Movie']})
word_count(df)

                 Text  Word_Count
0       I like roese.           3
1  Pass me that book.           4
2          Good Movie           2


**Q5.** How are DataFrame.size() and DataFrame.shape() different?

**Ans:**

* DataFrame.size returns the total number of elements in the DataFrame, which is equal to the product of the number of rows and columns.

In [13]:
import pandas as pd

data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
print(df.size)

6


* DataFrame.shape, on the other hand, returns a tuple containing the number of rows and columns in the DataFrame. 

In [15]:
import pandas as pd

data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
print(df.shape)
print(df)

(3, 2)
   A  B
0  1  4
1  2  5
2  3  6


**Q6.** Which function of pandas do we use to read an excel file?

**Ans:**

pandas.read_excel() function is used to read data from an Excel file into a pandas DataFrame object.

In [None]:
import pandas as pd

df = pd.read_excel('data.xlsx')
print(df)

**Q7.** You have a Pandas DataFrame df that contains a column named 'Email' that contains email addresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address.

The username is the part of the email address that appears before the '@' symbol. For example, if the email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your function should extract the username from each email address and store it in the new 'Username' column.

**Ans:**

In [5]:
import pandas as pd

def username(df):
    df['Username'] = df['Email'].apply(lambda x: x.split('@')[0])
    print(df)

data = {'Email': ['Kevin.Paul@example.com', 'James.Austin@example.com', 'Angel.Smith@example.com']}
df = pd.DataFrame(data)
username(df)

                      Email      Username
0    Kevin.Paul@example.com    Kevin.Paul
1  James.Austin@example.com  James.Austin
2   Angel.Smith@example.com   Angel.Smith


**Q8.** You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The function should return a new DataFrame that contains only the selected rows.

For example, if df contains the following values:

   A B C
  
0 3 5 1

1 8 2 7

2 6 9 4

3 2 3 5

4 9 1 2

Your function should select the following rows: 

   A B C

1 8 2 7

2 6 9 4

4 9 1 2

The function should return a new DataFrame that contains only the selected rows.

**Ans:**

In [6]:
import pandas as pd

def new_dataframe(df):
    rows = df[(df['A'] > 5) & (df['B'] < 10)]
    print(rows)

data = {'A': [3, 8, 6, 2, 9], 'B': [5, 2, 9, 3, 1], 'C': [1, 7, 4, 5, 2]}
df = pd.DataFrame(data)
new_dataframe(df)

   A  B  C
1  8  2  7
2  6  9  4
4  9  1  2


**Q9.** Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.

**Ans:**

In [7]:
import pandas as pd

def calculate(df):
    mean = df['Values'].mean()
    median = df['Values'].median()
    std_dev = df['Values'].std()
    print("Mean:", mean)
    print("Median:", median)
    print("Standard Deviation:", std_dev)

data = {'Values': [8, 2, 3, 6, 5]}
df = pd.DataFrame(data)
calculate(df)

Mean: 4.8
Median: 5.0
Standard Deviation: 2.3874672772626644


**Q10.** Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and should include the current day.

**Ans:**

In [9]:
import pandas as pd

def add_moving_average(df):
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    return df

data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08'],
        'Sales': [50, 40, 30, 20, 50, 60, 70, 10]}
df = pd.DataFrame(data)
df = add_moving_average(df)
print(df)


         Date  Sales  MovingAverage
0  2023-01-01     50      50.000000
1  2023-01-02     40      45.000000
2  2023-01-03     30      40.000000
3  2023-01-04     20      35.000000
4  2023-01-05     50      38.000000
5  2023-01-06     60      41.666667
6  2023-01-07     70      45.714286
7  2023-01-08     10      40.000000


**Q11.** You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g. Monday, Tuesday) corresponding to each date in the 'Date' column.

For example, if df contains the following values:

   Date

0  2023-01-01

1  2023-01-02

2  2023-01-03

3  2023-01-04

4  2023-01-05

Your function should create the following DataFrame:

   Date        Weekday

0  2023-01-01  Sunday

1  2023-01-02  Monday

2  2023-01-03  Tuesday

3  2023-01-04  Wednesday

4  2023-01-05  Thursday

The function should return the modified DataFrame.

**Ans:**

In [14]:
import pandas as pd

def add_weekday(df):
    df['Weekday'] = pd.to_datetime(df['Date']).dt.day_name()
    return df

data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']}
df = pd.DataFrame(data)
df = add_weekday(df)
print(df)


         Date    Weekday
0  2023-01-01     Sunday
1  2023-01-02     Monday
2  2023-01-03    Tuesday
3  2023-01-04  Wednesday
4  2023-01-05   Thursday


**Q12.** Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

**Ans:**

In [16]:
import pandas as pd

def timestamps(df):
    df['Date'] = pd.to_datetime(df['Date'])
    start_date = pd.Timestamp('2023-01-01')
    end_date = pd.Timestamp('2023-01-31')
    mask = (df['Date'] >= start_date) & (df['Date'] <= end_date)
    return df.loc[mask]

data = {'Date': ['2023-01-01', '2023-02-15', '2023-03-31', '2023-02-01', '2023-01-25', '2023-01-27', '2023-03-02']}
df = pd.DataFrame(data)
selected_rows = timestamps(df)
print(selected_rows)

        Date
0 2023-01-01
4 2023-01-25
5 2023-01-27


**Q13.** To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?

**Ans:**

To use the basic functions of pandas, you need to import the pandas library.

<b>import pandas as pd<b>

The pd alias is commonly used for the pandas library, but you can use any other valid alias that you prefer. This import statement is usually placed at the beginning of the Python script or notebook, before any pandas functions or objects are used. Once pandas is imported, you can use its basic functions to create and manipulate DataFrames and Series objects.