**Q1.** List any five functions of the pandas library with execution.   

**Ans :** List of five function of the pandas is following :-
- **read_csv:** This function is used to read data from a CSV file and create a DataFrame.   
- **head:** This function is used to display the first n rows of a DataFrame. By default, it shows the first 5 rows.   
- **describe:** This function generates descriptive statistics of a DataFrame, including count, mean, standard deviation, minimum, and maximum values.   
- **groupby:** This function is used for grouping data based on one or more columns and performing operations on them.   
- **plot:** This function is used to create various types of plots, such as line plots, bar plots, scatter plots, etc.

---
**Q2.** Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the
DataFrame with a new index that starts from 1 and increments by 2 for each row.   

**Ans:** 

In [1]:
import pandas as pd

df = pd.DataFrame({'A':['A','B','C'],
                  'B':['A','B','C'],
                  'C':['A','B','C']})
print(df)


   A  B  C
0  A  A  A
1  B  B  B
2  C  C  C


In [2]:
df['index']=[x for x in range(1,len(df)*2,2)]
df.set_index('index',inplace=True)
print(df)

       A  B  C
index         
1      A  A  A
3      B  B  B
5      C  C  C


---
**Q3.** You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.   


**Ans.**

In [3]:
import pandas as pd

# Create a DataFrame
data = {'Values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Define the function
def calculate_sum_of_first_three(df):
    if len(df) < 3:
        print("DataFrame doesn't have enough rows.")
        return
    sum_first_three = df['Values'].iloc[:3].sum()
    print("Sum of the first three values:", sum_first_three)

# Call the function
calculate_sum_of_first_three(df)


Sum of the first three values: 60


---
**Q4.** Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column
'Word_Count' that contains the number of words in each row of the 'Text' column.   

**Ans.**   

In [4]:
import pandas as pd
def myfun(df):
    df['Word_Count']=[len(df['Text'][x].split()) for x in range(len(df))]
    return df

df =pd.DataFrame({'Text':['apple is red','banana is not red','papaya may be red','mango may be red','grapes is black']})

df = myfun(df)
print(df)

                Text  Word_Count
0       apple is red           3
1  banana is not red           4
2  papaya may be red           4
3   mango may be red           4
4    grapes is black           3


---
**Q5.** How are DataFrame.size() and DataFrame.shape() different?   

**Ans.** DataFrame.size() shows number of elements and DataFrame.shape() shows number of Row and Colunm

In [5]:
import pandas as pd

data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

print("DataFrame size:", df.size)     # Shows number of elements
print("DataFrame shape:", df.shape)   # Shows number of Row and Colunm 


DataFrame size: 6
DataFrame shape: (3, 2)


---
**Q6.** Which function of pandas do we use to read an excel file?   

**Ans.** In pandas, the function used to read an Excel file is ***pd.read_excel('file.xlsx', sheet_name='Sheet1')***

```python
import pandas as pd

# Read an Excel file and create a DataFrame
data = pd.read_excel('file.xlsx')

# You can also specify a specific sheet within the Excel file
# data = pd.read_excel('file.xlsx', sheet_name='Sheet1')

# Now you can work with the 'data' DataFrame
print(data)
```

---
**Q7.** You have a Pandas DataFrame df that contains a column named 'Email' that contains email addresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address.   

In [6]:
def split_user(email):
    return email.split('@')[0]
    

data = {'Email': ['user1@example.com', 'user2@example.com', 'user3@example.com']}
df = pd.DataFrame(data)

df['Username']=df['Email'].apply(split_user)

print(df)

               Email Username
0  user1@example.com    user1
1  user2@example.com    user2
2  user3@example.com    user3


---
**Q8.** You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.
For example, if df contains the following values:   
A B C   
0 3 5 1   
1 8 2 7   
2 6 9 4   
3 2 3 5   
4 9 1 2   

**Ans**

In [7]:
df=pd.DataFrame({'A':[3,8,6,2,9],
    'B':[5,2,9,3,1],
    'C':[1,7,4,5,2]})

p=df[(df['A'] > 5) & (df['B'] < 9)]
print(p)


   A  B  C
1  8  2  7
4  9  1  2


---
**Q9.** Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean,
median, and standard deviation of the values in the 'Values' column.   

**Ans.**

In [8]:
def values(df):
    mean=df['Values'].mean()
    median=df['Values'].median()
    std=df['Values'].std()
    return mean, median, std

    
df = pd.DataFrame({'Values':[10,20,30,40,50,60,70,80,90,100]})
mean, median, std = values(df)

print('mean is ',mean)
print('median is ',median)
print('Standerd Deviation is ',std)

mean is  55.0
median is  55.0
Standerd Deviation is  30.276503540974915


---
**Q10.** Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to
create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days
for each row in the DataFrame. The moving average should be calculated using a window of size 7 and
should include the current day.

In [175]:
import pandas as pd

def myfun(df, window_size=7):
    df['MovingAverage'] = df['Sales'].rolling(window=window_size, min_periods=1).mean()
    return df

data = {'Date': pd.date_range('2023-08-01', periods=10, freq='D'),
        'Sales': [10, 15, 20, 25, 30, 35, 40, 45, 50, 55]}
df = pd.DataFrame(data)

df = myfun(df)

print(df)

        Date  Sales  MovingAverage
0 2023-08-01     10           10.0
1 2023-08-02     15           12.5
2 2023-08-03     20           15.0
3 2023-08-04     25           17.5
4 2023-08-05     30           20.0
5 2023-08-06     35           22.5
6 2023-08-07     40           25.0
7 2023-08-08     45           30.0
8 2023-08-09     50           35.0
9 2023-08-10     55           40.0


In [176]:
df['Sales'].rolling(window=7, min_periods=1).mean()

0    10.0
1    12.5
2    15.0
3    17.5
4    20.0
5    22.5
6    25.0
7    30.0
8    35.0
9    40.0
Name: Sales, dtype: float64

---
**Q11.** You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new
column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.
Monday, Tuesday) corresponding to each date in the 'Date' column.

In [185]:
import pandas as pd

def myfun(df):
    df['Weekday'] = df['Date'].dt.day_name()
    return df

# Example DataFrame
data = {'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])  
# Add weekday column
df = myfun(df)

print(df)

        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


---
**Q12.** Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python
function to select all rows where the date is between '2023-01-01' and '2023-01-31'.   

**Ans.**

In [196]:
def myfun(df):
    start_date='2023-01-01'
    end_date ='2023-01-31'
    selected = df[(df['Date']>=start_date) & (df['Date']<=end_date)]
    return selected

data = {'Date': ['2023-01-01', '2023-01-15', '2023-01-20', '2023-02-01']}
df = pd.DataFrame(data)
df['Date']=pd.to_datetime(df['Date'])

df = myfun(df)
print(df)

        Date
0 2023-01-01
1 2023-01-15
2 2023-01-20


---
**Q13.** To use the basic functions of pandas, what is the first and foremost necessary library that needs to
be imported?

**Ans.** To use the basic functions of pandas, the first and foremost library that needs to be imported is, pandas library itself, using the following import statement:   


```python
import pandas as pd
```