## Q1. List any five functions of the pandas library with execution.

1. drop(): This function is used to drop columns or rows of a dataframe.
2. head(): This function returns the first n rows of a DataFrame. By default, it returns the first five rows.
3. describe(): This function generates descriptive statistics of a DataFrame, providing information such as count, mean, standard deviation, and quartiles for each column.
4. groupby(): This function is used to group data in a DataFrame based on one or more columns. It allows performing aggregate operations on grouped data.
5. sort_values(): This function is used to sort a DataFrame by one or more columns.

In [15]:
import pandas as pd

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar', 'foo'],
                     'B': ['one', 'one', 'two', 'two', 'one'],
                     'C': [1, 2, 3, 6, 4]})

In [16]:
df.head(3)

Unnamed: 0,A,B,C
0,foo,one,1
1,bar,one,2
2,foo,two,3


In [17]:
df.describe()

Unnamed: 0,C
count,5.0
mean,3.2
std,1.923538
min,1.0
25%,2.0
50%,3.0
75%,4.0
max,6.0


In [18]:
df.groupby('A')['C'].sum()

A
bar    8
foo    8
Name: C, dtype: int64

In [19]:
df.sort_values('C')

Unnamed: 0,A,B,C
0,foo,one,1
1,bar,one,2
2,foo,two,3
4,foo,one,4
3,bar,two,6


In [20]:
df.drop('A',axis=1)

Unnamed: 0,B,C
0,one,1
1,one,2
2,two,3
3,two,6
4,one,4


## Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [22]:
df=pd.DataFrame({'A':[1,2,3,4],'B':[5,6,7,8],'C':[9,10,11,12]})

df.index=pd.Index(range(1,len(df)*2,2))
df

Unnamed: 0,A,B,C
1,1,5,9
3,2,6,10
5,3,7,11
7,4,8,12


## Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The function should print the sum to the console. For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should calculate and print the sum of the first three values, which is 60.

In [32]:
def sum_three(df):
    sum=0
    count=0
    for index,row in df.iterrows():
        sum+=row['Values']
        count+=1
        if count==3:break
    return sum

sum_three(df)

60

In [31]:
(df.head(3))['Values'].sum() #another solution

60

## Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.

In [34]:
def word_count(df):
    df['Word_Count']=df['Text'].apply(lambda x:len(x.split()))

df = pd.DataFrame({'Text': ['Hello world', 'This is a sentence', 'Python programming is fun']})
word_count(df)
df

Unnamed: 0,Text,Word_Count
0,Hello world,2
1,This is a sentence,4
2,Python programming is fun,4


## Q5. How are DataFrame.size() and DataFrame.shape() different?

The `DataFrame.size` and `DataFrame.shape` attributes in pandas provide different information about the dimensions of a DataFrame.

1. `DataFrame.size`: The `size` attribute returns the total number of elements in the DataFrame. It represents the total number of cells or values present in the DataFrame, obtained by multiplying the number of rows by the number of columns. It does not provide information about the shape or dimensions of the DataFrame.

2. `DataFrame.shape`: The `shape` attribute returns a tuple representing the dimensions of the DataFrame. It provides the number of rows and columns in the DataFrame in the format `(rows, columns)`. The number of rows corresponds to the length of the DataFrame along the 0th axis, and the number of columns corresponds to the length along the 1st axis.


## Q6. Which function of pandas do we use to read an excel file?

we use the `read_excel()` method to read a excel file into a dataframe. It take the file's absolute or relative path as a paramete

## Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email addresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address. The username is the part of the email address that appears before the '@' symbol. For example, if the email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your function should extract the username from each email address and store it in the new 'Username' column.

In [36]:
def user_name(df):
    df['Username']=df['Email'].apply(lambda x: x.split('@')[0])

df=df = pd.DataFrame({'Email': ['john.doe@example.com', 'jane.smith@example.com', 'alex@example.com']})
user_name(df)
df

Unnamed: 0,Email,Username
0,john.doe@example.com,john.doe
1,jane.smith@example.com,jane.smith
2,alex@example.com,alex


## Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The function should return a new DataFrame that contains only the selected rows.

In [38]:
def find_col(df):
    return df[(df['A']>5) & (df['B']<10)]

df=pd.DataFrame({'A':[3,8,6,2,9],'B':[5,2,9,3,1],'C':[1,7,4,5,2]})
new_df=find_col(df)
new_df

Unnamed: 0,A,B,C
1,8,2,7
2,6,9,4
4,9,1,2


## Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.

In [40]:
def cal(df):
    print('Mean: ',df['Values'].mean())
    print('Median: ',df['Values'].median())
    print('Standard deviation: ',df['Values'].std())

df=pd.DataFrame({'Values':[10, 20, 30, 40, 50]})
cal(df)

Mean:  30.0
Median:  30.0
Standard deviation:  15.811388300841896


## Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and should include the current day.

In [45]:
def moving_avg(df):
    df['Moving_avg']=df['Sales'].rolling(window=7,min_periods=1).mean()

df = pd.DataFrame({'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05',
                            '2023-01-06', '2023-01-07', '2023-01-08', '2023-01-09', '2023-01-10'],
                   'Sales': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]})

df['Date'] = pd.to_datetime(df['Date'])

moving_avg(df)
df

Unnamed: 0,Date,Sales,Moving_avg
0,2023-01-01,10,10.0
1,2023-01-02,20,15.0
2,2023-01-03,30,20.0
3,2023-01-04,40,25.0
4,2023-01-05,50,30.0
5,2023-01-06,60,35.0
6,2023-01-07,70,40.0
7,2023-01-08,80,50.0
8,2023-01-09,90,60.0
9,2023-01-10,100,70.0


## Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g. Monday, Tuesday) corresponding to each date in the 'Date' column.

In [47]:
def add_weekday_column(df):
    df['Date'] = pd.to_datetime(df['Date'])
    df['Weekday'] = df['Date'].dt.strftime('%A')

df = pd.DataFrame({'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']})
add_weekday_column(df)
df

Unnamed: 0,Date,Weekday
0,2023-01-01,Sunday
1,2023-01-02,Monday
2,2023-01-03,Tuesday
3,2023-01-04,Wednesday
4,2023-01-05,Thursday


## Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [48]:
def select(df):
    df['Date']=pd.to_datetime(df['Date'])
    return df[(df['Date'] >= '2023-01-01') & (df['Date'] <= '2023-01-31')]

df = pd.DataFrame({'Date': ['2023-01-01', '2023-01-15', '2023-01-31', '2023-02-05']})

select(df)

Unnamed: 0,Date
0,2023-01-01
1,2023-01-15
2,2023-01-31


## Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?

To use the basic functions of pandas, the first and foremost necessary library that needs to be imported is the pandas library itself. The pandas library provides the primary data structures and functions for data manipulation and analysis. 
```python
import pandas as pd
```