# Q1. List any five functions of the pandas library with execution.

Certainly! Here are five common functions in the Pandas library along with sample executions:

1. **`read_csv()`**: Used to read data from a CSV file into a DataFrame.


2. **`head()`**: Displays the first n rows of a DataFrame (default n = 5).


3. **`info()`**: Provides a concise summary of the DataFrame, including the column data types and non-null values.


4. **`describe()`**: Generates descriptive statistics that summarize the central tendency, dispersion, and shape of a DataFrame's distribution.

5. **`groupby()`**: Groups data in a DataFrame based on a given criterion.


# Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [2]:
import pandas as pd

data = {
    'A': [10, 20, 30, 40],
    'B': [50, 60, 70, 80],
    'C': [90, 100, 110, 120]
}
df = pd.DataFrame(data)

def reindex(df):

    df.index = pd.RangeIndex(start=1, stop=2*len(df), step=2)
    return df

new = reindex(df)
print(new)


    A   B    C
1  10  50   90
3  20  60  100
5  30  70  110
7  40  80  120


# Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The function should print the sum to the console.

In [13]:
import pandas as pd

data = {

    'Values': [10, 20, 30, 40, 50,6]  

}
df = pd.DataFrame(data)

def cal(df):
    
    sum = df['Values'].iloc[:3].sum()
    print("Sum = ", sum)

cal(df)

Sum =  60


# Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.

In [12]:
import pandas as pd

data = {

    'Text': [ 'asdassfs', 'tyrw', 'agdew', 'wndkaqvfd', 'faaaaaaaaaaaaaaaa']  
}

df = pd.DataFrame(data)

df["Word_count"] = df['Text'].apply(len)

print(df)

                Text  Word_count
0           asdassfs           8
1               tyrw           4
2              agdew           5
3          wndkaqvfd           9
4  faaaaaaaaaaaaaaaa          17


# Q5. How are DataFrame.size() and DataFrame.shape() different?

The `.size` and `.shape` are the attributes of a DataFrame in pandas, and they serve different purposes:

- `DataFrame.size` is an attribute that returns the total number of elements in the DataFrame. It's the product of the number of rows times the number of columns.

- `DataFrame.shape` is an attribute that returns a tuple representing the dimensionality of the DataFrame. The tuple contains two elements: the number of rows and the number of columns (rows, columns).

Here's an example to illustrate:

Suppose we have a DataFrame `df` with 3 rows and 4 columns.

- `df.size` would return `12` (which is 3*4), indicating there are 12 elements in total in the DataFrame.
- `df.shape` would return `(3, 4)`, indicating the DataFrame has 3 rows and 4 columns. 

These attributes are used to understand the size and structure of the DataFrame.

# Q6. Which function of pandas do we use to read an excel file?

In [None]:
import pandas as pd

df = pd.read_excel('path_to_file.xlsx')

To read an Excel file in pandas, we use the read_excel() function. This function can read data from Excel (xls) or Excel (xlsx) files using the xlrd and openpyxl libraries, respectively.

### Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email addresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address.

### The username is the part of the email address that appears before the '@' symbol. For example, if the email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your function should extract the username from each email address and store it in the new 'Username' column.

In [19]:
import pandas as pd

df = pd.DataFrame({
    'Email': ['zxcvd.qwrse@example.com', 'sawne.mddwdith@example.com', 'nwdwadfo@company.com']
})

def username(df):
    df['Username'] = df['Email'].apply(lambda x: x.split('@')[0])
    df['Domain'] = df['Email'].apply(lambda x: x.split('@')[1])
    return df

df = username(df)
df

Unnamed: 0,Email,Username,Domain
0,zxcvd.qwrse@example.com,zxcvd.qwrse,example.com
1,sawne.mddwdith@example.com,sawne.mddwdith,example.com
2,nwdwadfo@company.com,nwdwadfo,company.com


Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects 
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The 
function should return a new DataFrame that contains only the selected rows.

For example, if df contains the following values:

   A   B   C

0  3   5   1  

1  8   2   7  

2  6   9   4  

3  2   3   5  

4  9   1   2

Your function should select the following rows:   A   B   C

1  8   2   7

4  9   1   2

The function should return a new DataFrame that contains only the selected rows

In [21]:
import pandas as pd

data = {
    'A': [3, 8, 6, 2, 9],
    'B': [5, 2, 9, 3, 1],
    'C': [1, 7, 4, 5, 2]
}
df = pd.DataFrame(data)

def rows(df):
    return df[(df['A'] > 5) & (df['B'] < 10)]

df1 = rows(df)
df1


Unnamed: 0,A,B,C
1,8,2,7
2,6,9,4
4,9,1,2


# Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.

In [2]:
import pandas as pd

data = {

    'Values': [10, 20, 30, 40, 50, 60]  
}

df = pd.DataFrame(data)
print(df['Values'].mean())
print(df['Values'].median())
print(df['Values'].std())

35.0
35.0
18.708286933869708


# Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and should include the current day

In [6]:
import pandas as pd

def average(df):
    df = df.sort_values('Date')
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    return df

data = {
    'Date': ['01-01-2022', '02-01-2022', '03-01-2022', '04-01-2022', '05-01-2022', '06-01-2022', '07-01-2022',
             '08-01-2022', '09-01-2022', '10-01-2022'],
    'Sales': [100, 120, 150, 80, 70, 200, 130, 160, 190, 110]
}

df_temp = pd.DataFrame(data)
df_average = average(df_temp)
print(df_average)

         Date  Sales  MovingAverage
0  01-01-2022    100     100.000000
1  02-01-2022    120     110.000000
2  03-01-2022    150     123.333333
3  04-01-2022     80     112.500000
4  05-01-2022     70     104.000000
5  06-01-2022    200     120.000000
6  07-01-2022    130     121.428571
7  08-01-2022    160     130.000000
8  09-01-2022    190     140.000000
9  10-01-2022    110     134.285714


# Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g. Monday, Tuesday) corresponding to each date in the 'Date' column.

In [7]:
import pandas as pd

def weekday(df):
    df['Date'] = pd.to_datetime(df['Date'])
    df['Weekday'] = df['Date'].dt.day_name()
    return df

# Example usage:
data = {
    'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']
}

df = pd.DataFrame(data)
df_weekday = weekday(df)
print(df_weekday)

        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


# Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [None]:
import pandas as pd

def dates(df):
    df['Date'] = pd.to_datetime(df['Date'])
    mask = (df['Date'] >= '2023-01-01') & (df['Date'] <= '2023-01-31')
    selected_rows = df.loc[mask]
    return selected_rows

data = {
    'Date': ['2023-01-01', '2023-01-15', '2023-01-25', '2023-02-05', '2023-02-15']
}

df = pd.DataFrame(data)
selected_rows = dates(df)
print(selected_rows)

# Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?

To use the basic functions of pandas, the first and foremost necessary library that needs to be imported is the pandas library itself. The common convention is to import pandas using the alias 'pd':

```python
import pandas as pd
```
