Q1. List any five functions of the pandas library with execution.

Here are five functions of the pandas library with execution:

head() and tail(): Returns the first or last n rows of a DataFrame, respectively. These 
                   functions are often used to quickly inspect the structure of a DataFrame 
                   or to get a glimpse of the data.
            
describe(): Provides descriptive statistics for all numerical columns in a DataFrame, such 
            as count, mean, standard deviation, minimum, and maximum. This function is 
            useful for getting a general idea of the data distribution and identifying 
            potential outliers.
            
sort_values(): Sorts a DataFrame by one or more columns in ascending or descending order. 
               This function is useful for ranking, ordering, or finding the top/bottom 
               values in a DataFrame.
            
groupby(): Groups a DataFrame by one or more columns and applies a function (such as sum(), 
           mean(), count(), etc.) to each group. This function is useful for aggregating 
           data by categories or levels.

In [None]:
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily', 'Frank', 'Gina', 'Hannah', 'Ivan'],
        'Age': [21, 25, 30, 35, 40, 45, 50, 55, 60],
        'Gender': ['F', 'M', 'M', 'M', 'F', 'M', 'F', 'F', 'M'],
        'Salary': [45000, 55000, 70000, 85000, 100000, 120000, 140000, 160000, 180000]}

df = pd.DataFrame(data)
head = df.head()
tail = df.tail()
describe = df.describe()
sort_values = df.sort_values('Salary')
group_by = df.groupby('Gender')
g_by_mean = group_by.mean()

Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to 
re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [1]:
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

In [2]:
def reindex_df(df):
    new_index = range(1, len(df)*2+1, 2)
    df = df.set_index(pd.Index(new_index))
    return df

In [3]:
new_df = reindex_df(df)

In [4]:
new_df

Unnamed: 0,A,B,C
1,1,4,7
3,2,5,8
5,3,6,9


Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that
iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The
function should print the sum to the console.

For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should
calculate and print the sum of the first three values, which is 60.

In [5]:
def sum_three(df):
    values = df['Values'][:3]
    total = sum(values)
    print(total)
    
df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})
sum_three(df)

60


Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column
'Word_Count' that contains the number of words in each row of the 'Text' column.

In [6]:
df = pd.DataFrame({'Text': ["This is Data Science Course.", "The course is going very well"]})

def count_words(df):
    df['Word Count'] = df['Text'].apply(lambda x: len(str(x).split()))
    
count_words(df)
df

Unnamed: 0,Text,Word Count
0,This is Data Science Course.,5
1,The course is going very well,6


In [None]:
Q5. How are DataFrame.size() and DataFrame.shape() different?

In [None]:
'''
DataFrame.size and DataFrame.shape are both methods in Pandas that provide information 
about the dimensions of a DataFrame, but they return different types of values.

DataFrame.size returns the total number of elements in the DataFrame, which is equal to the
product of the number of rows and the number of columns. For example:
'''

In [7]:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
size = df.size
size

6

In [None]:
'''
In this example, the DataFrame has 2 columns and 3 rows, so df.size returns 6.

On the other hand, DataFrame.shape returns a tuple containing the number of rows and the 
number of columns in the DataFrame, respectively. For example:
'''

In [8]:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
shape = df.shape
shape

(3, 2)

In [None]:
'''
In this example, the DataFrame has 2 columns and 3 rows, so df.shape returns (3, 2).

In summary, DataFrame.size returns the total number of elements in the DataFrame, while 
DataFrame.shape returns a tuple containing the number of rows and columns of the DataFrame.
'''

In [None]:
Q6. Which function of pandas do we use to read an excel file?

In [None]:
To read an Excel file in Pandas, you can use the read_excel() function.

df = pd.read_excel('file.xlsx')

In [None]:
Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email
addresses in the format 'username@domain.com'. Write a Python function that creates a new column
'Username' in df that contains only the username part of each email address.

The username is the part of the email address that appears before the '@' symbol. For example, if the
email address is 'john.doe@example.com', the 'Username' column should contain 'john.doe'. Your
function should extract the username from each email address and store it in the new 'Username'
column.

In [11]:
import pandas as pd

def extract_username(df):
    df['Username'] = df['Email'].apply(lambda x: x.split('@')[0])
    
df = pd.DataFrame({'Email': ['john.doe@example.com', 'jane.doe@example.com']})
extract_username(df)
print(df)

                  Email  Username
0  john.doe@example.com  john.doe
1  jane.doe@example.com  jane.doe


Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects
all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The
function should return a new DataFrame that contains only the selected rows.
For example, if df contains the following values:
    
A B C
0 3 5 1
1 8 2 7
2 6 9 4
3 2 3 5
4 9 1 2


Your function should select the following rows: A B C
1 8 2 7
4 9 1 2
The function should return a new DataFrame that contains only the selected rows.

In [13]:
def select_rows(df):
    selected_rows = df[(df['A'] > 5) & (df['B'] < 10)]
    return selected_rows

df = pd.DataFrame({'A': [3, 8, 6, 2, 9], 'B': [5, 2, 9, 3, 1], 'C': [1, 7, 4, 5, 2]})

selected_df = select_rows(df)
selected_df

Unnamed: 0,A,B,C
1,8,2,7
2,6,9,4
4,9,1,2


Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate
the mean,median, and standard deviation of the values in the 'Values' column.

In [15]:
import pandas as pd

def calc(df):
    mean = df['Values'].mean()
    median = df['Values'].median()
    std_dev = df['Values'].std()
    return mean, median, std_dev

df = pd.DataFrame({'Values': [10, 20, 30, 40, 50]})
mean, median, std_dev = calc(df)

print(f"Mean: {mean}, Median: {median}, Standard Deviation: {std_dev}")

Mean: 30.0, Median: 30.0, Standard Deviation: 15.811388300841896


Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python 
function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and should include the current day.

In [17]:
import pandas as pd

def moving_average(df):
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()
    return df


df = pd.DataFrame({
    'Date': pd.date_range(start='2022-01-01', periods=14, freq='D'),
    'Sales': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140]
})

df_with_ma = moving_average(df)
df_with_ma

Unnamed: 0,Date,Sales,MovingAverage
0,2022-01-01,10,10.0
1,2022-01-02,20,15.0
2,2022-01-03,30,20.0
3,2022-01-04,40,25.0
4,2022-01-05,50,30.0
5,2022-01-06,60,35.0
6,2022-01-07,70,40.0
7,2022-01-08,80,50.0
8,2022-01-09,90,60.0
9,2022-01-10,100,70.0


In [None]:
Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new
column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g.
Monday, Tuesday) corresponding to each date in the 'Date' column.
For example, if df contains the following values:
Date
0 2023-01-01
1 2023-01-02
2 2023-01-03
3 2023-01-04
4 2023-01-05

Your function should create the following DataFrame:

Date Weekday
0 2023-01-01 Sunday
1 2023-01-02 Monday
2 2023-01-03 Tuesday
3 2023-01-04 Wednesday
4 2023-01-05 Thursday
The function should return the modified DataFrame.

In [20]:
import pandas as pd

def weekday(df):
    df['Date'] = pd.to_datetime(df['Date'])
    df['Weekday'] = df['Date'].dt.day_name()
    return df

df = pd.DataFrame({'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']})
df = weekday(df)
print(df)    

        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python
function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [24]:
import pandas as pd

def select_dates(df):
    df['Date'] = pd.to_datetime(df['Date'])
           
    mask = (df['Date'].dt.date >= pd.Timestamp('2023-01-01').date()) & \
           (df['Date'].dt.date <= pd.Timestamp('2023-01-31').date())
    result = df.loc[mask]
    
    return result

df = pd.DataFrame({'Date': ['2023-01-01', '2023-01-02', '2023-02-01', '2023-02-02']})
result = select_dates(df)
result

Unnamed: 0,Date
0,2023-01-01
1,2023-01-02


Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to
be imported?

The first and foremost necessary library that needs to be imported to use the basic functions of pandas is 'pandas' itself.