## Q1. List any five functions of the pandas library with execution.

## Ans:

1. Pandas is a popular Python library for data manipulation and analysis. Here are five common functions provided by Pandas, along with code examples for each:

Reading Data:\
        Pandas can read data from various file formats like CSV, Excel, SQL databases, and more

In [None]:
import pandas as pd

# Read data from a CSV file
data = pd.read_csv('data.csv')

2. Data Exploration:\
Pandas provides functions to quickly explore and understand your dataset, such as head(), info(), and describe().

In [None]:
# Display the first few rows of the DataFrame
print(data.head())

# Display summary information about the DataFrame
print(data.info())

# Generate summary statistics of numeric columns
print(data.describe())

3. Data Selection and Filtering:\
You can select specific rows and columns of your DataFrame and apply filters based on conditions.

In [None]:
# Select a single column
column_data = data['Column_Name']

# Filter data based on a condition
filtered_data = data[data['Age'] > 30]

4. Grouping and Aggregation:\
You can group your data based on one or more columns and perform aggregate functions on the groups.

In [None]:
# Group data by a column and calculate the mean of another column
grouped_data = data.groupby('Category')['Value'].mean()

# Aggregating data using multiple functions
agg_data = data.groupby('Category')['Value'].agg(['mean', 'median', 'sum'])

## Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [1]:
import pandas as pd

# Sample DataFrame
data = {'A': [10, 20, 30], 'B': [40, 50, 60], 'C': [70, 80, 90]}
df = pd.DataFrame(data)
print('DataFrame with original index:')
print(df)
# Reset the index and create a new index starting from 1 and incrementing by 2
df = df.reset_index(drop=True)
df.index = range(1, 2 * len(df) + 1, 2)
print('DataFrame with new index:')
print(df)

DataFrame with original index:
    A   B   C
0  10  40  70
1  20  50  80
2  30  60  90
DataFrame with new index:
    A   B   C
1  10  40  70
3  20  50  80
5  30  60  90


## Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The function should print the sum to the console. For example, if the 'Values' column of df contains the values [10, 20, 30, 40, 50], your function should calculate and print the sum of the first three values, which is 60.

## Ans:

In [3]:
data = {'Values':[10,20,30,40,50]}
df1  = pd.DataFrame(data)
print('Sum of first three values:',df1['Values'].head(3).sum())

Sum of first three values: 60


## Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.

## Ans:

In [None]:
# X is the text in each row.
# lambda function returns number of words in each text present in each row.
# apply(): It makes sure that lambda function is applicatble to each text present in the dataframe.
df2['Word_Count'] = df2['Text'].apply(lambda x:len(x.split()))

## Q5. How are DataFrame.size() and DataFrame.shape() different?

## Ans:

DataFrame.size:

    DataFrame.size is an attribute that returns the total number of elements in the DataFrame. It calculates the product of the number of rows and the number of columns in the DataFrame.
    It returns an integer value representing the total number of elements in the DataFrame, including both data and potential missing (NaN) values.

In [5]:
import pandas as pd

data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

size = df.size
print(df)
print('Size of the Data Frame:',size)  # Output: 6 (3 rows * 2 columns = 6 elements)

   A  B
0  1  4
1  2  5
2  3  6
Size of the Data Frame: 6


DataFrame.shape:

    DataFrame.shape is an attribute that returns a tuple representing the dimensions of the DataFrame. The tuple contains two values: the number of rows and the number of columns.
    It provides a more intuitive way to understand the structure of the DataFrame by returning the shape as a tuple.

In [7]:
import pandas as pd

data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

shape = df.shape
print(df)
print('Shape of the Data Frame:',shape)  # Output: (3, 2) (3 rows, 2 columns)

   A  B
0  1  4
1  2  5
2  3  6
Shape of the Data Frame: (3, 2)


## Q6. Which function of pandas do we use to read an excel file?

## Ans:

In [None]:
import pandas as pd

# Read an Excel file into a Pandas DataFrame
df = pd.read_excel('your_file.xlsx')

## Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email addresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address.

## Ans:

In [16]:
data2 = {'Email':['username@domain.com','khanajan@gmail.com','hari@gmail.com']}
df2 = pd.DataFrame(data2)
print('Original data frame:')
print(df2)
df2['Username'] = df2['Email'].apply(lambda x:x.split('@')[0])
print('\n')
print('Final data frame:')
print(df2)

Original data frame:
                 Email
0  username@domain.com
1   khanajan@gmail.com
2       hari@gmail.com


Final data frame:
                 Email  Username
0  username@domain.com  username
1   khanajan@gmail.com  khanajan
2       hari@gmail.com      hari


## Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The function should return a new DataFrame that contains only the selected rows.

In [22]:
data = {'A':[3,8,6,2,9],
        'B':[5,2,19,3,1],
        'C':[1,7,4,5,2]}
df = pd.DataFrame(data)
print(df)

   A   B  C
0  3   5  1
1  8   2  7
2  6  19  4
3  2   3  5
4  9   1  2


## Ans:

In [23]:
df1 = df[(df['A']>5) & (df['B']<10)] 
df1

Unnamed: 0,A,B,C
1,8,2,7
4,9,1,2


## Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.

## Ans:

In [None]:
# Mean
df['Values'].mean()
# Median
df['Values'].median()
# Mode
df['Values'].mode()
# Standard Deviation
df['Values'].std()

## Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and should include the current day.

## Ans:

In [None]:
import pandas as pd

def calculate_moving_average(df):
    # Sort the DataFrame by the 'Date' column if it's not already sorted
    df['Date'] = pd.to_datetime(df['Date'])
    df = df.sort_values(by='Date')

    # Calculate the moving average with a window of size 7
    df['MovingAverage'] = df['Sales'].rolling(window=7, min_periods=1).mean()

    return df

# Example usage:
data = {'Date': ['2023-09-01', '2023-09-02', '2023-09-03', '2023-09-04', '2023-09-05', '2023-09-06', '2023-09-07'],
        'Sales': [100, 120, 130, 110, 140, 150, 160]}
df = pd.DataFrame(data)

df = calculate_moving_average(df)
print(df)

## Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g. Monday, Tuesday) corresponding to each date in the 'Date' column. For example, if df contains the following values:
    Date
0 2023-01-01\
1 2023-01-02\
2 2023-01-03\
3 2023-01-04\
4 2023-01-05\
Your function should create the following DataFrame:

     Date   Weekday
0 2023-01-01   Sunday\
1 2023-01-02   Monday\
2 2023-01-03   Tuesday\
3 2023-01-04   Wednesday\
4 2023-01-05   Thursday\
The function should return the modified DataFrame.

## Ans:

In [44]:
data = pd.date_range(start = '2023-01-01' ,end = '2023-01-05')
data

DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
               '2023-01-05'],
              dtype='datetime64[ns]', freq='D')

In [45]:
df_data = pd.DataFrame({'Date':data})
df_data

Unnamed: 0,Date
0,2023-01-01
1,2023-01-02
2,2023-01-03
3,2023-01-04
4,2023-01-05


In [47]:
df_data['Date'] = pd.to_datetime(df_data['Date'])
lst1 = ['Sunday','Monday','Tuesday','Wednesday','Thursday']
lst2 = list(df_data['Date'].dt.day)
df_data['Weekday'] = [lst1[x-1] for x in lst2]
print('New data frame:')
print(df_data)

New data frame:
        Date    Weekday
0 2023-01-01     Sunday
1 2023-01-02     Monday
2 2023-01-03    Tuesday
3 2023-01-04  Wednesday
4 2023-01-05   Thursday


## Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

## Ans:

In [None]:
import pandas as pd

def select_rows_in_date_range(df):
    # Convert the 'Date' column to datetime format if it's not already
    df['Date'] = pd.to_datetime(df['Date'])
    
    # Define the start and end dates for the date range
    start_date = pd.Timestamp('2023-01-01')
    end_date = pd.Timestamp('2023-01-31')
    
    # Use boolean indexing to select rows within the specified date range
    selected_rows = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]
    
    return selected_rows

# Example usage:
data = {'Date': ['2023-01-05', '2023-01-15', '2023-02-10', '2023-01-25'],
        'Value': [100, 150, 200, 120]}
df = pd.DataFrame(data)

selected_df = select_rows_in_date_range(df)
print(selected_df)

## Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?

## Ans:

In [48]:
import pandas