##### Q1. List any five functions of the pandas library with execution.

Here are five functions from the pandas library along with their brief descriptions and example executions:

**1. read_csv():** This function is used to read data from a CSV file and create a DataFrame.

In [5]:
import pandas as pd

# Read data from a CSV file and create a DataFrame
data = pd.read_csv('data.csv')

data

Unnamed: 0,Year,Industry_aggregation_NZSIOC,Industry_code_NZSIOC,Industry_name_NZSIOC,Units,Variable_code,Variable_name,Variable_category,Value,Industry_code_ANZSIC06
0,2021.0,Level 1,99999.0,All industries,Dollars (millions),H01,Total income,Financial performance,757504,ANZSIC06 divisions A-S (excluding classes K633...
1,2021.0,Level 1,99999.0,All industries,Dollars (millions),H04,"Sales, government funding, grants and subsidies",Financial performance,674890,ANZSIC06 divisions A-S (excluding classes K633...
2,2021.0,Level 1,99999.0,All industries,Dollars (millions),H05,"Interest, dividends and donations",Financial performance,49593,ANZSIC06 divisions A-S (excluding classes K633...
3,2021.0,Level 1,99999.0,All industries,Dollars (millions),H07,Non-operating income,Financial performance,33020,ANZSIC06 divisions A-S (excluding classes K633...
4,2021.0,Level 1,99999.0,All industries,Dollars (millions),H08,Total expenditure,Financial performance,654404,ANZSIC06 divisions A-S (excluding classes K633...
...,...,...,...,...,...,...,...,...,...,...
41704,,,,,,,,,,
41705,,,,,,,,,,
41706,,,,,,,,,,
41707,,,,,,,,,,


**2. df.head():** This function is used to display the first few rows of a DataFrame. It helps you quickly inspect the data.

In [7]:
import pandas as pd

data = {'Name': ['Henil', 'Hansraj', 'Abhi', 'Yash', 'Jay'],
        'Age': [24, 23, 25, 22, 23]}

df = pd.DataFrame(data)

df.head()

Unnamed: 0,Name,Age
0,Henil,24
1,Hansraj,23
2,Abhi,25
3,Yash,22
4,Jay,23


**3. df.info():** This function provides information about the DataFrame, including the data types of columns and the number of non-null values.

In [9]:
import pandas as pd

data = {'Name': ['Henil', 'Yash', 'Abhi'],
        'Age': [24, 22, 25]}

df = pd.DataFrame(data)

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3 non-null      object
 1   Age     3 non-null      int64 
dtypes: int64(1), object(1)
memory usage: 176.0+ bytes


**4. df.describe():** This function generates descriptive statistics of the numerical columns in the DataFrame, such as mean, standard deviation, minimum, maximum, etc.

In [2]:
import pandas as pd

data = {'Age': [24, 23, 25, 22, 23]}

df = pd.DataFrame(data)

df.describe()

Unnamed: 0,Age
count,5.0
mean,23.4
std,1.140175
min,22.0
25%,23.0
50%,23.0
75%,24.0
max,25.0


**5. df.groupby():** This function is used to group data in a DataFrame based on one or more columns and perform operations on the grouped data.

In [4]:
import pandas as pd

data = {'Name': ['Henil', 'Hansraj', 'Abhi', 'Yash', 'Jay'],
        'Score': [24, 23, 25, 22, 23]}

df = pd.DataFrame(data)

grouped = df.groupby('Name')['Score'].mean()

grouped

Name
Abhi       25.0
Hansraj    23.0
Henil      24.0
Jay        23.0
Yash       22.0
Name: Score, dtype: float64

##### Q2. Given a Pandas DataFrame df with columns 'A', 'B', and 'C', write a Python function to re-index the DataFrame with a new index that starts from 1 and increments by 2 for each row.

In [37]:
import pandas as pd

data = {'A': [10,20,30], 'B':[40,50,60], 'C':[70,80,90]}

df = pd.DataFrame(data)

def reindex(dataframe):
    new_index = pd.RangeIndex(start=1, stop=len(dataframe)*2, step=2)
    dataframe.index = new_index
    
reindex(df)

df

Unnamed: 0,A,B,C
1,10,40,70
3,20,50,80
5,30,60,90


##### Q3. You have a Pandas DataFrame df with a column named 'Values'. Write a Python function that iterates over the DataFrame and calculates the sum of the first three values in the 'Values' column. The function should print the sum to the console.

In [5]:
import pandas as pd

data = {'Values': [15,23,45,89,48]}

df = pd.DataFrame(data)

def sum_first_three(dataframe):
    
    first_three_value = dataframe['Values'][:3]
    
    sum_first_three_values = first_three_value.sum()
    
    return f'Sum of the first three value: {sum_first_three_values}'

sum_first_three(df)

'Sum of the first three value: 83'

##### Q4. Given a Pandas DataFrame df with a column 'Text', write a Python function to create a new column 'Word_Count' that contains the number of words in each row of the 'Text' column.

In [29]:
import pandas as pd

data = {'Text': ['My name is Henil Rupawala', 'Data Science', 'What a busy day']}

df = pd.DataFrame(data)

def add_word_count(dataframe, text_column, new_column):
    dataframe[new_column] = dataframe[text_column].str.split().apply(len)

add_word_count(df, 'Text', 'Word_Count')

df

Unnamed: 0,Text,Word_Count
0,My name is Henil Rupawala,5
1,Data Science,2
2,What a busy day,4


##### Q5. How are DataFrame.size() and DataFrame.shape() different?

**DataFrame.size** and **DataFrame.shape** are both attributes of Pandas DataFrames, but they provide different types of information:

**1. DataFrame.size:**
- The size attribute returns the total number of elements in the DataFrame, including both rows and columns.
- It calculates the size by multiplying the number of rows by the number of columns.
- The value returned by size is a single integer.
- It gives you a measure of the total number of data points in the DataFrame.

Example:

In [43]:
import pandas as pd

data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

df.size  # Output: 6 (3 rows * 2 columns)

6

**2. DataFrame.shape:**

- The shape attribute returns a tuple representing the dimensions of the DataFrame.
- It consists of two values: the number of rows and the number of columns, in that order.
- It provides a clear overview of the structure of the DataFrame.

Example:

In [44]:
import pandas as pd

data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

df.shape  # Output: (3, 2) (3 rows, 2 columns)

(3, 2)

##### Q6. Which function of pandas do we use to read an excel file?

To read an Excel file in pandas, you can use the **pd.read_excel()** function. This function is used to read data from an Excel file into a pandas DataFrame.

##### Q7. You have a Pandas DataFrame df that contains a column named 'Email' that contains email addresses in the format 'username@domain.com'. Write a Python function that creates a new column 'Username' in df that contains only the username part of each email address.

In [48]:
import pandas as pd

data = {'Email':['henilrupawala@gmail.com', 'hansraj123@gmail.com', 'abhi54@gmail.com']}

df = pd.DataFrame(data)

def username_from_email(dataframe, Email_column_name, username_column_name):
    dataframe[username_column_name] = dataframe[Email_column_name].str.split('@').str[0]
    
username_from_email(df, 'Email', 'Username')

df

Unnamed: 0,Email,Username
0,henilrupawala@gmail.com,henilrupawala
1,hansraj123@gmail.com,hansraj123
2,abhi54@gmail.com,abhi54


##### Q8. You have a Pandas DataFrame df with columns 'A', 'B', and 'C'. Write a Python function that selects all rows where the value in column 'A' is greater than 5 and the value in column 'B' is less than 10. The function should return a new DataFrame that contains only the selected rows.

In [9]:
import pandas as pd

data = {'A': [2,6,7,9,8,1,4,5,9,4], 'B': [5,6,9,8,13,16,1,6,14,6], 'C': [1,2,3,8,9,12,14,12,11,6]}

df = pd.DataFrame(data)

def new_dataframe(dataframe):
    new_column = (dataframe['A'] > 5) & (dataframe['B'] < 10)
    dataframe1 = dataframe[new_column]
    return dataframe1

df1 = new_dataframe(df)

df1

Unnamed: 0,A,B,C
1,6,6,2
2,7,9,3
3,9,8,8


##### Q9. Given a Pandas DataFrame df with a column 'Values', write a Python function to calculate the mean, median, and standard deviation of the values in the 'Values' column.

In [36]:
import pandas as pd

data = {'Values': [12,45,10,45,85,108,25,50]}

df = pd.DataFrame(data)

def mean_median_stddeviation(dataframe, column_name):
    mean_value = dataframe[column_name].mean()
    median_value = dataframe[column_name].median()
    std_value = dataframe[column_name].std()
    new_line = '\n'
    return mean_value, median_value, std_value

mean, median, std = mean_median_stddeviation(df, 'Values')

print("Mean:", mean)
print("Median:", median)
print("Standard Deviation:", std)

Mean: 47.5
Median: 45.0
Standard Deviation: 34.34696909065319


##### Q10. Given a Pandas DataFrame df with a column 'Sales' and a column 'Date', write a Python function to create a new column 'MovingAverage' that contains the moving average of the sales for the past 7 days for each row in the DataFrame. The moving average should be calculated using a window of size 7 and should include the current day.

In [14]:
import pandas as pd

data = {'Sales': [12,23,45,50,15,25,70], 'Date': ['2023-08-01','2023-08-02','2023-08-03','2023-08-04','2023-08-05','2023-08-06','2023-08-07']}

df = pd.DataFrame(data)

def moving_average(dataframe, value_column_name, date_column_name, window_size):
    
    dataframe[date_column_name] = pd.to_datetime(dataframe[date_column_name])
    
    dataframe.sort_values(by=date_column_name, inplace=True)
    
    dataframe['MovingAverage'] = dataframe[value_column_name].rolling(window=window_size, min_periods=1).mean()
    
moving_average(df, 'Sales', 'Date', window_size=7)

df

Unnamed: 0,Sales,Date,MovingAverage
0,12,2023-08-01,12.0
1,23,2023-08-02,17.5
2,45,2023-08-03,26.666667
3,50,2023-08-04,32.5
4,15,2023-08-05,29.0
5,25,2023-08-06,28.333333
6,70,2023-08-07,34.285714


##### Q11. You have a Pandas DataFrame df with a column 'Date'. Write a Python function that creates a new column 'Weekday' in the DataFrame. The 'Weekday' column should contain the weekday name (e.g. Monday, Tuesday) corresponding to each date in the 'Date' column.

In [30]:
import pandas as pd

data = {'Date': ['2023-08-01','2022-08-10','2002-12-01','1999-08-10','2030-11-30','2024-02-26']}

df = pd.DataFrame(data)

def day_from_date(dataframe, date_column_name, days):
    dataframe[date_column_name] = pd.to_datetime(dataframe[date_column_name])
    dataframe[days] = dataframe[date_column_name].dt.day_name()
    
day_from_date(df, 'Date', 'Weekday')

df

Unnamed: 0,Date,Weekday
0,2023-08-01,Tuesday
1,2022-08-10,Wednesday
2,2002-12-01,Sunday
3,1999-08-10,Tuesday
4,2030-11-30,Saturday
5,2024-02-26,Monday


##### Q12. Given a Pandas DataFrame df with a column 'Date' that contains timestamps, write a Python function to select all rows where the date is between '2023-01-01' and '2023-01-31'.

In [39]:
import pandas as pd

data = {'Date': ['2023-08-01','2023-01-10','2023-12-01','2023-01-26','2023-11-30','2023-01-31']}

df = pd.DataFrame(data)

df['Date'] = pd.to_datetime(df['Date'])

def select_row_between_dates(dataframe, start_date, end_date):
    return dataframe[(dataframe['Date'] >= start_date) & (dataframe['Date'] <= end_date)]

start_date = '2023-01-01'
end_date = '2023-01-31'

select_row_between_dates(df, start_date, end_date)

Unnamed: 0,Date
1,2023-01-10
3,2023-01-26
5,2023-01-31


##### Q13. To use the basic functions of pandas, what is the first and foremost necessary library that needs to be imported?

The first and foremost necessary library that needs to be imported to use the basic functions of pandas is, of course, the **pandas** library itself. The pandas library provides the fundamental data structures (such as DataFrame and Series) and functions for data manipulation and analysis.

Here's how you would import the pandas library:

In [40]:
import pandas as pd

By convention, the alias pd is commonly used for pandas to make it easier to refer to the library's functions and objects. Once you've imported pandas, you can start using its functions to work with DataFrames, Series, and various data manipulation and analysis operations.