In [None]:
# Q 1 Answer:
"""
Ordinal encoding and label encoding are two common techniques used for encoding categorical variables in machine learning.

Label encoding involves assigning a unique numerical value to each category in a categorical variable. For example, 
if we have a categorical variable 'color' with categories 'red', 'green', and 'blue', we could assign the values 0, 1, and 2 to the categories, 
respectively. The order of the values assigned does not matter, and label encoding does not imply any order or hierarchy among the categories.

Ordinal encoding, on the other hand, is a similar process but assigns values based on the order of the categories. 
For example, if we have a categorical variable 'size' with categories 'small', 'medium', and 'large', we could assign the values 0, 1, and 2 to
the categories, respectively. Here, the order of the values assigned reflects the order of the categories, with 'small' being assigned the lowest 
value, and 'large' being assigned the highest.

In some cases, using ordinal encoding might be more appropriate than label encoding. For example, if we have a categorical variable 'temperature' with
categories 'low', 'medium', and 'high', using ordinal encoding would reflect the natural ordering of the categories. However, if we have a categorical
variable 'color' with categories 'red', 'green', and 'blue', using label encoding might be more appropriate since the categories do not have a natural
ordering.

In general, the choice between ordinal encoding and label encoding depends on the specific context and the nature of the categorical variable being
encoded.

"""

In [None]:
# Q 2 Answer:
"""
Target Guided Ordinal Encoding is a technique used for encoding categorical variables based on the relationship between the categorical variable and 
the target variable in a supervised learning problem. This encoding method involves replacing the categories of a categorical variable with ordinal 
numbers that reflect the relationship between the category and the target variable.

The process of Target Guided Ordinal Encoding involves the following steps:

1. Calculate the mean or median of the target variable for each category of the categorical variable.
2.Sort the categories in ascending or descending order based on the mean or median of the target variable.
3.Assign ordinal values to each category based on their position in the sorted list.

For example, let's say we have a dataset that contains a categorical variable "city" with values 'New York', 'Boston', 'San Francisco', and 'Chicago'.
We want to predict the income of people living in each city, and we notice that there is a strong relationship between the city and income. 
We can use Target Guided Ordinal Encoding to encode the "city" variable as follows:

1.Calculate the mean income for each city. Let's say the mean incomes are: 'New York' - $80,000, 'Boston' - $75,000, 'San Francisco' - $90,000, and
'Chicago' - $70,000.
2.Sort the cities in descending order based on the mean income: 'San Francisco', 'New York', 'Boston', 'Chicago'.
3.Assign ordinal values to each city based on their position in the sorted list: 'San Francisco' - 4, 'New York' - 3, 'Boston' - 2, 'Chicago' - 1.
Now we have encoded the categorical variable "city" as an ordinal variable that reflects the relationship between the city and income.

We might choose to use Target Guided Ordinal Encoding in a machine learning project when we have a categorical variable that we believe has a strong 
relationship with the target variable, and we want to capture this relationship in our model. This encoding method can help improve the performance 
of our model by providing a better representation of the categorical variable in the training data.

"""

In [None]:
# Q 3 Answer
"""
Covariance is a measure of the relationship between two variables. It measures how much two variables change together, meaning it reflects the 
degree to which they are positively or negatively associated with each other. A positive covariance indicates that the two variables tend to 
increase or decrease together, while a negative covariance indicates that they tend to move in opposite directions.

In statistical analysis, covariance is important because it provides a way to quantify the degree of association between two variables. 
It is used to understand the relationships between variables, to identify patterns and trends in data, 
and to help make predictions based on observed data.

Covariance is calculated using the following formula:

cov(X,Y) = (Σ(xi - μx) * (yi - μy)) / (n-1)

Where:

cov(X,Y) is the covariance between variables X and Y.
xi and yi are the individual values of variables X and Y, respectively.
μx and μy are the means of variables X and Y, respectively.
n is the total number of observations in the data set.
The covariance value can range from negative infinity to positive infinity, with a value of 0 indicating that there is no linear
relationship between the two variables.

One limitation of covariance is that it does not provide a standardized measure of the degree of association between two variables. 
Therefore, it can be difficult to interpret the magnitude of the covariance value. To address this limitation, researchers often use correlation 
coefficient, which is a standardized measure of the relationship between two variables.

"""

In [1]:
# Q 4 Answer
from sklearn.preprocessing import LabelEncoder
import pandas as pd

# Create sample dataset
data = {'Color': ['red', 'green', 'blue', 'red', 'green'],
        'Size': ['small', 'medium', 'medium', 'large', 'small'],
        'Material': ['wood', 'metal', 'plastic', 'wood', 'metal']}

df = pd.DataFrame(data)

# Create LabelEncoder object
le = LabelEncoder()

# Apply LabelEncoder to each categorical column
df['Color'] = le.fit_transform(df['Color'])
df['Size'] = le.fit_transform(df['Size'])
df['Material'] = le.fit_transform(df['Material'])

print(df)


   Color  Size  Material
0      2     2         2
1      1     1         0
2      0     1         1
3      2     0         2
4      1     2         0


In [2]:
# Q 5 Answer
import pandas as pd

# Create sample dataset
data = {'Age': [25, 32, 47, 18, 62],
        'Income': [50000, 80000, 120000, 30000, 150000],
        'Education': [12, 16, 18, 10, 20]}

df = pd.DataFrame(data)

# Calculate covariance matrix
cov_matrix = df.cov()

print(cov_matrix)


                Age        Income  Education
Age           313.7  8.665000e+05       70.3
Income     866500.0  2.430000e+09   201000.0
Education      70.3  2.010000e+05       17.2


In [None]:
# Q 6 Answer
"""
For the categorical variables in the dataset, here is my recommendation for the encoding method to use:

1. Gender (Male/Female): Use Label Encoding since there are only two categories (Male and Female). 
We can encode Male as 0 and Female as 1.

2. Education Level (High School/Bachelor's/Master's/PhD): Use Ordinal Encoding since there is an inherent order to the categories 
(High School < Bachelor's < Master's < PhD). We can assign an integer value to each category based on its position in the order.

3. Employment Status (Unemployed/Part-Time/Full-Time): Use One-Hot Encoding since there is no inherent order to the categories and we want to
avoid implying any ordinal relationship between them. We can create three binary columns, one for each category, 
where a value of 1 indicates that the individual is employed in that category and a value of 0 indicates otherwise.

By using these encoding methods, we can convert the categorical variables into numerical features that can be used as input for machine learning 
models. Label Encoding and Ordinal Encoding preserve the ordinal information in the data, while One-Hot Encoding avoids making any assumptions
about the ordering of the categories.

"""

In [3]:
# Q 7 Answer

import pandas as pd

# Create sample dataset
data = {'Temperature': [22, 25, 20, 18, 23],
        'Humidity': [50, 60, 70, 45, 55],
        'Weather_Condition_Sunny': [1, 0, 1, 0, 0],
        'Weather_Condition_Cloudy': [0, 1, 0, 0, 1],
        'Weather_Condition_Rainy': [0, 0, 0, 1, 0],
        'Wind_Direction_North': [1, 0, 0, 1, 0],
        'Wind_Direction_South': [0, 1, 0, 0, 1],
        'Wind_Direction_East': [0, 0, 1, 0, 0],
        'Wind_Direction_West': [0, 0, 0, 0, 0]}

df = pd.DataFrame(data)

# Calculate covariance matrix
cov_matrix = df.cov()

print(cov_matrix)


                          Temperature  Humidity  Weather_Condition_Sunny  \
Temperature                      7.30      6.75                    -0.30   
Humidity                         6.75     92.50                     2.00   
Weather_Condition_Sunny         -0.30      2.00                     0.30   
Weather_Condition_Cloudy         1.20      0.75                    -0.20   
Weather_Condition_Rainy         -0.90     -2.75                    -0.10   
Wind_Direction_North            -0.80     -4.25                     0.05   
Wind_Direction_South             1.20      0.75                    -0.20   
Wind_Direction_East             -0.40      3.50                     0.15   
Wind_Direction_West              0.00      0.00                     0.00   

                          Weather_Condition_Cloudy  Weather_Condition_Rainy  \
Temperature                                   1.20                    -0.90   
Humidity                                      0.75                    -2.75   
We