## Question 1

Ordinal encoding and label encoding are related concepts, but they have distinct characteristics and applications. 

Ordinal Encoding:

Ordinal encoding is used for categorical variables with a meaningful order or hierarchy among the categories. It assigns numerical labels to categories in a way that reflects their ordinal relationship.
Consider an "Education Level" variable with categories High School, Bachelor's, Master's, and Ph.D. Ordinal encoding might assign labels like 1, 2, 3, and 4, respectively, reflecting the increasing level of education.

Label Encoding:

Label encoding is a more general term and can be applied to both nominal and ordinal categorical variables. It involves assigning a unique numerical label to each category, regardless of any inherent order. Can be applied to both ordinal and nominal categorical variables. It's a more general technique but might not be suitable when there's a meaningful order among categories.

For example if the task is predicting job satisfaction based on education level, and you don't assume a specific order, you might choose label encoding

## Question 2

Target Guided Ordinal Encoding is a technique used for encoding categorical variables based on the relationship between the categories and the target variable in a supervised learning setting. The goal is to capture the ordinal relationship between the categories of a feature and the target variable, making it particularly useful when the target variable is ordinal in nature.

For example predicting customer satisfaction levels Low, Medium, High based on various features. The target guided ordinal encoding captures the ordinal relationship between the satisfaction levels and the features, which might help the model understand the impact of each feature on customer satisfaction.

## Question 3

Covariance is a statistical measure that describes the extent to which two variables change together. In other words, it quantifies the degree to which two variables tend to move in relation to each other. Covariance can be positive, indicating that as one variable increases, the other variable also tends to increase, or negative, indicating an inverse relationship. A covariance value of zero suggests no linear relationship between the variables.

Covariance provides information about the direction of the relationship between two variables. A positive covariance indicates a positive relationship, while a negative covariance indicates a negative relationship. The magnitude (absolute value) of covariance indicates the strength of the relationship between variables. Larger absolute covariance values suggest a stronger linear relationship.

Let's assume we have two features X and Y then ,

Cov(X,Y) = ∑ (Xi-X(mean))(Yi-Y(mean))/(n-1)

Where Xi and Yi are individual points and n is the number of data points.


## Question 4

In [1]:
from sklearn.preprocessing import LabelEncoder
import pandas as pd

data={
    'Color':['red','green','blue'],
    'Size':['small','medium','large'],
    'Material':['wood','metal','plastic']
}
df=pd.DataFrame(data)

encoder=LabelEncoder()

df['Color_encoded']=encoder.fit_transform(df['Color'])
df['Size_encoded']=encoder.fit_transform(df['Size'])
df['Material_encoded']=encoder.fit_transform(df['Material'])

print("Encoded DataFrame\n")
df.head()

Encoded DataFrame



Unnamed: 0,Color,Size,Material,Color_encoded,Size_encoded,Material_encoded
0,red,small,wood,2,2,2
1,green,medium,metal,1,1,0
2,blue,large,plastic,0,0,1


## Question 5

Let's assume you have a DataFrame df with columns 'Age', 'Income', and 'Education_Level'. 

In [2]:
import pandas as pd
import numpy as np
data={
    'Age':[25,30,35,40,45],
    'Income':[50000,60000,75000,90000,80000],
    'Education_Level':[12,16,14,18,15]
}

df=pd.DataFrame(data)
cov_matrix=np.cov(df,rowvar=False)
print(cov_matrix)

[[6.250e+01 1.125e+05 1.000e+01]
 [1.125e+05 2.550e+08 2.625e+04]
 [1.000e+01 2.625e+04 5.000e+00]]


## Question 6

We have a dataset containing several categorical variables, including "Gender" (Male/Female), "Education Level" (High School/Bachelor's/Master's/PhD), and "Employment Status (Unemployed/Part-Time/Full-Time). 

1. Gender : There are only two unique categories in the feature gender therefore we can use binary encoding in which 1 can be assigned to "Female"  and 0 can be assigned to "Male".

2. Education Level : The sub categories present are High School/Bachelor's/Master's/PhD thus these categories can be ordinally encoded.

3. Employemment Status : There are three sub-categories defined within this category namely Unemployed/Part-Time/Full-Time therefore we can use nominal or one-hot encoding for this particular feature.

## Question 7

In [3]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
data={
    'Temperature':[25,30,22,28,26],
    'Humidity':[60,70,55,75,65],
    'Weather Condition':['Sunny','Cloudy','Rainy','Sunny','Cloudy'],
    'Wind Direction':['North','South','East','West','North']
}

df=pd.DataFrame(data)
encoder=OneHotEncoder(sparse=False,drop='first')
encoded_categories=encoder.fit_transform(df[['Weather Condition','Wind Direction']])
df_encoded=pd.concat([df[['Temperature','Humidity']],pd.DataFrame(encoded_categories,columns=encoder.get_feature_names_out(['Weather Condition','Wind Direction']))],axis=1)
df_encoded.cov()



Unnamed: 0,Temperature,Humidity,Weather Condition_Rainy,Weather Condition_Sunny,Wind Direction_North,Wind Direction_South,Wind Direction_West
Temperature,9.2,21.25,-1.05,0.15,-0.35,0.95,0.45
Humidity,21.25,62.5,-2.5,1.25,-1.25,1.25,2.5
Weather Condition_Rainy,-1.05,-2.5,0.2,-0.1,-0.1,-0.05,-0.05
Weather Condition_Sunny,0.15,1.25,-0.1,0.3,0.05,-0.1,0.15
Wind Direction_North,-0.35,-1.25,-0.1,0.05,0.3,-0.1,-0.1
Wind Direction_South,0.95,1.25,-0.05,-0.1,-0.1,0.2,-0.05
Wind Direction_West,0.45,2.5,-0.05,0.15,-0.1,-0.05,0.2
