#### Answer_1

Both Ordinal Encoding and Label Encoding are methods used to convert categorical data into numerical format to make it suitable for machine learning algorithms. However, there are some differences between the two:

* Label Encoding assigns a unique number to each category in a categorical variable. For example, if you have a variable "Color" with categories ["Red", "Green", "Blue"], Label Encoding will transform it into [0, 1, 2]. This method does not assume any specific order or ranking among the categories.

* Ordinal Encoding, on the other hand, also assigns a numerical value to each category, but this value is based on the order or ranking of the categories. For example, if you have a variable "Size" with categories ["Small", "Medium", "Large"], Ordinal Encoding will transform it into [0, 1, 2], where "Small" is assigned the value of 0, "Medium" is assigned the value of 1, and "Large" is assigned the value of 2.

In general, Ordinal Encoding should be used when there is a clear order or ranking among the categories, and this order is relevant to the problem you are trying to solve. For example, if you are working with a dataset of clothing sizes and you know that "Large" is bigger than "Medium" and "Small", then using Ordinal Encoding would make sense.

Label Encoding, on the other hand, can be used when there is no clear order or ranking among the categories, or when the order is not relevant to the problem you are trying to solve. For example, if you are working with a dataset of car brands and you just need to assign a numerical value to each brand, then using Label Encoding would be sufficient.

#### Answer_2

Target Guided Ordinal Encoding is a type of ordinal encoding that uses the target variable to assign numerical values to the categories of a categorical variable. It works by calculating the mean of the target variable for each category and then assigning a numerical value based on the order of these means.

Here are the steps to perform Target Guided Ordinal Encoding:

Calculate the mean of the target variable for each category of the categorical variable.
Sort the categories based on their mean values in ascending or descending order.
Assign a numerical value to each category based on their order in the sorted list.
For example, suppose you have a categorical variable called "City" with four categories: "New York", "Los Angeles", "Chicago", and "Houston". You want to encode this variable using Target Guided Ordinal Encoding based on the target variable, which is the average income of people in each city.

Here are the steps to perform Target Guided Ordinal Encoding:

Calculate the mean income for each city:

* New York: $100,000
* Los Angeles: $90,000
* Chicago: $80,000
* Houston: $70,000

Sort the cities based on their mean income in descending order:

* New York
* Los Angeles
* Chicago
* Houston

Assign numerical values to each city based on their order in the sorted list:

* New York: 3
* Los Angeles: 2
* Chicago: 1
* Houston: 0

In this example, Target Guided Ordinal Encoding assigns a higher value to cities with higher average incomes.

Target Guided Ordinal Encoding can be useful when there is a strong relationship between the target variable and the categorical variable, and the order of the categories is not clear or meaningful. For example, in a marketing campaign dataset, you might use Target Guided Ordinal Encoding to encode the "Occupation" variable based on the response rate of each occupation to the previous marketing campaign. This would help the model to capture the relationship between occupation and response rate in a more meaningful way than traditional ordinal encoding.

#### Answer_3

Covariance is a statistical measure that describes the relationship between two variables. Specifically, it measures how much two variables change together, or in other words, how much they co-vary. More precisely, it measures the degree to which the two variables vary together, either in a positive or negative way.

Covariance is important in statistical analysis because it helps to understand the relationship between two variables. A positive covariance indicates that the two variables tend to increase or decrease together, while a negative covariance indicates that one variable tends to increase while the other decreases. If the covariance is close to zero, then the two variables are considered to be independent of each other.

Covariance can be calculated using the following formula:

* cov(X,Y) = Σ[(Xi - Xmean)(Yi - Ymean)] / (n - 1)

Where X and Y are two variables, Xi and Yi are the individual observations for each variable, Xmean and Ymean are the sample means of X and Y respectively, and n is the sample size.

The resulting value of covariance can be positive, negative, or zero. A positive covariance means that the two variables tend to increase or decrease together, while a negative covariance means that one variable tends to increase while the other decreases. A covariance of zero indicates that there is no linear relationship between the two variables.

#### Answer_4

In [2]:
import pandas as pd

In [4]:
df = pd.DataFrame({
    "Color" : ['red','green','blue'],
    "Size" : ['small','medium','large'],
    "Material" : ['wood','metal','plastic']
})

In [32]:
df

Unnamed: 0,Color,Size,Material,color_label_encoded
0,red,small,wood,2
1,green,medium,metal,1
2,blue,large,plastic,0


In [6]:
from sklearn.preprocessing import LabelEncoder

In [40]:
color_encoded = LabelEncoder.fit_transform(df['Color'])
size_encoded = LabelEncoder.fit_transform(df['Size'])
material_encoded = LabelEncoder.fit_transform(df['Material'])

In [17]:
color_encoded

array([2, 1, 0])

In [41]:
size_encoded

array([2, 1, 0])

In [42]:
material_encoded

array([2, 0, 1])

In [43]:
df['color_label_encoded'] = color_encoded
df['size_label_encoded'] = size_encoded
df['material_label_encoded'] = material_encoded

In [44]:
df

Unnamed: 0,Color,Size,Material,color_label_encoded,size_label_encoded,material_label_encoded
0,red,small,wood,2,2,2
1,green,medium,metal,1,1,0
2,blue,large,plastic,0,0,1


#### Answer_5

In [45]:
import numpy as np
import pandas as pd


data = {'Age': [25, 30, 35, 40, 45, 50],
        'Income': [50000, 60000, 70000, 80000, 90000, 100000],
        'Education': [12, 14, 16, 18, 20, 22]}

df = pd.DataFrame(data)


cov_matrix = np.cov(df.T)


print(cov_matrix)


[[8.75e+01 1.75e+05 3.50e+01]
 [1.75e+05 3.50e+08 7.00e+04]
 [3.50e+01 7.00e+04 1.40e+01]]


#### Answer_6

In [50]:
Gender : ['Male','Female']
Education_Level : ['High Scholl','Bacherlors','Masters','PhD']
Employment_Status : ['Unemployment','Par-time','Full-time']  

* For Gender we can use OHE(One Hot Encoding) and there is only two categorical variable are there.
* For Education Level we can use Ordinal Encoding as there is inherent ranking in between categorical varible.
* For Employment Status we can use Ordinal Encoding or Nominal Encoding.

#### Answer_7

In [51]:
import pandas as pd


data = {'Temperature': [25, 30, 35, 40, 45, 50],
        'Humidity': [50, 60, 70, 80, 90, 100],
        'Weather Condition': ['Sunny', 'Cloudy', 'Rainy', 'Sunny', 'Rainy', 'Cloudy'],
        'Wind Direction': ['North', 'South', 'East', 'West', 'North', 'South']}

df = pd.DataFrame(data)


cov_matrix = df.cov()


print(cov_matrix)

             Temperature  Humidity
Temperature         87.5     175.0
Humidity           175.0     350.0


  cov_matrix = df.cov()
