### Q1

**Ordinal Encoding vs. Label Encoding**:
- **Ordinal Encoding**: Assigns numerical values to categories with an implied order. For example, for "Size" (small, medium, large), encode as 1, 2, 3.
- **Label Encoding**: Assigns numerical values to categories without any order. For example, "Color" (red, green, blue) as 1, 2, 3.
- **Example**: Choose ordinal encoding for education levels (High School, Bachelor’s, Master’s) due to the inherent order. Use label encoding for non-ordinal categories like color.

### Q2

**Target Guided Ordinal Encoding**: Assigns ordinal values based on the mean of the target variable for each category. For example, in a churn prediction model, if customers with "Contract Type" have different churn rates, the categories can be encoded based on these rates. 

### Q3

**Covariance**: Measures the degree to which two variables change together. It's important for understanding relationships between variables. Calculated as:
\[ \text{Cov}(X, Y) = \frac{1}{n-1} \sum (X_i - \bar{X})(Y_i - \bar{Y}) \]
where \(X_i\) and \(Y_i\) are individual data points, and \(\bar{X}\) and \(\bar{Y}\) are the means of \(X\) and \(Y\).



In [5]:
### Q4
import pandas as pd
from sklearn.preprocessing import LabelEncoder

data = {'Color': ['red', 'green', 'blue'], 'Size': ['small', 'medium', 'large'], 'Material': ['wood', 'metal', 'plastic']}
df = pd.DataFrame(data)

le = LabelEncoder()
df['Color'] = le.fit_transform(df['Color'])
df['Size'] = le.fit_transform(df['Size'])
df['Material'] = le.fit_transform(df['Material'])

print(df)


   Color  Size  Material
0      2     2         2
1      1     1         0
2      0     0         1


In [7]:
### Q5
import numpy as np

data = np.array([[25, 50000, 16], [30, 60000, 18], [35, 70000, 20], [40, 80000, 22]])
cov_matrix = np.cov(data.T)

print(cov_matrix)


[[4.16666667e+01 8.33333333e+04 1.66666667e+01]
 [8.33333333e+04 1.66666667e+08 3.33333333e+04]
 [1.66666667e+01 3.33333333e+04 6.66666667e+00]]


### Q6
Gender: One-hot encoding, creating binary columns for Male and Female.
Education Level: Ordinal encoding, representing the inherent order (1: High School, 2: Bachelor’s, 3: Master’s, 4: PhD).
Employment Status: One-hot encoding, creating binary columns for each status (Unemployed, Part-Time, Full-Time).

In [10]:
### Q7
temperature = [30, 22, 25, 28, 31]
humidity = [70, 60, 75, 65, 80]
cov_temp_humidity = np.cov(temperature, humidity)[0, 1]

print(cov_temp_humidity)


18.75
