### QUESTIONS

Q1. What is the difference between Ordinal Encoding and Label Encoding? Provide an example of when you
might choose one over the other.

Q2. Explain how Target Guided Ordinal Encoding works and provide an example of when you might use it in
a machine learning project.

Q3. Define covariance and explain why it is important in statistical analysis. How is covariance calculated?

Q4. For a dataset with the following categorical variables: Color (red, green, blue), Size (small, medium,
large), and Material (wood, metal, plastic), perform label encoding using Python's scikit-learn library.
Show your code and explain the output.

Q5. Calculate the covariance matrix for the following variables in a dataset: Age, Income, and Education
level. Interpret the results.

Q6. You are working on a machine learning project with a dataset containing several categorical
variables, including "Gender" (Male/Female), "Education Level" (High School/Bachelor's/Master's/PhD),
and "Employment Status" (Unemployed/Part-Time/Full-Time). Which encoding method would you use for
each variable, and why?

Q7. You are analyzing a dataset with two continuous variables, "Temperature" and "Humidity", and two
categorical variables, "Weather Condition" (Sunny/Cloudy/Rainy) and "Wind Direction" (North/South/
East/West). Calculate the covariance between each pair of variables and interpret the results.

### ANSWERS

Sure, let me provide a brief description and a Python code snippet for each of the questions.

**Q1. Ordinal Encoding vs. Label Encoding:**

*Description:* Ordinal Encoding assigns integers to categorical variables based on their order or rank. Label Encoding assigns a unique integer to each category without considering any inherent order.

from sklearn.preprocessing import OrdinalEncoder, LabelEncoder
education_levels = ['High School', 'Bachelor\'s', 'Master\'s', 'PhD']
ordinal_encoder = OrdinalEncoder(categories=[education_levels])
ordinal_encoded = ordinal_encoder.fit_transform([education_levels]).tolist()[0]
label_encoder = LabelEncoder()
label_encoded = label_encoder.fit_transform(education_levels)

print("Ordinal Encoding:", ordinal_encoded)
print("Label Encoding:", label_encoded)

**Q2. Target Guided Ordinal Encoding:**

*Description:* Target Guided Ordinal Encoding involves ordering categories based on their relationship with the target variable.

```python
import pandas as pd

# Example data
data = {'Education Level': ['High School', 'Bachelor\'s', 'Master\'s', 'PhD'],
        'Satisfaction': [3, 2, 4, 5]}

df = pd.DataFrame(data)
education_level_order = df.groupby('Education Level')['Satisfaction'].mean().sort_values().index

education_level_mapping = {level: i for i, level in enumerate(education_level_order, 1)}

df['Education Level Ordinal'] = df['Education Level'].map(education_level_mapping)

print(df[['Education Level', 'Education Level Ordinal']])

**Q3. Covariance in Statistical Analysis:**

*Description:* Covariance measures the degree to which two variables change together. It is crucial in statistical analysis to understand how variables vary in tandem.
```python
import numpy as np

covariance_matrix = np.cov(df['X'], df['Y'], rowvar=False)
print("Covariance Matrix:")
print(covariance_matrix)
```

**Q4. Label Encoding with scikit-learn:**

*Description:* Label Encoding is used to transform categorical variables into numerical labels.

```python
from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()
df_encoded = df.apply(label_encoder.fit_transform)

print(df_encoded)
```

**Q5. Covariance Matrix Calculation:**

*Description:* The covariance matrix shows the pairwise covariances between variables.


```python
covariance_matrix = np.cov(df[['Age', 'Income', 'Education']], rowvar=False)
print("Covariance Matrix:")
print(covariance_matrix)
```

**Q6. Encoding for Categorical Variables:**

*Description:* Choose appropriate encoding methods for categorical variables ('Gender', 'Education Level', 'Employment Status').


```python
# "Gender": Label Encoding
df['Gender'] = LabelEncoder().fit_transform(df['Gender'])

# "Education Level": Ordinal Encoding
education_levels = ['High School', 'Bachelor\'s', 'Master\'s', 'PhD']
df['Education Level'] = OrdinalEncoder(categories=[education_levels]).fit_transform([df['Education Level']]).flatten()

# "Employment Status": Label Encoding or one-hot encoding
df = pd.get_dummies(df, columns=['Employment Status'], prefix='Employment')

print(df)
```

**Q7. Covariance Calculation for Continuous and Categorical Variables:**

*Description:* Calculate the covariance matrix for variables including both continuous and categorical ones.


```python
covariance_matrix = df.cov()
print("Covariance Matrix:")
print(covariance_matrix)
```

