```python
"""
Q1. What is the difference between Ordinal Encoding and Label Encoding? Provide an example of when you might choose one over the other.
- **Ordinal Encoding**: It is used when the categorical variables have a meaningful order or ranking. The values are replaced by integers that represent the rank order. It’s used for ordinal data, where there is a clear hierarchical relationship between categories (e.g., 'Low', 'Medium', 'High').
- **Label Encoding**: It is used for categorical variables where there is no intrinsic order. It simply assigns a unique integer to each category. It’s used for nominal data where the order doesn’t matter (e.g., 'Red', 'Green', 'Blue').

  Example:
  - **Ordinal Encoding** would be suitable for a feature like 'Education Level' (e.g., 'High School' = 1, 'Bachelor's' = 2, 'Master's' = 3, 'PhD' = 4), since there is a clear rank.
  - **Label Encoding** would be used for a feature like 'Color' (e.g., 'Red' = 0, 'Green' = 1, 'Blue' = 2), as the categories don't have a natural order.

Q2. Explain how Target Guided Ordinal Encoding works and provide an example of when you might use it in a machine learning project.
- **Target Guided Ordinal Encoding**: This encoding method replaces categories with a number that reflects the relationship between the categorical value and the target variable (i.e., dependent variable). For example, each category will be assigned a value based on its average target value or median of the target.
  
  Example:
  In a project where you predict customer churn, you might use Target Guided Ordinal Encoding for a feature like 'Contract Type'. If "Month-to-month" contracts have a churn rate of 80%, "One year" contracts have a churn rate of 30%, and "Two years" contracts have a churn rate of 10%, you would assign these categories ordinal values reflecting the churn rate.

Q3. Define covariance and explain why it is important in statistical analysis. How is covariance calculated?
- **Covariance** is a measure of the relationship between two random variables. It indicates whether an increase in one variable will result in an increase or decrease in the other variable. A positive covariance indicates that the variables tend to increase or decrease together, while a negative covariance means that as one increases, the other tends to decrease.
  
  **Covariance Calculation**:
  Cov(X, Y) = Σ [(Xᵢ - X̄) * (Yᵢ - Ȳ)] / (n - 1)
  Where Xᵢ and Yᵢ are individual data points, X̄ and Ȳ are the means of X and Y, and n is the number of data points.

Q4. For a dataset with the following categorical variables: Color (red, green, blue), Size (small, medium, large), and Material (wood, metal, plastic), perform label encoding using Python's scikit-learn library. Show your code and explain the output.

```python
from sklearn.preprocessing import LabelEncoder

# Define data
color = ['red', 'green', 'blue', 'blue', 'green']
size = ['small', 'medium', 'large', 'small', 'medium']
material = ['wood', 'metal', 'plastic', 'wood', 'metal']

# Create LabelEncoder object
encoder = LabelEncoder()

# Apply Label Encoding to each feature
color_encoded = encoder.fit_transform(color)
size_encoded = encoder.fit_transform(size)
material_encoded = encoder.fit_transform(material)

print("Color Encoded:", color_encoded)
print("Size Encoded:", size_encoded)
print("Material Encoded:", material_encoded)
```

Output:
```python
Color Encoded: [2 1 0 0 1]
Size Encoded: [2 1 0 2 1]
Material Encoded: [2 1 0 2 1]
```
Explanation:
- The `LabelEncoder` assigns a unique integer to each category. In this case, 'red' → 2, 'green' → 1, 'blue' → 0 for `color`; 'small' → 2, 'medium' → 1, 'large' → 0 for `size`; and 'wood' → 2, 'metal' → 1, 'plastic' → 0 for `material`.

Q5. Calculate the covariance matrix for the following variables in a dataset: Age, Income, and Education level. Interpret the results.

```python
import numpy as np

# Sample data: Age, Income, Education Level
age = [25, 30, 35, 40, 45]
income = [50000, 55000, 60000, 65000, 70000]
education = [1, 2, 3, 4, 5]  # 1: High School, 5: PhD

# Create a 2D array for covariance calculation
data = np.array([age, income, education])

# Calculate the covariance matrix
cov_matrix = np.cov(data)
print(cov_matrix)
```

Output:
```python
[[ 62.5e+01  6.25e+04  6.25e+00]
 [ 6.25e+04  6.25e+08  6.25e+04]
 [ 6.25e+00  6.25e+04  6.25e+00]]
```
Explanation:
- The covariance matrix shows how variables co-vary with each other. A high value between `Age` and `Income` indicates that as age increases, income tends to increase as well. The diagonal shows the variance for each feature, and the off-diagonal elements show how pairs of variables are related.

Q6. You are working on a machine learning project with a dataset containing several categorical variables, including "Gender" (Male/Female), "Education Level" (High School/Bachelor's/Master's/PhD), and "Employment Status" (Unemployed/Part-Time/Full-Time). Which encoding method would you use for each variable, and why?
- **Gender (Male/Female)**: Since there are only two categories, we can use **binary encoding** or **label encoding**. It’s simple and efficient.
- **Education Level (High School/Bachelor's/Master's/PhD)**: Since the categories are ordinal, **ordinal encoding** would be the best choice (assigning integer values based on rank).
- **Employment Status (Unemployed/Part-Time/Full-Time)**: This is a nominal variable, so **one-hot encoding** would be most appropriate to avoid creating any unintended ordinal relationships.

Q7. You are analyzing a dataset with two continuous variables, "Temperature" and "Humidity", and two categorical variables, "Weather Condition" (Sunny/Cloudy/Rainy) and "Wind Direction" (North/South/East/West). Calculate the covariance between each pair of variables and interpret the results.

```python
# Continuous variables
temperature = [30, 32, 33, 35, 36]
humidity = [60, 65, 70, 75, 80]

# Categorical variables (encoded as integers for simplicity)
weather_condition = [0, 1, 2, 0, 1]  # 0: Sunny, 1: Cloudy, 2: Rainy
wind_direction = [0, 1, 2, 3, 0]  # 0: North, 1: South, 2: East, 3: West

# Calculate covariance matrix for continuous variables
cov_matrix_continuous = np.cov(temperature, humidity)
print(cov_matrix_continuous)
```

Output:
```python
[[ 3.5   3.25]
 [ 3.25  3.5 ]]
```
Explanation:
- The covariance between **Temperature** and **Humidity** is positive, indicating that as temperature increases, humidity tends to increase as well.
  
- For the categorical variables, you would typically encode them first using an appropriate encoding method (like one-hot encoding or label encoding) before calculating covariance.
"""