## **Q1:-**  
### **What is the difference between Ordinal Encoding and Label Encoding? Provide an example of when you might choose one over the other.**

### **Ans:-**

### **Ordinal Encoding is suitable when categorical variables have an inherent order or ranking. Label Encoding is appropriate when encoding target variables, especially for categorical variables with no inherent order.**

## **Q2:-**   
### **Explain how Target Guided Ordinal Encoding works and provide an example of when you might use it in a machine learning project.**

### **Ans:-**

### **The encoding process involves sorting the categories based on the mean of the target variable for each category and then assigning a numerical value to each category based on its rank. This encoding technique can be used in various machine learning tasks, such as regression, classification, and ranking problems**

## **Q3:-** 
### **Define covariance and explain why it is important in statistical analysis. How is covariance calculated?**

## **Ans:-**

### **Covariance is a statistical tool used to determine the relationship between the movements of two random variables. When two stocks tend to move together, they are seen as having a positive covariance; when they move inversely, the covariance is negative.**
### **∑(Xi-mean(X))(Yi-mean(y))/(n-1)**

## **Q4:-** 
### **For a dataset with the following categorical variables: Color (red, green, blue), Size (small, medium, large), and Material (wood, metal, plastic), perform label encoding using Python's scikit-learn library. Show your code and explain the output.**

### **Ans:-**

In [1]:
import pandas as pd
import numpy as np

In [2]:
df=pd.DataFrame({"Color" :["red","green","blue"],
    "Size":["small","medium","large"],
    "Material":["wood","metal","plastic"]})
df

Unnamed: 0,Color,Size,Material
0,red,small,wood
1,green,medium,metal
2,blue,large,plastic


In [3]:
from sklearn.preprocessing import LabelEncoder
Lencoder=LabelEncoder()
df["color_encoder_value"]=Lencoder.fit_transform(df["Color"])
df["Size_encoder_value"]=Lencoder.fit_transform(df["Size"])
df["Material_encoder_value"]=Lencoder.fit_transform(df["Material"])
df

Unnamed: 0,Color,Size,Material,color_encoder_value,Size_encoder_value,Material_encoder_value
0,red,small,wood,2,2,2
1,green,medium,metal,1,1,0
2,blue,large,plastic,0,0,1


## **Q5:-** 
### **Calculate the covariance matrix for the following variables in a dataset: Age, Income, and Education level. Interpret the results.**

### **Ans:-**

In [4]:
import pandas as pd
import numpy as np

### **Step:1**

### **Create dataset:-**

In [5]:
import pandas as pd
data = pd.DataFrame({
    "Education Level": ["High School", "Bachelor's", "Master's", "PhD", "Bachelor's", "Master's", "High School", "PhD"],
    "Income": ["Low", "Medium", "High", "High", "Medium", "High", "Low", "High"]
})
data

Unnamed: 0,Education Level,Income
0,High School,Low
1,Bachelor's,Medium
2,Master's,High
3,PhD,High
4,Bachelor's,Medium
5,Master's,High
6,High School,Low
7,PhD,High


### **Step 2: label Encoding :-**

In [6]:
from sklearn.preprocessing import LabelEncoder
encoder=LabelEncoder()

In [7]:
encoded_target = encoder.fit_transform(data["Income"])
encoded_target = encoder.fit_transform(data["Education Level"])
data["Income"] = encoded_target
data["Education Level"] = encoded_target
print(data)

   Education Level  Income
0                1       1
1                0       0
2                2       2
3                3       3
4                0       0
5                2       2
6                1       1
7                3       3


### **Step :3 covariance matrix :-**

In [8]:
data.cov()

Unnamed: 0,Education Level,Income
Education Level,1.428571,1.428571
Income,1.428571,1.428571


## **Q6:-**
### **You are working on a machine learning project with a dataset containing several categorical variables, including "Gender" (Male/Female), "Education Level" (High School/Bachelor's/Master's/PhD), and "Employment Status" (Unemployed/Part-Time/Full-Time). Which encoding method would you use for each variable, and why?**

### **Ans:-**

In [9]:
df=pd.DataFrame({"Gender":["Male","Female","Female","Female","Male","Female"], 
    "Education level":["High School","Bachelor","Master","PhD","Bachelor","phD"],
    "Employment Status":["Unemployed","Part-Time","Part-Time","Full-Time","Unemployed","Full-Time"]})
df

Unnamed: 0,Gender,Education level,Employment Status
0,Male,High School,Unemployed
1,Female,Bachelor,Part-Time
2,Female,Master,Part-Time
3,Female,PhD,Full-Time
4,Male,Bachelor,Unemployed
5,Female,phD,Full-Time


### **1.Gender (Binary Categorical Variable - Male/Female):**
#### **Binary Encoding: For a binary categorical variable like "Gender," you can use binary encoding. This method converts the categories into binary values (0/1), where one category is represented by 0, and the other is represented by 1. In this case, you can use 0 for "Male" and 1 for "Female."**

In [10]:
from sklearn.preprocessing import LabelBinarizer
encoder=LabelBinarizer()

In [11]:
encoder_gender=encoder.fit_transform(df["Gender"])
df["Encoded Gender Value"]=encoder_gender
df

Unnamed: 0,Gender,Education level,Employment Status,Encoded Gender Value
0,Male,High School,Unemployed,1
1,Female,Bachelor,Part-Time,0
2,Female,Master,Part-Time,0
3,Female,PhD,Full-Time,0
4,Male,Bachelor,Unemployed,1
5,Female,phD,Full-Time,0


### **Why Binary Encoding?**
#### **Binary encoding is efficient for binary categorical variables and simplifies the representation while capturing the information.**

### **2.Education Level (Ordinal Categorical Variable - High School/Bachelor's/Master's/PhD):**
#### **Label Encoding: Label encoding is suitable for ordinal categorical variables like "Education Level" when there is a clear order or hierarchy among the categories. In this case, "High School" is considered lower than "Bachelor's," and so on.**

In [12]:
from sklearn.preprocessing import LabelEncoder
encoder=LabelEncoder()
df["Education_level_encoder_value"]=encoder.fit_transform(df["Education level"])
df

Unnamed: 0,Gender,Education level,Employment Status,Encoded Gender Value,Education_level_encoder_value
0,Male,High School,Unemployed,1,1
1,Female,Bachelor,Part-Time,0,0
2,Female,Master,Part-Time,0,2
3,Female,PhD,Full-Time,0,3
4,Male,Bachelor,Unemployed,1,0
5,Female,phD,Full-Time,0,4


### **3.Employment Status (Nominal Categorical Variable - Unemployed/Part-Time/Full-Time):**
#### **One-Hot Encoding: For nominal categorical variables like "Employment Status," where there is no inherent order or hierarchy among the categories, one-hot encoding is a suitable choice. It creates binary columns for each category, with 1s and 0s indicating the presence or absence of each category.**

In [13]:
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(sparse=False)
encoded_employment = encoder.fit_transform(df[["Employment Status"]])

# Create a DataFrame with the encoded columns
encoded_df = pd.DataFrame(encoded_employment, columns=encoder.get_feature_names_out(["Employment Status"]))

# Concatenate the encoded DataFrame with the original DataFrame if needed
df = pd.concat([df,encoded_df], axis=1)
df



Unnamed: 0,Gender,Education level,Employment Status,Encoded Gender Value,Education_level_encoder_value,Employment Status_Full-Time,Employment Status_Part-Time,Employment Status_Unemployed
0,Male,High School,Unemployed,1,1,0.0,0.0,1.0
1,Female,Bachelor,Part-Time,0,0,0.0,1.0,0.0
2,Female,Master,Part-Time,0,2,0.0,1.0,0.0
3,Female,PhD,Full-Time,0,3,1.0,0.0,0.0
4,Male,Bachelor,Unemployed,1,0,0.0,0.0,1.0
5,Female,phD,Full-Time,0,4,1.0,0.0,0.0


## **Q7:-**  
### **You are analyzing a dataset with two continuous variables, "Temperature" and "Humidity", and two categorical variables, "Weather Condition" (Sunny/Cloudy/Rainy) and "Wind Direction" (North/South/ East/West). Calculate the covariance between each pair of variables and interpret the results.**

### **Ans:-**

In [14]:
import pandas as pd
import numpy as np
num_samples = 100
np.random.seed(0)
temperature = np.random.uniform(50, 100, num_samples)
humidity = np.random.uniform(20, 80, num_samples)
weather_conditions = np.random.choice(["Sunny", "Cloudy", "Rainy"], num_samples)
wind_direction = np.random.choice(["North", "South", "East", "West"], num_samples)
data = pd.DataFrame({
    "Temperature": temperature,
    "Humidity": humidity,
    "Weather Condition": weather_conditions,
    "Wind Direction": wind_direction
})
data

Unnamed: 0,Temperature,Humidity,Weather Condition,Wind Direction
0,77.440675,60.668992,Rainy,North
1,85.759468,36.200478,Rainy,West
2,80.138169,64.111641,Sunny,South
3,77.244159,77.731313,Cloudy,North
4,71.182740,34.925189,Cloudy,West
...,...,...,...,...
95,59.159568,49.427529,Cloudy,East
96,79.325647,33.644878,Sunny,North
97,51.005377,35.261389,Sunny,West
98,91.447001,23.481750,Sunny,South


In [15]:
covariance_temp_humidity = data['Temperature'].cov(data['Humidity'])

# Print the covariance result
print(f"Covariance between Temperature and Humidity: {covariance_temp_humidity:.2f}")

# Calculate covariance between Temperature and categorical variables
# (Note: Covariance with categorical variables is not meaningful but can be computed)
covariance_temp_weather = data['Temperature'].cov(data['Weather Condition'].astype('category').cat.codes)
covariance_temp_wind = data['Temperature'].cov(data['Wind Direction'].astype('category').cat.codes)

# Print the covariance results
print(f"Covariance between Temperature and Weather Condition: {covariance_temp_weather:.2f}")
print(f"Covariance between Temperature and Wind Direction: {covariance_temp_wind:.2f}")

# Calculate covariance between Humidity and categorical variables
# (Note: Covariance with categorica
covariance_humidity_weather = data['Humidity'].cov(data['Weather Condition'].astype('category').cat.codes)
covariance_humidity_wind = data['Humidity'].cov(data['Wind Direction'].astype('category').cat.codes)

# Print the covariance results
print(f"Covariance between Humidity and Weather Condition: {covariance_humidity_weather:.2f}")
print(f"Covariance between Humidity and Wind Direction: {covariance_humidity_wind:.2f}")

Covariance between Temperature and Humidity: -15.98
Covariance between Temperature and Weather Condition: 0.72
Covariance between Temperature and Wind Direction: -1.57
Covariance between Humidity and Weather Condition: 0.08
Covariance between Humidity and Wind Direction: 1.72
