In [None]:
Q1: Difference Between Ordinal Encoding and Label Encoding

Ordinal Encoding:
    Definition: Ordinal encoding assigns unique integer values to categories in a way that preserves
the ordinal nature of the data.
Use Case: Suitable for ordinal data where the categories have a meaningful order or rank (e.g., "Low," "Medium," "High").

Label Encoding:
    Definition: Label encoding assigns unique integer values to categories without considering any order.
    Use Case: Suitable for nominal data where there is no inherent order among the categories (e.g., "Cat," "Dog," "Bird").
    Example:

import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Ordinal Encoding Example
data_ordinal = {'Size': ['Small', 'Medium', 'Large', 'Medium', 'Small']}
df_ordinal = pd.DataFrame(data_ordinal)
size_mapping = {'Small': 1, 'Medium': 2, 'Large': 3}
df_ordinal['Size_Encoded'] = df_ordinal['Size'].map(size_mapping)

# Label Encoding Example
data_nominal = {'Animal': ['Cat', 'Dog', 'Bird', 'Dog', 'Cat']}
df_nominal = pd.DataFrame(data_nominal)
label_encoder = LabelEncoder()
df_nominal['Animal_Encoded'] = label_encoder.fit_transform(df_nominal['Animal'])

print(df_ordinal)
print(df_nominal)


Q2: Target Guided Ordinal Encoding

Target Guided Ordinal Encoding:
    Definition: This method orders the categories based on the mean of the target variable and assigns ordinal values accordingly.
    Use Case: Useful in supervised learning when there is a clear relationship between the categorical feature and the target variable.
    
    Example:
import pandas as pd

# Example Data
data = {'City': ['A', 'B', 'C', 'A', 'B', 'C'], 'Target': [10, 20, 30, 10, 20, 40]}
df = pd.DataFrame(data)

# Calculate mean target value for each category
mean_target = df.groupby('City')['Target'].mean().sort_values()
ordinal_mapping = {k: i for i, k in enumerate(mean_target.index, 1)}

# Apply mapping
df['City_Encoded'] = df['City'].map(ordinal_mapping)
print(df)


Q3: Covariance 

Definition:
    Covariance measures the directional relationship between two random variables.
    Importance: It helps in understanding how two variables change together.
    Positive covariance indicates that the variables tend to increase together, 
    while negative covariance indicates one variable tends to increase when the other decreases.
    
Calculation:
    
Covariance between variables X and Y is calculated as:
    
Cov(X,Y)=1/(n−1)∑i=1 n (Xi−Xˉ)(Yi−Yˉ)

Q4:
Label Encoding using scikit-learn

import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Example Data
data = {'Color': ['red', 'green', 'blue', 'green', 'red'],
        'Size': ['small', 'medium', 'large', 'medium', 'small'],
        'Material': ['wood', 'metal', 'plastic', 'metal', 'wood']}
df = pd.DataFrame(data)

# Apply Label Encoding
label_encoder = LabelEncoder()

for column in df.columns:
    df[column + '_Encoded'] = label_encoder.fit_transform(df[column])

print(df)

Q5: Covariance Matrix

import numpy as np
import pandas as pd

# Example Data
data = {'Age': [23, 45, 31, 35, 62],
        'Income': [50000, 64000, 58000, 60000, 72000],
        'Education Level': [12, 16, 14, 15, 18]}
df = pd.DataFrame(data)

# Calculate Covariance Matrix
cov_matrix = df.cov()
print(cov_matrix)


Q6: Encoding Methods for Categorical Variables

Encoding Methods:Gender (Male/Female): 
One-Hot Encoding 
Education Level (High School/Bachelor's/Master's/PhD): Ordinal EncodingEmployment Status (Unemployed/Part-Time/Full-Time): One-Hot EncodingpythonCopy codeimport pandas as pd
from sklearn.preprocessing import OneHotEncoder

# Example Data
data = {'Gender': ['Male', 'Female', 'Female', 'Male'],
        'Education Level': ['Bachelor', 'Master', 'PhD', 'High School'],
        'Employment Status': ['Full-Time', 'Part-Time', 'Unemployed', 'Full-Time']}
df = pd.DataFrame(data)

# Apply One-Hot Encoding for Gender and Employment Status
one_hot_encoder = OneHotEncoder()
gender_encoded = one_hot_encoder.fit_transform(df[['Gender']]).toarray()
employment_encoded = one_hot_encoder.fit_transform(df[['Employment Status']]).toarray()

# Apply Ordinal Encoding for Education Level
education_mapping = {'High School': 1, 'Bachelor': 2, 'Master': 3, 'PhD': 4}
df['Education_Level_Encoded'] = df['Education Level'].map(education_mapping)

# Combine Encoded Data
encoded_df = pd.concat([df, pd.DataFrame(gender_encoded, columns=one_hot_encoder.get_feature_names_out(['Gender'])),
                        pd.DataFrame(employment_encoded, columns=one_hot_encoder.get_feature_names_out(['Employment Status']))], axis=1)

print(encoded_df)
Q7: Covariance Calculation and InterpretationTo calculate covariance between continuous and categorical variables, one typically uses statistical tests or methods suitable for such data combinations (e.g., ANOVA). Direct covariance calculations are more meaningful between pairs of continuous variables.pythonCopy codeimport numpy as np
import pandas as pd

# Example Data
data = {'Temperature': [20, 22, 21, 19, 18],
        'Humidity': [30, 45, 35, 40, 50],
        'Weather Condition': ['Sunny', 'Cloudy', 'Rainy', 'Sunny', 'Rainy'],
        'Wind Direction': ['North', 'South', 'East', 'West', 'North']}
df = pd.DataFrame(data)

# Calculate Covariance Matrix for Continuous Variables
cov_matrix = df[['Temperature', 'Humidity']].cov()
print(cov_matrix)
For categorical variables, use appropriate statistical tests (Chi-square, ANOVA) instead of covariance.