### Binary Encoding :
  is a technique used to encode categorical data into binary numbers. It's particularly useful for high-cardinality categorical variables (variables with many unique categories) because it efficiently reduces the dimensionality of the encoded data compared to one-hot encoding.

### How Binary Encoding Works:
1 Convert Category to Numerical Representation: Each category is assigned a unique integer value.

2 Convert Integer to Binary: The integer representation is then converted into its binary equivalent.

3 Split Binary Digits into Columns: Each binary digit becomes a separate column in the encoded dataset.

In [10]:
# first install category_encoders using " !pip install category_encoders"

In [8]:
import pandas as pd
import numpy as np
from category_encoders import BinaryEncoder

In [12]:
data = {'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago', 'Houston']}
df=pd.DataFrame(data)
df

Unnamed: 0,City
0,New York
1,San Francisco
2,Los Angeles
3,Chicago
4,Houston


In [14]:
encoder=BinaryEncoder(df['City'])
encoded_df=encoder.fit_transform(df)
df_encoded=pd.concat([df,encoded_df],axis=1)
df_encoded

Unnamed: 0,City,City_0,City_1,City_2
0,New York,0,0,1
1,San Francisco,0,1,0
2,Los Angeles,0,1,1
3,Chicago,1,0,0
4,Houston,1,0,1


### When to Use Binary Encoding:
1 High-cardinality categorical variables (e.g., Zip Codes, City Names).
2 When reducing memory usage and avoiding high-dimensional datasets.
3 When you need a balance between preserving ordinal relationships and efficient encoding.

In [18]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder

data = pd.DataFrame({
    'Category': ['A', 'B', 'C', 'A', 'B', 'C', 'D']
})

print("Original Data:")
print(data)

# Step 1: Use LabelEncoder to convert categories into integers
label_encoder = LabelEncoder()
data['Category_Encoded'] = label_encoder.fit_transform(data['Category'])

# Step 2: Convert integers into binary format and pad to ensure equal length
binary_encoded = data['Category_Encoded'].apply(lambda x: format(x, '03b'))  # 3 bits for up to 7 categories

# Step 3: Split binary values into separate columns
binary_columns = binary_encoded.apply(lambda x: pd.Series(list(x))).astype(int)
binary_columns.columns = [f'Binary_{i+1}' for i in range(binary_columns.shape[1])]

# Concatenate the binary columns with the original DataFrame
data = pd.concat([data, binary_columns], axis=1)

print("\nBinary Encoded Data:")
print(data)


Original Data:
  Category
0        A
1        B
2        C
3        A
4        B
5        C
6        D

Binary Encoded Data:
  Category  Category_Encoded  Binary_1  Binary_2  Binary_3
0        A                 0         0         0         0
1        B                 1         0         0         1
2        C                 2         0         1         0
3        A                 0         0         0         0
4        B                 1         0         0         1
5        C                 2         0         1         0
6        D                 3         0         1         1


## Example 3

In [35]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
from category_encoders import BinaryEncoder


data = pd.DataFrame({
    'Product_ID': [f'P{i}' for i in range(1, 101)],  # 100 unique products
    'Price': np.random.randint(10, 1000, 100),
    'Quantity': np.random.randint(1, 20, 100),
    'Purchase_Status': np.random.randint(0, 2, 100)  # Target variable (0 or 1)
})

print("Sample Data:")
print(data.head())


Sample Data:
  Product_ID  Price  Quantity  Purchase_Status
0         P1    853        11                1
1         P2    603        16                0
2         P3    490         7                1
3         P4    566        17                1
4         P5    898         9                1


In [41]:
X = data[['Product_ID', 'Price', 'Quantity']]
y = data['Purchase_Status']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)




In [43]:
encoder = BinaryEncoder(cols=['Product_ID'])
X_train_encoded = encoder.fit_transform(X_train)
X_test_encoded = encoder.transform(X_test)
print("\nBinary Encoded Feature Example (Train Set):")
print(X_train_encoded.head())


Binary Encoded Feature Example (Train Set):
    Product_ID_0  Product_ID_1  Product_ID_2  Product_ID_3  Product_ID_4  \
55             0             0             0             0             0   
88             0             0             0             0             0   
26             0             0             0             0             0   
42             0             0             0             0             1   
69             0             0             0             0             1   

    Product_ID_5  Product_ID_6  Price  Quantity  
55             0             1    188        19  
88             1             0    667         1  
26             1             1    393         9  
42             0             0    330        19  
69             0             1    844        17  


In [31]:
model = RandomForestClassifier(random_state=42)
model.fit(X_train_encoded, y_train)
y_pred = model.predict(X_test_encoded)


In [33]:
print("\nClassification Report:")
print(classification_report(y_test, y_pred))



Classification Report:
              precision    recall  f1-score   support

           0       0.22      0.25      0.24         8
           1       0.45      0.42      0.43        12

    accuracy                           0.35        20
   macro avg       0.34      0.33      0.34        20
weighted avg       0.36      0.35      0.35        20

