# Hands-On Example with IEncoder

This notebook demonstrates how to use the `IEncoder` library to encode categorical data for machine learning tasks.

## Installation and Setup
Before using `IEncoder`, ensure it is installed in your environment. Run the following command to install:


In [45]:
pip install iencoder



In [46]:
from iencoder import IEncoder

After installation, import the libraries:

In [47]:
import pandas as pd
import numpy as np

##  Loading and Preprocessing a Sample Dataset
For this example, we'll use a small dataset of categorical values. You can replace this with your own data later.

### Citation
Becker, B. & Kohavi, R. (1996). *Adult [Dataset]*. UCI Machine Learning Repository. [https://doi.org/10.24432/C5XW20](https://doi.org/10.24432/C5XW20).


In [48]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [49]:
data = pd.read_csv('path_to_the_dataset', header=None)

In [50]:
data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


In [51]:
categorical_columns = data.select_dtypes(include=['object'])
print("Total number of categorical columns: ", categorical_columns.shape[1])

Total number of categorical columns:  9


In [52]:
numerical_columns = data.select_dtypes(include=['int64'])
print("Total number of numerical columns: ", numerical_columns.shape[1])

Total number of numerical columns:  6


In [53]:
categories = categorical_columns.nunique().sum()
print("Total number of categories: ", categories)

Total number of categories:  104


In [54]:
data.dropna(inplace=True)

## One-Hot Encoding for Comparison

In [55]:
encoded_data = pd.get_dummies(data)
encoded_data.shape

(32561, 110)

In [56]:
data.shape

(32561, 15)

## Encoding Categorical Features

We will use `IEncoder` to transform the cleaned categorical data into a numerical representation. This is essential for using categorical data in machine learning models.

In [57]:
i_encoder = IEncoder(handle_unknown='ignore', num_of_decimal_places=3, target_column='<=50')

In [58]:
transformed = i_encoder.fit_transform(data)

In [59]:
transformed # data that is i encoded

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,39.0,4.887,77516.0,3.534,13.0,3.590,0.419,1.047,5.027,3.142,2174.0,0.0,40.0,5.834,0.000
1,50.0,4.189,83311.0,3.534,13.0,1.795,1.676,0.000,5.027,3.142,0.0,0.0,13.0,5.834,0.000
2,38.0,2.793,215646.0,4.320,9.0,0.000,2.513,1.047,5.027,3.142,0.0,0.0,40.0,5.834,0.000
3,53.0,2.793,234721.0,0.393,7.0,1.795,2.513,0.000,2.513,3.142,0.0,0.0,40.0,5.834,0.000
4,28.0,2.793,338409.0,3.534,13.0,1.795,4.189,5.236,2.513,0.000,0.0,0.0,40.0,0.748,0.000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32556,27.0,2.793,257302.0,2.749,12.0,1.795,5.445,5.236,5.027,0.000,0.0,0.0,38.0,5.834,0.000
32557,40.0,2.793,154374.0,4.320,9.0,1.795,2.932,0.000,5.027,3.142,0.0,0.0,40.0,5.834,3.142
32558,58.0,2.793,151910.0,4.320,9.0,5.386,0.419,4.189,5.027,0.000,0.0,0.0,40.0,5.834,0.000
32559,22.0,2.793,201490.0,4.320,9.0,3.590,0.419,3.142,5.027,3.142,0.0,0.0,20.0,5.834,0.000


## Decoding Data Back to Original Form


If needed, you can decode the encoded data back to its original categorical format.

In [60]:
inversed = i_encoder.inverse_transform(transformed)

In [61]:
pd.DataFrame(inversed)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,Without-pay,Masters,Widowed,Other-service,Wife,Other,Female,Ecuador,>50K,3.142,2174.0,0.0,40.0,5.834,0.0
1,Without-pay,HS-grad,Widowed,Other-service,Wife,Asian-Pac-Islander,Male,?,>50K,3.142,0.0,0.0,13.0,5.834,0.0
2,Without-pay,Assoc-acdm,Widowed,Prof-specialty,Wife,Amer-Indian-Eskimo,Male,Ecuador,>50K,3.142,0.0,0.0,40.0,5.834,0.0
3,Without-pay,Assoc-acdm,Widowed,Adm-clerical,Wife,Asian-Pac-Islander,Male,?,>50K,3.142,0.0,0.0,40.0,5.834,0.0
4,Without-pay,Assoc-acdm,Widowed,Other-service,Wife,Asian-Pac-Islander,Male,South,>50K,0.0,0.0,0.0,40.0,0.748,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32556,Without-pay,Assoc-acdm,Widowed,Machine-op-inspct,Wife,Asian-Pac-Islander,Male,South,>50K,0.0,0.0,0.0,38.0,5.834,0.0
32557,Without-pay,Assoc-acdm,Widowed,Prof-specialty,Wife,Asian-Pac-Islander,Male,?,>50K,3.142,0.0,0.0,40.0,5.834,3.142
32558,Without-pay,Assoc-acdm,Widowed,Prof-specialty,Wife,White,Female,Outlying-US(Guam-USVI-etc),>50K,0.0,0.0,0.0,40.0,5.834,0.0
32559,Without-pay,Assoc-acdm,Widowed,Prof-specialty,Wife,Other,Female,Ireland,>50K,3.142,0.0,0.0,20.0,5.834,0.0


## Conclusion



In this notebook, we demonstrated how to:
1. Preprocess categorical data.
2. Use `IEncoder` to encode and decode the data.
3. Prepare the data for machine learning tasks.

You can now experiment with your own datasets and explore other encoding methods provided by `IEncoder`.

## Additional Examples in `simple_tests.py` on GitHub
For additional examples and simple test cases using `IEncoder`, please refer to the `simple_tests.py` script available in the [GitHub repository](https://github.com/anezovic1/i-encoding). You can clone the repository and explore the file `simple_tests.py` for more hands-on examples.
