[Scikit Learn](https://scikit-learn.org/)

In [1]:
from IPython.display import Markdown, display

In [2]:
display(Markdown("# One-Hot Encoding with Scikit-Learn"))
display(Markdown("In this notebook, we demonstrate how to use `OneHotEncoder` from Scikit-Learn to convert a list of words into one-hot encoded vectors."))

# One-Hot Encoding with Scikit-Learn

In this notebook, we demonstrate how to use `OneHotEncoder` from Scikit-Learn to convert a list of words into one-hot encoded vectors.

In [3]:
display(Markdown("## Step 1: Import Required Libraries"))
from sklearn.preprocessing import OneHotEncoder
import numpy as np

## Step 1: Import Required Libraries

In [4]:
display(Markdown("## Step 2: Define Vocabulary"))
vocabulary = ['Natural', 'Language', 'Processing', 'for', 'Text', 'and', 'Speech']

## Step 2: Define Vocabulary

In [5]:
display(Markdown("## Step 3: Reshape Vocabulary"))
vocabulary_reshaped = np.array(vocabulary).reshape(-1, 1)
vocabulary_reshaped

## Step 3: Reshape Vocabulary

array([['Natural'],
       ['Language'],
       ['Processing'],
       ['for'],
       ['Text'],
       ['and'],
       ['Speech']], dtype='<U10')

In [6]:
display(Markdown("## Step 4: Initialize and Fit OneHotEncoder"))
one_hot_encoder = OneHotEncoder()
one_hot_encoded = one_hot_encoder.fit_transform(vocabulary_reshaped)

## Step 4: Initialize and Fit OneHotEncoder

In [8]:
display(Markdown("## Step 5: View One-Hot Encoded Vectors"))
one_hot_encoded

## Step 5: View One-Hot Encoded Vectors

<Compressed Sparse Row sparse matrix of dtype 'float64'
	with 7 stored elements and shape (7, 7)>

In [9]:
display(Markdown("## Step 6: Display Word-Vector Pairs"))
for word, vector in zip(vocabulary, one_hot_encoded):
    print(word, vector)

## Step 6: Display Word-Vector Pairs

Natural <Compressed Sparse Row sparse matrix of dtype 'float64'
	with 1 stored elements and shape (1, 7)>
  Coords	Values
  (0, 1)	1.0
Language <Compressed Sparse Row sparse matrix of dtype 'float64'
	with 1 stored elements and shape (1, 7)>
  Coords	Values
  (0, 0)	1.0
Processing <Compressed Sparse Row sparse matrix of dtype 'float64'
	with 1 stored elements and shape (1, 7)>
  Coords	Values
  (0, 2)	1.0
for <Compressed Sparse Row sparse matrix of dtype 'float64'
	with 1 stored elements and shape (1, 7)>
  Coords	Values
  (0, 6)	1.0
Text <Compressed Sparse Row sparse matrix of dtype 'float64'
	with 1 stored elements and shape (1, 7)>
  Coords	Values
  (0, 4)	1.0
and <Compressed Sparse Row sparse matrix of dtype 'float64'
	with 1 stored elements and shape (1, 7)>
  Coords	Values
  (0, 5)	1.0
Speech <Compressed Sparse Row sparse matrix of dtype 'float64'
	with 1 stored elements and shape (1, 7)>
  Coords	Values
  (0, 3)	1.0
