# BWT - Deep Learning Track
## Task#31: Applying One-Hot Encoding and Word Embeddings
### Adil Mubashir Chaudhry

#### One-Hot Encoding:
One-hot encoding is a simple method to represent categorical data, including words in text. In this technique, each word is represented as a binary vector, where the length of the vector is equal to the vocabulary size. The vector has a value of 1 at the index corresponding to the word's position in the vocabulary, and 0 everywhere else. For example, if we have a vocabulary of 10 words, the word "apple" would be represented as [0, 0, 0, 0, 0, 0, 0, 0, 1, 0].

One-hot encoding has some limitations. It creates high-dimensional sparse vectors, making it computationally expensive and inefficient for large vocabularies. Additionally, it does not capture any semantic relationships between words.

#### Word Embeddings:
Word embeddings are dense vector representations that capture the semantic meaning and relationships between words. Unlike one-hot encoding, word embeddings are learned from data using techniques like Word2Vec, GloVe, or BERT. These models analyze large text corpora and learn to represent words in a continuous vector space, where similar words are closer to each other in the space.

Word embeddings have several advantages. They capture semantic relationships, allowing for meaningful arithmetic operations such as "king - man + woman = queen." They also handle out-of-vocabulary words by providing meaningful representations based on their context. Furthermore, word embeddings reduce the dimensionality of the representation, making them more computationally efficient compared to one-hot encoding.

Word embeddings have become a standard approach in natural language processing tasks such as text classification, named entity recognition, machine translation, and sentiment analysis, as they enable models to effectively understand and process textual data by leveraging semantic information encoded in the vectors.

Below is a simple example of how word embeddings and one hot encoding can be applied to a deep learning model

In [9]:
import numpy as np
from keras.datasets import reuters
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.utils import to_categorical

In [10]:
max_features = 10000

(X_train, y_train), (X_test, y_test) = reuters.load_data(num_words=max_features)

max_len = 100
X_train = sequence.pad_sequences(X_train, maxlen=max_len)
X_test = sequence.pad_sequences(X_test, maxlen=max_len)

num_classes = np.max(y_train) + 1
y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)

embedding_size = 100
model = Sequential()
model.add(Embedding(max_features, embedding_size, input_length=max_len))
model.add(LSTM(64))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=3, batch_size=64)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x1e591de4040>