# **CBOW (Continuous Bag of Words)**

Continuous Bag of Words" (CBOW) is a neural network model used in Natural Language Processing (NLP) to learn word embeddings, which are vector representations of words that capture their semantic and syntactic relationships, by predicting a target word based on its surrounding context words within a sentence

**Imports**

In [None]:
!pip install numpy gensim spacy sklearn

In [2]:
import numpy as np
import gensim
from gensim.models import Word2Vec
import spacy
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer



**Corpus**

In [3]:
documents = [
    "Text processing is an essential part of NLP.",
    "Word embeddings capture semantic meaning of words.",
    "Spacy and Gensim are popular NLP libraries."
]

**Bag of Words Representation**


In [4]:
vectorizer = CountVectorizer()
X_bow = vectorizer.fit_transform(documents)
print("Bag of Words Representation:\n", X_bow.toarray())

Bag of Words Representation:
 [[1 0 0 0 0 1 0 1 0 0 1 1 1 0 1 0 0 1 0 0]
 [0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 1 0 0 1 1]
 [0 1 1 0 0 0 1 0 1 0 1 0 0 1 0 0 1 0 0 0]]


**TF-IDF Representation**


In [5]:
tfidf_vectorizer = TfidfVectorizer()
X_tfidf = tfidf_vectorizer.fit_transform(documents)
print("TF-IDF Representation:\n", X_tfidf.toarray())

TF-IDF Representation:
 [[0.37380112 0.         0.         0.         0.         0.37380112
  0.         0.37380112 0.         0.         0.28428538 0.28428538
  0.37380112 0.         0.37380112 0.         0.         0.37380112
  0.         0.        ]
 [0.         0.         0.         0.38988801 0.38988801 0.
  0.         0.         0.         0.38988801 0.         0.29651988
  0.         0.         0.         0.38988801 0.         0.
  0.38988801 0.38988801]
 [0.         0.38988801 0.38988801 0.         0.         0.
  0.38988801 0.         0.38988801 0.         0.29651988 0.
  0.         0.38988801 0.         0.         0.38988801 0.
  0.         0.        ]]


**Word2Vec Embeddings**


In [6]:
sentences = [doc.lower().split() for doc in documents]
word2vec_model = Word2Vec(sentences, vector_size=10, window=5, min_count=1, workers=2)
print("Word2Vec Embedding for 'text':\n", word2vec_model.wv['text'])

Word2Vec Embedding for 'text':
 [ 0.00094564  0.0307732  -0.06812645 -0.01375465  0.07668581  0.0734641
 -0.03673297  0.02642702 -0.08317129  0.06205486]


**Spacy Word Embeddings**


In [7]:
nlp = spacy.load("en_core_web_sm")
doc = nlp("Text processing is an essential part of NLP.")
print("Spacy Word Embedding for 'text':\n", doc[0].vector)

Spacy Word Embedding for 'text':
 [-0.3143282  -1.4313254   0.97116375  1.1657453  -0.78642035 -0.46428394
  1.2019652   1.3568258  -0.2798018  -1.2564946   0.4840206  -0.5896061
 -0.06185064 -0.4479941  -0.5028136   0.49722886 -0.63204694 -0.22040193
  0.02757305  0.4600879   0.28234607  0.46573967 -0.38013345 -0.14335965
  0.8882024  -0.05255298 -0.17257833  0.57981324 -0.14689963  0.439454
 -0.19292976 -0.11136664  0.00469905 -0.86727947 -0.04117572 -0.6702051
 -0.7805647  -0.14242868 -0.11098792  1.1332091  -0.54676765  0.81636554
 -0.5482671   0.22478546  0.32625276  0.06537245 -0.3599271   0.68783486
  0.5495992  -0.7473009   0.47752792  1.0430102   1.3414547  -0.71042055
  0.54571635 -0.7700973  -0.12274662 -0.12977678  0.02895647 -0.37396997
 -0.93757653  0.5367556  -0.43098864 -1.2304637  -0.05791636 -1.0462031
  1.0053499   0.53844917  0.7979429  -1.0634456   0.00697577  1.0017843
 -0.0815018  -0.98491645  0.39395735 -1.1745528  -0.41442788 -0.83574855
  0.98226875 -0.6173926