# **Neural Embedding**
São redes neurais utilizadas para gerar embeddings de palavras. Isso surgiu devido aos modelos simples, que tinham problemas de capturas de semântica e correlação entre as palavras. Novos modelos foram sendo gerados, porém a sua dimensionalidade era inviável. Os modelos neurais aprender a representação distribuídas das palavras, de uma forma controlada na dimensionalidade, ou seja, posso fazer um vetor menor e que seja menos esparso. O Word2Vec é um exemplo. Capturam muito bem a semântica, são os melhores.

### **Example**

In [1]:
C = ['The who is the band!', 'who is the band?', 'The band who plays the who.']

print('C has %d texts:' % len(C))
for i in range(len(C)):
    print(f"t{i+1} = {C[i]}")

C has 3 texts:
t1 = The who is the band!
t2 = who is the band?
t3 = The band who plays the who.


In [2]:
import re

def pre_process_corpus(corpus):
    new_corpus = [doc.lower() for doc in corpus]
    regex = r"(?<!\d)[\!\?.,;:-](?!\d)"
    return [re.sub(regex, "", doc, 0) for doc in new_corpus]

In [3]:
import sklearn
import pandas as pd
import nltk
import re
from sklearn.feature_extraction.text import CountVectorizer

corpus = pre_process_corpus(C)
print(corpus)

['the who is the band', 'who is the band', 'the band who plays the who']


In [7]:
import spacy

nlp = spacy.load('en_core_web_sm') # Modelo pre treinado, large language model

bag_vector = [nlp(sentence).vector for sentence in corpus]

print(bag_vector)

[array([ 0.17612581,  0.07790999, -0.17024486,  0.13084903, -0.25323072,
       -0.10850911,  0.4864773 ,  0.09325148,  0.18489957,  0.07242895,
        0.62086475,  0.46458608, -0.14795896,  0.3091182 , -0.5860945 ,
       -0.47349486, -0.37025276,  0.31795073, -0.11550923,  0.07351366,
       -0.05179209,  0.08320063,  0.3893171 , -0.71300715,  0.40280765,
       -0.36641473,  0.22252432,  0.156894  ,  0.2139632 ,  0.34614712,
       -0.42252207,  0.13657291,  0.3946168 , -0.6982438 , -0.2997318 ,
       -0.27903527,  0.14616129,  0.41428286,  0.02704452,  0.44372758,
        0.08663211,  0.22411232, -0.12779517,  0.2942869 , -0.27620453,
        0.22624454, -0.18871322,  0.99233377, -0.206545  , -0.07681147,
       -0.3358131 , -0.03941146,  0.13218947,  0.60052663, -0.1867938 ,
       -0.15506901, -0.207357  ,  1.0792536 , -0.33497724,  0.18722704,
        0.99995005,  0.03483916, -0.17905235, -0.3916667 , -0.07298835,
       -0.12924477,  0.03062646, -0.52140725,  0.7322411 , -0.3