# Playing with NLTK and Spacy

Using the following notebooks, the manual, and the help of your favourite LLM; play with NLTK and Spacy capabilities for text processing. Reflect on concepts we have seen at the class and other concepts that appear while you play around. Generate two code blocks, one for NLTK and other for Spacy.

Recommended notebook for a demo of NLTK: https://github.com/hb20007/hands-on-nltk-tutorial
Recommended notebook for a demo of Spacy: https://github.com/explosion/spacy-notebooks/blob/master/notebooks/lightning_tour.ipynb

In [17]:
import spacy 
import nltk 

In [18]:
# NLTK Playground

from __future__ import unicode_literals

In [19]:
# Spacy Playground

nlp = spacy.load("en_core_web_sm")
print("Modelo 'en_core_web_sm' cargado con éxito.")


Modelo 'en_core_web_sm' cargado con éxito.


In [20]:
# Celda 2: Procesar un texto
text = "Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(text)

# Mostrar para cada token: texto, lema, parte de la oración, dependencia y otras propiedades
for token in doc:
    print(f"Token: {token.text:12} | Lema: {token.lemma_:12} | POS: {token.pos_:6} | Dep: {token.dep_:10} | Is_alpha: {token.is_alpha} | Is_stop: {token.is_stop}")


Token: Apple        | Lema: Apple        | POS: PROPN  | Dep: nsubj      | Is_alpha: True | Is_stop: False
Token: is           | Lema: be           | POS: AUX    | Dep: aux        | Is_alpha: True | Is_stop: True
Token: looking      | Lema: look         | POS: VERB   | Dep: ROOT       | Is_alpha: True | Is_stop: False
Token: at           | Lema: at           | POS: ADP    | Dep: prep       | Is_alpha: True | Is_stop: True
Token: buying       | Lema: buy          | POS: VERB   | Dep: pcomp      | Is_alpha: True | Is_stop: False
Token: U.K.         | Lema: U.K.         | POS: PROPN  | Dep: nsubj      | Is_alpha: False | Is_stop: False
Token: startup      | Lema: startup      | POS: VERB   | Dep: ccomp      | Is_alpha: True | Is_stop: False
Token: for          | Lema: for          | POS: ADP    | Dep: prep       | Is_alpha: True | Is_stop: True
Token: $            | Lema: $            | POS: SYM    | Dep: quantmod   | Is_alpha: False | Is_stop: False
Token: 1            | Lema: 1         

In [21]:
# Celda 3: Mostrar entidades nombradas
print("Entidades nombradas encontradas:")
for ent in doc.ents:
    print(f"Texto: {ent.text:12} | Inicio: {ent.start_char:3} | Fin: {ent.end_char:3} | Etiqueta: {ent.label_}")


Entidades nombradas encontradas:
Texto: Apple        | Inicio:   0 | Fin:   5 | Etiqueta: ORG
Texto: U.K.         | Inicio:  27 | Fin:  31 | Etiqueta: GPE
Texto: $1 billion   | Inicio:  44 | Fin:  54 | Etiqueta: MONEY


In [22]:
# Celda 4: Visualizar dependencias con displaCy
from spacy import displacy

# Renderiza la visualización directamente en el notebook
displacy.render(doc, style="dep", jupyter=True)


In [23]:
# Celda 5: Ejemplo con Matcher
from spacy.matcher import Matcher

# Crear una instancia del matcher y definir un patrón
matcher = Matcher(nlp.vocab)
pattern = [{"LOWER": "apple"}]
matcher.add("APPLE_PATTERN", [pattern])

# Buscar coincidencias en el documento
matches = matcher(doc)
for match_id, start, end in matches:
    span = doc[start:end]
    print("Coincidencia encontrada:", span.text)


Coincidencia encontrada: Apple
