# NLP Basics Assessment

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Ohtar10/icesi-nlp/blob/main/Sesion1/6-practice.ipynb)

En este notebook vamos a poner en práctica algunos de los conceptos vistos en los notebooks anteriores, aplicado a un corpus específico: 
[_An Occurrence at Owl Creek Bridge_](https://en.wikipedia.org/wiki/An_Occurrence_at_Owl_Creek_Bridge) por Ambrose Bierce (1890). Esta historia es de dominio público y el corpus fue obtenido de [Project Gutenberg](https://www.gutenberg.org/ebooks/375.txt.utf-8).

## Referencias
* [NLP - Natural Language Processing With Python](https://www.udemy.com/course/nlp-natural-language-processing-with-python)

In [32]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')

**1. Creamos el documento desde el archivo `owlcreek.txt`**<br>
> Pista: Usa `with open('./owlcreek.txt') as f:`

In [33]:
!test '{IN_COLAB}' = 'True' && wget  https://github.com/Ohtar10/icesi-nlp/raw/refs/heads/main/Sesion1/owlcreek.txt

In [34]:
with open('./minecraft.txt') as file:
    doc = nlp(file.read())

In [35]:
doc[:36]

Minecraft is a sandbox video game that gives players an unparalleled level of freedom to shape, explore, and survive in a world built entirely out of blocks. Originally created by Markus "Notch

El documento fue cargado exitosamente!

**2. Cuantos tokens hay en el archivo?**

In [36]:
len(doc)

758

**3. Cuantas oraciones hay en el archivo?**
<br>Pista: Necesitarás una lista primero

In [37]:
sentences = list(doc.sents)
len(sentences)

24

**4. Imprime la segunda oración del documento**
<br> Pista: Los índices comienzan en 0 y el título cuenta como la primera oración.

In [38]:
sentences[1]

Originally created by Markus "Notch" Persson and released officially by Mojang Studios in 2011, Minecraft has evolved into one of the best-selling and most influential games in history.

**5. Por cada token en la oración anterior, imprime su `text`, `POS` tag, `dep` tag y `lemma`**
<br>

In [39]:
print("{:20}{:20}{:20}{:20}".format("Text", "POS", "dep", "lemma"))
for token in sentences[1]:
    print(f"{token.text:{20}}{token.pos_:{20}}{token.dep_:{20}}{token.lemma_:{20}}")

Text                POS                 dep                 lemma               
Originally          ADV                 advmod              originally          
created             VERB                advcl               create              
by                  ADP                 agent               by                  
Markus              PROPN               nmod                Markus              
"                   PUNCT               punct               "                   
Notch               PROPN               nmod                Notch               
"                   PUNCT               punct               "                   
Persson             PROPN               pobj                Persson             
and                 CCONJ               cc                  and                 
released            VERB                conj                release             
officially          ADV                 advmod              officially          
by                  ADP     

**6. Implementa un matcher llamado *Swimming* que encuentre las ocurrencias de la frase *swimming vigorously* Write a matcher called 'Swimming' that finds**
<br>
Pista: Deberías incluir un patrón`'IS_SPACE': True` entre las dos palabras.

In [44]:
from spacy.matcher import Matcher

matcher = Matcher(nlp.vocab)
pattern = [{'LOWER': 'video'}, {'LOWER': 'game'}]
matcher.add("game", [pattern])


In [45]:
found_matches = matcher(doc)
found_matches




[(6371478730979445275, 4, 6), (6371478730979445275, 661, 663)]

**7. Imprime el texto al rededor de cada match encontrado**

In [52]:
start, end = found_matches[0][1:]
doc[start-2:end+13]

a sandbox video game that gives players an unparalleled level of freedom to shape, explore,

In [53]:
start, end = found_matches[1][1:]
doc[start-7:end+5]

recognizable worldwide. More than just a video game, Minecraft has become a

**8. Imprime la oración que contiene cada match encontrado**

In [54]:
for sentence in sentences:
    for _, start, end in found_matches:
        if sentence.start <= start and sentence.end >= end:
            print(sentence.text, '\n')

Minecraft is a sandbox video game that gives players an unparalleled level of freedom to shape, explore, and survive in a world built entirely out of blocks. 

More than just a video game, Minecraft has become a platform for expression, creativity, and community building. 

