## __Text mining y Procesamiento de Lenguaje Natural (NLP)__

__Profesor__: Anthony D. Cho

__Tema__: Análisis sintáctico

__Método__: Full parsing 

***

__Dependencias__

```python
    python3 -m pip install nltk spacy
    python3 -m spacy download en_core_web_sm
    python3 -m spacy download es_core_news_sm
    
    python3 -m pip install svglib
```

[Lista de terminos (Ingles) de full parsing - Spacy](https://github.com/clir/clearnlp-guidelines/blob/master/md/specifications/dependency_labels.md)

[Glosario de término general - SpaCy](https://github.com/explosion/spaCy/blob/master/spacy/glossary.py)

[Glosario de término universal de relación-dependencia](https://universaldependencies.org/u/dep/)

### Librerias

In [None]:
from os import mkdir
from os.path import exists
from pathlib import Path

from svglib.svglib import svg2rlg
from reportlab.graphics import renderPDF

from spacy import load, displacy

## Cargar el modelo del lenguaje
nlp = load('es_core_news_sm')

### Declaración de funciones

In [None]:
def readText(filename, sep='.'):
    """
        DESCRIPTION:
            Read a document content
            
        INPUT:
            @param filename: plain text filename.
            @type filename: str
            
        OUTPUT:
            @param text: text splitted by a given pattern.
            @type text: str
    
    """
    
    ## Open plain text file
    file = open(file=filename, mode='r', encoding='utf-8')
    
    ## Read file content and split by "sep" pattern
    content = file.read().split(sep=sep)
    
    ## Close file
    file.close()
    
    ## Return splitted content
    return content


In [None]:
def createOutputFile(sentence_id, image, output_folder=''):
    """
        DESCRIPTION:
            Build full parsing graph of a sentence
        
        INPUT:
            @param sentence_id: the id of a sentence
            @type sentence_id: int
            
            @param image: 
            @type image: str
            
            @param output_folder: folder path for images storage. 
            @type output_folder: str
        
        OUTPUT:
            SVG Images
    """
        
    if len(output_folder):
        if not exists(output_folder):
            mkdir(output_folder)
        
        filename = f'{output_folder}/FP_File_{sentence_id:03}.svg'
    else:
        filename = f'FP_File_{sentence_id:03}.svg'
    
    ## Output file path
    output_path = Path(filename)
    
    ## Export image to a svg file
    output_path.open('w', encoding="utf-8").write(image)
    
    ## Export to pdf
    drawing = svg2rlg(filename)
    renderPDF.drawToFile(drawing, f'{filename[:-4]}.pdf')

## Ejecuciones

#### Test inicial

In [None]:
## Contenido del texto
texto = "El vuelo llegó sin problema"

## Aplicar modelo de lenaguaje al texto
documento = nlp(text=texto)

## Extraer dependencias a partir del modelo
dependencias = [word.dep_ for word in documento]
dependencias

In [None]:
## Mostrar en entorno jupyter
onJupyter = True
image = displacy.render(documento, style='dep', jupyter=onJupyter)

## Exportar imagen a SVG y PDF
if not onJupyter: 
    createOutputFile(sentence_id=100, image=image, output_folder='outputs')

#### Carga de información

In [None]:
filename = 'Data/texto_02.txt'
texto = readText(filename=filename)

print('Numero de oraciones: {}'.format(len(texto)))
texto

In [None]:
## Mostrar en entorno jupyter
onJupyter = True

for i, sentence in enumerate(texto):
    
    ## Aplicación del modelo de lenguaje al texto.
    document = nlp(sentence)
    
    ## Visualizar las dependencias o relaciones
    image = displacy.render(document, style='dep', jupyter=onJupyter)
    
    if not onJupyter:
        ## Exportar el esquema de dependencias-relaciones
        createOutputFile(i, image, 'outputs')
    
