# spaCy 3.x Tutorial: Transformers Spanish

**(C) 2021 by [Damir Cavar](http://damir.cavar.me/) <<dcavar@iu.edu>>**

**Version:** 1.1, Sept. 2021

**License:** [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/) ([CA BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/))

This is a tutorial about developing simple [spaCy](https://spacy.io/) based NLP components for Spanish.

This tutorial was developed as part of my course material for the course L715 Research Seminar on NLU, Knowledge Graphs, and GNNs at [Indiana University at Bloomington](https://www.indiana.edu/).

## NLP Pipeline for Spanish

In [1]:
import spacy

If you want to use the GPU in the following, use the following code. It will assign memory allocations to PyTorch, which will be more robust, preventing out-of-memory errors that might occur from competing memory pools (See [spaCy documentation](https://spacy.io/usage/embeddings-transformers)).

In [2]:
from thinc.api import set_gpu_allocator, require_gpu

set_gpu_allocator("pytorch")
require_gpu(0)
nlp = spacy.load("es_dep_news_trf")

In [3]:
for doc in nlp.pipe(["Este es un texto corto.", "Aquí hay algún otro texto."]):
    tokvecs = doc._.trf_data.tensors[-1]
    print(tokvecs)

[[-3.65571797e-01 -1.07624516e-01 -3.75636220e-01  1.49296507e-01
   2.07939029e-01  4.88051981e-01  3.35931361e-01 -9.16096687e-01
  -8.11563373e-01  3.77223343e-01 -9.96331930e-01 -2.59966522e-01
  -3.50860029e-01 -5.12584150e-01  5.53630054e-01  1.66127264e-01
   2.06725914e-02  9.80642617e-01  6.24353513e-02 -1.99831963e-01
   6.89916313e-03 -4.97184284e-02  8.94966483e-01  6.42851710e-01
   8.40666667e-02  9.79871869e-01 -9.38545287e-01 -3.20128888e-01
   6.22806311e-01 -8.75478745e-01  1.03287362e-01  9.96363044e-01
   1.96195722e-01  3.82119298e-01  4.98581856e-01  8.98100257e-01
  -9.44938138e-02  3.71071965e-01  9.48450565e-01 -3.04787070e-01
  -9.88952339e-01 -1.29752010e-01 -6.88679457e-01  6.00430369e-01
  -6.77243173e-01 -3.21298018e-02  9.15745497e-02  4.13944393e-01
   6.53509423e-02 -9.33000967e-02 -4.53986555e-01 -5.53809702e-01
  -1.58625498e-01  6.77313004e-03 -7.58789340e-03  9.88084733e-01
   7.22095370e-01 -8.49259198e-01  1.04436286e-01  9.61510599e-01
  -8.06320

In [6]:
for x in doc:
    print(x, x.lemma_, x.pos_, x.tag_, x.dep_, x.shape_, x.is_alpha, x.is_stop)

Aquí aquí ADV ADV advmod Xxxx True True
hay haber AUX AUX ROOT xxx True True
algún alguno DET DET det xxxx True True
otro otro DET DET det xxxx True True
texto texto NOUN NOUN obj xxxx True False
. . PUNCT PUNCT punct . False False


**(C) 2021 by [Damir Cavar](http://damir.cavar.me/) <<dcavar@iu.edu>>**