# Converting SpaCy Doc to CluDocument

To convert a SpaCy ```Doc``` object to a py-processors ```Document``` object (sometimes referred to as CluDocument) the method ```.to_clu_doc()``` is used. This method is found in the ```ConverterUtils``` class. The method accepts one paramater: a ```Doc``` object annotated through the minimal pipeline as described in \S Pipeline Requirements, and has one return: a ```Document``` (CluDocument) object. 

### Working Example

First, you need a SpaCy ```Doc``` object annotated with a full pipeline. We use ```en_core_web_sm```, feel free to use any pipeline you want (in any language).

In [None]:
import spacy

PIPELINE = 'en_core_web_sm'
nlp = spacy.load(PIPELINE)

text = "The huskies howled all night."
doc = nlp(text)

Now we have a SpaCy ```Doc``` object trained on the ```en_core_web_sm``` pipeline stored as ```doc```, with the text "The huskies howled all night." This is the same text used to create the example CluDoc found in the ```docs/examples``` directory of the ```processors-spacy``` repository, see there for expected output. 

Next, we must convert the ```Doc``` object to a ```Document``` object.

In [None]:
# Import the converter!
from processorspacy.utils import ConverterUtils as converter

cludoc = converter.to_clu_doc(doc)

Now we have our py-processor ```Document``` object stored as ```cludoc```. We can further JSON serialize this object and write to a .json file with the following code.

In [None]:
file_name = "cludoc.json" # Replace "cludoc" with desired path/filename

with open(file_name, 'w') as f:
    f.write(cludoc.to_JSON())

We now have a file ```cludoc.json``` which contains our JSON serialized ```Document``` object which has been converted from its original ```Doc``` instance. For more information on serializing ```Document``` objects, see [the py-processors documentation](https://py-processors.readthedocs.io/en/latest/example.html#serializing-to-from-json).

# Converting CluDocument to SpaCy Doc

To convert a py-processors ```Document``` object to a SpaCy ```Doc``` object the ```.to_spacy_doc()``` methods is used. This method is found in the ```ConverterUtils``` class. The method accepts two paramaters: a JSON serialized ```Document``` object passed as a path to the .json file and the name of a SpaCy pipeline (this pipeline must be preinstalled/trained), and has one return: a SpaCy ```Doc``` object. Future versions of ```processors-spacy``` may support direct ```Document``` objects.

### Working Example

For the purposes of this example we will use the example cludoc found at ```docs/examples/CluDoc1.json``` and the pipeline ```en_core_web_sm```.

In [None]:
# Import the converter!
from processorspacy.utils import ConverterUtils as converter

file_path = "app/docs/examples/cluDoc1.json"
PIPELINE = "en_core_web_sm"

spacydoc = converter.to_spacy_doc(file_path, PIPELINE)

If you wish to use an annotated text other than the example provided, you can follow the directions [here](https://py-processors.readthedocs.io/en/latest/example.html#running-the-nlp-server) to create an annotated ```Document``` object. Once created simply follow the instructions in the previous section for JSON serialization.