![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/SparkNLP_Reader2Doc_Demo.ipynb)

# Introducing Reader2Doc in SparkNLP
This notebook showcases the newly added `Reader2Doc` annotator in Spark NLP
providing a streamlined and user-friendly interface for reading files. Useful for preprocessing data for NLP pipelines

In [6]:
! wget -q http://setup.johnsnowlabs.com/colab.sh -O - | bash

In [7]:
import sparknlp
# let's start Spark with Spark NLP
spark = sparknlp.start()

print("Apache Spark version: {}".format(spark.version))

Apache Spark version: 3.5.1


## Setup and Initialization
Let's keep in mind a few things before we start 😊

Support for **Reader2Doc** was introduced in Spark NLP 6.1.0 Please make sure you have upgraded to the latest Spark NLP release.

- Let's install and setup Spark NLP in Google Colab. This part is pretty easy via our simple script

In [8]:
!wget -q http://setup.johnsnowlabs.com/colab.sh -O - | bash

For local files example we will download different files from Spark NLP Github repo:

The output of Reader2Doc uses the same Annotation schema as other Spark NLP annotators. This means you can seamlessly integrate it into any Spark NLP pipeline or process that expects annotated data.

In [9]:
from sparknlp.reader.reader2doc import Reader2Doc
from pyspark.ml import Pipeline

empty_df = spark.createDataFrame([], "string").toDF("text")

In [10]:
base_url = "https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/src/test/resources/reader"

In [11]:
!mkdir all-files

## Reading PDF Documents

In [12]:
!mkdir all-files/pdf-files

**Downloading PDF files**

In [13]:
!wget "{base_url}/pdf/image_3_pages.pdf" -P all-files/pdf-files
!wget "{base_url}/pdf/pdf-title.pdf" -P all-files/pdf-files
!wget "{base_url}/pdf/text_3_pages.pdf" -P all-files/pdf-files

--2025-08-20 12:17:10--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/src/test/resources/reader/pdf/image_3_pages.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 15629 (15K) [application/octet-stream]
Saving to: ‘all-files/pdf-files/image_3_pages.pdf’


2025-08-20 12:17:10 (10.3 MB/s) - ‘all-files/pdf-files/image_3_pages.pdf’ saved [15629/15629]

--2025-08-20 12:17:10--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/src/test/resources/reader/pdf/pdf-title.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response..

In [14]:
reader2doc = Reader2Doc() \
    .setContentType("application/pdf") \
    .setContentPath("./all-files/pdf-files") \
    .setOutputCol("document")

pipeline = Pipeline(stages=[reader2doc])
model = pipeline.fit(empty_df)

result_df = model.transform(empty_df)
result_df.show(truncate=False)

+-----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|fileName         |document                                                                                                                                                                                                                                                                                     |
+-----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|image_3_pages.pdf|[]                                                             

## Reading HTML Documents

In [15]:
!mkdir all-files/html-files

**Downloading HTML files**

In [16]:
!wget "{base_url}/html/example-10k.html" -P all-files/html-files
!wget "{base_url}/html/fake-html.html" -P all-files/html-files

--2025-08-20 12:17:18--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/src/test/resources/reader/html/example-10k.html
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2456707 (2.3M) [text/plain]
Saving to: ‘all-files/html-files/example-10k.html’


2025-08-20 12:17:18 (31.6 MB/s) - ‘all-files/html-files/example-10k.html’ saved [2456707/2456707]

--2025-08-20 12:17:18--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/src/test/resources/reader/html/fake-html.html
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200

In [17]:
reader2doc = Reader2Doc() \
    .setContentType("text/html") \
    .setContentPath("./all-files/html-files") \
    .setOutputCol("document")

pipeline = Pipeline(stages=[reader2doc])
model = pipeline.fit(empty_df)

result_df = model.transform(empty_df)
result_df.show(truncate=False)

+----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

## Reading MS Office Documents

### Reading Word Files

In [18]:
!mkdir all-files/word-files

**Downloading Word files**

In [19]:
!wget "{base_url}/doc/contains-pictures.docx" -P all-files/word-files
!wget "{base_url}/doc/fake_table.docx" -P all-files/word-files
!wget "{base_url}/doc/page-breaks.docx" -P all-files/word-files

--2025-08-20 12:17:20--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/src/test/resources/reader/doc/contains-pictures.docx
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 95087 (93K) [application/octet-stream]
Saving to: ‘all-files/word-files/contains-pictures.docx’


2025-08-20 12:17:20 (3.96 MB/s) - ‘all-files/word-files/contains-pictures.docx’ saved [95087/95087]

--2025-08-20 12:17:21--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/src/test/resources/reader/doc/fake_table.docx
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, 

In [20]:
reader2doc = Reader2Doc() \
    .setContentType("application/msword") \
    .setContentPath("./all-files/word-files") \
    .setOutputCol("document")

pipeline = Pipeline(stages=[reader2doc])
model = pipeline.fit(empty_df)

result_df = model.transform(empty_df)
result_df.show(truncate=False)

+----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|fileName              |

### Reading PowerPoint Files

In [21]:
!mkdir all-files/ppt-files

**Downloading PowerPoint files**

In [22]:
!wget "{base_url}/ppt/fake-power-point.pptx" -P all-files/ppt-files
!wget "{base_url}/ppt/fake-power-point-table.pptx" -P all-files/ppt-files
!wget "{base_url}/ppt/speaker-notes.pptx" -P all-files/ppt-files

--2025-08-20 12:17:23--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/src/test/resources/reader/ppt/fake-power-point.pptx
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 38412 (38K) [application/octet-stream]
Saving to: ‘all-files/ppt-files/fake-power-point.pptx’


2025-08-20 12:17:23 (3.16 MB/s) - ‘all-files/ppt-files/fake-power-point.pptx’ saved [38412/38412]

--2025-08-20 12:17:23--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/src/test/resources/reader/ppt/fake-power-point-table.pptx
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request

In [23]:
reader2doc = Reader2Doc() \
    .setContentType("application/vnd.ms-powerpoint") \
    .setContentPath("./all-files/ppt-files") \
    .setOutputCol("document")

pipeline = Pipeline(stages=[reader2doc])
model = pipeline.fit(empty_df)

result_df = model.transform(empty_df)
result_df.show(truncate=False)

+---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

### Reading Excel Files

In [24]:
!mkdir all-files/xls-files

**Downloading Excel files**

In [25]:
!wget "{base_url}/xls/vodafone.xlsx" -P all-files/xls-files
!wget "{base_url}/xls/2023-half-year-analyses-by-segment.xlsx" -P all-files/xls-files
!wget "{base_url}/xls/page-break-example.xlsx" -P all-files/xls-files

--2025-08-20 12:17:25--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/src/test/resources/reader/xls/vodafone.xlsx
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12541 (12K) [application/octet-stream]
Saving to: ‘all-files/xls-files/vodafone.xlsx’


2025-08-20 12:17:25 (18.8 MB/s) - ‘all-files/xls-files/vodafone.xlsx’ saved [12541/12541]

--2025-08-20 12:17:25--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/src/test/resources/reader/xls/2023-half-year-analyses-by-segment.xlsx
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, await

In [26]:
reader2doc = Reader2Doc() \
    .setContentType("application/vnd.ms-excel") \
    .setContentPath("./all-files/xls-files") \
    .setOutputCol("document")

pipeline = Pipeline(stages=[reader2doc])
model = pipeline.fit(empty_df)

result_df = model.transform(empty_df)
result_df.show(truncate=False)

+---------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

## Reading Text Documents

In [27]:
!mkdir all-files/text-files

**Downloading Text files**

In [28]:
!wget "{base_url}/txt/simple-text.txt" -P all-files/txt-files

--2025-08-20 12:17:27--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/src/test/resources/reader/txt/simple-text.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 300 [text/plain]
Saving to: ‘all-files/txt-files/simple-text.txt’


2025-08-20 12:17:28 (5.81 MB/s) - ‘all-files/txt-files/simple-text.txt’ saved [300/300]



In [29]:
reader2doc = Reader2Doc() \
    .setContentType("text/plain") \
    .setContentPath("./all-files/txt-files") \
    .setOutputCol("document")

pipeline = Pipeline(stages=[reader2doc])
model = pipeline.fit(empty_df)

result_df = model.transform(empty_df)
result_df.show(truncate=False)

+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|fileName       |document                                                                                                                                                                                                                                                                                                                             

## Reading XML Documents

In [30]:
!mkdir all-files/xml-files

**Downloading XML files**

In [31]:
!wget "{base_url}/xml/multi-level.xml" -P all-files/xml-files

--2025-08-20 12:17:28--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/src/test/resources/reader/xml/multi-level.xml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 538 [text/plain]
Saving to: ‘all-files/xml-files/multi-level.xml’


2025-08-20 12:17:28 (31.1 MB/s) - ‘all-files/xml-files/multi-level.xml’ saved [538/538]



In [32]:
reader2doc = Reader2Doc() \
    .setContentType("application/xml") \
    .setContentPath("./all-files/xml-files") \
    .setOutputCol("document")

pipeline = Pipeline(stages=[reader2doc])
model = pipeline.fit(empty_df)

result_df = model.transform(empty_df)
result_df.show(truncate=False)

+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|fileName       |document    

## Reading Markdown Documents

In [33]:
!mkdir md-files

**Downloading Markdown files**

In [34]:
!wget "{base_url}/md/simple.md" -P all-files/md-files

--2025-08-20 12:17:29--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/src/test/resources/reader/md/simple.md
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 184 [text/plain]
Saving to: ‘all-files/md-files/simple.md’


2025-08-20 12:17:29 (3.89 MB/s) - ‘all-files/md-files/simple.md’ saved [184/184]



In [35]:
reader2doc = Reader2Doc() \
    .setContentType("text/markdown") \
    .setContentPath("./all-files/md-files") \
    .setOutputCol("document")

pipeline = Pipeline(stages=[reader2doc])
model = pipeline.fit(empty_df)

result_df = model.transform(empty_df)
result_df.show(truncate=False)

+---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|fileName |document                                                                                                                                                                                                                

## Reading Email Documents

In [36]:
!mkdir all-files/email-files

**Downloading Email files**

In [37]:
!wget "{base_url}/email/email-text-attachments.eml" -P all-files/email-files
!wget "{base_url}/email/test-several-attachments.eml" -P all-files/email-files

--2025-08-20 12:17:30--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/src/test/resources/reader/email/email-text-attachments.eml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3175 (3.1K) [text/plain]
Saving to: ‘all-files/email-files/email-text-attachments.eml’


2025-08-20 12:17:30 (43.8 MB/s) - ‘all-files/email-files/email-text-attachments.eml’ saved [3175/3175]

--2025-08-20 12:17:30--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/src/test/resources/reader/email/test-several-attachments.eml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP

In [38]:
reader2doc = Reader2Doc() \
    .setContentType("message/rfc822") \
    .setContentPath("./all-files/email-files") \
    .setOutputCol("document")

pipeline = Pipeline(stages=[reader2doc])
model = pipeline.fit(empty_df)

result_df = model.transform(empty_df)
result_df.show(truncate=False)

+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

## Reading Mixed Documents

We can send a directory with a different file types.

In [39]:
reader2doc = Reader2Doc() \
    .setContentPath("./all-files") \
    .setOutputCol("document") \
    .setExplodeDocs(False)

pipeline = Pipeline(stages=[reader2doc])
model = pipeline.fit(empty_df)

result_df = model.transform(empty_df)
result_df.show(truncate=False)

+---------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

## Parameters

We can output one document per row by setting `explodeDocs` to `false`

In [40]:
reader2doc = Reader2Doc() \
    .setContentType("message/rfc822") \
    .setContentPath("./all-files/email-files") \
    .setOutputCol("document") \
    .setExplodeDocs(False)

pipeline = Pipeline(stages=[reader2doc])
model = pipeline.fit(empty_df)

result_df = model.transform(empty_df)
result_df.show(truncate=False)

+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

We can output plain text with minimal metadata by setting `flattenOutput` to `true`

In [41]:
reader2doc = Reader2Doc() \
    .setContentType("text/html") \
    .setContentPath("./all-files/html-files") \
    .setOutputCol("document") \
    .setFlattenOutput(True)

pipeline = Pipeline(stages=[reader2doc])
model = pipeline.fit(empty_df)

result_df = model.transform(empty_df)
result_df.show(truncate=False)

+----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

We can output data as one single document by setting `outputAsDocument` to `true`

In [42]:
reader2doc = Reader2Doc() \
    .setContentType("text/html") \
    .setContentPath("./all-files/html-files") \
    .setOutputCol("document") \
    .setOutputAsDocument(True)

pipeline = Pipeline(stages=[reader2doc])
model = pipeline.fit(empty_df)

result_df = model.transform(empty_df)
result_df.show(truncate=False)

+----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

If we wan to exclude none text data. We can set `excludeNonText` parameter to `true`. This will remove data from tables and images

In [43]:
reader2doc = Reader2Doc() \
    .setContentType("text/html") \
    .setContentPath("./all-files/html-files/fake-html.html") \
    .setOutputCol("document") \
    .setExcludeNonText(True)

pipeline = Pipeline(stages=[reader2doc])
model = pipeline.fit(empty_df)

result_df = model.transform(empty_df)
result_df.show(truncate=False)

+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|fileName      |document                                       

## Pipeline Integration

We can integrate with pipelines. For example, with a simple `Tokenizer`:

In [44]:
from sparknlp.annotator import *
from sparknlp.base import *

empty_df = spark.createDataFrame([], "string").toDF("text")

regex_tok = RegexTokenizer() \
    .setInputCols(["document"]) \
    .setOutputCol("regex_token")

pipeline = Pipeline(stages=[reader2doc, regex_tok])
model = pipeline.fit(empty_df)

result_df = model.transform(empty_df)

In [45]:
result_df.show(truncate=False)

+--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------