# DateMatcher multi-language

#### This annotator allows you to specify a source language that will be used to identify temporal keywords and extract dates.

In [1]:
# This is only to setup PySpark and Spark NLP on Colab
!wget http://setup.johnsnowlabs.com/colab.sh -O - | bash

--2022-12-23 12:30:40--  http://setup.johnsnowlabs.com/colab.sh
Resolving setup.johnsnowlabs.com (setup.johnsnowlabs.com)... 51.158.130.125
Connecting to setup.johnsnowlabs.com (setup.johnsnowlabs.com)|51.158.130.125|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://setup.johnsnowlabs.com/colab.sh [following]
--2022-12-23 12:30:40--  https://setup.johnsnowlabs.com/colab.sh
Connecting to setup.johnsnowlabs.com (setup.johnsnowlabs.com)|51.158.130.125|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/scripts/colab_setup.sh [following]
--2022-12-23 12:30:41--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/scripts/colab_setup.sh
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:44

In [2]:
# Import Spark NLP
from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp.pretrained import PretrainedPipeline
import sparknlp

# Start Spark Session with Spark NLP
# start() functions has two parameters: gpu and spark23
# sparknlp.start(gpu=True) will start the session with GPU support
# sparknlp.start(spark23=True) is when you have Apache Spark 2.3.x installed
spark = sparknlp.start()

In [3]:
spark

In [4]:
sparknlp.version()

'4.2.6'

# Spanish examples

### Let's import some articoles sentences from the news where relative dates are present.

In [5]:
es_articles = [
  ("Italia este domingo 11 de julio de 2021 es, por tanto, bicampeona de Europa.",),
  ("Italia sucede a Portugal, ganador del torneo hace 5 años, como campeón europeo de fútbol tras vencer a Inglaterra en Wembley.",),
]

### Let's  fill a DataFrame with the text column

In [6]:
articles_cols = ["text"]

df = spark.createDataFrame(data=es_articles, schema=articles_cols)

df.printSchema()
df.show()

root
 |-- text: string (nullable = true)

+--------------------+
|                text|
+--------------------+
|Italia este domin...|
|Italia sucede a P...|
+--------------------+



### Now, let's create a simple pipeline to apply the DateMatcher, specifying the source language

In [8]:
document_assembler = DocumentAssembler() \
            .setInputCol("text") \
            .setOutputCol("document")

date_matcher = DateMatcher() \
            .setInputCols(['document']) \
            .setOutputCol("date") \
            .setOutputFormat("MM/dd/yyyy") \
            .setSourceLanguage("es")

In [9]:
### Let's transform the Data

In [10]:
assembled = document_assembler.transform(df)
date_matcher.transform(assembled).select('date').show(10, False)

+-------------------------------------------------+
|date                                             |
+-------------------------------------------------+
|[{date, 19, 36, 07/11/2021, {sentence -> 0}, []}]|
|[{date, 45, 55, 12/23/2017, {sentence -> 0}, []}]|
+-------------------------------------------------+

