![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/annotation/text/german/date_matcher_multi_language_de.ipynb)

# DateMatcher multi-language (German)
This annotator allows you to specify a source language that will be used to identify temporal keywords and extract dates.

In [None]:
# Only run this cell when you are using Spark NLP on Google Colab
!wget http://setup.johnsnowlabs.com/colab.sh -O - | bash

In [None]:
# Import Spark NLP
from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp.pretrained import PretrainedPipeline
import sparknlp

# Start Spark Session with Spark NLP
# start() functions has two parameters: gpu and spark23
# sparknlp.start(gpu=True) will start the session with GPU support
# sparknlp.start(spark23=True) is when you have Apache Spark 2.3.x installed
spark = sparknlp.start()

In [None]:
spark

In [None]:
sparknlp.version()

'4.3.1'

# German examples

### Let's import some articoles sentences from the news where relative dates are present.

In [None]:
de_articles = [
  ("Am Sonntag, 11. Juli 2021, benutzte Chiellini das Wort Kiricocho, als Saka sich dem Ball zum Elfmeter näherte.",),
  ("Die nächste WM findet im November 2022 statt.",),
]

### Let's  fill a DataFrame with the text column

In [None]:
articles_cols = ["text"]

df = spark.createDataFrame(data=de_articles, schema=articles_cols)

df.printSchema()
df.show()

root
 |-- text: string (nullable = true)

+--------------------+
|                text|
+--------------------+
|Am Sonntag, 11. J...|
|Die nächste WM fi...|
+--------------------+



### Now, let's create a simple pipeline to apply the DateMatcher, specifying the source language

In [None]:
document_assembler = DocumentAssembler() \
            .setInputCol("text") \
            .setOutputCol("document")

date_matcher = DateMatcher() \
            .setInputCols(['document']) \
            .setOutputCol("date") \
            .setOutputFormat("MM/dd/yyyy") \
            .setSourceLanguage("de")

In [None]:
### Let's transform the Data

In [None]:
assembled = document_assembler.transform(df)
date_matcher.transform(assembled).select('date').show(10, False)

+-------------------------------------------------+
|date                                             |
+-------------------------------------------------+
|[{date, 10, 21, 07/11/2021, {sentence -> 0}, []}]|
|[{date, 25, 37, 11/01/2022, {sentence -> 0}, []}]|
+-------------------------------------------------+

