![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/annotation/text/portuguese/MultiDateMatcherMultiLanguage_pt.ipynb)

# MultiDateMatcher in Portuguese

In [None]:
# Only run this cell when you are using Spark NLP on Google Colab
!wget http://setup.johnsnowlabs.com/colab.sh -O - | bash

In [None]:
from pyspark import *
from pyspark.sql.types import StringType

import sparknlp
from sparknlp.annotator import *
from sparknlp.base import *

spark = sparknlp.start()
print(sparknlp.version())
print(spark.version)

4.3.1
3.3.0


## Portuguese formatted dates matching examples

In [None]:
df = spark.createDataFrame(
  ["Encontramo-nos no dia 13/05/2018 e depois no dia 18/05/2020."],
  StringType()).toDF("text")
df.show()

+--------------------+
|                text|
+--------------------+
|Encontramo-nos no...|
+--------------------+



In [None]:
document_assembler = DocumentAssembler() \
            .setInputCol("text") \
            .setOutputCol("document")

date_matcher = MultiDateMatcher() \
            .setInputCols(['document']) \
            .setOutputCol("date") \
            .setOutputFormat("MM/dd/yyyy") \
            .setSourceLanguage("pt")

assembled = document_assembler.transform(df)
date_matcher.transform(assembled).select("date").show(10, False)

+--------------------------------------------------------------------------------------------------+
|date                                                                                              |
+--------------------------------------------------------------------------------------------------+
|[{date, 23, 32, 05/13/2018, {sentence -> 0}, []}, {date, 51, 60, 05/18/2020, {sentence -> 0}, []}]|
+--------------------------------------------------------------------------------------------------+



## Portuguese unformatted dates matching examples

In [None]:
df = spark.createDataFrame(
  ["Nós nos conhecemos há 5 dias e ele me disse que nos visitaria na próxima semana."],
  StringType()).toDF("text")
df.show()

+--------------------+
|                text|
+--------------------+
|Nós nos conhecemo...|
+--------------------+



In [None]:
document_assembler = DocumentAssembler() \
            .setInputCol("text") \
            .setOutputCol("document")

date_matcher = MultiDateMatcher() \
            .setInputCols(['document']) \
            .setOutputCol("date") \
            .setOutputFormat("MM/dd/yyyy") \
            .setSourceLanguage("pt")

assembled = document_assembler.transform(df)
date_matcher.transform(assembled).select("date").show(10, False)

+--------------------------------------------------------------------------------------------------+
|date                                                                                              |
+--------------------------------------------------------------------------------------------------+
|[{date, 19, 28, 02/15/2023, {sentence -> 0}, []}, {date, 66, 74, 02/27/2023, {sentence -> 0}, []}]|
+--------------------------------------------------------------------------------------------------+



# A short guide to language support extension

## In order to extend the date matchers language support for new languages, please follow the instructions below:

1. Add the new dictionary into src/main/resources/date-matcher/translation-dictionaries/dynamic folder of the spark-nlp project
2. Add the same dictionary base of the other languages
   * Add tests for the dictionary
3. Add other eventual specific expressions to the base
   * Add tests for those specific expressions to avoid syntactic conflicts in parsing
4. Add a notebook like this one to show how to use the language extension

Thank you for contributing! :)