

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/DATE_MATCHER.ipynb)




# **Spark NLP Date Matcher**

### Spark NLP documentation and instructions:
https://nlp.johnsnowlabs.com/docs/en/quickstart

### You can find details about Spark NLP annotators here:
https://nlp.johnsnowlabs.com/docs/en/annotators

### You can find details about Spark NLP models here:
https://nlp.johnsnowlabs.com/models


## 1. Colab Setup

In [None]:
# Install java
!apt-get update -qq
!apt-get install -y openjdk-8-jdk-headless -qq > /dev/null
!java -version

# Install pyspark
!pip install --ignore-installed -q pyspark==2.4.4

# Install Sparknlp
!pip install --ignore-installed spark-nlp

openjdk version "11.0.8" 2020-07-14
OpenJDK Runtime Environment (build 11.0.8+10-post-Ubuntu-0ubuntu118.04.1)
OpenJDK 64-Bit Server VM (build 11.0.8+10-post-Ubuntu-0ubuntu118.04.1, mixed mode, sharing)
[K     |████████████████████████████████| 215.7MB 59kB/s 
[K     |████████████████████████████████| 204kB 42.1MB/s 
[?25h  Building wheel for pyspark (setup.py) ... [?25l[?25hdone
Collecting spark-nlp
[?25l  Downloading https://files.pythonhosted.org/packages/f1/de/6db7be666e7c8b70d39bcb0d956d3d983bcb06ea3895308c84d7a131dfb1/spark_nlp-2.6.2-py2.py3-none-any.whl (128kB)
[K     |████████████████████████████████| 133kB 1.4MB/s 
[?25hInstalling collected packages: spark-nlp
Successfully installed spark-nlp-2.6.2


## 2. Start the Spark session

Import dependencies and start Spark session.

In [None]:
import os
import json
os.environ['JAVA_HOME'] = "/usr/lib/jvm/java-8-openjdk-amd64"

import pandas as pd
import numpy as np

from pyspark.ml import Pipeline
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
from sparknlp.annotator import *
from sparknlp.base import *
import sparknlp
from sparknlp.pretrained import PretrainedPipeline

spark = sparknlp.start()

##3. Build Pipeline

In [None]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetector().setInputCols("document")\
    .setOutputCol("sentence")

date_matcher = DateMatcher() \
    .setInputCols('sentence')\
    .setOutputCol("date") \
    .setDateFormat("yyyy/MM/dd")

pipeline1= Pipeline(stages=[ document_assembler, 
                                 sentence_detector,
                                 date_matcher,
                                 ])

empty_df = spark.createDataFrame([['']]).toDF("text")

date_pp = pipeline1.fit(empty_df)
date_model = LightPipeline(date_pp)

##4. Run & Visualize

In [None]:
input_list = [
    """David visited the restaurant yesterday with his family. 
He also visited and the day before, but at that time he was alone.
David again visited today with his colleagues.
He and his friends really liked the food and hoped to visit again tomorrow.""",]

In [None]:

tres = date_model.fullAnnotate(input_list)[0]
for dte in tres['date']:
    sent = tres['sentence'][int(dte.metadata['sentence'])]
    print (f'text/chunk {sent.result[dte.begin:dte.end+1]} | mapped_date: {dte.result}')

text/chunk yesterday | mapped_date: 2020/10/22
text/chunk day before | mapped_date: 2020/10/22
text/chunk today | mapped_date: 2020/10/23
text/chunk tomorrow | mapped_date: 2020/10/24
