Skip to content

Commit

Permalink
Legal NLP 1.12.0 (#180)
Browse files Browse the repository at this point in the history
* 2023-04-16-legner_nda_remedies_en (#123)

* Add model 2023-04-16-legner_nda_remedies_en

* Update 2023-04-16-legner_nda_remedies_en.md

---------

Co-authored-by: bunyamin-polat <muhendisbp@gmail.com>
Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>

* 2023-04-19-legner_nda_return_of_conf_info_en (#132)

* Add model 2023-04-19-legner_nda_return_of_conf_info_en

* Update 2023-04-19-legner_nda_return_of_conf_info_en.md

---------

Co-authored-by: bunyamin-polat <muhendisbp@gmail.com>
Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>

* Add model 2023-04-20-legmulticlf_covid19_exceptions_italian_it (#135)

Co-authored-by: Mary-Sci <meryemyildiz366@gmail.com>

* 2023-04-21-leggen_flant5_base_en (#143)

* Add model 2023-04-21-leggen_flant5_base_en

* Update 2023-04-21-leggen_flant5_base_en.md

---------

Co-authored-by: gadde5300 <gadde5300@gmail.com>
Co-authored-by: GADDE SAI SHAILESH <69344247+gadde5300@users.noreply.github.com>

* 2023-04-24-legner_nda_req_discl_en (#146)

* Add model 2023-04-24-legner_nda_req_discl_en

* Update 2023-04-24-legner_nda_req_discl_en.md

---------

Co-authored-by: bunyamin-polat <muhendisbp@gmail.com>
Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>

* 2023-04-25-legner_greek_legislation_el (#148)

* Add model 2023-04-25-legner_greek_legislation_el

* Update 2023-04-25-legner_greek_legislation_el.md

* Update 2023-04-25-legner_greek_legislation_el.md

---------

Co-authored-by: bunyamin-polat <muhendisbp@gmail.com>
Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>

* Add model 2023-04-26-legmulticlf_online_terms_of_service_english_en (#153)

Co-authored-by: Mary-Sci <meryemyildiz366@gmail.com>

* 2023-04-26-legner_mapa_bg (#155)

* Add model 2023-04-26-legner_mapa_bg

* Update 2023-04-26-legner_mapa_bg.md

---------

Co-authored-by: bunyamin-polat <muhendisbp@gmail.com>
Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>

* 2023-04-26-legner_mapa_da (#156)

* Add model 2023-04-26-legner_mapa_da

* Update 2023-04-26-legner_mapa_da.md

---------

Co-authored-by: bunyamin-polat <muhendisbp@gmail.com>
Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>

* 2023-04-27-legner_mapa_de (#159)

* Add model 2023-04-27-legner_mapa_de

* Update 2023-04-27-legner_mapa_de.md

* Add model 2023-04-27-legner_mapa_el

* Update 2023-04-27-legner_mapa_el.md

---------

Co-authored-by: bunyamin-polat <muhendisbp@gmail.com>
Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>

* 2023-04-27-legner_mapa_en (#160)

* Add model 2023-04-27-legner_mapa_en

* Update 2023-04-27-legner_mapa_en.md

---------

Co-authored-by: bunyamin-polat <muhendisbp@gmail.com>
Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>

* 2023-04-27-legner_mapa_es (#162)

* Add model 2023-04-27-legner_mapa_es

* Update 2023-04-27-legner_mapa_es.md

---------

Co-authored-by: bunyamin-polat <muhendisbp@gmail.com>
Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>

* 2023-04-27-legner_mapa_fr (#163)

* Add model 2023-04-27-legner_mapa_fr

* Update 2023-04-27-legner_mapa_fr.md

* Add model 2023-04-27-legner_mapa_it

* Update 2023-04-27-legner_mapa_it.md

---------

Co-authored-by: bunyamin-polat <muhendisbp@gmail.com>
Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>

* 2023-04-27-legner_mapa_lt (#166)

* Add model 2023-04-27-legner_mapa_lt

* Update 2023-04-27-legner_mapa_lt.md

---------

Co-authored-by: bunyamin-polat <muhendisbp@gmail.com>
Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>

* 2023-04-27-legner_mapa_nl (#167)

* Add model 2023-04-27-legner_mapa_nl

* Update 2023-04-27-legner_mapa_nl.md

---------

Co-authored-by: bunyamin-polat <muhendisbp@gmail.com>
Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>

* 2023-04-27-legner_mapa_pt (#169)

* Add model 2023-04-27-legner_mapa_pt

* Update 2023-04-27-legner_mapa_pt.md

* Add model 2023-04-27-legner_mapa_ro

* Update 2023-04-27-legner_mapa_ro.md

---------

Co-authored-by: bunyamin-polat <muhendisbp@gmail.com>
Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>

* 2023-04-28-legner_mapa_cs (#172)

* Add model 2023-04-28-legner_mapa_cs

* Update 2023-04-28-legner_mapa_cs.md

* Add model 2023-04-28-legner_mapa_ga

* Update 2023-04-28-legner_mapa_ga.md

* Update 2023-04-28-legner_mapa_ga.md

* Add model 2023-04-28-legner_mapa_fi

* Update 2023-04-28-legner_mapa_fi.md

* Add model 2023-04-28-legner_mapa_sk

* Update 2023-04-28-legner_mapa_sk.md

---------

Co-authored-by: bunyamin-polat <muhendisbp@gmail.com>
Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>

* 2023-04-29-legpipe_alias_en (#176)

* Add model 2023-04-29-legpipe_alias_en

* Update 2023-04-29-legpipe_alias_en.md

---------

Co-authored-by: bunyamin-polat <muhendisbp@gmail.com>
Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>

* 2023-04-29-leggen_flant5_finetuned_en (#177)

* Add model 2023-04-29-leggen_flant5_finetuned_en

* Update 2023-04-29-leggen_flant5_finetuned_en.md

---------

Co-authored-by: gadde5300 <gadde5300@gmail.com>
Co-authored-by: GADDE SAI SHAILESH <69344247+gadde5300@users.noreply.github.com>

* Delete 2023-04-29-legpipe_alias_en.md

* 2023-04-30-legpipe_alias_en (#178)

* Add model 2023-04-30-legpipe_alias_en

* Update 2023-04-30-legpipe_alias_en.md

* Update 2023-04-30-legpipe_alias_en.md

* Update 2023-04-30-legpipe_alias_en.md

* Update 2023-04-30-legpipe_alias_en.md

---------

Co-authored-by: bunyamin-polat <muhendisbp@gmail.com>
Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>
Co-authored-by: Juan Martinez <36634572+josejuanmartinez@users.noreply.github.com>

---------

Co-authored-by: jsl-models <74001263+jsl-models@users.noreply.github.com>
Co-authored-by: bunyamin-polat <muhendisbp@gmail.com>
Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>
Co-authored-by: Mary-Sci <meryemyildiz366@gmail.com>
Co-authored-by: gadde5300 <gadde5300@gmail.com>
Co-authored-by: GADDE SAI SHAILESH <69344247+gadde5300@users.noreply.github.com>
  • Loading branch information
7 people committed May 1, 2023
1 parent c7c06e6 commit 0a84240
Show file tree
Hide file tree
Showing 25 changed files with 3,130 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
---
layout: model
title: Legal Multilabel Classifier on Covid-19 Exceptions (Italian)
author: John Snow Labs
name: legmulticlf_covid19_exceptions_italian
date: 2023-04-20
tags: [it, licensed, legal, multilabel, classification, tensorflow]
task: Text Classification
language: it
edition: Legal NLP 1.0.0
spark_version: 3.0
supported: true
engine: tensorflow
annotator: MultiClassifierDLModel
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

This is the Multi-Label Text Classification model that can be used to identify up to 5 classes to facilitate analysis, discovery, and comparison of legal texts in Italian related to COVID-19 exception measures. The classes are as follows:

- Closures/lockdown
- Government_oversight
- Restrictions_of_daily_liberties
- Restrictions_of_fundamental_rights_and_civil_liberties
- State_of_Emergency

## Predicted Entities

`Closures/lockdown`, `Government_oversight`, `Restrictions_of_daily_liberties`, `Restrictions_of_fundamental_rights_and_civil_liberties`, `State_of_Emergency`

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legmulticlf_covid19_exceptions_italian_it_1.0.0_3.0_1681985472330.zip){:.button.button-orange}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legmulticlf_covid19_exceptions_italian_it_1.0.0_3.0_1681985472330.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
document_assembler = nlp.DocumentAssembler() \
.setInputCol("text")\
.setOutputCol("document")

tokenizer = nlp.Tokenizer()\
.setInputCols(["document"]) \
.setOutputCol("token")

embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_bert_base_italian_xxl_cased", "it") \
.setInputCols(["document", "token"])\
.setOutputCol("embeddings")

embeddingsSentence = nlp.SentenceEmbeddings() \
.setInputCols(["document", "embeddings"])\
.setOutputCol("sentence_embeddings")\
.setPoolingStrategy("AVERAGE")

multilabelClfModel = nlp.MultiClassifierDLModel.pretrained('legmulticlf_covid19_exceptions_italian', 'it', "legal/models") \
.setInputCols(["sentence_embeddings"])\
.setOutputCol("class")

clf_pipeline = nlp.Pipeline(
stages=[document_assembler,
tokenizer,
embeddings,
embeddingsSentence,
multilabelClfModel])

df = spark.createDataFrame([["Al di fuori di tale ultima ipotesi, secondo le raccomandazioni impartite dal Ministero della salute, occorre provvedere ad assicurare la corretta applicazione di misure preventive quali lavare frequentemente le mani con acqua e detergenti comuni."]]).toDF("text")

model = clf_pipeline.fit(df)
result = model.transform(df)

result.select("text", "class.result").show(truncate=False)
```

</div>

## Results

```bash
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+
|text |result |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+
|Al di fuori di tale ultima ipotesi, secondo le raccomandazioni impartite dal Ministero della salute, occorre provvedere ad assicurare la corretta applicazione di misure preventive quali lavare frequentemente le mani con acqua e detergenti comuni.|[Restrictions_of_daily_liberties]|
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+
```

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|legmulticlf_covid19_exceptions_italian|
|Compatibility:|Legal NLP 1.0.0+|
|License:|Licensed|
|Edition:|Official|
|Input Labels:|[sentence_embeddings]|
|Output Labels:|[class]|
|Language:|it|
|Size:|13.9 MB|

## References

Train dataset available [here](https://huggingface.co/datasets/joelito/covid19_emergency_event)

## Benchmarking

```bash
label precision recall f1-score support
Closures/lockdown 0.88 0.94 0.91 47
Government_oversight 1.00 0.50 0.67 4
Restrictions_of_daily_liberties 0.88 0.79 0.83 28
Restrictions_of_fundamental_rights_and_civil_liberties 0.62 0.62 0.62 16
State_of_Emergency 0.67 1.00 0.80 6
micro-avg 0.82 0.83 0.83 101
macro-avg 0.81 0.77 0.77 101
weighted-avg 0.83 0.83 0.83 101
samples-avg 0.81 0.84 0.81 101
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
---
layout: model
title: Legal Multilabel Classifier on Online Terms of Service
author: John Snow Labs
name: legmulticlf_online_terms_of_service_english
date: 2023-04-26
tags: [en, licensed, multilabel, classification, legal, tensorflow]
task: Text Classification
language: en
edition: Legal NLP 1.0.0
spark_version: 3.0
supported: true
engine: tensorflow
annotator: MultiClassifierDLModel
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

This is the Multi-Label Text Classification model that can be used to identify potentially unfair clauses in online Terms of Service. The classes are as follows:

- Arbitration
- Choice_of_law
- Content_removal
- Jurisdiction
- Limitation_of_liability
- Other
- Unilateral_change
- Unilateral_termination

## Predicted Entities

`Arbitration`, `Choice_of_law`, `Content_removal`, `Jurisdiction`, `Limitation_of_liability`, `Other`, `Unilateral_change`, `Unilateral_termination`

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legmulticlf_online_terms_of_service_english_en_1.0.0_3.0_1682519205970.zip){:.button.button-orange}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legmulticlf_online_terms_of_service_english_en_1.0.0_3.0_1682519205970.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
document_assembler = nlp.DocumentAssembler() \
.setInputCol('text')\
.setOutputCol('document')

tokenizer = nlp.Tokenizer() \
.setInputCols(['document'])\
.setOutputCol('token')

embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base", "en") \
.setInputCols(['document', 'token'])\
.setOutputCol("embeddings")

embeddingsSentence = nlp.SentenceEmbeddings() \
.setInputCols(['document', 'embeddings'])\
.setOutputCol('sentence_embeddings')\
.setPoolingStrategy('AVERAGE')

classifierdl = nlp.MultiClassifierDLModel.pretrained('legmulticlf_online_terms_of_service_english', 'en', 'legal/models')
.setInputCols(["sentence_embeddings"])\
.setOutputCol("class")

clf_pipeline = nlp.Pipeline(stages=[document_assembler,
tokenizer,
embeddings,
embeddingsSentence,
classifierdl])

df = spark.createDataFrame([["We are not responsible or liable for (and have no obligation to verify) any wrong or misspelled email address or inaccurate or wrong (mobile) phone number or credit card number."]]).toDF("text")

model = clf_pipeline.fit(df)
result = model.transform(df)

result.select("text", "class.result").show(truncate=False)
```

</div>

## Results

```bash
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+
|sentence |result |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+
|We are not responsible or liable for (and have no obligation to verify) any wrong or misspelled email address or inaccurate or wrong (mobile) phone number or credit card number.|[Limitation_of_liability]|
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+
```

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|legmulticlf_online_terms_of_service_english|
|Compatibility:|Legal NLP 1.0.0+|
|License:|Licensed|
|Edition:|Official|
|Input Labels:|[sentence_embeddings]|
|Output Labels:|[class]|
|Language:|en|
|Size:|13.9 MB|

## References

Train dataset available [here](https://huggingface.co/datasets/joelito/online_terms_of_service)

## Benchmarking

```bash
label precision recall f1-score support
Arbitration 1.00 0.50 0.67 4
Choice_of_law 0.67 0.67 0.67 3
Content_removal 1.00 0.67 0.80 3
Jurisdiction 0.80 1.00 0.89 4
Limitation_of_liability 0.73 0.73 0.73 15
Other 0.86 0.89 0.88 28
Unilateral_change 0.86 1.00 0.92 6
Unilateral_termination 1.00 0.80 0.89 5
micro-avg 0.84 0.82 0.83 68
macro-avg 0.86 0.78 0.81 68
weighted-avg 0.85 0.82 0.83 68
samples-avg 0.80 0.82 0.81 68
```
126 changes: 126 additions & 0 deletions docs/_posts/bunyamin-polat/2023-04-16-legner_nda_remedies_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
---
layout: model
title: Legal NER for NDA (Remedies Clauses)
author: John Snow Labs
name: legner_nda_remedies
date: 2023-04-16
tags: [en, licensed, ner, legal, nda, remedies]
task: Named Entity Recognition
language: en
edition: Legal NLP 1.0.0
spark_version: 3.0
supported: true
annotator: LegalNerModel
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

This is a NER model, aimed to be run **only** after detecting the `REMEDIES` clause with a proper classifier (use `legmulticlf_mnda_sections_paragraph_other` for that purpose). It will extract the following entities: `CURRENCY`, `NUMERIC_REMEDY`, and `REMEDY_TYPE`.

## Predicted Entities

`CURRENCY`, `NUMERIC_REMEDY`, `REMEDY_TYPE`

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_nda_remedies_en_1.0.0_3.0_1681687124993.zip){:.button.button-orange}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_nda_remedies_en_1.0.0_3.0_1681687124993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}

```python
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")

sentence_detector = nlp.SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")

embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings")\
.setMaxSentenceLength(512)\
.setCaseSensitive(True)

ner_model = legal.NerModel.pretrained("legner_nda_remedies", "en", "legal/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")

ner_converter = nlp.NerConverter()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")

nlpPipeline = nlp.Pipeline(stages=[
document_assembler,
sentence_detector,
tokenizer,
embeddings,
ner_model,
ner_converter])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = nlpPipeline.fit(empty_data)

text = ["""The breaching party shall pay the non-breaching party liquidated damages of $ 1,000 per day for each day that the breach of this NDA continues."""]

result = model.transform(spark.createDataFrame([text]).toDF("text"))
```

</div>

## Results

```bash
+------------------+--------------+
|chunk |ner_label |
+------------------+--------------+
|liquidated damages|REMEDY_TYPE |
|$ |CURRENCY |
|1,000 |NUMERIC_REMEDY|
+------------------+--------------+
```

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|legner_nda_remedies|
|Compatibility:|Legal NLP 1.0.0+|
|License:|Licensed|
|Edition:|Official|
|Input Labels:|[sentence, token, embeddings]|
|Output Labels:|[ner]|
|Language:|en|
|Size:|16.3 MB|

## References

In-house annotations on the Non-disclosure Agreements

## Benchmarking

```bash
label precision recall f1-score support
CURRENCY 1.00 1.00 1.00 11
NUMERIC_REMEDY 1.00 1.00 1.00 11
REMEDY_TYPE 0.86 0.94 0.90 32
micro-avg 0.91 0.96 0.94 54
macro-avg 0.95 0.98 0.97 54
weighted-avg 0.92 0.96 0.94 54
```
Loading

0 comments on commit 0a84240

Please sign in to comment.