-
Notifications
You must be signed in to change notification settings - Fork 20
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add model 2023-04-27-legner_mapa_de * Update 2023-04-27-legner_mapa_de.md * Add model 2023-04-27-legner_mapa_el * Update 2023-04-27-legner_mapa_el.md --------- Co-authored-by: bunyamin-polat <muhendisbp@gmail.com> Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com>
- Loading branch information
1 parent
ece0d26
commit e082cf5
Showing
2 changed files
with
264 additions
and
0 deletions.
There are no files selected for viewing
132 changes: 132 additions & 0 deletions
132
docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_de.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
--- | ||
layout: model | ||
title: Legal NER for MAPA(Multilingual Anonymisation for Public Administrations) | ||
author: John Snow Labs | ||
name: legner_mapa | ||
date: 2023-04-27 | ||
tags: [de, ner, legal, licensed, mapa] | ||
task: Named Entity Recognition | ||
language: de | ||
edition: Legal NLP 1.0.0 | ||
spark_version: 3.0 | ||
supported: true | ||
annotator: LegalNerModel | ||
article_header: | ||
type: cover | ||
use_language_switcher: "Python-Scala-Java" | ||
--- | ||
|
||
## Description | ||
|
||
The dataset consists of 12 documents taken from EUR-Lex, a multilingual corpus of court decisions and legal dispositions in the 24 official languages of the European Union. | ||
|
||
This model extracts `ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, and `PERSON` entities from `German` documents. | ||
|
||
## Predicted Entities | ||
|
||
`ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, `PERSON` | ||
|
||
{:.btn-box} | ||
<button class="button button-orange" disabled>Live Demo</button> | ||
<button class="button button-orange" disabled>Open in Colab</button> | ||
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_mapa_de_1.0.0_3.0_1682589773968.zip){:.button.button-orange} | ||
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_mapa_de_1.0.0_3.0_1682589773968.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} | ||
|
||
## How to use | ||
|
||
|
||
|
||
<div class="tabs-box" markdown="1"> | ||
{% include programmingLanguageSelectScalaPythonNLU.html %} | ||
|
||
```python | ||
document_assembler = nlp.DocumentAssembler()\ | ||
.setInputCol("text")\ | ||
.setOutputCol("document") | ||
|
||
sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ | ||
.setInputCols(["document"])\ | ||
.setOutputCol("sentence") | ||
|
||
tokenizer = nlp.Tokenizer()\ | ||
.setInputCols(["sentence"])\ | ||
.setOutputCol("token") | ||
|
||
embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_base_de_cased", "de")\ | ||
.setInputCols(["sentence", "token"])\ | ||
.setOutputCol("embeddings")\ | ||
.setMaxSentenceLength(512)\ | ||
.setCaseSensitive(True) | ||
|
||
ner_model = legal.NerModel.pretrained("legner_mapa", "de", "legal/models")\ | ||
.setInputCols(["sentence", "token", "embeddings"])\ | ||
.setOutputCol("ner") | ||
|
||
ner_converter = nlp.NerConverter()\ | ||
.setInputCols(["sentence", "token", "ner"])\ | ||
.setOutputCol("ner_chunk") | ||
|
||
nlpPipeline = nlp.Pipeline(stages=[ | ||
document_assembler, | ||
sentence_detector, | ||
tokenizer, | ||
embeddings, | ||
ner_model, | ||
ner_converter]) | ||
|
||
empty_data = spark.createDataFrame([[""]]).toDF("text") | ||
|
||
model = nlpPipeline.fit(empty_data) | ||
|
||
text = ["""Herr Liberato und Frau Grigorescu heirateten am 22 Oktober 2005 in Rom (Italien) und lebten in diesem Mitgliedstaat bis zur Geburt ihres Kindes am 20 Februar 2006 zusammen."""] | ||
|
||
result = model.transform(spark.createDataFrame([text]).toDF("text")) | ||
``` | ||
|
||
</div> | ||
|
||
## Results | ||
|
||
```bash | ||
+----------------+---------+ | ||
|chunk |ner_label| | ||
+----------------+---------+ | ||
|Herr Liberato |PERSON | | ||
|Frau Grigorescu |PERSON | | ||
|22 Oktober 2005|DATE | | ||
|Rom (Italien) |ADDRESS | | ||
|20 Februar 2006 |DATE | | ||
+----------------+---------+ | ||
``` | ||
|
||
{:.model-param} | ||
## Model Information | ||
|
||
{:.table-model} | ||
|---|---| | ||
|Model Name:|legner_mapa| | ||
|Compatibility:|Legal NLP 1.0.0+| | ||
|License:|Licensed| | ||
|Edition:|Official| | ||
|Input Labels:|[sentence, token, embeddings]| | ||
|Output Labels:|[ner]| | ||
|Language:|de| | ||
|Size:|1.4 MB| | ||
|
||
## References | ||
|
||
The dataset is available [here](https://huggingface.co/datasets/joelito/mapa). | ||
|
||
## Benchmarking | ||
|
||
```bash | ||
label precision recall f1-score support | ||
ADDRESS 0.69 0.85 0.76 13 | ||
AMOUNT 1.00 0.75 0.86 4 | ||
DATE 0.92 0.93 0.93 61 | ||
ORGANISATION 0.64 0.77 0.70 30 | ||
PERSON 0.85 0.87 0.86 46 | ||
macro-avg 0.82 0.87 0.84 154 | ||
macro-avg 0.82 0.83 0.82 154 | ||
weighted-avg 0.83 0.87 0.85 154 | ||
``` |
132 changes: 132 additions & 0 deletions
132
docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_el.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
--- | ||
layout: model | ||
title: Legal NER for MAPA(Multilingual Anonymisation for Public Administrations) | ||
author: John Snow Labs | ||
name: legner_mapa | ||
date: 2023-04-27 | ||
tags: [el, ner, legal, mapa, licensed] | ||
task: Named Entity Recognition | ||
language: el | ||
edition: Legal NLP 1.0.0 | ||
spark_version: 3.0 | ||
supported: true | ||
annotator: LegalNerModel | ||
article_header: | ||
type: cover | ||
use_language_switcher: "Python-Scala-Java" | ||
--- | ||
|
||
## Description | ||
|
||
The dataset consists of 12 documents taken from EUR-Lex, a multilingual corpus of court decisions and legal dispositions in the 24 official languages of the European Union. | ||
|
||
This model extracts `ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, and `PERSON` entities from `Greek` documents. | ||
|
||
## Predicted Entities | ||
|
||
`ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, `PERSON` | ||
|
||
{:.btn-box} | ||
<button class="button button-orange" disabled>Live Demo</button> | ||
<button class="button button-orange" disabled>Open in Colab</button> | ||
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_mapa_el_1.0.0_3.0_1682590655353.zip){:.button.button-orange} | ||
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_mapa_el_1.0.0_3.0_1682590655353.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} | ||
|
||
## How to use | ||
|
||
|
||
|
||
<div class="tabs-box" markdown="1"> | ||
{% include programmingLanguageSelectScalaPythonNLU.html %} | ||
|
||
```python | ||
document_assembler = nlp.DocumentAssembler()\ | ||
.setInputCol("text")\ | ||
.setOutputCol("document") | ||
|
||
sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ | ||
.setInputCols(["document"])\ | ||
.setOutputCol("sentence") | ||
|
||
tokenizer = nlp.Tokenizer()\ | ||
.setInputCols(["sentence"])\ | ||
.setOutputCol("token") | ||
|
||
embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_base_el_cased", "el")\ | ||
.setInputCols(["sentence", "token"])\ | ||
.setOutputCol("embeddings")\ | ||
.setMaxSentenceLength(512)\ | ||
.setCaseSensitive(True) | ||
|
||
ner_model = legal.NerModel.pretrained("legner_mapa", "el", "legal/models")\ | ||
.setInputCols(["sentence", "token", "embeddings"])\ | ||
.setOutputCol("ner") | ||
|
||
ner_converter = nlp.NerConverter()\ | ||
.setInputCols(["sentence", "token", "ner"])\ | ||
.setOutputCol("ner_chunk") | ||
|
||
nlpPipeline = nlp.Pipeline(stages=[ | ||
document_assembler, | ||
sentence_detector, | ||
tokenizer, | ||
embeddings, | ||
ner_model, | ||
ner_converter]) | ||
|
||
empty_data = spark.createDataFrame([[""]]).toDF("text") | ||
|
||
model = nlpPipeline.fit(empty_data) | ||
|
||
text = ["""86 Στην υπόθεση της κύριας δίκης, προκύπτει ότι ορισμένοι εργαζόμενοι της Martin‑Meat αποσπάσθηκαν στην Αυστρία κατά την περίοδο μεταξύ του έτους 2007 και του έτους 2012, για την εκτέλεση εργασιών τεμαχισμού κρέατος σε εγκαταστάσεις της Alpenrind."""] | ||
|
||
result = model.transform(spark.createDataFrame([text]).toDF("text")) | ||
``` | ||
|
||
</div> | ||
|
||
## Results | ||
|
||
```bash | ||
+-----------+------------+ | ||
|chunk |ner_label | | ||
+-----------+------------+ | ||
|Martin‑Meat|ORGANISATION| | ||
|Αυστρία |ADDRESS | | ||
|2007 |DATE | | ||
|2012 |DATE | | ||
|Alpenrind |ORGANISATION| | ||
+-----------+------------+ | ||
``` | ||
|
||
{:.model-param} | ||
## Model Information | ||
|
||
{:.table-model} | ||
|---|---| | ||
|Model Name:|legner_mapa| | ||
|Compatibility:|Legal NLP 1.0.0+| | ||
|License:|Licensed| | ||
|Edition:|Official| | ||
|Input Labels:|[sentence, token, embeddings]| | ||
|Output Labels:|[ner]| | ||
|Language:|el| | ||
|Size:|16.4 MB| | ||
|
||
## References | ||
|
||
The dataset is available [here](https://huggingface.co/datasets/joelito/mapa). | ||
|
||
## Benchmarking | ||
|
||
```bash | ||
label precision recall f1-score support | ||
ADDRESS 0.89 1.00 0.94 16 | ||
AMOUNT 0.82 0.75 0.78 12 | ||
DATE 0.98 0.98 0.98 65 | ||
ORGANISATION 0.85 0.85 0.85 40 | ||
PERSON 0.90 0.95 0.92 38 | ||
macro-avg 0.91 0.93 0.92 171 | ||
macro-avg 0.89 0.91 0.90 171 | ||
weighted-avg 0.91 0.93 0.92 171 | ||
``` |