Skip to content

Commit

Permalink
2024-05-15-legner_lener_base_pt (#1202)
Browse files Browse the repository at this point in the history
* Add model 2024-05-15-legner_lener_base_pt

* Add model 2024-05-15-legner_lener_large_pt

---------

Co-authored-by: gadde5300 <gadde5300@gmail.com>
  • Loading branch information
jsl-models and gadde5300 committed May 15, 2024
1 parent 6f856a1 commit 256b93c
Show file tree
Hide file tree
Showing 2 changed files with 458 additions and 0 deletions.
229 changes: 229 additions & 0 deletions docs/_posts/gadde5300/2024-05-15-legner_lener_base_pt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
---
layout: model
title: Brazilian Portuguese NER for Laws (Bert, Base)
author: John Snow Labs
name: legner_lener_base
date: 2024-05-15
tags: [lener, laws, legal, licensed, ner, pt, tensorflow]
task: Named Entity Recognition
language: pt
edition: Legal NLP 1.0.0
spark_version: 3.0
supported: true
engine: tensorflow
annotator: LegalBertForTokenClassification
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

This model is a Deep Learning Portuguese Named Entity Recognition model for the legal domain, trained using Base Bert Embeddings, and is able to predict the following entities:

- ORGANIZACAO (Organizations)
- JURISPRUDENCIA (Jurisprudence)
- PESSOA (Person)
- TEMPO (Time)
- LOCAL (Location)
- LEGISLACAO (Laws)
- O (Other)

You can find different versions of this model in Models Hub:
- With a Deep Learning architecture (non-transformer) and Base Embeddings;
- With a Deep Learning architecture (non-transformer) and Large Embeddings;
- With a Transformers Architecture and Base Embeddings;
- With a Transformers Architecture and Large Embeddings;

## Predicted Entities

`PESSOA`, `ORGANIZACAO`, `LEGISLACAO`, `JURISPRUDENCIA`, `TEMPO`, `LOCAL`

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_lener_base_pt_1.0.0_3.0_1715772909273.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_lener_base_pt_1.0.0_3.0_1715772909273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")

sentenceDetector = nlp.SentenceDetectorDLModel.pretrained()\
.setInputCols(["document"])\
.setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
.setInputCols("sentence")\
.setOutputCol("token")

tokenClassifier = legal.BertForTokenClassification.load("legner_lener_base","pt", "legal/models")\
.setInputCols("token", "sentence")\
.setOutputCol("label")\
.setCaseSensitive(True)

ner_converter = nlp.NerConverter()\
.setInputCols(["sentence","token","label"])\
.setOutputCol("ner_chunk")


pipeline = nlp.Pipeline(
stages=[
documentAssembler,
sentenceDetector,
tokenizer,
tokenClassifier,
ner_converter
]
)

example = spark.createDataFrame(pd.DataFrame({'text': ["""Mediante do exposto , com fundamento nos artigos 32 , i , e 33 , da lei 8.443/1992 , submetem-se os autos à consideração superior , com posterior encaminhamento ao ministério público junto ao tcu e ao gabinete do relator , propondo : a ) conhecer do recurso e , no mérito , negar-lhe provimento ; b ) comunicar ao recorrente , ao superior tribunal militar e ao tribunal regional federal da 2ª região , a fim de fornecer subsídios para os processos judiciais 2001.34.00.024796-9 e 2003.34.00.044227-3 ; e aos demais interessados a deliberação que vier a ser proferida por esta corte ” ."""]}))

result = pipeline.fit(example).transform(example)
```

</div>

## Results

```bash
+--------------+---------+----------+
| token|ner_label|confidence|
+--------------+---------+----------+
| Mediante| O|0.99998605|
| do| O| 0.9999868|
| exposto| O|0.99998623|
| ,| O| 0.999987|
| com| O|0.99998677|
| fundamento| O| 0.9999863|
| nos| O|0.99998486|
| artigos| I-TEMPO| 0.9995784|
| 32| B-LOCAL| 0.9998317|
| ,| B-LOCAL|0.99983853|
| i| B-LOCAL| 0.9998391|
| ,| B-LOCAL| 0.999842|
| e| B-LOCAL| 0.9998447|
| 33| B-LOCAL| 0.9998419|
| ,| B-LOCAL| 0.9998423|
| da| B-LOCAL| 0.9998431|
| lei| B-LOCAL| 0.9998434|
| 8.443/1992| B-LOCAL|0.99982893|
| ,| O| 0.9999863|
| submetem-se| O|0.99998677|
| os| O| 0.9999873|
| autos| O|0.99998647|
| à| O|0.99998707|
| consideração| O| 0.9999871|
| superior| O| 0.9999868|
| ,| O|0.99998736|
| com| O| 0.9999876|
| posterior| O|0.99998707|
|encaminhamento| O|0.99998724|
| ao| O|0.99998707|
| ministério| O| 0.9999853|
| público| O| 0.9999854|
| junto| O|0.99998665|
| ao| O|0.99998516|
| tcu| O| 0.9993648|
| e| O|0.99998665|
| ao| O|0.99998677|
| gabinete| O| 0.9999856|
| do| O| 0.9999865|
| relator| O|0.99998575|
| ,| O| 0.9999872|
| propondo| O|0.99998724|
| :| O|0.99998707|
| a| O| 0.9999873|
| )| O| 0.9999873|
| conhecer| O|0.99998724|
| do| O| 0.9999872|
| recurso| O| 0.9999867|
| e| O| 0.9999872|
| ,| O| 0.9999869|
| no| O|0.99998695|
| mérito| O| 0.9999872|
| ,| O| 0.9999873|
| negar-lhe| O| 0.9999875|
| provimento| O|0.99998724|
| ;| O| 0.9999865|
| b| O|0.99998635|
| )| O| 0.9999871|
| comunicar| O| 0.9999869|
| ao| O| 0.9999872|
| recorrente| O| 0.9999854|
| ,| O| 0.999987|
| ao| O| 0.999987|
| superior| O| 0.9999805|
| tribunal| O|0.99998057|
| militar| O| 0.9999655|
| e| O|0.99998677|
| ao| O|0.99998665|
| tribunal| O|0.99996954|
| regional| O| 0.9999731|
| federal| O| 0.9999361|
| da| O| 0.9999758|
|| O| 0.9999704|
| região| O|0.99994576|
| ,| O| 0.999987|
| a| O| 0.9999872|
| fim| O|0.99998724|
| de| O| 0.999987|
| fornecer| O|0.99998724|
| subsídios| O| 0.9999871|
| para| O| 0.9999867|
| os| O| 0.9999863|
| processos| O| 0.9999849|
| judiciais| O| 0.9999815|
| 2001| O|0.99994475|
| .| O|0.99998444|
|34.00.024796-9| O| 0.9999273|
| e| O| 0.9999757|
| 2003| O| 0.9908976|
| .| O|0.99998164|
|34.00.044227-3| O| 0.9999851|
| ;| O| 0.9999866|
| e| O|0.99998695|
| aos| O| 0.9999869|
| demais| O|0.99998677|
| interessados| O| 0.9999867|
| a| O|0.99998707|
| deliberação| O|0.99998724|
| que| O| 0.9999871|
| vier| O| 0.9999868|
| a| O| 0.9999867|
| ser| O| 0.9999872|
| proferida| O| 0.9999871|
| por| O|0.99998695|
| esta| O|0.99998677|
| corte| O|0.99998224|
|| O| 0.9999714|
| .| O|0.99998647|
+--------------+---------+----------+
```

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|legner_lener_base|
|Compatibility:|Legal NLP 1.0.0+|
|License:|Licensed|
|Edition:|Official|
|Input Labels:|[sentence, token]|
|Output Labels:|[ner]|
|Language:|pt|
|Size:|403.3 MB|
|Case sensitive:|true|
|Max sentence length:|128|

## References

Original texts available in https://paperswithcode.com/sota?task=Token+Classification&dataset=lener_br and in-house data augmentation with weak labelling
Loading

0 comments on commit 256b93c

Please sign in to comment.