From 0a84240517316c3e86d315ab274e927dc73be3d6 Mon Sep 17 00:00:00 2001 From: Juan Martinez <36634572+josejuanmartinez@users.noreply.github.com> Date: Mon, 1 May 2023 11:57:21 +0100 Subject: [PATCH] Legal NLP 1.12.0 (#180) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * 2023-04-16-legner_nda_remedies_en (#123) * Add model 2023-04-16-legner_nda_remedies_en * Update 2023-04-16-legner_nda_remedies_en.md --------- Co-authored-by: bunyamin-polat Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com> * 2023-04-19-legner_nda_return_of_conf_info_en (#132) * Add model 2023-04-19-legner_nda_return_of_conf_info_en * Update 2023-04-19-legner_nda_return_of_conf_info_en.md --------- Co-authored-by: bunyamin-polat Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com> * Add model 2023-04-20-legmulticlf_covid19_exceptions_italian_it (#135) Co-authored-by: Mary-Sci * 2023-04-21-leggen_flant5_base_en (#143) * Add model 2023-04-21-leggen_flant5_base_en * Update 2023-04-21-leggen_flant5_base_en.md --------- Co-authored-by: gadde5300 Co-authored-by: GADDE SAI SHAILESH <69344247+gadde5300@users.noreply.github.com> * 2023-04-24-legner_nda_req_discl_en (#146) * Add model 2023-04-24-legner_nda_req_discl_en * Update 2023-04-24-legner_nda_req_discl_en.md --------- Co-authored-by: bunyamin-polat Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com> * 2023-04-25-legner_greek_legislation_el (#148) * Add model 2023-04-25-legner_greek_legislation_el * Update 2023-04-25-legner_greek_legislation_el.md * Update 2023-04-25-legner_greek_legislation_el.md --------- Co-authored-by: bunyamin-polat Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com> * Add model 2023-04-26-legmulticlf_online_terms_of_service_english_en (#153) Co-authored-by: Mary-Sci * 2023-04-26-legner_mapa_bg (#155) * Add model 2023-04-26-legner_mapa_bg * Update 2023-04-26-legner_mapa_bg.md --------- Co-authored-by: bunyamin-polat Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com> * 2023-04-26-legner_mapa_da (#156) * Add model 2023-04-26-legner_mapa_da * Update 2023-04-26-legner_mapa_da.md --------- Co-authored-by: bunyamin-polat Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com> * 2023-04-27-legner_mapa_de (#159) * Add model 2023-04-27-legner_mapa_de * Update 2023-04-27-legner_mapa_de.md * Add model 2023-04-27-legner_mapa_el * Update 2023-04-27-legner_mapa_el.md --------- Co-authored-by: bunyamin-polat Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com> * 2023-04-27-legner_mapa_en (#160) * Add model 2023-04-27-legner_mapa_en * Update 2023-04-27-legner_mapa_en.md --------- Co-authored-by: bunyamin-polat Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com> * 2023-04-27-legner_mapa_es (#162) * Add model 2023-04-27-legner_mapa_es * Update 2023-04-27-legner_mapa_es.md --------- Co-authored-by: bunyamin-polat Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com> * 2023-04-27-legner_mapa_fr (#163) * Add model 2023-04-27-legner_mapa_fr * Update 2023-04-27-legner_mapa_fr.md * Add model 2023-04-27-legner_mapa_it * Update 2023-04-27-legner_mapa_it.md --------- Co-authored-by: bunyamin-polat Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com> * 2023-04-27-legner_mapa_lt (#166) * Add model 2023-04-27-legner_mapa_lt * Update 2023-04-27-legner_mapa_lt.md --------- Co-authored-by: bunyamin-polat Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com> * 2023-04-27-legner_mapa_nl (#167) * Add model 2023-04-27-legner_mapa_nl * Update 2023-04-27-legner_mapa_nl.md --------- Co-authored-by: bunyamin-polat Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com> * 2023-04-27-legner_mapa_pt (#169) * Add model 2023-04-27-legner_mapa_pt * Update 2023-04-27-legner_mapa_pt.md * Add model 2023-04-27-legner_mapa_ro * Update 2023-04-27-legner_mapa_ro.md --------- Co-authored-by: bunyamin-polat Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com> * 2023-04-28-legner_mapa_cs (#172) * Add model 2023-04-28-legner_mapa_cs * Update 2023-04-28-legner_mapa_cs.md * Add model 2023-04-28-legner_mapa_ga * Update 2023-04-28-legner_mapa_ga.md * Update 2023-04-28-legner_mapa_ga.md * Add model 2023-04-28-legner_mapa_fi * Update 2023-04-28-legner_mapa_fi.md * Add model 2023-04-28-legner_mapa_sk * Update 2023-04-28-legner_mapa_sk.md --------- Co-authored-by: bunyamin-polat Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com> * 2023-04-29-legpipe_alias_en (#176) * Add model 2023-04-29-legpipe_alias_en * Update 2023-04-29-legpipe_alias_en.md --------- Co-authored-by: bunyamin-polat Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com> * 2023-04-29-leggen_flant5_finetuned_en (#177) * Add model 2023-04-29-leggen_flant5_finetuned_en * Update 2023-04-29-leggen_flant5_finetuned_en.md --------- Co-authored-by: gadde5300 Co-authored-by: GADDE SAI SHAILESH <69344247+gadde5300@users.noreply.github.com> * Delete 2023-04-29-legpipe_alias_en.md * 2023-04-30-legpipe_alias_en (#178) * Add model 2023-04-30-legpipe_alias_en * Update 2023-04-30-legpipe_alias_en.md * Update 2023-04-30-legpipe_alias_en.md * Update 2023-04-30-legpipe_alias_en.md * Update 2023-04-30-legpipe_alias_en.md --------- Co-authored-by: bunyamin-polat Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com> Co-authored-by: Juan Martinez <36634572+josejuanmartinez@users.noreply.github.com> --------- Co-authored-by: jsl-models <74001263+jsl-models@users.noreply.github.com> Co-authored-by: bunyamin-polat Co-authored-by: Bünyamin Polat <78386903+bunyamin-polat@users.noreply.github.com> Co-authored-by: Mary-Sci Co-authored-by: gadde5300 Co-authored-by: GADDE SAI SHAILESH <69344247+gadde5300@users.noreply.github.com> --- ...gmulticlf_covid19_exceptions_italian_it.md | 126 ++++++++++++++++ ...iclf_online_terms_of_service_english_en.md | 131 +++++++++++++++++ .../2023-04-16-legner_nda_remedies_en.md | 126 ++++++++++++++++ ...04-19-legner_nda_return_of_conf_info_en.md | 130 +++++++++++++++++ .../2023-04-24-legner_nda_req_discl_en.md | 138 ++++++++++++++++++ .../2023-04-25-legner_greek_legislation_el.md | 135 +++++++++++++++++ .../2023-04-26-legner_mapa_bg.md | 136 +++++++++++++++++ .../2023-04-26-legner_mapa_da.md | 132 +++++++++++++++++ .../2023-04-27-legner_mapa_de.md | 132 +++++++++++++++++ .../2023-04-27-legner_mapa_el.md | 132 +++++++++++++++++ .../2023-04-27-legner_mapa_en.md | 130 +++++++++++++++++ .../2023-04-27-legner_mapa_es.md | 129 ++++++++++++++++ .../2023-04-27-legner_mapa_fr.md | 130 +++++++++++++++++ .../2023-04-27-legner_mapa_it.md | 131 +++++++++++++++++ .../2023-04-27-legner_mapa_lt.md | 132 +++++++++++++++++ .../2023-04-27-legner_mapa_nl.md | 131 +++++++++++++++++ .../2023-04-27-legner_mapa_pt.md | 131 +++++++++++++++++ .../2023-04-27-legner_mapa_ro.md | 130 +++++++++++++++++ .../2023-04-28-legner_mapa_cs.md | 132 +++++++++++++++++ .../2023-04-28-legner_mapa_fi.md | 131 +++++++++++++++++ .../2023-04-28-legner_mapa_ga.md | 131 +++++++++++++++++ .../2023-04-28-legner_mapa_sk.md | 131 +++++++++++++++++ .../2023-04-30-legpipe_alias_en.md | 75 ++++++++++ .../2023-04-21-leggen_flant5_base_en.md | 81 ++++++++++ .../2023-04-29-leggen_flant5_finetuned_en.md | 87 +++++++++++ 25 files changed, 3130 insertions(+) create mode 100644 docs/_posts/Mary-Sci/2023-04-20-legmulticlf_covid19_exceptions_italian_it.md create mode 100644 docs/_posts/Mary-Sci/2023-04-26-legmulticlf_online_terms_of_service_english_en.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-16-legner_nda_remedies_en.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-19-legner_nda_return_of_conf_info_en.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-24-legner_nda_req_discl_en.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-25-legner_greek_legislation_el.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-26-legner_mapa_bg.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-26-legner_mapa_da.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_de.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_el.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_en.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_es.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_fr.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_it.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_lt.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_nl.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_pt.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_ro.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-28-legner_mapa_cs.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-28-legner_mapa_fi.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-28-legner_mapa_ga.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-28-legner_mapa_sk.md create mode 100644 docs/_posts/bunyamin-polat/2023-04-30-legpipe_alias_en.md create mode 100644 docs/_posts/gadde5300/2023-04-21-leggen_flant5_base_en.md create mode 100644 docs/_posts/gadde5300/2023-04-29-leggen_flant5_finetuned_en.md diff --git a/docs/_posts/Mary-Sci/2023-04-20-legmulticlf_covid19_exceptions_italian_it.md b/docs/_posts/Mary-Sci/2023-04-20-legmulticlf_covid19_exceptions_italian_it.md new file mode 100644 index 0000000000..1db9053349 --- /dev/null +++ b/docs/_posts/Mary-Sci/2023-04-20-legmulticlf_covid19_exceptions_italian_it.md @@ -0,0 +1,126 @@ +--- +layout: model +title: Legal Multilabel Classifier on Covid-19 Exceptions (Italian) +author: John Snow Labs +name: legmulticlf_covid19_exceptions_italian +date: 2023-04-20 +tags: [it, licensed, legal, multilabel, classification, tensorflow] +task: Text Classification +language: it +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MultiClassifierDLModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This is the Multi-Label Text Classification model that can be used to identify up to 5 classes to facilitate analysis, discovery, and comparison of legal texts in Italian related to COVID-19 exception measures. The classes are as follows: + + - Closures/lockdown + - Government_oversight + - Restrictions_of_daily_liberties + - Restrictions_of_fundamental_rights_and_civil_liberties + - State_of_Emergency + +## Predicted Entities + +`Closures/lockdown`, `Government_oversight`, `Restrictions_of_daily_liberties`, `Restrictions_of_fundamental_rights_and_civil_liberties`, `State_of_Emergency` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legmulticlf_covid19_exceptions_italian_it_1.0.0_3.0_1681985472330.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legmulticlf_covid19_exceptions_italian_it_1.0.0_3.0_1681985472330.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = nlp.DocumentAssembler() \ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["document"]) \ + .setOutputCol("token") + +embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_bert_base_italian_xxl_cased", "it") \ + .setInputCols(["document", "token"])\ + .setOutputCol("embeddings") + +embeddingsSentence = nlp.SentenceEmbeddings() \ + .setInputCols(["document", "embeddings"])\ + .setOutputCol("sentence_embeddings")\ + .setPoolingStrategy("AVERAGE") + +multilabelClfModel = nlp.MultiClassifierDLModel.pretrained('legmulticlf_covid19_exceptions_italian', 'it', "legal/models") \ + .setInputCols(["sentence_embeddings"])\ + .setOutputCol("class") + +clf_pipeline = nlp.Pipeline( + stages=[document_assembler, + tokenizer, + embeddings, + embeddingsSentence, + multilabelClfModel]) + +df = spark.createDataFrame([["Al di fuori di tale ultima ipotesi, secondo le raccomandazioni impartite dal Ministero della salute, occorre provvedere ad assicurare la corretta applicazione di misure preventive quali lavare frequentemente le mani con acqua e detergenti comuni."]]).toDF("text") + +model = clf_pipeline.fit(df) +result = model.transform(df) + +result.select("text", "class.result").show(truncate=False) +``` + +
+ +## Results + +```bash ++------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+ +|text |result | ++------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+ +|Al di fuori di tale ultima ipotesi, secondo le raccomandazioni impartite dal Ministero della salute, occorre provvedere ad assicurare la corretta applicazione di misure preventive quali lavare frequentemente le mani con acqua e detergenti comuni.|[Restrictions_of_daily_liberties]| ++------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legmulticlf_covid19_exceptions_italian| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence_embeddings]| +|Output Labels:|[class]| +|Language:|it| +|Size:|13.9 MB| + +## References + +Train dataset available [here](https://huggingface.co/datasets/joelito/covid19_emergency_event) + +## Benchmarking + +```bash +label precision recall f1-score support +Closures/lockdown 0.88 0.94 0.91 47 +Government_oversight 1.00 0.50 0.67 4 +Restrictions_of_daily_liberties 0.88 0.79 0.83 28 +Restrictions_of_fundamental_rights_and_civil_liberties 0.62 0.62 0.62 16 +State_of_Emergency 0.67 1.00 0.80 6 +micro-avg 0.82 0.83 0.83 101 +macro-avg 0.81 0.77 0.77 101 +weighted-avg 0.83 0.83 0.83 101 +samples-avg 0.81 0.84 0.81 101 +``` \ No newline at end of file diff --git a/docs/_posts/Mary-Sci/2023-04-26-legmulticlf_online_terms_of_service_english_en.md b/docs/_posts/Mary-Sci/2023-04-26-legmulticlf_online_terms_of_service_english_en.md new file mode 100644 index 0000000000..e41992818a --- /dev/null +++ b/docs/_posts/Mary-Sci/2023-04-26-legmulticlf_online_terms_of_service_english_en.md @@ -0,0 +1,131 @@ +--- +layout: model +title: Legal Multilabel Classifier on Online Terms of Service +author: John Snow Labs +name: legmulticlf_online_terms_of_service_english +date: 2023-04-26 +tags: [en, licensed, multilabel, classification, legal, tensorflow] +task: Text Classification +language: en +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MultiClassifierDLModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This is the Multi-Label Text Classification model that can be used to identify potentially unfair clauses in online Terms of Service. The classes are as follows: + + - Arbitration + - Choice_of_law + - Content_removal + - Jurisdiction + - Limitation_of_liability + - Other + - Unilateral_change + - Unilateral_termination + +## Predicted Entities + +`Arbitration`, `Choice_of_law`, `Content_removal`, `Jurisdiction`, `Limitation_of_liability`, `Other`, `Unilateral_change`, `Unilateral_termination` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legmulticlf_online_terms_of_service_english_en_1.0.0_3.0_1682519205970.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legmulticlf_online_terms_of_service_english_en_1.0.0_3.0_1682519205970.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = nlp.DocumentAssembler() \ + .setInputCol('text')\ + .setOutputCol('document') + +tokenizer = nlp.Tokenizer() \ + .setInputCols(['document'])\ + .setOutputCol('token') + +embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base", "en") \ + .setInputCols(['document', 'token'])\ + .setOutputCol("embeddings") + +embeddingsSentence = nlp.SentenceEmbeddings() \ + .setInputCols(['document', 'embeddings'])\ + .setOutputCol('sentence_embeddings')\ + .setPoolingStrategy('AVERAGE') + +classifierdl = nlp.MultiClassifierDLModel.pretrained('legmulticlf_online_terms_of_service_english', 'en', 'legal/models') + .setInputCols(["sentence_embeddings"])\ + .setOutputCol("class") + +clf_pipeline = nlp.Pipeline(stages=[document_assembler, + tokenizer, + embeddings, + embeddingsSentence, + classifierdl]) + +df = spark.createDataFrame([["We are not responsible or liable for (and have no obligation to verify) any wrong or misspelled email address or inaccurate or wrong (mobile) phone number or credit card number."]]).toDF("text") + +model = clf_pipeline.fit(df) +result = model.transform(df) + +result.select("text", "class.result").show(truncate=False) +``` + +
+ +## Results + +```bash ++---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+ +|sentence |result | ++---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+ +|We are not responsible or liable for (and have no obligation to verify) any wrong or misspelled email address or inaccurate or wrong (mobile) phone number or credit card number.|[Limitation_of_liability]| ++---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legmulticlf_online_terms_of_service_english| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence_embeddings]| +|Output Labels:|[class]| +|Language:|en| +|Size:|13.9 MB| + +## References + +Train dataset available [here](https://huggingface.co/datasets/joelito/online_terms_of_service) + +## Benchmarking + +```bash +label precision recall f1-score support +Arbitration 1.00 0.50 0.67 4 +Choice_of_law 0.67 0.67 0.67 3 +Content_removal 1.00 0.67 0.80 3 +Jurisdiction 0.80 1.00 0.89 4 +Limitation_of_liability 0.73 0.73 0.73 15 +Other 0.86 0.89 0.88 28 +Unilateral_change 0.86 1.00 0.92 6 +Unilateral_termination 1.00 0.80 0.89 5 +micro-avg 0.84 0.82 0.83 68 +macro-avg 0.86 0.78 0.81 68 +weighted-avg 0.85 0.82 0.83 68 +samples-avg 0.80 0.82 0.81 68 +``` \ No newline at end of file diff --git a/docs/_posts/bunyamin-polat/2023-04-16-legner_nda_remedies_en.md b/docs/_posts/bunyamin-polat/2023-04-16-legner_nda_remedies_en.md new file mode 100644 index 0000000000..6760245b72 --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-16-legner_nda_remedies_en.md @@ -0,0 +1,126 @@ +--- +layout: model +title: Legal NER for NDA (Remedies Clauses) +author: John Snow Labs +name: legner_nda_remedies +date: 2023-04-16 +tags: [en, licensed, ner, legal, nda, remedies] +task: Named Entity Recognition +language: en +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This is a NER model, aimed to be run **only** after detecting the `REMEDIES` clause with a proper classifier (use `legmulticlf_mnda_sections_paragraph_other` for that purpose). It will extract the following entities: `CURRENCY`, `NUMERIC_REMEDY`, and `REMEDY_TYPE`. + +## Predicted Entities + +`CURRENCY`, `NUMERIC_REMEDY`, `REMEDY_TYPE` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_nda_remedies_en_1.0.0_3.0_1681687124993.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_nda_remedies_en_1.0.0_3.0_1681687124993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = nlp.SentenceDetector()\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("embeddings")\ + .setMaxSentenceLength(512)\ + .setCaseSensitive(True) + +ner_model = legal.NerModel.pretrained("legner_nda_remedies", "en", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""The breaching party shall pay the non-breaching party liquidated damages of $ 1,000 per day for each day that the breach of this NDA continues."""] + +result = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++------------------+--------------+ +|chunk |ner_label | ++------------------+--------------+ +|liquidated damages|REMEDY_TYPE | +|$ |CURRENCY | +|1,000 |NUMERIC_REMEDY| ++------------------+--------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_nda_remedies| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|16.3 MB| + +## References + +In-house annotations on the Non-disclosure Agreements + +## Benchmarking + +```bash +label precision recall f1-score support +CURRENCY 1.00 1.00 1.00 11 +NUMERIC_REMEDY 1.00 1.00 1.00 11 +REMEDY_TYPE 0.86 0.94 0.90 32 +micro-avg 0.91 0.96 0.94 54 +macro-avg 0.95 0.98 0.97 54 +weighted-avg 0.92 0.96 0.94 54 +``` diff --git a/docs/_posts/bunyamin-polat/2023-04-19-legner_nda_return_of_conf_info_en.md b/docs/_posts/bunyamin-polat/2023-04-19-legner_nda_return_of_conf_info_en.md new file mode 100644 index 0000000000..68bcf67c99 --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-19-legner_nda_return_of_conf_info_en.md @@ -0,0 +1,130 @@ +--- +layout: model +title: Legal NER for NDA (Return of Confidential Information Clauses) +author: John Snow Labs +name: legner_nda_return_of_conf_info +date: 2023-04-19 +tags: [en, legal, licensed, ner, nda] +task: Named Entity Recognition +language: en +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This is a NER model, aimed to be run **only** after detecting the `RETURN_OF_CONF_INFO` clause with a proper classifier (use `legmulticlf_mnda_sections_paragraph_other` model for that purpose). It will extract the following entities: `ARCHIVAL_PURPOSE`, and `LEGAL_PURPOSE`. + +## Predicted Entities + +`ARCHIVAL_PURPOSE`, `LEGAL_PURPOSE` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_nda_return_of_conf_info_en_1.0.0_3.0_1681936414470.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_nda_return_of_conf_info_en_1.0.0_3.0_1681936414470.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = nlp.SentenceDetector()\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("embeddings")\ + .setMaxSentenceLength(512)\ + .setCaseSensitive(True) + +ner_model = legal.NerModel.pretrained("legner_nda_return_of_conf_info", "en", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner_model, + ner_converter +]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""Notwithstanding the foregoing, the Recipient and its Representatives may retain copies of the Confidential Information to the extent that such retention is required to demonstrate compliance with applicable law or governmental rule or regulation, to the extent included in any board or executive documents relating to the proposed business relationship, and in its archives for backup purposes subject to the confidentiality provisions of this Agreement."""] + +result = model.transform(spark.createDataFrame([text]).toDF("text")) + + +``` + +
+ +## Results + +```bash ++--------------+----------------+ +|chunk |ner_label | ++--------------+----------------+ +|applicable law|LEGAL_PURPOSE | +|governmental |LEGAL_PURPOSE | +|regulation |LEGAL_PURPOSE | +|archives |ARCHIVAL_PURPOSE| +|backup |ARCHIVAL_PURPOSE| ++--------------+----------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_nda_return_of_conf_info| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|16.3 MB| + +## References + +In-house annotations on the Non-disclosure Agreements + +## Benchmarking + +```bash +label precision recall f1-score support +ARCHIVAL_PURPOSE 0.94 1.00 0.97 16 +LEGAL_PURPOSE 0.78 0.85 0.81 33 +micro-avg 0.83 0.90 0.86 49 +macro-avg 0.86 0.92 0.89 49 +weighted-avg 0.83 0.90 0.86 49 +``` diff --git a/docs/_posts/bunyamin-polat/2023-04-24-legner_nda_req_discl_en.md b/docs/_posts/bunyamin-polat/2023-04-24-legner_nda_req_discl_en.md new file mode 100644 index 0000000000..c91961336c --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-24-legner_nda_req_discl_en.md @@ -0,0 +1,138 @@ +--- +layout: model +title: Legal NER for NDA (Required Disclosure Clauses) +author: John Snow Labs +name: legner_nda_req_discl +date: 2023-04-24 +tags: [en, legal, licensed, ner, nda, disclosure] +task: Named Entity Recognition +language: en +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This is a NER model, aimed to be run **only** after detecting the `REQ_DISCL` clause with a proper classifier (use `legmulticlf_mnda_sections_paragraph_other` model for that purpose). It will extract the following entities: `DISCLOSURE_BASIS`, `REQ_DISCLOSURE_CONFID`, `REQ_DISCLOSURE_COOPERATION`, `REQ_DISCLOSURE_LEGAL`, `REQ_DISCLOSURE_NOTICE`, `REQ_DISCLOSURE_PARTY`, `REQ_DISCLOSURE_REMEDY`, and `REQ_OBLIGATION_ACTION`. + +## Predicted Entities + +`DISCLOSURE_BASIS`, `REQ_DISCLOSURE_CONFID`, `REQ_DISCLOSURE_COOPERATION`, `REQ_DISCLOSURE_LEGAL`, `REQ_DISCLOSURE_NOTICE`, `REQ_DISCLOSURE_PARTY`, `REQ_DISCLOSURE_REMEDY`, `REQ_OBLIGATION_ACTION` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_nda_req_discl_en_1.0.0_3.0_1682327765264.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_nda_req_discl_en_1.0.0_3.0_1682327765264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = nlp.SentenceDetector()\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("embeddings")\ + .setMaxSentenceLength(512)\ + .setCaseSensitive(True) + +ner_model = legal.NerModel.pretrained("legner_nda_req_discl", "en", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner_model, + ner_converter +]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""If the Discloser waives the Recipient’s compliance with the agreement or fails to obtain a protective order or other appropriate remedies, the Recipient will furnish only that portion of the Confidential Information that is legally required to be disclosed and will use its best efforts to obtain confidential treatment for such Confidential Information."""] + +result = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++----------------------+--------------------------+ +|chunk |ner_label | ++----------------------+--------------------------+ +|Discloser |REQ_DISCLOSURE_PARTY | +|obtain |REQ_OBLIGATION_ACTION | +|protective order |REQ_DISCLOSURE_REMEDY | +|appropriate remedies |REQ_DISCLOSURE_REMEDY | +|furnish |REQ_OBLIGATION_ACTION | +|legally required |REQ_DISCLOSURE_LEGAL | +|best efforts |REQ_DISCLOSURE_COOPERATION| +|obtain |REQ_OBLIGATION_ACTION | +|confidential treatment|REQ_DISCLOSURE_CONFID | ++----------------------+--------------------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_nda_req_discl| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|16.3 MB| + +## References + +In-house annotations on the Non-disclosure Agreements + +## Benchmarking + +```bash +label precision recall f1-score support +DISCLOSURE_BASIS 0.77 0.70 0.73 57 +REQ_DISCLOSURE_CONFID 0.96 0.93 0.95 29 +REQ_DISCLOSURE_COOPERATION 1.00 0.94 0.97 17 +REQ_DISCLOSURE_LEGAL 0.93 0.77 0.84 35 +REQ_DISCLOSURE_NOTICE 0.89 0.89 0.89 19 +REQ_DISCLOSURE_PARTY 1.00 0.89 0.94 38 +REQ_DISCLOSURE_REMEDY 1.00 1.00 1.00 52 +REQ_OBLIGATION_ACTION 0.95 0.86 0.90 121 +macro-avg 0.94 0.86 0.90 368 +macro-avg 0.94 0.87 0.90 368 +weighted-avg 0.93 0.86 0.90 368 +``` diff --git a/docs/_posts/bunyamin-polat/2023-04-25-legner_greek_legislation_el.md b/docs/_posts/bunyamin-polat/2023-04-25-legner_greek_legislation_el.md new file mode 100644 index 0000000000..f5c8df4b70 --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-25-legner_greek_legislation_el.md @@ -0,0 +1,135 @@ +--- +layout: model +title: Legal NER in Greek Legislations +author: John Snow Labs +name: legner_greek_legislation +date: 2023-04-25 +tags: [el, legal, ner, licensed, legislation] +task: Named Entity Recognition +language: el +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This Legal NER model extracts the following entities from the Greek legislations: + +- `FACILITY`: Facilities, such as police stations, departments, etc. +- `GPE`: Geopolitical Entity; any reference to a geopolitical entity (e.g., country, city, Greek administrative unit, etc.) +- `LEG_REF`: Legislation Reference; any reference to Greek or European legislation +- `ORG`: Organization; any reference to a public or private organization +- `PER`: Any formal name of a person mentioned in the text +- `PUBLIC_DOC`: Public Document Reference + +## Predicted Entities + +`FACILITY`, `GPE`, `LEG_REF`, `PUBLIC_DOC`, `PER`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_greek_legislation_el_1.0.0_3.0_1682420832367.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_greek_legislation_el_1.0.0_3.0_1682420832367.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["document"])\ + .setOutputCol("token") + +embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_base_el_cased","el")\ + .setInputCols(["document", "token"])\ + .setOutputCol("embeddings")\ + .setMaxSentenceLength(512)\ + .setCaseSensitive(True) + +ner_model = legal.NerModel.pretrained("legner_greek_legislation", "el", "legal/models")\ + .setInputCols(["document", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text_list = ["""3 του άρθρου 5 του ν. 3148/2003, όπως ισχύει, αντικαθίσταται ως εξής""", + """1 του άρθρου 1 ασκούνται πλέον από την ΕΥΔΕ/ΕΣΕΑ μέσα σε δύο μήνες από την έναρξη ισχύος του παρόντος Διατάγματος.""", + """Ο Πρόεδρος της Επιτροπής και τα τέσσερα μέλη με ισάριθμα αναπληρωματικά εκλέγονται μεταξύ των δημοτών του Δήμου Κυθήρων.""", + """Τη με αριθ. 117/Σ.10η/25 Ιουλ 2016 γνωμοδότηση του Ανωτάτου Στρατιωτικού Συμβουλίου."""] + +result = model.transform(spark.createDataFrame(pd.DataFrame({"text" : text_list}))) +``` + +
+ +## Results + +```bash ++----------------------------------------+----------+ +|chunk |ner_label | ++----------------------------------------+----------+ +|ν. 3148/2003 |LEG_REF | +|ΕΥΔΕ/ΕΣΕΑ |ORG | +|Δήμου Κυθήρων |GPE | +|αριθ. 117/Σ.10η/25 Ιουλ 2016 γνωμοδότηση|PUBLIC_DOC| ++----------------------------------------+----------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_greek_legislation| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|el| +|Size:|16.4 MB| + +## References + +In-house annotations + +## Benchmarking + +```bash +label precision recall f1-score support +FACILITY 0.94 0.80 0.86 64 +GPE 0.77 0.83 0.80 136 +LEG_REF 0.94 0.90 0.92 93 +ORG 0.85 0.74 0.79 173 +PER 0.72 0.71 0.71 58 +PUBLIC_DOC 0.76 0.82 0.79 39 +micro-avg 0.83 0.80 0.81 563 +macro-avg 0.83 0.80 0.81 563 +weighted-avg 0.84 0.80 0.82 563 +``` diff --git a/docs/_posts/bunyamin-polat/2023-04-26-legner_mapa_bg.md b/docs/_posts/bunyamin-polat/2023-04-26-legner_mapa_bg.md new file mode 100644 index 0000000000..1588040393 --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-26-legner_mapa_bg.md @@ -0,0 +1,136 @@ +--- +layout: model +title: Legal NER for MAPA(Multilingual Anonymisation for Public Administrations) +author: John Snow Labs +name: legner_mapa +date: 2023-04-26 +tags: [bg, licensed, ner, legal, mapa] +task: Named Entity Recognition +language: bg +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The dataset consists of 12 documents taken from EUR-Lex, a multilingual corpus of court decisions and legal dispositions in the 24 official languages of the European Union. + +This model extracts `ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, and `PERSON` entities from `Bulgarian` documents. + +## Predicted Entities + +`ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, `PERSON` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_mapa_bg_1.0.0_3.0_1682548782666.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_mapa_bg_1.0.0_3.0_1682548782666.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_base_bg_cased", "bg")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings")\ + .setMaxSentenceLength(512)\ + .setCaseSensitive(True) + +ner_model = legal.NerModel.pretrained("legner_mapa", "bg", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""7 В окончателно решение № 1072 на Curtea de Apel București ( Апелативен съд Букурещ, Румъния ), 3-то гражданско отделение за малолетни и непълнолетни лица и семейноправни въпроси, от 12 юни 2013г., което е приложено към акта за преюдициално запитване и представено от г‑н Liberato, се уточнява, че„ [с] ъдът приема, че страните са сключили брак в Италия през октомври 2005 г. и до октомври 2006 г. са живели ту в Румъния, ту в Италия."""] + +result = model.transform(spark.createDataFrame([text]).toDF("text")) + +``` + +
+ +## Results + +```bash ++----------------+---------+ +|chunk |ner_label| ++----------------+---------+ +|Букурещ, Румъния|ADDRESS | +|12 юни 2013г., |DATE | +|г‑н Liberato |PERSON | +|Италия |ADDRESS | +|октомври 2005 г.|DATE | +|октомври 2006 г.|DATE | +|Румъния |ADDRESS | +|Италия |ADDRESS | ++----------------+---------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_mapa| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|bg| +|Size:|1.4 MB| + +## References + +The dataset is available [here](https://huggingface.co/datasets/joelito/mapa). + +## Benchmarking + +```bash +label precision recall f1-score support +ADDRESS 0.86 0.75 0.80 8 +AMOUNT 1.00 0.64 0.78 11 +DATE 0.97 0.97 0.97 65 +ORGANISATION 0.81 0.86 0.83 35 +PERSON 0.87 0.84 0.85 56 +macro-avg 0.90 0.87 0.89 175 +macro-avg 0.90 0.81 0.85 175 +weighted-avg 0.90 0.87 0.89 175 +``` diff --git a/docs/_posts/bunyamin-polat/2023-04-26-legner_mapa_da.md b/docs/_posts/bunyamin-polat/2023-04-26-legner_mapa_da.md new file mode 100644 index 0000000000..f580ab5627 --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-26-legner_mapa_da.md @@ -0,0 +1,132 @@ +--- +layout: model +title: Legal NER for MAPA(Multilingual Anonymisation for Public Administrations) +author: John Snow Labs +name: legner_mapa +date: 2023-04-26 +tags: [da, legal, ner, licensed, mapa] +task: Named Entity Recognition +language: da +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The dataset consists of 12 documents taken from EUR-Lex, a multilingual corpus of court decisions and legal dispositions in the 24 official languages of the European Union. + +This model extracts `ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, and `PERSON` entities from `Danish` documents. + +## Predicted Entities + +`ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, `PERSON` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_mapa_da_1.0.0_3.0_1682551046131.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_mapa_da_1.0.0_3.0_1682551046131.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_base_da_cased", "da")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings")\ + .setMaxSentenceLength(512)\ + .setCaseSensitive(True) + +ner_model = legal.NerModel.pretrained("legner_mapa", "da", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""Fra den 1. februar 2012 til den 31. januar 2014, og således også under den omtvistede periode, blev arbejdstagere hos Martimpex udsendt til Østrig for at udføre det samme arbejde."""] + +result = model.transform(spark.createDataFrame([text]).toDF("text")) + +``` + +
+ +## Results + +```bash ++---------------+------------+ +|chunk |ner_label | ++---------------+------------+ +|1. februar 2012|DATE | +|31. januar 2014|DATE | +|Martimpex |ORGANISATION| +|Østrig |ADDRESS | ++---------------+------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_mapa| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|da| +|Size:|1.4 MB| + +## References + +The dataset is available [here](https://huggingface.co/datasets/joelito/mapa). + +## Benchmarking + +```bash +label precision recall f1-score support +ADDRESS 0.95 0.90 0.93 21 +AMOUNT 1.00 1.00 1.00 4 +DATE 0.98 0.98 0.98 54 +ORGANISATION 0.74 0.74 0.74 31 +PERSON 0.79 0.86 0.82 43 +macro-avg 0.87 0.89 0.88 153 +macro-avg 0.89 0.90 0.89 153 +weighted-avg 0.87 0.89 0.88 153 +``` diff --git a/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_de.md b/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_de.md new file mode 100644 index 0000000000..07f89c1c1e --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_de.md @@ -0,0 +1,132 @@ +--- +layout: model +title: Legal NER for MAPA(Multilingual Anonymisation for Public Administrations) +author: John Snow Labs +name: legner_mapa +date: 2023-04-27 +tags: [de, ner, legal, licensed, mapa] +task: Named Entity Recognition +language: de +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The dataset consists of 12 documents taken from EUR-Lex, a multilingual corpus of court decisions and legal dispositions in the 24 official languages of the European Union. + +This model extracts `ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, and `PERSON` entities from `German` documents. + +## Predicted Entities + +`ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, `PERSON` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_mapa_de_1.0.0_3.0_1682589773968.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_mapa_de_1.0.0_3.0_1682589773968.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_base_de_cased", "de")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings")\ + .setMaxSentenceLength(512)\ + .setCaseSensitive(True) + +ner_model = legal.NerModel.pretrained("legner_mapa", "de", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""Herr Liberato und Frau Grigorescu heirateten am 22 Oktober 2005 in Rom (Italien) und lebten in diesem Mitgliedstaat bis zur Geburt ihres Kindes am 20 Februar 2006 zusammen."""] + +result = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++----------------+---------+ +|chunk |ner_label| ++----------------+---------+ +|Herr Liberato |PERSON | +|Frau Grigorescu |PERSON | +|22 Oktober 2005|DATE | +|Rom (Italien) |ADDRESS | +|20 Februar 2006 |DATE | ++----------------+---------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_mapa| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|1.4 MB| + +## References + +The dataset is available [here](https://huggingface.co/datasets/joelito/mapa). + +## Benchmarking + +```bash +label precision recall f1-score support +ADDRESS 0.69 0.85 0.76 13 +AMOUNT 1.00 0.75 0.86 4 +DATE 0.92 0.93 0.93 61 +ORGANISATION 0.64 0.77 0.70 30 +PERSON 0.85 0.87 0.86 46 +macro-avg 0.82 0.87 0.84 154 +macro-avg 0.82 0.83 0.82 154 +weighted-avg 0.83 0.87 0.85 154 +``` diff --git a/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_el.md b/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_el.md new file mode 100644 index 0000000000..6f10765afe --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_el.md @@ -0,0 +1,132 @@ +--- +layout: model +title: Legal NER for MAPA(Multilingual Anonymisation for Public Administrations) +author: John Snow Labs +name: legner_mapa +date: 2023-04-27 +tags: [el, ner, legal, mapa, licensed] +task: Named Entity Recognition +language: el +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The dataset consists of 12 documents taken from EUR-Lex, a multilingual corpus of court decisions and legal dispositions in the 24 official languages of the European Union. + +This model extracts `ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, and `PERSON` entities from `Greek` documents. + +## Predicted Entities + +`ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, `PERSON` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_mapa_el_1.0.0_3.0_1682590655353.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_mapa_el_1.0.0_3.0_1682590655353.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_base_el_cased", "el")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings")\ + .setMaxSentenceLength(512)\ + .setCaseSensitive(True) + +ner_model = legal.NerModel.pretrained("legner_mapa", "el", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""86 Στην υπόθεση της κύριας δίκης, προκύπτει ότι ορισμένοι εργαζόμενοι της Martin‑Meat αποσπάσθηκαν στην Αυστρία κατά την περίοδο μεταξύ του έτους 2007 και του έτους 2012, για την εκτέλεση εργασιών τεμαχισμού κρέατος σε εγκαταστάσεις της Alpenrind."""] + +result = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++-----------+------------+ +|chunk |ner_label | ++-----------+------------+ +|Martin‑Meat|ORGANISATION| +|Αυστρία |ADDRESS | +|2007 |DATE | +|2012 |DATE | +|Alpenrind |ORGANISATION| ++-----------+------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_mapa| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|el| +|Size:|16.4 MB| + +## References + +The dataset is available [here](https://huggingface.co/datasets/joelito/mapa). + +## Benchmarking + +```bash +label precision recall f1-score support +ADDRESS 0.89 1.00 0.94 16 +AMOUNT 0.82 0.75 0.78 12 +DATE 0.98 0.98 0.98 65 +ORGANISATION 0.85 0.85 0.85 40 +PERSON 0.90 0.95 0.92 38 +macro-avg 0.91 0.93 0.92 171 +macro-avg 0.89 0.91 0.90 171 +weighted-avg 0.91 0.93 0.92 171 +``` diff --git a/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_en.md b/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_en.md new file mode 100644 index 0000000000..329909789d --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_en.md @@ -0,0 +1,130 @@ +--- +layout: model +title: Legal NER for MAPA(Multilingual Anonymisation for Public Administrations) +author: John Snow Labs +name: legner_mapa +date: 2023-04-27 +tags: [en, legal, ner, mapa, licensed] +task: Named Entity Recognition +language: en +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The dataset consists of 12 documents taken from EUR-Lex, a multilingual corpus of court decisions and legal dispositions in the 24 official languages of the European Union. + +This model extracts `ADDRESS`, `DATE`, `ORGANISATION`, and `PERSON` entities from `English` documents. + +## Predicted Entities + +`ADDRESS`, `DATE`, `ORGANISATION`, `PERSON` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_mapa_en_1.0.0_3.0_1682592120053.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_mapa_en_1.0.0_3.0_1682592120053.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_base_en_cased", "en")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings")\ + .setMaxSentenceLength(512)\ + .setCaseSensitive(True) + +ner_model = legal.NerModel.pretrained("legner_mapa", "en", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""From 1 February 2012 until 31 January 2014, thus including the period concerned, Martimpex's workers were posted to Austria to perform the same work."""] + +result = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++---------------+------------+ +|chunk |ner_label | ++---------------+------------+ +|1 February 2012|DATE | +|31 January 2014|DATE | +|Martimpex's |ORGANISATION| +|Austria |ADDRESS | ++---------------+------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_mapa| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.4 MB| + +## References + +The dataset is available [here](https://huggingface.co/datasets/joelito/mapa). + +## Benchmarking + +```bash +label precision recall f1-score support +ADDRESS 1.00 1.00 1.00 5 +DATE 0.98 1.00 0.99 40 +ORGANISATION 0.83 0.71 0.77 14 +PERSON 0.98 0.85 0.91 48 +macro-avg 0.96 0.90 0.93 107 +macro-avg 0.95 0.89 0.92 107 +weighted-avg 0.96 0.90 0.93 107 +``` diff --git a/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_es.md b/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_es.md new file mode 100644 index 0000000000..46b2606155 --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_es.md @@ -0,0 +1,129 @@ +--- +layout: model +title: Legal NER for MAPA(Multilingual Anonymisation for Public Administrations) +author: John Snow Labs +name: legner_mapa +date: 2023-04-27 +tags: [es, licensed, legal, ner, mapa] +task: Named Entity Recognition +language: es +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The dataset consists of 12 documents taken from EUR-Lex, a multilingual corpus of court decisions and legal dispositions in the 24 official languages of the European Union. + +This model extracts `ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, and `PERSON` entities from `Spanish` documents. + +## Predicted Entities + +`ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, `PERSON` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_mapa_es_1.0.0_3.0_1682593085140.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_mapa_es_1.0.0_3.0_1682593085140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_base_es_cased", "es")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings")\ + .setMaxSentenceLength(512)\ + .setCaseSensitive(True) + +ner_model = legal.NerModel.pretrained("legner_mapa", "es", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""Heiko Jonny Maniero , de nacionalidad italiana , nació y reside en Alemania."""] + +result = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++-------------------+---------+ +|chunk |ner_label| ++-------------------+---------+ +|Heiko Jonny Maniero|PERSON | +|Alemania |ADDRESS | ++-------------------+---------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_mapa| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|1.4 MB| + +## References + +The dataset is available [here](https://huggingface.co/datasets/joelito/mapa). + +## Benchmarking + +```bash +label precision recall f1-score support +ADDRESS 1.00 0.86 0.92 7 +AMOUNT 1.00 1.00 1.00 1 +DATE 1.00 0.92 0.96 24 +ORGANISATION 0.83 0.71 0.77 7 +PERSON 0.75 0.71 0.73 17 +macro-avg 0.90 0.82 0.86 56 +macro-avg 0.92 0.84 0.88 56 +weighted-avg 0.90 0.82 0.86 56 +``` diff --git a/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_fr.md b/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_fr.md new file mode 100644 index 0000000000..7e85c42c72 --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_fr.md @@ -0,0 +1,130 @@ +--- +layout: model +title: Legal NER for MAPA(Multilingual Anonymisation for Public Administrations) +author: John Snow Labs +name: legner_mapa +date: 2023-04-27 +tags: [fr, ner, licensed, legal, mapa] +task: Named Entity Recognition +language: fr +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The dataset consists of 12 documents taken from EUR-Lex, a multilingual corpus of court decisions and legal dispositions in the 24 official languages of the European Union. + +This model extracts `ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, and `PERSON` entities from `French` documents. + +## Predicted Entities + +`ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, `PERSON` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_mapa_fr_1.0.0_3.0_1682596162755.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_mapa_fr_1.0.0_3.0_1682596162755.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_base_fr_cased", "fr")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings")\ + .setMaxSentenceLength(512)\ + .setCaseSensitive(True) + +ner_model = legal.NerModel.pretrained("legner_mapa", "fr", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""Heeren, administrateur, vu la phase écrite de la procédure et à la suite de l’audience du 28 novembre 2017, rend le présent Arrêt Antécédents du litige 1 La requérante, Foshan Lihua Ceramic Co."""] + +result = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++-----------------------+------------+ +|chunk |ner_label | ++-----------------------+------------+ +|Heeren |PERSON | +|28 novembre 2017 |DATE | +|Foshan Lihua Ceramic Co|ORGANISATION| ++-----------------------+------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_mapa| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|fr| +|Size:|1.4 MB| + +## References + +The dataset is available [here](https://huggingface.co/datasets/joelito/mapa). + +## Benchmarking + +```bash +label precision recall f1-score support +ADDRESS 1.00 1.00 1.00 11 +AMOUNT 1.00 1.00 1.00 4 +DATE 1.00 0.96 0.98 28 +ORGANISATION 1.00 0.95 0.98 22 +PERSON 0.94 0.94 0.94 31 +macro-avg 0.98 0.96 0.97 96 +macro-avg 0.99 0.97 0.98 96 +weighted-avg 0.98 0.96 0.97 96 +``` diff --git a/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_it.md b/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_it.md new file mode 100644 index 0000000000..0f8b16a63c --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_it.md @@ -0,0 +1,131 @@ +--- +layout: model +title: Legal NER for MAPA(Multilingual Anonymisation for Public Administrations) +author: John Snow Labs +name: legner_mapa +date: 2023-04-27 +tags: [it, ner, legal, mapa, licensed] +task: Named Entity Recognition +language: it +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The dataset consists of 12 documents taken from EUR-Lex, a multilingual corpus of court decisions and legal dispositions in the 24 official languages of the European Union. + +This model extracts `ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, and `PERSON` entities from `Italian` documents. + +## Predicted Entities + +`ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, `PERSON` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_mapa_it_1.0.0_3.0_1682597548726.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_mapa_it_1.0.0_3.0_1682597548726.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_base_it_cased", "it")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings")\ + .setMaxSentenceLength(512)\ + .setCaseSensitive(True) + +ner_model = legal.NerModel.pretrained("legner_mapa", "it", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""In pendenza del giudizio relativo alla responsabilità genitoriale instaurato in Italia, la sig.ra Grigorescu, il 30 settembre 2009, ha adito la Judecătoria București ( Tribunale di primo grado di Bucarest ) chiedendo il divorzio, l’affidamento esclusivo del figlio e un contributo al mantenimento del figlio a carico del padre a titolo di mantenimento della prole."""] + +result = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++-----------------+---------+ +|chunk |ner_label| ++-----------------+---------+ +|Italia |ADDRESS | +|sig.ra Grigorescu|PERSON | +|30 settembre 2009|DATE | +|Bucarest |ADDRESS | ++-----------------+---------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_mapa| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|it| +|Size:|1.4 MB| + +## References + +The dataset is available [here](https://huggingface.co/datasets/joelito/mapa). + +## Benchmarking + +```bash +label precision recall f1-score support +ADDRESS 1.00 1.00 1.00 14 +AMOUNT 1.00 1.00 1.00 3 +DATE 1.00 1.00 1.00 45 +ORGANISATION 0.89 0.89 0.89 9 +PERSON 0.92 1.00 0.96 12 +macro-avg 0.98 0.99 0.98 83 +macro-avg 0.96 0.98 0.97 83 +weighted-avg 0.98 0.99 0.98 83 +``` diff --git a/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_lt.md b/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_lt.md new file mode 100644 index 0000000000..243cf069fb --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_lt.md @@ -0,0 +1,132 @@ +--- +layout: model +title: Legal NER for MAPA(Multilingual Anonymisation for Public Administrations) +author: John Snow Labs +name: legner_mapa +date: 2023-04-27 +tags: [lt, licensed, ner, legal, mapa] +task: Named Entity Recognition +language: lt +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The dataset consists of 12 documents taken from EUR-Lex, a multilingual corpus of court decisions and legal dispositions in the 24 official languages of the European Union. + +This model extracts `ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, and `PERSON` entities from `Lithuanian` documents. + +## Predicted Entities + +`ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, `PERSON` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_mapa_lt_1.0.0_3.0_1682599671257.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_mapa_lt_1.0.0_3.0_1682599671257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_base_lt_cased", "lt")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings")\ + .setMaxSentenceLength(512)\ + .setCaseSensitive(True) + +ner_model = legal.NerModel.pretrained("legner_mapa", "lt", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""Iš pagrindinės bylos matyti, kad Martin-Meat darbuotojai buvo komandiruoti į Austriją laikotarpiu nuo 2007 m iki 2012 m mėsos išpjaustymo darbams Alpenrind patalpose atlikti."""] + +result = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++-----------+------------+ +|chunk |ner_label | ++-----------+------------+ +|Martin-Meat|ORGANISATION| +|Austriją |ADDRESS | +|2007 m |DATE | +|2012 m |DATE | +|Alpenrind |ORGANISATION| ++-----------+------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_mapa| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|lt| +|Size:|1.4 MB| + +## References + +The dataset is available [here](https://huggingface.co/datasets/joelito/mapa). + +## Benchmarking + +```bash +label precision recall f1-score support +ADDRESS 0.86 0.75 0.80 8 +AMOUNT 1.00 0.64 0.78 11 +DATE 0.97 0.97 0.97 65 +ORGANISATION 0.81 0.86 0.83 35 +PERSON 0.87 0.84 0.85 56 +macro-avg 0.90 0.87 0.89 175 +macro-avg 0.90 0.81 0.85 175 +weighted-avg 0.90 0.87 0.89 175 +``` diff --git a/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_nl.md b/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_nl.md new file mode 100644 index 0000000000..630b4c6e42 --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_nl.md @@ -0,0 +1,131 @@ +--- +layout: model +title: Legal NER for MAPA(Multilingual Anonymisation for Public Administrations) +author: John Snow Labs +name: legner_mapa +date: 2023-04-27 +tags: [nl, ner, licensed, legal, mapa] +task: Named Entity Recognition +language: nl +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The dataset consists of 12 documents taken from EUR-Lex, a multilingual corpus of court decisions and legal dispositions in the 24 official languages of the European Union. + +This model extracts `ADDRESS`, `DATE`, `ORGANISATION`, and `PERSON` entities from `Dutch` documents. + +## Predicted Entities + +`ADDRESS`, `DATE`, `ORGANISATION`, `PERSON` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_mapa_nl_1.0.0_3.0_1682600676432.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_mapa_nl_1.0.0_3.0_1682600676432.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_base_nl_cased", "nl")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings")\ + .setMaxSentenceLength(512)\ + .setCaseSensitive(True) + +ner_model = legal.NerModel.pretrained("legner_mapa", "nl", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""Liberato en Grigorescu zijn op 22 oktober 2005 in Rome ( Italië ) in het huwelijk getreden en hebben tot de geboorte van hun kind op 20 februari 2006 in die lidstaat samengewoond."""] + +result = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++----------------+---------+ +|chunk |ner_label| ++----------------+---------+ +|Liberato |PERSON | +|Grigorescu |PERSON | +|22 oktober 2005 |DATE | +|Rome ( Italië ) |ADDRESS | +|20 februari 2006|DATE | ++----------------+---------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_mapa| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|nl| +|Size:|1.4 MB| + +## References + +The dataset is available [here](https://huggingface.co/datasets/joelito/mapa). + +## Benchmarking + +```bash +label precision recall f1-score support +ADDRESS 0.87 0.81 0.84 16 +DATE 0.98 0.98 0.98 54 +ORGANISATION 0.83 0.97 0.90 31 +PERSON 0.90 0.92 0.91 39 +macro-avg 0.91 0.94 0.93 140 +macro-avg 0.90 0.92 0.91 140 +weighted-avg 0.91 0.94 0.93 140 +``` diff --git a/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_pt.md b/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_pt.md new file mode 100644 index 0000000000..42d8a5d720 --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_pt.md @@ -0,0 +1,131 @@ +--- +layout: model +title: Legal NER for MAPA(Multilingual Anonymisation for Public Administrations) +author: John Snow Labs +name: legner_mapa +date: 2023-04-27 +tags: [pt, licensed, ner, legal, mapa] +task: Named Entity Recognition +language: pt +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The dataset consists of 12 documents taken from EUR-Lex, a multilingual corpus of court decisions and legal dispositions in the 24 official languages of the European Union. + +This model extracts `ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, and `PERSON` entities from `Portuguese` documents. + +## Predicted Entities + +`ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, `PERSON` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_mapa_pt_1.0.0_3.0_1682608680085.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_mapa_pt_1.0.0_3.0_1682608680085.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_base_pt_cased", "pt")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings")\ + .setMaxSentenceLength(512)\ + .setCaseSensitive(True) + +ner_model = legal.NerModel.pretrained("legner_mapa", "pt", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""Nos termos dos Decretos da Garda Síochána (6), só pode ser admitido como estagiário para integrar a força policial nacional quem tiver pelo menos 18 anos, mas menos de 35 anos de idade, no primeiro dia do mês em que tenha sido publicado pela primeira vez, num jornal nacional, o anúncio da vaga a que o recrutamento respeita."""] + +result = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++-----------------------+------------+ +|chunk |ner_label | ++-----------------------+------------+ +|Garda Síochána |ORGANISATION| +|força policial nacional|ORGANISATION| +|18 anos |AMOUNT | +|35 anos |AMOUNT | ++-----------------------+------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_mapa| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|1.4 MB| + +## References + +The dataset is available [here](https://huggingface.co/datasets/joelito/mapa). + +## Benchmarking + +```bash +label precision recall f1-score support +ADDRESS 0.91 0.91 0.91 23 +AMOUNT 1.00 0.83 0.91 6 +DATE 1.00 0.95 0.97 61 +ORGANISATION 0.85 0.77 0.81 30 +PERSON 0.88 0.91 0.89 65 +macro-avg 0.92 0.90 0.91 185 +macro-avg 0.93 0.87 0.90 185 +weighted-avg 0.92 0.90 0.91 185 +``` diff --git a/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_ro.md b/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_ro.md new file mode 100644 index 0000000000..e01bca7dc3 --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-27-legner_mapa_ro.md @@ -0,0 +1,130 @@ +--- +layout: model +title: Legal NER for MAPA(Multilingual Anonymisation for Public Administrations) +author: John Snow Labs +name: legner_mapa +date: 2023-04-27 +tags: [ro, licensed, ner, legal, mapa] +task: Named Entity Recognition +language: ro +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The dataset consists of 12 documents taken from EUR-Lex, a multilingual corpus of court decisions and legal dispositions in the 24 official languages of the European Union. + +This model extracts `ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, and `PERSON` entities from `Romanian` documents. + +## Predicted Entities + +`ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, `PERSON` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_mapa_ro_1.0.0_3.0_1682609352989.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_mapa_ro_1.0.0_3.0_1682609352989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_base_ro_cased", "ro")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings")\ + .setMaxSentenceLength(512)\ + .setCaseSensitive(True) + +ner_model = legal.NerModel.pretrained("legner_mapa", "ro", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""Or, rezultă din hotărârea Curții de Apel București din 12 iunie 2013 că instanța română a aplicat greșit dreptul Uniunii (32) atunci când a respins excepția de litispendență invocată de domnul Liberato, întemeiată pe cererile referitoare la legătura matrimonială."""] + +result = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++---------------+---------+ +|chunk |ner_label| ++---------------+---------+ +|București |ADDRESS | +|12 iunie 2013 |DATE | +|domnul Liberato|PERSON | ++---------------+---------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_mapa| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|ro| +|Size:|1.4 MB| + +## References + +The dataset is available [here](https://huggingface.co/datasets/joelito/mapa). + +## Benchmarking + +```bash +label precision recall f1-score support +ADDRESS 0.88 0.96 0.92 23 +AMOUNT 1.00 0.67 0.80 3 +DATE 0.97 0.97 0.97 31 +ORGANISATION 0.67 0.71 0.69 28 +PERSON 0.91 0.83 0.87 48 +macro-avg 0.86 0.86 0.86 133 +macro-avg 0.88 0.83 0.85 133 +weighted-avg 0.87 0.86 0.86 133 +``` diff --git a/docs/_posts/bunyamin-polat/2023-04-28-legner_mapa_cs.md b/docs/_posts/bunyamin-polat/2023-04-28-legner_mapa_cs.md new file mode 100644 index 0000000000..b72f89712c --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-28-legner_mapa_cs.md @@ -0,0 +1,132 @@ +--- +layout: model +title: Legal NER for MAPA(Multilingual Anonymisation for Public Administrations) +author: John Snow Labs +name: legner_mapa +date: 2023-04-28 +tags: [cs, licensed, legal, ner, mapa] +task: Named Entity Recognition +language: cs +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The dataset consists of 12 documents taken from EUR-Lex, a multilingual corpus of court decisions and legal dispositions in the 24 official languages of the European Union. + +This model extracts `ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, and `PERSON` entities from `Czech` documents. + +## Predicted Entities + +`ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, `PERSON` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_mapa_cs_1.0.0_3.0_1682668776380.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_mapa_cs_1.0.0_3.0_1682668776380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_base_czech_legal","cs")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings")\ + .setMaxSentenceLength(512)\ + .setCaseSensitive(True) + +ner_model = legal.NerModel.pretrained("legner_mapa", "cs", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""V roce 2007 uzavřela společnost Alpenrind, dříve S GmbH, se společností Martin-Meat usazenou v Maďarsku smlouvu, podle níž se posledně uvedená společnost zavázala k porcování masa a jeho balení v rozsahu 25 půlek jatečně upravených těl skotu týdně."""] + +result = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++-----------+------------+ +|chunk |ner_label | ++-----------+------------+ +|2007 |DATE | +|Alpenrind |ORGANISATION| +|Martin-Meat|ORGANISATION| +|Maďarsku |ADDRESS | +|25 půlek |AMOUNT | ++-----------+------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_mapa| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|cs| +|Size:|1.4 MB| + +## References + +The dataset is available [here](https://huggingface.co/datasets/joelito/mapa). + +## Benchmarking + +```bash +label precision recall f1-score support +ADDRESS 0.80 0.67 0.73 36 +AMOUNT 1.00 1.00 1.00 5 +DATE 0.98 0.98 0.98 56 +ORGANISATION 0.64 0.66 0.65 32 +PERSON 0.75 0.82 0.78 66 +micro-avg 0.81 0.82 0.81 195 +macro-avg 0.83 0.82 0.83 195 +weighted-avg 0.81 0.82 0.81 195 +``` diff --git a/docs/_posts/bunyamin-polat/2023-04-28-legner_mapa_fi.md b/docs/_posts/bunyamin-polat/2023-04-28-legner_mapa_fi.md new file mode 100644 index 0000000000..70d929731b --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-28-legner_mapa_fi.md @@ -0,0 +1,131 @@ +--- +layout: model +title: Legal NER for MAPA(Multilingual Anonymisation for Public Administrations) +author: John Snow Labs +name: legner_mapa +date: 2023-04-28 +tags: [fi, licensed, ner, legal, mapa] +task: Named Entity Recognition +language: fi +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The dataset consists of 12 documents taken from EUR-Lex, a multilingual corpus of court decisions and legal dispositions in the 24 official languages of the European Union. + +This model extracts `ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, and `PERSON` entities from `Finnish` documents. + +## Predicted Entities + +`ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, `PERSON` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_mapa_fi_1.0.0_3.0_1682671773751.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_mapa_fi_1.0.0_3.0_1682671773751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_base_finnish_legal","fi")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings")\ + .setMaxSentenceLength(512)\ + .setCaseSensitive(True) + +ner_model = legal.NerModel.pretrained("legner_mapa", "fi", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""Liberato vaati 22.5.2007 päivätyllä kanteellaan Tribunale di Teramossa ( Teramon alioikeus, Italia ) asumuseroa Grigorescusta ja lapsen huoltajuutta."""] + +result = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++-------------+---------+ +|chunk |ner_label| ++-------------+---------+ +|Liberato |PERSON | +|22.5.2007 |DATE | +|Italia |ADDRESS | +|Grigorescusta|PERSON | ++-------------+---------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_mapa| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|fi| +|Size:|1.4 MB| + +## References + +The dataset is available [here](https://huggingface.co/datasets/joelito/mapa). + +## Benchmarking + +```bash +label precision recall f1-score support +ADDRESS 0.81 0.93 0.86 27 +AMOUNT 1.00 1.00 1.00 2 +DATE 0.92 0.95 0.94 61 +ORGANISATION 0.88 0.81 0.85 27 +PERSON 0.93 0.95 0.94 40 +micro-avg 0.90 0.92 0.91 157 +macro-avg 0.91 0.93 0.92 157 +weighted-avg 0.90 0.92 0.91 157 +``` diff --git a/docs/_posts/bunyamin-polat/2023-04-28-legner_mapa_ga.md b/docs/_posts/bunyamin-polat/2023-04-28-legner_mapa_ga.md new file mode 100644 index 0000000000..4a30fa8c47 --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-28-legner_mapa_ga.md @@ -0,0 +1,131 @@ +--- +layout: model +title: Legal NER for MAPA(Multilingual Anonymisation for Public Administrations) +author: John Snow Labs +name: legner_mapa +date: 2023-04-28 +tags: [ga, licensed, ner, legal, mapa] +task: Named Entity Recognition +language: ga +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The dataset consists of 12 documents taken from EUR-Lex, a multilingual corpus of court decisions and legal dispositions in the 24 official languages of the European Union. + +This model extracts `ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, and `PERSON` entities from `Irish` documents. + +## Predicted Entities + +`ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, `PERSON` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_mapa_ga_1.0.0_3.0_1682670223837.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_mapa_ga_1.0.0_3.0_1682670223837.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_base_irish_legal","gle")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings")\ + .setMaxSentenceLength(512)\ + .setCaseSensitive(True) + +ner_model = legal.NerModel.pretrained("legner_mapa", "ga", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""Dhiúltaigh Tribunale di Teramo ( An Chúirt Dúiche, Teramo ) an t-iarratas a rinne Bn.Grigorescu, ar bhonn teagmhasach, chun aitheantas a thabhairt san Iodáil do bhreithiúnas colscartha Tribunalul București ( An Chúirt Réigiúnach, Búcairist ) an 3 Nollaig 2012, de bhun Rialachán Uimh."""] + +result = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++--------------+---------+ +|chunk |ner_label| ++--------------+---------+ +|Teramo |ADDRESS | +|Bn.Grigorescu |PERSON | +|Búcairist |ADDRESS | +|3 Nollaig 2012|DATE | ++--------------+---------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_mapa| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|ga| +|Size:|16.3 MB| + +## References + +The dataset is available [here](https://huggingface.co/datasets/joelito/mapa). + +## Benchmarking + +```bash +label precision recall f1-score support +ADDRESS 0.82 0.74 0.78 19 +AMOUNT 1.00 1.00 1.00 7 +DATE 0.91 0.92 0.91 75 +ORGANISATION 0.65 0.67 0.66 48 +PERSON 0.71 0.82 0.76 56 +micro-avg 0.79 0.82 0.80 205 +macro-avg 0.82 0.83 0.82 205 +weighted-avg 0.79 0.82 0.80 205 +``` diff --git a/docs/_posts/bunyamin-polat/2023-04-28-legner_mapa_sk.md b/docs/_posts/bunyamin-polat/2023-04-28-legner_mapa_sk.md new file mode 100644 index 0000000000..3614c38aca --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-28-legner_mapa_sk.md @@ -0,0 +1,131 @@ +--- +layout: model +title: Legal NER for MAPA(Multilingual Anonymisation for Public Administrations) +author: John Snow Labs +name: legner_mapa +date: 2023-04-28 +tags: [sk, licensed, ner, legal, mapa] +task: Named Entity Recognition +language: sk +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The dataset consists of 12 documents taken from EUR-Lex, a multilingual corpus of court decisions and legal dispositions in the 24 official languages of the European Union. + +This model extracts `ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, and `PERSON` entities from `Slovak` documents. + +## Predicted Entities + +`ADDRESS`, `AMOUNT`, `DATE`, `ORGANISATION`, `PERSON` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_mapa_sk_1.0.0_3.0_1682674803309.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_mapa_sk_1.0.0_3.0_1682674803309.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.RoBertaEmbeddings.pretrained("roberta_base_slovak_legal","sk")\ + .setInputCols(["sentence", "token"])\ + .setOutputCol("embeddings")\ + .setMaxSentenceLength(512)\ + .setCaseSensitive(True) + +ner_model = legal.NerModel.pretrained("legner_mapa", "sk", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence", "token", "ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + document_assembler, + sentence_detector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""Návrhom podaným 22. mája 2007 na Tribunale di Teramo ( súd v Terame, Taliansko ) požiadal pán Liberato o rozluku a o zverenie syna do svojej starostlivosti."""] + +result = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++-------------+---------+ +|chunk |ner_label| ++-------------+---------+ +|22. mája 2007|DATE | +|Terame |ADDRESS | +|Taliansko |ADDRESS | +|pán Liberato |PERSON | ++-------------+---------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_mapa| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|sk| +|Size:|1.4 MB| + +## References + +The dataset is available [here](https://huggingface.co/datasets/joelito/mapa). + +## Benchmarking + +```bash +label precision recall f1-score support +ADDRESS 0.88 0.85 0.86 26 +AMOUNT 1.00 1.00 1.00 4 +DATE 0.92 0.88 0.90 50 +ORGANISATION 0.79 0.61 0.69 31 +PERSON 0.66 0.86 0.75 44 +micro-avg 0.80 0.82 0.81 155 +macro-avg 0.85 0.84 0.84 155 +weighted-avg 0.81 0.82 0.81 155 +``` diff --git a/docs/_posts/bunyamin-polat/2023-04-30-legpipe_alias_en.md b/docs/_posts/bunyamin-polat/2023-04-30-legpipe_alias_en.md new file mode 100644 index 0000000000..64d8714200 --- /dev/null +++ b/docs/_posts/bunyamin-polat/2023-04-30-legpipe_alias_en.md @@ -0,0 +1,75 @@ +--- +layout: model +title: Legal Alias Pipeline +author: John Snow Labs +name: legpipe_alias +date: 2023-04-30 +tags: [en, legal, ner, pipeline, alias, licensed] +task: Pipeline Legal +language: en +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: +type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This pipeline allows you to detect names in quotes and brackets like: ("Supplier"), ("Recipient"), ("Disclosing Parties"), etc. very common in Legal Agreements to reference the parties. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legpipe_alias_en_1.0.0_3.0_1682861474127.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legpipe_alias_en_1.0.0_3.0_1682861474127.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} + +```python +legal_pipeline = nlp.PretrainedPipeline("legpipe_alias", "en", "legal/models") + +text = ["""MUTUAL NON-DISCLOSURE AGREEMENT +This Mutual Non-Disclosure Agreement (the “Agreement”) is made on _________ by and between: +John Snow Labs, a Delaware corporation, registered at 16192 Coastal Highway, Lewes, Delaware 19958 (“John Snow Labs”), and +Acentos, S.L, a Spanish corporation, registered at Gran Via 71, 2º floor (“Company”), (each a “party” and together the “parties”). +Recitals: +John Snow Labs and Company intend to explore the possibility of a business relationship between each other, whereby each party (“Discloser”) may disclose sensitive information to the other party (“Recipient”). +The parties agree as follows:"""] + +result = legal_pipeline.annotate(text) +``` + +
+ +## Results + +```bash +['(“John Snow Labs”)', '(“Company”)', '( “ Discloser ” )', '(“Recipient”)'] +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legpipe_alias| +|Type:|pipeline| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|13.1 KB| + +## Included Models + +- DocumentAssembler +- TokenizerModel +- ContextualParserModel diff --git a/docs/_posts/gadde5300/2023-04-21-leggen_flant5_base_en.md b/docs/_posts/gadde5300/2023-04-21-leggen_flant5_base_en.md new file mode 100644 index 0000000000..d7d92ba377 --- /dev/null +++ b/docs/_posts/gadde5300/2023-04-21-leggen_flant5_base_en.md @@ -0,0 +1,81 @@ +--- +layout: model +title: Legal FLAN-T5 Text Generation (Base) +author: John Snow Labs +name: leggen_flant5_base +date: 2023-04-21 +tags: [en, licensed, legal, flan_t5, generation, tensorflow] +task: Text Generation +language: en +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: LegalTextGenerator +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +FLAN-T5 is an enhanced version of the original T5 model and is designed to produce better quality and more coherent text generation. It is trained on a large dataset of diverse texts and can generate high-quality summaries of articles, documents, and other text-based inputs. The model can also be utilized to generate legal texts. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/leggen_flant5_base_en_1.0.0_3.0_1682073962277.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/leggen_flant5_base_en_1.0.0_3.0_1682073962277.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("question") + +flant5 = legal.TextGenerator.pretrained('leggen_flant5_base','en','legal/models')\ + .setInputCols(["question"])\ + .setOutputCol("summary") + .setMaxNewTokens(150)\ + .setStopAtEos(True) + +pipeline = nlp.Pipeline(stages=[document_assembler, flant5]) +data = spark.createDataFrame([ + [1, "Explain loan Clauses"] +]).toDF('id', 'text') +results = pipeline.fit(data).transform(data) +results.select("summary.result").show(truncate=False) +``` + +
+ +## Results + +```bash ++--------------------------------------------------------------------------------------------+ +|result | ++--------------------------------------------------------------------------------------------+ +|[Loan clauses are clauses in the U.S. Constitution that provide for the repayment of loans.]| ++--------------------------------------------------------------------------------------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|leggen_flant5_base| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|920.9 MB| diff --git a/docs/_posts/gadde5300/2023-04-29-leggen_flant5_finetuned_en.md b/docs/_posts/gadde5300/2023-04-29-leggen_flant5_finetuned_en.md new file mode 100644 index 0000000000..2494b54fec --- /dev/null +++ b/docs/_posts/gadde5300/2023-04-29-leggen_flant5_finetuned_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: Legal Finetuned FLAN-T5 Text Generation +author: John Snow Labs +name: leggen_flant5_finetuned +date: 2023-04-29 +tags: [en, legal, text_generation, licensed, tensorflow] +task: Text Generation +language: en +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: LegalTextGenerator +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This Text Generation model has been fine-tuned on FLANT5 Using legal texts. FLAN-T5 is a state-of-the-art language model developed by Facebook AI that utilizes the T5 architecture for text generation tasks. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/leggen_flant5_finetuned_en_1.0.0_3.0_1682797013244.zip){:.button.button-orange} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/leggen_flant5_finetuned_en_1.0.0_3.0_1682797013244.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +document_assembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("question") + +flant5 = legal.TextGenerator.pretrained('leggen_flant5_finetuned,'en','legal/models')\ + .setInputCols(["question"])\ + .setOutputCol("generated_text") + .setMaxNewTokens(150)\ + .setStopAtEos(True) + +pipeline = nlp.Pipeline(stages=[document_assembler, flant5]) + +data = spark.createDataFrame([ + [1,'''This exhibit has been redacted and is the subject of a confidential treatment request. redacted material is marked with [* * *] and has been filed separately with the securities and exchange commission. this agreement (this "agreement"), dated december 30, 2016 (the "effective date"), is'''] +]).toDF('id', 'text') +results = pipeline.fit(data).transform(data) +results.select("generated_text.result").show(truncate=False) +``` + +
+ +## Results + +```bash ++---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +|result | ++---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +|[The parties agree that this Agreement shall be binding upon and inure to the benefit of the parties, their successors and assigns. The parties further agree that any disputes arising out of or related to this Agreement shall be resolved through binding arbitration. The parties agree to submit to binding arbitration in accordance with the rules of the American Arbitration Association]| ++---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|leggen_flant5_finetuned| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Language:|en| +|Size:|1.6 GB| + +## References + +In house annotated data