diff --git a/docs/_posts/gadde5300/2024-05-17-legner_sec_edgar_le_en.md b/docs/_posts/gadde5300/2024-05-17-legner_sec_edgar_le_en.md new file mode 100644 index 0000000000..fd94e4c4a9 --- /dev/null +++ b/docs/_posts/gadde5300/2024-05-17-legner_sec_edgar_le_en.md @@ -0,0 +1,131 @@ +--- +layout: model +title: Legal NER on EDGAR Documents +author: John Snow Labs +name: legner_sec_edgar_le +date: 2024-05-17 +tags: [en, ner, legal, sec, edgar, licensed] +task: Named Entity Recognition +language: en +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This Legal NER model extracts `ORG`, `INST`, `LAW`, `COURT`, `PER`, `LOC`, `MISC`, `ALIAS`, and `TICKER` entities from the US SEC EDGAR documents, was trained using custom legal word embeddings. + +## Predicted Entities + +`ORG`, `INST`, `LAW`, `COURT`, `PER`, `LOC`, `MISC`, `ALIAS`, `TICKER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_sec_edgar_le_en_1.0.0_3.0_1715941721099.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_sec_edgar_le_en_1.0.0_3.0_1715941721099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.WordEmbeddingsModel.pretrained("legal_word_embeddings", "en", "legal/models")\ + .setInputCols(["sentence","token"])\ + .setOutputCol("embeddings") + +ner_model = legal.NerModel.pretrained("legner_sec_edgar_le", "en", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence","token","ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + documentAssembler, + sentenceDetector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = ["""In our opinion, the accompanying consolidated balance sheets and the related consolidated statements of operations, of changes in stockholders' equity, and of cash flows present fairly, in all material respects, the financial position of SunGard Capital Corp. II and its subsidiaries ( SCC II ) at December 31, 2010, and 2009, and the results of their operations and their cash flows for each of the three years in the period ended December 31, 2010, in conformity with accounting principles generally accepted in the United States of America."""] + + +res = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++----------------------------------------+-----+ +|chunk |label| ++----------------------------------------+-----+ +|SunGard Capital Corp. II |ORG | +|SCC II |ALIAS| +|accounting principles generally accepted|LAW | +|United States of America |LOC | ++----------------------------------------+-----+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_sec_edgar_le| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|14.6 MB| + +## References + +In-house annotations + +## Benchmarking + +```bash + precision recall f1-score support +ALIAS 0.88 0.87 0.87 84 +COURT 1.00 1.00 1.00 6 +INST 0.94 0.83 0.88 76 +LAW 0.92 0.91 0.91 166 +LOC 0.93 0.91 0.92 140 +MISC 0.88 0.84 0.86 226 +ORG 0.91 0.95 0.93 430 +PER 0.97 0.94 0.95 66 +TICKER 1.00 0.86 0.92 7 +micro-avg 0.91 0.90 0.91 1201 +macro-avg 0.94 0.90 0.92 1201 +weighted-avg 0.91 0.90 0.91 1201 +``` diff --git a/docs/_posts/gadde5300/2024-05-21-legal_word_embeddings_en.md b/docs/_posts/gadde5300/2024-05-21-legal_word_embeddings_en.md new file mode 100644 index 0000000000..06bf2e1f44 --- /dev/null +++ b/docs/_posts/gadde5300/2024-05-21-legal_word_embeddings_en.md @@ -0,0 +1,66 @@ +--- +layout: model +title: Legal Word Embeddings +author: John Snow Labs +name: legal_word_embeddings +date: 2024-05-21 +tags: [legal, word_embeddings, en, licensed] +task: Embeddings +language: en +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: WordEmbeddingsModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The word embedding models were based on Word2Vec, trained on a mix of different datasets. We used public data and in-house annotated documents. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legal_word_embeddings_en_1.0.0_3.0_1716300540404.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legal_word_embeddings_en_1.0.0_3.0_1716300540404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +model = nlp.WordEmbeddingsModel.pretrained("legal_word_embeddings","en","legal/models")\ + .setInputCols(["sentence","token"])\ + .setOutputCol("embeddings") +``` + +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_word_embeddings| +|Type:|embeddings| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[word_embeddings]| +|Language:|en| +|Size:|84.9 MB| +|Case sensitive:|false| +|Dimension:|200| + +## References + +Public data and in-house annotated documents \ No newline at end of file diff --git a/docs/_posts/gadde5300/2024-05-21-legner_deid_le_en.md b/docs/_posts/gadde5300/2024-05-21-legner_deid_le_en.md new file mode 100644 index 0000000000..c6dfa31dcd --- /dev/null +++ b/docs/_posts/gadde5300/2024-05-21-legner_deid_le_en.md @@ -0,0 +1,137 @@ +--- +layout: model +title: Generic Deidentification NER (Legal) +author: John Snow Labs +name: legner_deid_le +date: 2024-05-21 +tags: [en, legal, ner, deid, deidentification, licensed] +task: Named Entity Recognition +language: en +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This is a Legal NER model trained using custom legal embeddings which allows you to detect some generic entities that may require to be masked or obfuscated to be compliant with different regulations, as GDPR and CCPA. This is just an NER model, make sure you try the full De-identification pipelines available in Models Hub. + +## Predicted Entities + +`AGE`, `CITY`, `COUNTRY`, `DATE`, `EMAIL`, `FAX`, `LOCATION-OTHER`, `ORG`, `PERSON`, `PHONE`, `PROFESSION`, `STATE`, `STREET`, `URL`, `ZIP` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_deid_le_en_1.0.0_3.0_1716291298762.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_deid_le_en_1.0.0_3.0_1716291298762.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.WordEmbeddingsModel.pretrained("legal_word_embeddings", "en", "legal/models")\ + .setInputCols(["sentence","token"])\ + .setOutputCol("embeddings") + +ner_model =legal.NerModel.pretrained("legner_deid_le", "en", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence","token","ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + documentAssembler, + sentenceDetector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = [""" This LICENSE AND DEVELOPMENT AGREEMENT (this Agreement) is entered into effective as of Nov. 02, 2019 (the Effective Date) by and between Bioeq IP AG, having its principal place of business at 333 Twin Dolphin Drive, Suite 600, Redwood City, CA, 94065, USA (Licensee). """] + +res = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++----------------------+------+ +|chunk |label | ++----------------------+------+ +|Nov. 02, 2019 |DATE | +|333 Twin Dolphin Drive|STREET| +|Redwood City |CITY | +|CA |STATE | ++----------------------+------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_deid_le| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|14.8 MB| + +## References + +In-house annotated documents with protected information + +## Benchmarking + +```bash + precision recall f1-score support +AGE 0.97 0.97 0.97 266 +CITY 0.85 0.76 0.80 120 +COUNTRY 0.89 0.63 0.74 38 +DATE 0.98 0.98 0.98 2206 +EMAIL 1.00 1.00 1.00 1 +FAX 0.00 0.00 0.00 2 +LOCATION-OTHER 1.00 0.50 0.67 6 +ORG 0.69 0.48 0.56 42 +PERSON 0.96 0.96 0.96 1295 +PHONE 0.84 0.85 0.85 62 +PROFESSION 0.80 0.54 0.65 76 +STATE 0.94 0.93 0.94 90 +STREET 0.95 0.90 0.92 81 +URL 0.00 0.00 0.00 1 +ZIP 0.97 0.96 0.96 67 +micro-avg 0.96 0.95 0.95 4353 +macro-avg 0.79 0.70 0.73 4353 +weighted-avg 0.96 0.95 0.95 4353 + +``` \ No newline at end of file diff --git a/docs/_posts/gadde5300/2024-06-07-legner_contract_doc_parties_le_en.md b/docs/_posts/gadde5300/2024-06-07-legner_contract_doc_parties_le_en.md new file mode 100644 index 0000000000..db0852624d --- /dev/null +++ b/docs/_posts/gadde5300/2024-06-07-legner_contract_doc_parties_le_en.md @@ -0,0 +1,159 @@ +--- +layout: model +title: Legal NER (Parties, Dates, Alias, Former names, Document Type) +author: John Snow Labs +name: legner_contract_doc_parties_le +date: 2024-06-07 +tags: [document, agreement, contract, type, parties, aliases, former, names, effective, dates, licensed, en] +task: Named Entity Recognition +language: en +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +IMPORTANT: Don't run this model on the whole legal agreement. Instead: +- Split by paragraphs. You can use [notebook 1](https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/tutorials/Certification_Trainings) in Finance or Legal as inspiration; +- Use the `legclf_introduction_clause` Text Classifier to select only these paragraphs; + +This is a Legal NER Model, aimed to process the first page of the agreements when information can be found about: +- Parties of the contract/agreement; +- Their former names; +- Aliases of those parties, or how those parties will be called further on in the document; +- Document Type; +- Effective Date of the agreement; +- Other organizations; + +This model can be used all along with its Relation Extraction model to retrieve the relations between these entities, called `legre_contract_doc_parties` + +## Predicted Entities + +`EFFDATE`, `PARTY`, `DOC`, `FORMER_NAME`, `ALIAS`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_contract_doc_parties_le_en_1.0.0_3.0_1717749001756.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_contract_doc_parties_le_en_1.0.0_3.0_1717749001756.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.WordEmbeddingsModel.pretrained("legal_word_embeddings", "en", "legal/models")\ + .setInputCols(["sentence","token"])\ + .setOutputCol("embeddings") + +ner_model = legal.NerModel.pretrained("legner_contract_doc_parties_le", "en", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence","token","ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + documentAssembler, + sentenceDetector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) + +text = [""" +INTELLECTUAL PROPERTY AGREEMENT + +This INTELLECTUAL PROPERTY AGREEMENT (this "Agreement"), dated as of December 31, 2018 (the "Effective Date") is entered into by and between Armstrong Flooring, Inc., a Delaware corporation ("Seller") and AFI Licensing LLC, a Delaware limited liability company ("Licensing" and together with Seller, "Arizona") and AHF Holding, Inc. (formerly known as Tarzan HoldCo, Inc.), a Delaware corporation ("Buyer") and Armstrong Hardwood Flooring Company, a Tennessee corporation (the "Company" and together with Buyer the "Buyer Entities") (each of Arizona on the one hand and the Buyer Entities on the other hand, a "Party" and collectively, the "Parties"). +"""] + +res = model.transform(spark.createDataFrame([text]).toDF("text")) +``` + +
+ +## Results + +```bash ++-----------------------------------+-----------+ +|chunk |label | ++-----------------------------------+-----------+ +|INTELLECTUAL PROPERTY AGREEMENT |DOC | +|INTELLECTUAL PROPERTY AGREEMENT |DOC | +|December 31, 2018 |EFFDATE | +|Armstrong Flooring, Inc |PARTY | +|Seller |ALIAS | +|AFI Licensing LLC |PARTY | +|Licensing |ALIAS | +|Seller |PARTY | +|Arizona |ALIAS | +|AHF Holding, Inc |PARTY | +|Tarzan HoldCo, Inc |FORMER_NAME| +|Buyer |ALIAS | +|Armstrong Hardwood Flooring Company|PARTY | +|Company |ALIAS | +|Buyer |PARTY | +|Buyer Entities |ALIAS | +|Arizona |PARTY | +|Buyer Entities |PARTY | +|Party |ALIAS | +|Parties |ALIAS | ++-----------------------------------+-----------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_contract_doc_parties_le| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|14.7 MB| + +## References + +Manual annotations on CUAD dataset + +## Benchmarking + +```bash + precision recall f1-score support +ALIAS 0.86 0.94 0.90 118 +DOC 0.82 0.81 0.82 79 +EFFDATE 0.87 0.93 0.90 56 +FORMER_NAME 0.80 0.80 0.80 5 +ORG 0.76 0.75 0.76 122 +PARTY 0.84 0.81 0.82 209 +micro-avg 0.83 0.83 0.83 589 +macro-avg 0.83 0.84 0.83 589 +weighted-avg 0.83 0.83 0.83 589 +``` \ No newline at end of file diff --git a/docs/_posts/gadde5300/2024-06-10-legal_bge_base_embeddings_en.md b/docs/_posts/gadde5300/2024-06-10-legal_bge_base_embeddings_en.md new file mode 100644 index 0000000000..29234098fa --- /dev/null +++ b/docs/_posts/gadde5300/2024-06-10-legal_bge_base_embeddings_en.md @@ -0,0 +1,64 @@ +--- +layout: model +title: Legal BGE Embeddings +author: John Snow Labs +name: legal_bge_base_embeddings +date: 2024-06-10 +tags: [legal, licensed, embeddings, bge, en, onnx] +task: Embeddings +language: en +edition: Legal NLP 1.0.0 +spark_version: 3.2 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +The BGE embedding model was trained on a mix of different datasets. We used public data and in-house annotated documents. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legal_bge_base_embeddings_en_1.0.0_3.2_1718032892975.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legal_bge_base_embeddings_en_1.0.0_3.2_1718032892975.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +embeddings = nlp.BGEEmbeddings.pretrained("legal_bge_base_embeddings","en","legal/models")\ + .setInputCols("document")\ + .setOutputCol("embeddings") +``` + +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_bge_base_embeddings| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[sentence_embeddings]| +|Language:|en| +|Size:|394.4 MB| + +## References + +Public data and in-house annotated documents \ No newline at end of file diff --git a/docs/_posts/gadde5300/2024-06-28-legner_subpoenas_sm_en.md b/docs/_posts/gadde5300/2024-06-28-legner_subpoenas_sm_en.md new file mode 100644 index 0000000000..181ddff40c --- /dev/null +++ b/docs/_posts/gadde5300/2024-06-28-legner_subpoenas_sm_en.md @@ -0,0 +1,134 @@ +--- +layout: model +title: Legal NER on Subpoenas (Small) +author: John Snow Labs +name: legner_subpoenas_sm +date: 2024-06-28 +tags: [legal, subpoenas, licensed, en] +task: Named Entity Recognition +language: en +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +annotator: LegalNerModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This is a Legal NER model which is trained using custom legal embeddings and is aimed to extract 19 entities from subpoenas. This is called a small version because it has been trained on more generic labels. The larger versions of this model will be available on models hub. + +## Predicted Entities + +`COURT`, `APPOINTMENT_DATE`, `DEADLINE_DATE`, `DOCUMENT_DATE_FROM`, `ADDRESS`, `APPOINTMENT_HOUR`, `DOCUMENT_DATE_TO`, `DOCUMENT_PERSON`, `DOCUMENT_DATE_YEAR`, `STATE`, `MATTER_VS`, `CASE`, `COUNTY`, `DOCUMENT_TOPIC`, `MATTER`, `SUBPOENA_DATE`, `SIGNER`, `RECEIVE`, `DOCUMENT_TYPE` + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_subpoenas_sm_en_1.0.0_3.0_1719594943623.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legner_subpoenas_sm_en_1.0.0_3.0_1719594943623.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = nlp.Tokenizer()\ + .setInputCols(["sentence"])\ + .setOutputCol("token") + +embeddings = nlp.WordEmbeddingsModel.pretrained("legal_word_embeddings", "en", "legal/models")\ + .setInputCols(["sentence","token"])\ + .setOutputCol("embeddings") + +ner_model = legal.NerModel.pretrained("legner_bert_subpoenas_sm_le", "en", "legal/models")\ + .setInputCols(["sentence", "token", "embeddings"])\ + .setOutputCol("ner") + +ner_converter = nlp.NerConverter()\ + .setInputCols(["sentence","token","ner"])\ + .setOutputCol("ner_chunk") + +nlpPipeline = nlp.Pipeline(stages=[ + documentAssembler, + sentenceDetector, + tokenizer, + embeddings, + ner_model, + ner_converter]) + +empty_data = spark.createDataFrame([[""]]).toDF("text") + +model = nlpPipeline.fit(empty_data) +``` + +
+ +## Results + +```bash ++-------------------+-------------+ +|chunk |label | ++-------------------+-------------+ +|summary disposition|DOCUMENT_TYPE| ++-------------------+-------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legner_subpoenas_sm| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence, token, embeddings]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|14.7 MB| + +## References + +In House annotated dataset + +## Benchmarking + +```bash + precision recall f1-score support + ADDRESS 0.82 0.88 0.85 42 + APPOINTMENT_DATE 0.50 1.00 0.67 3 + APPOINTMENT_HOUR 1.00 1.00 1.00 2 + CASE 0.74 0.89 0.81 19 + COUNTY 0.33 0.50 0.40 2 + COURT 0.44 0.40 0.42 10 + DEADLINE_DATE 0.50 1.00 0.67 2 +DOCUMENT_DATE_FROM 0.67 0.86 0.75 7 + DOCUMENT_DATE_TO 0.71 0.83 0.77 6 +DOCUMENT_DATE_YEAR 0.50 0.50 0.50 4 + DOCUMENT_PERSON 0.82 0.79 0.81 1307 + DOCUMENT_TOPIC 0.63 0.62 0.62 94 + DOCUMENT_TYPE 0.87 0.89 0.88 783 + MATTER 0.92 0.86 0.89 94 + MATTER_VS 0.93 0.78 0.85 54 + RECEIVER 0.50 0.30 0.37 20 + SIGNER 0.62 0.65 0.63 20 + STATE 0.60 0.86 0.71 14 + SUBPOENA_DATE 0.24 0.57 0.33 7 + micro-avg 0.82 0.81 0.82 2490 + macro-avg 0.65 0.75 0.68 2490 + weighted-avg 0.82 0.81 0.82 2490 +``` diff --git a/docs/_posts/gadde5300/2024-07-04-legmulticlf_edgar_le_en.md b/docs/_posts/gadde5300/2024-07-04-legmulticlf_edgar_le_en.md new file mode 100644 index 0000000000..f3bb23855f --- /dev/null +++ b/docs/_posts/gadde5300/2024-07-04-legmulticlf_edgar_le_en.md @@ -0,0 +1,135 @@ +--- +layout: model +title: Legal Clauses Multilabel Classifier +author: John Snow Labs +name: legmulticlf_edgar_le +date: 2024-07-04 +tags: [clauses, edgar, ledgar, en, licensed, tensorflow] +task: Text Classification +language: en +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MultiClassifierDLModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This is a Multilabel Document Classification model, which can be used to identify up to 15 classes in texts. The classes are the following: + +- terminations +- assigns +- notices +- amendments +- waivers +- survival +- successors +- governing laws +- severability +- expenses +- assignments +- warranties +- representations +- entire agreements +- counterparts + +## Predicted Entities + +`terminations` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legmulticlf_edgar_le_en_1.0.0_3.0_1720072778562.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legmulticlf_edgar_le_en_1.0.0_3.0_1720072778562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +embeddings = nlp.E5Embeddings.pretrained("legembedding_e5_base", "en", "legal/models")\ + .setInputCols(["document"])\ + .setOutputCol("sentence_embeddings") + + +multiClassifier = nlp.MultiClassifierDLModel.pretrained("legmulticlf_edgar_le", "en", "legal/models") \ + .setInputCols(["document", "sentence_embeddings"]) \ + .setOutputCol("class") + +ledgar_pipeline = nlp.Pipeline( + stages=[document, + embeddings, + multiClassifier]) + + +light_pipeline = LightPipeline(ledgar_pipeline.fit(spark.createDataFrame([['']]).toDF("text"))) + +result = light_pipeline.annotate("""(a) No failure or delay by the Administrative Agent or any Lender in exercising any right or power hereunder shall operate as a waiver thereof, nor shall any single or partial exercise of any such right or power, or any abandonment or discontinuance of steps to enforce such a right or power, preclude any other or further exercise thereof or the exercise of any other right or power. The rights and remedies of the Administrative Agent and the Lenders hereunder are cumulative and are not exclusive of any rights or remedies that they would otherwise have. No waiver of any provision of this Agreement or consent to any departure by the Borrower therefrom shall in any event be effective unless the same shall be permitted by paragraph (b) of this Section, and then such waiver or consent shall be effective only in the specific instance and for the purpose for which given. Without limiting the generality of the foregoing, the making of a Loan shall not be construed as a waiver of any Default, regardless of whether the Administrative Agent or any Lender may have had notice or knowledge of such Default at the time.""") + +result["class"] +``` + +
+ +## Results + +```bash +['waivers', 'amendments'] +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legmulticlf_edgar_le| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence_embeddings]| +|Output Labels:|[class]| +|Language:|en| +|Size:|14.0 MB| + +## References + +Ledgar dataset, available at https://metatext.io/datasets/ledgar, with in-house data + +## Benchmarking + +```bash +Classification report: + precision recall f1-score support + + 0 0.89 0.89 0.89 1066 + 1 0.83 0.65 0.73 333 + 2 0.80 0.81 0.80 537 + 3 0.99 0.99 0.99 918 + 4 0.98 0.98 0.98 1049 + 5 0.99 0.97 0.98 339 + 6 1.00 0.99 0.99 1274 + 7 0.98 0.98 0.98 926 + 8 0.91 0.92 0.91 437 + 9 0.98 0.97 0.98 922 + 10 0.89 0.88 0.88 674 + 11 0.95 0.96 0.95 566 + 12 0.92 0.79 0.85 354 + 13 0.89 0.87 0.88 725 + 14 0.88 0.78 0.83 365 + + micro avg 0.94 0.92 0.93 10485 + macro avg 0.93 0.89 0.91 10485 +weighted avg 0.94 0.92 0.93 10485 + samples avg 0.93 0.94 0.93 10485 +``` \ No newline at end of file diff --git a/docs/_posts/gadde5300/2024-07-04-legmulticlf_mnda_sections_paragraph_other_le_en.md b/docs/_posts/gadde5300/2024-07-04-legmulticlf_mnda_sections_paragraph_other_le_en.md new file mode 100644 index 0000000000..f602d0f7f5 --- /dev/null +++ b/docs/_posts/gadde5300/2024-07-04-legmulticlf_mnda_sections_paragraph_other_le_en.md @@ -0,0 +1,146 @@ +--- +layout: model +title: Multilabel Classification of NDA Clauses (paragraph, medium) +author: John Snow Labs +name: legmulticlf_mnda_sections_paragraph_other_le +date: 2024-07-04 +tags: [en, legal, licensed, mnda, tensorflow] +task: Text Classification +language: en +edition: Legal NLP 1.0.0 +spark_version: 3.0 +supported: true +engine: tensorflow +annotator: MultiClassifierDLModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This models is a version of `legmulticlf_mnda_sections_other` (sentence, medium) but expecting a bigger-than-sentence context, ideally between 2 and 4-5 sentences, or a small paragraph, to provide with more context. + +It should be run on sentences of the NDA clauses, and will retrieve a series of 1..N labels for each of them. The possible clause types detected my this model in NDA / MNDA aggrements are: + +1. Parties to the Agreement - Names of the Parties Clause +2. Identification of What Information Is Confidential - Definition of Confidential Information Clause +3. Use of Confidential Information: Permitted Use Clause and Obligations of the Recipient +4. Time Frame of the Agreement - Termination Clause +5. Return of Confidential Information Clause +6. Remedies for Breaches of Agreement - Remedies Clause +7. Non-Solicitation Clause +8. Dispute Resolution Clause +9. Exceptions Clause +10. Non-competition clause +11. Other: Nothing of the above (synonym to `[]`)- + +## Predicted Entities + +`APPLIC_LAW`, `ASSIGNMENT`, `DEF_OF_CONF_INFO`, `DISPUTE_RESOL`, `EXCEPTIONS`, `NAMES_OF_PARTIES`, `NON_COMP`, `NON_SOLIC`, `PREAMBLE`, `REMEDIES`, `REQ_DISCL`, `RETURN_OF_CONF_INFO`, `TERMINATION`, `USE_OF_CONF_INFO`, `OTHER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legmulticlf_mnda_sections_paragraph_other_le_en_1.0.0_3.0_1720071478051.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/legal/models/legmulticlf_mnda_sections_paragraph_other_le_en_1.0.0_3.0_1720071478051.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document = nlp.DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document")\ + .setCleanupMode("shrink") + +embeddings = ( + nlp.E5Embeddings.pretrained( + "legembedding_e5_base", "en", "legal/models") + .setInputCols(["document"]) + .setOutputCol("sentence_embeddings") +) + +paragraph_classifier = ( + nlp.MultiClassifierDLModel.load("legmulticlf_mnda_sections_paragraph_other_le", "en", "legal/models") + .setInputCols(["sentence_embeddings"]) + .setOutputCol("class") +) + + +sentence_pipeline = nlp.Pipeline( + stages=[document, + embeddings, + paragraph_classifier]) + + + + +df = spark.createDataFrame([["'Destruction of Confidential Information. \xa0 Promptly (and in any event within five days) after the earlier of"]]).toDF("text") + +model = sentence_pipeline.fit(df) + +result = model.transform(df) + +result.select("text", "class.result").show(truncate=False) +``` + +
+ +## Results + +```bash ++-------------------------------------------------------------------------------------------------------------+---------------------+ +|text |result | ++-------------------------------------------------------------------------------------------------------------+---------------------+ +|'Destruction of Confidential Information. Promptly (and in any event within five days) after the earlier of|[RETURN_OF_CONF_INFO]| ++-------------------------------------------------------------------------------------------------------------+---------------------+ +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legmulticlf_mnda_sections_paragraph_other_le| +|Compatibility:|Legal NLP 1.0.0+| +|License:|Licensed| +|Edition:|Official| +|Input Labels:|[sentence_embeddings]| +|Output Labels:|[class]| +|Language:|en| +|Size:|14.0 MB| + +## References + +In-house MNDA + +## Benchmarking + +```bash +| | precision | recall | f1-score | support | +|---------------|-----------|--------|----------|---------| +| APPLIC_LAW | 0.91 | 0.91 | 0.91 | 58 | +| ASSIGNMENT | 0.96 | 0.87 | 0.91 | 52 | +| DEF_OF_CONF_INFO | 0.91 | 0.83 | 0.87 | 89 | +| DISPUTE_RESOL | 0.90 | 0.72 | 0.80 | 64 | +| EXCEPTIONS | 0.97 | 0.92 | 0.95 | 144 | +| NAMES_OF_PARTIES | 0.95 | 0.85 | 0.89 | 84 | +| NON_COMP | 0.80 | 0.80 | 0.80 | 25 | +| NON_SOLIC | 0.92 | 0.80 | 0.86 | 60 | +| PREAMBLE | 0.79 | 0.82 | 0.80 | 186 | +| REMEDIES | 0.91 | 0.79 | 0.85 | 76 | +| REQ_DISCL | 0.88 | 0.86 | 0.87 | 73 | +| RETURN_OF_CONF_INFO | 0.91 | 0.89 | 0.90 | 83 | +| TERMINATION | 0.98 | 0.86 | 0.92 | 96 | +| USE_OF_CONF_INFO | 0.85 | 0.85 | 0.85 | 47 | +| OTHER | 0.86 | 0.77 | 0.81 | 87 | +| **micro avg** | **0.90** | **0.84** | **0.87** | **1224**| +| **macro avg** | **0.90** | **0.84** | **0.87** | **1224**| +| **weighted avg** | **0.90** | **0.84** | **0.87** | **1224**| +| **samples avg** | **0.85** | **0.85** | **0.85** | **1224**| + +```