# **Summarization and Knowledge Extraction**

This file contains code for all the methods to generte summaries and Knowledge graph using ChatGroq, into a json file for each text input

The following image is a sample json output after the summarization and Knowledge extraction

<img src="Images\Jsonoutput.jpg" width=600px>

###### **NOTE** : Use langchain virtual env created!

##### Import necessary modules and library

In [22]:
from typing import List, Dict, Any, Set
from langchain_groq import ChatGroq
from langchain.prompts import ChatPromptTemplate
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from difflib import SequenceMatcher
from tqdm import tqdm

import os
import re
import time
import json

##### Create schema for the output from the LLM

In [23]:
# create a class for getting summary and relation entities from the chat groq
class Relation(BaseModel):
    source: str = Field(description="Source entity of the relation")
    target: str = Field(description="Target entity of the relation")
    relation: str = Field(description="Relation between the source and target entities")

class Entity(BaseModel):
    name: str = Field(description="Name of the entity", alias="entity")
    type: str = Field(description="Type of the entity")

class BankruptcyLevel(BaseModel):
    level: str = Field(description="Bankruptcy level of the company", alias="lvl")

class Summary(BaseModel):
    summary: str = Field(description="Summary of the input text")
    bankruptcy_level: BankruptcyLevel = Field(description="Bankruptcy level of the company")
    entities: List[Entity] = Field(description="Entities extracted from the input text")
    relations: List[Relation] = Field(description="Relations extracted from the input text")

In [3]:
class EntityNormalizer:
    def __init__(self):
        self.company_suffixes = {
            'limited', 'ltd', 'llc', 'inc', 'incorporated', 'corporation', 'corp',
            'enterprise', 'enterprises', 'company', 'co', 'group', 'holdings',
            'plc', 'ag', 'sa', 'nv', 'private', 'pvt'
        }
        self.known_entities = {}  # Maps normalized names to canonical names

    def normalize_name(self, name: str) -> str:
        name = name.lower()
        name = ' '.join(name.split())
        name = re.sub(r'[^\w\s&0-]', '', name)
        words = name.split()
        cleaned_words = [w for w in words if w not in self.company_suffixes]

        if cleaned_words:
            return ' '.join(cleaned_words)
        return name
    
    def are_similar_entities(self, name1, name2, threshold = 0.95):
        norm1 = self.normalize_name(name1)
        norm2 = self.normalize_name(name2)

        if norm1 == norm2:
            return True
        
        similarity = SequenceMatcher(None, norm1, norm2).ratio()
        return similarity >= threshold
    
    def get_canonical_name(self, name: str) -> str:
        normalized = self.normalize_name(name)

        for known_norm, canonical in self.known_entities.items():
            if self.are_similar_entities(normalized, known_norm):
                return canonical
            
        self.known_entities[normalized] = name
        return name

##### Summary and Knowledge Extractor template

In [24]:
# change the prompt to build kg based on the relation other than person to company !
# and also do chunking or just trim off the text to 6000 tokens and the daily token limit is 200000

class SnKExtractor:
    def __init__(self, api_key: str, model: str = "llama-3.1-70b-versatile"):
        os.environ["GROQ_API_KEY"] = api_key
        self.llm = ChatGroq(temperature=0.5, model_name = model)
        self.parser = PydanticOutputParser(pydantic_object=Summary)
        self.prompt = self._create_prompt()
        self.entity_normalizer = EntityNormalizer()
        # print(self.parser.get_format_instructions())

    def _create_prompt(self):
        template = """You are a financial Summarization and Knowledge Extraction System. Your task is to summarize and extract entities and relation from the given text and format them exactly according to the specified JSON structure. Only output the JSON structure, nothing else.

Extract the following information from the given financial text of a company:

Entities should be one of these types:
1. COMPANY
2. EVENT
3. PRODUCT

Relations should be one of these types:
1. PARTICIPATES_IN (COMPANY -> EVENT)
- Properties: Role (Organizer, Participant, Sponsor), Effect (-1 to 1)
2. PRODUCES (COMPANY -> PRODUCT)
- Properties: Production Volume, Production Start Date
3. MENTIONS (EVENTS -> COMPANY/PRODUCT)
- Properties: Sentiment (-1 to 1), Mention Count
4. OWNS (COMPANY -> COMPANY)
- Properties: Ownership Percentage, Acquisition Date
5. COMPETES_WITH (COMPANY -> COMPANY)
- Properties: Market Overlap Percentage
6. HAD_NEGATIVE_IMPACT_ON (EVENT -> COMPANY)
- Properties: Impact Level (0 to 1), Impact Type (Financial, Reputation, Legal), Reason
7. HAD_POSITIVE_IMPACT_ON (EVENT -> COMPANY)
- Properties: Impact Level (0 to 1), Impact Type (Financial, Reputation, Legal), Reason

Summary should be the main content of the whole text provided.
- Include the company's name and the bankruptcy level of the company.
- Include the reason for the bankruptcy and the impact of the bankruptcy on the company.
- Include the company's financial status and the company's future prospects.

Bankruptcy Level should be between (-1 to 1), where -1 is the lowest level of bankruptcy and 1 is the highest level of bankruptcy.
- Conclude corresponding to the sentiment of the company's financial status and future prospects.
- If the company would not be bankrupt, the bankruptcy level should be -1. (Healthy Company)
- If the comapny would be bankrupt, the bankruptcy level should be between 0.4 to 1. (Bankrupt Company)
- If the company is in a critical situation, the bankruptcy level should be between 0 to 0.4. (Critical Company)

Rules:
1. Use full Company names consistently 
2. Do not repeat the contents
3. Normalize company names (e.g., if "Apple Inc." and "Apple Corporation" refer to the same company, use one consistent name)
4. Output must be valid JSON format
5. Use only the predefined entities and relations type
6. The source and target have to be added as entities before forming a relation
7. Focus mainly on the impacts of events on the company in relation extractions
8. Impact level should be between 0 to 1 for both positive and negative impacts, 0 means no impact and 1 means the highest impact
9. Bankruptcy level should be between -1 to 1
10. Bankruptcy level should be concluded based on the sentiment of the company's financial status and future prospects
11. Bankruptcy level should be between 0.4 to 1 if the sentiment of the text is negative and -1 to 0 if the sentiment of the text is positive
12. Must create properties related to relations based on the given information in the text

Input text: {text}

{format_instructions}
"""
        return ChatPromptTemplate.from_template(template)
    
    def _clean_llm_output(self, output: str) -> str:
        try:
            start = output.find('{')
            end = output.rfind('}')
            if start == -1 or end == -1:
                raise ValueError("No JSON object found in output")
            json_str = output[start:end+1]
            return json.dumps(json.load(json_str))
        except Exception as e:
            raise ValueError(f"Failed to parse JSON from output: {e}")
        
    def _disambiguate_entities(self, result: Summary) -> Summary:
        name_mapping = {}
        unique_entities = {}

        for entity in result.entities:
            canonical_name = self.entity_normalizer.get_canonical_name(entity.name)
            name_mapping[entity.name] = canonical_name

            if canonical_name not in unique_entities:
                unique_entities[canonical_name] = entity.type
        
        new_entities = [
            Entity(name = name, type = etype) for name, etype in unique_entities.items()
        ]

        new_relations = []
        for _relation in result.relations:
            new_relation = Relation(
                source=name_mapping.get(_relation.source, _relation.source),
                target=name_mapping.get(_relation.target, _relation.target),
                relation=_relation.relation,
                properties=_relation.properties
            )
            new_relations.append(new_relation)
        
        return Summary(summary=result.summary, entities=new_entities, relations=new_relations, bankruptcy_level=result.bankruptcy_level)
    
    def extract_summary_and_knowledge(self, financial_snippet: str, output_dir: str) -> Summary:
        # Implement timer for each api call, can wait for a min after each request
        # time consuming but can work! Max reties can be 5
        retries = 0
        wait_time = 0
        while retries < 5:
            try:
                message = self.prompt.format_messages(
                    text=financial_snippet,
                    format_instructions=self.parser.get_format_instructions()
                )
                output = self.llm.invoke(message)

                #-----------------------------------
                # cleaned_output = self._clean_llm_output(output.model_dump()['content'])
                # print(cleaned_output)
                # result = self.parser.parse(cleaned_output)
                # print(result)
                # final_result = self._disambiguate_entities(result)
                # print(final_result)
                #-----------------------------------
                # os.makedirs(output_dir, exist_ok=True)
                # output_file = os.path.join(output_dir, "extraction_result.json")
                
                # with open(output_file, "w") as f:
                    # json.dump(final_result.model_dump(), f, indent=4)

                # return final_result
                retries = 0
                wait_time = 0
                return output
            except Exception as e:
                retries += 1
                wait_time += 60
                time.sleep(wait_time)
                print(F"Error during Summarization and Knowledge Extraction II: {str(e)}")
                return Summary(summary="Could not generate summary !", entities=[], relations=[], bankruptcy_level="0")

##### Testing on one file

In [None]:
# testing it on one file.
api_key = "your-api-key-goes-here"
summary_extractor = SnKExtractor(api_key)

bankrupt_file = r'Dataset\Final Dataset\Bankrupt\ADHUNIK_2015_MDA.txt'

with open(bankrupt_file, 'r') as f:
    text = f.read()

try:
    result = summary_extractor.extract_summary_and_knowledge(text, r".\output\bankrupt")
    os.makedirs(r".\output\bankrupt", exist_ok=True)
    output_file = os.path.join(r".\output\bankrupt", "ADHUNIK_2015_MDA.txt.json")
    with open(output_file, "w") as f:
        json.dump(result.model_dump(), f, indent=4)

except Exception as e:
    print(f"Error during Summarization and Knowledge Extraction III: {str(e)}")

In [12]:
# parsing the json output and saving it back to the file

extracted_text_path = r'output\bankrupt\ADHUNIK_2015_MDA.txt.json'
with open(extracted_text_path, 'r') as f:
    extracted_text = json.load(f)
    try:
        start = extracted_text['content'].find('{')
        end = extracted_text['content'].rfind('}')
        if start == -1 or end == -1:
            raise ValueError("No JSON object found in output")
        json_str = extracted_text['content'][start:end+1]
        # print(json_str) # use it to see the json output
        json_str = json.loads(json_str)
        with open(r'output\bankrupt\ADHUNIK_2015_MDA.txt.json', 'w') as f:
            json.dump(json_str, f, indent=4)

    except Exception as e:
        raise ValueError(f"Failed to parse JSON from output: {e}")

##### Summarization and Knowledge Extraction for all the files

do chunking!
```
Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 9469, please reduce your message size and try again.
```

In [None]:
# create summary and ner on all the files!
api_keys = [
    "api-key-1",
    "api-key-2",
    "api-key-3"]
api_key = "api-key-1"
summary_extractor = SnKExtractor(api_keys[2])

bankrupt_files_path = r'Dataset\Phase-II\Bankrupt'
healthy_files_path = r'Dataset\Phase-II\Healthy'

bankrupt_files_output = r".\output\bankrupt"
healthy_files_output = r".\output\healthy"

In [None]:
# trying tqdm
total_files = len(os.listdir(bankrupt_files_path))
import time

for file in tqdm(os.listdir(bankrupt_files_path), total=total_files, desc="Processing files"):
    time.sleep(1)

In [None]:
for file in tqdm(os.listdir(bankrupt_files_path), total=len(os.listdir(bankrupt_files_path)), desc="Processing JSON files"):
    with open(os.path.join(bankrupt_files_path, file), 'r', encoding='utf-8') as f:
        text = f.read()
        # change the above to readline() to read the first main chunk of the text to avoid 413 error
        # for 429 error code try to wait or do stuff in batches like first 20, then wait for 15 mins then 20 and so on...

        try:
            result = summary_extractor.extract_summary_and_knowledge(text, bankrupt_files_output)
            os.makedirs(bankrupt_files_output, exist_ok=True)
            output_file = os.path.join(bankrupt_files_output, f"{file}.json")
            with(open(output_file, 'w')) as f:
                json.dump(result.model_dump(), f, indent=4)
        except Exception as e:
            print(f"Error during Summarization and Knowledge Extraction IV: {str(e)}")


Processing JSON files:   2%|▏         | 5/201 [02:59<2:29:09, 45.66s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 7181, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:   3%|▎         | 6/201 [04:00<2:45:18, 50.86s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 9152, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:   3%|▎         | 7/201 [05:01<2:54:58, 54.12s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 9583, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:   4%|▍         | 8/201 [06:02<3:01:02, 56.28s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 12921, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:   6%|▋         | 13/201 [08:35<2:11:44, 42.05s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 13053, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  10%|█         | 21/201 [13:09<1:58:54, 39.64s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 7236, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  11%|█▏        | 23/201 [14:17<1:55:43, 39.01s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 15299, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  12%|█▏        | 24/201 [15:17<2:14:18, 45.53s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 6681, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  13%|█▎        | 26/201 [16:26<2:02:46, 42.10s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 8718, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  14%|█▍        | 28/201 [17:32<1:55:26, 40.04s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 8422, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  14%|█▍        | 29/201 [18:33<2:12:37, 46.27s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 8712, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  15%|█▍        | 30/201 [19:34<2:24:25, 50.67s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 9368, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  15%|█▌        | 31/201 [20:35<2:32:19, 53.76s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 9813, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  16%|█▌        | 32/201 [21:36<2:37:41, 55.98s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 9810, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  16%|█▋        | 33/201 [22:37<2:40:51, 57.45s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 6777, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  17%|█▋        | 34/201 [23:38<2:42:44, 58.47s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 6469, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  24%|██▍       | 48/201 [30:43<1:54:44, 44.99s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 7232, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  24%|██▍       | 49/201 [31:44<2:06:11, 49.82s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 10181, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  25%|██▍       | 50/201 [32:44<2:13:39, 53.11s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 9174, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  25%|██▌       | 51/201 [33:45<2:18:32, 55.42s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 8614, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  26%|██▋       | 53/201 [34:50<1:54:04, 46.25s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 7656, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  27%|██▋       | 54/201 [35:51<2:03:59, 50.61s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 7763, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  27%|██▋       | 55/201 [36:52<2:10:42, 53.72s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 12568, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  29%|██▉       | 58/201 [38:45<1:52:38, 47.26s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 7095, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  30%|███       | 61/201 [39:55<1:24:29, 36.21s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 9717, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  31%|███       | 62/201 [40:57<1:41:12, 43.69s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 11608, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  31%|███▏      | 63/201 [41:58<1:52:29, 48.91s/it]

Error during Summarization and Knowledge Extraction II: Error code: 413 - {'error': {'message': 'Request too large for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on tokens per minute (TPM): Limit 6000, Requested 10345, please reduce your message size and try again. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  38%|███▊      | 77/201 [51:05<1:49:38, 53.05s/it]

Error during Summarization and Knowledge Extraction II: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on : Limit 200000, Used 198492, Requested 6207. Please try again in 33m49.745s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': '', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  39%|███▉      | 78/201 [52:05<1:53:31, 55.37s/it]

Error during Summarization and Knowledge Extraction II: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on : Limit 200000, Used 198351, Requested 6796. Please try again in 37m3.276s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': '', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  39%|███▉      | 79/201 [53:06<1:55:54, 57.00s/it]

Error during Summarization and Knowledge Extraction II: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on : Limit 200000, Used 198210, Requested 7694. Please try again in 42m30.497s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': '', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  40%|███▉      | 80/201 [54:07<1:57:16, 58.16s/it]

Error during Summarization and Knowledge Extraction II: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on : Limit 200000, Used 198070, Requested 8513. Please try again in 47m23.472s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': '', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  40%|████      | 81/201 [55:08<1:57:46, 58.89s/it]

Error during Summarization and Knowledge Extraction II: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on : Limit 200000, Used 197929, Requested 2528. Please try again in 3m17.238999999s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': '', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  41%|████      | 82/201 [56:08<1:57:47, 59.39s/it]

Error during Summarization and Knowledge Extraction II: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on : Limit 200000, Used 197789, Requested 2901. Please try again in 4m57.819s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': '', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  41%|████▏     | 83/201 [57:09<1:57:32, 59.77s/it]

Error during Summarization and Knowledge Extraction II: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on : Limit 200000, Used 197649, Requested 4137. Please try again in 12m51.141s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': '', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  42%|████▏     | 84/201 [58:10<1:57:07, 60.06s/it]

Error during Summarization and Knowledge Extraction II: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on : Limit 200000, Used 197508, Requested 5365. Please try again in 20m40.88s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': '', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  42%|████▏     | 85/201 [59:10<1:56:32, 60.28s/it]

Error during Summarization and Knowledge Extraction II: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on : Limit 200000, Used 197367, Requested 6113. Please try again in 25m3.207s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': '', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  43%|████▎     | 86/201 [1:00:11<1:55:43, 60.38s/it]

Error during Summarization and Knowledge Extraction II: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on : Limit 200000, Used 197227, Requested 5275. Please try again in 18m0.569999999s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': '', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  43%|████▎     | 87/201 [1:01:12<1:54:50, 60.45s/it]

Error during Summarization and Knowledge Extraction II: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on : Limit 200000, Used 197087, Requested 6776. Please try again in 27m48.411s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': '', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  44%|████▍     | 88/201 [1:02:13<1:54:15, 60.66s/it]

Error during Summarization and Knowledge Extraction II: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on : Limit 200000, Used 196945, Requested 3907. Please try again in 6m7.827999999s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': '', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  45%|████▍     | 90/201 [1:03:25<1:33:09, 50.35s/it]

Error during Summarization and Knowledge Extraction II: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on : Limit 200000, Used 200171, Requested 4169. Please try again in 31m15.160999999s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': '', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  45%|████▌     | 91/201 [1:04:26<1:38:03, 53.49s/it]

Error during Summarization and Knowledge Extraction II: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on : Limit 200000, Used 200030, Requested 4228. Please try again in 30m39.761999999s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': '', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  46%|████▌     | 92/201 [1:05:27<1:41:08, 55.68s/it]

Error during Summarization and Knowledge Extraction II: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on : Limit 200000, Used 199891, Requested 4145. Please try again in 29m3.336s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': '', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  46%|████▋     | 93/201 [1:06:28<1:43:09, 57.31s/it]

Error during Summarization and Knowledge Extraction II: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on : Limit 200000, Used 199749, Requested 18111. Please try again in 2h8m35.277999999s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': '', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  47%|████▋     | 94/201 [1:07:30<1:44:34, 58.64s/it]

Error during Summarization and Knowledge Extraction II: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on : Limit 200000, Used 199606, Requested 24214. Please try again in 2h51m30.101999999s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': '', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  47%|████▋     | 95/201 [1:08:32<1:45:25, 59.67s/it]

Error during Summarization and Knowledge Extraction II: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on : Limit 200000, Used 199463, Requested 18554. Please try again in 2h9m42.963s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': '', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  48%|████▊     | 96/201 [1:09:48<1:53:22, 64.79s/it]

Error during Summarization and Knowledge Extraction II: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on : Limit 200000, Used 199285, Requested 6099. Please try again in 38m45.693s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': '', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


Processing JSON files:  48%|████▊     | 97/201 [1:10:50<1:50:31, 63.77s/it]

Error during Summarization and Knowledge Extraction II: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama-3.1-70b-versatile` in organization `org_01jb6awgzkfy7bjrz3wgh93byw` on : Limit 200000, Used 199143, Requested 3382. Please try again in 18m10.522s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': '', 'code': 'rate_limit_exceeded'}}
Error during Summarization and Knowledge Extraction IV: 1 validation error for Summary
bankruptcy_level
  Input should be a valid dictionary or instance of BankruptcyLevel [type=model_type, input_value='0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type


In [6]:
# parsing the json output and saving it back to the file
result_files = os.listdir(bankrupt_files_output)

for file in tqdm(result_files, total=len(result_files), desc="Rewriting Valid JSON"):
    extracted_text_path = os.path.join(bankrupt_files_output, file)
    with open(extracted_text_path, 'r', encoding='utf-8') as f:
        extracted_text = json.load(f)
        try:
            start = extracted_text['content'].find('{')
            end = extracted_text['content'].rfind('}')
            if start==-1 or end == -1:
                raise ValueError("No JSON object found in output")
            json_str  = extracted_text['content'][start: end+1]
            json_str = json.loads(json_str)
            with open(extracted_text_path, 'w', encoding='utf-8') as f:
                json.dump(json_str, f, indent=4)
        except Exception as e:
            raise ValueError(f"Failed to parse JSON from output: {e}")

Rewriting Valid JSON: 100%|██████████| 50/50 [00:00<00:00, 167.60it/s]


###### Run these as chunks to avoid error code 429 (request limit exceeded)

In [6]:
for file in tqdm(os.listdir(healthy_files_path)[:30], total=len(os.listdir(healthy_files_path)), desc="Processing Healthy Files"):
    with open(os.path.join(healthy_files_path, file), 'r', encoding='utf-8') as f:
        text = f.readline()

        try:
            result = summary_extractor.extract_summary_and_knowledge(text, healthy_files_output)
            os.makedirs(healthy_files_output, exist_ok=True)
            output_file = os.path.join(healthy_files_output, f"{file}.json")
            with(open(output_file, 'w')) as f:
                json.dump(result.model_dump(), f, indent=4)
        except Exception as e:
            print(f"Error during Summarization and Knowledge Extraction V: {str(e)}")

Processing Healthy Files:  11%|█         | 30/280 [30:41<4:15:48, 61.39s/it]


In [None]:
for file in tqdm(os.listdir(healthy_files_path)[31:60], total=len(os.listdir(healthy_files_path)[31:60]), desc="Processing Healthy Files"):
    with open(os.path.join(healthy_files_path, file), 'r', encoding='utf-8') as f:
        text = f.readline()

        try:
            result = summary_extractor.extract_summary_and_knowledge(text, healthy_files_output)
            os.makedirs(healthy_files_output, exist_ok=True)
            output_file = os.path.join(healthy_files_output, f"{file}.json")
            with(open(output_file, 'w')) as f:
                json.dump(result.model_dump(), f, indent=4)
        except Exception as e:
            print(f"Error during Summarization and Knowledge Extraction V: {str(e)}")

In [None]:
for file in os.listdir(healthy_files_path)[61:90]:
    with open(os.path.join(healthy_files_path, file), 'r', encoding='utf-8') as f:
        text = f.readline()

        try:
            result = summary_extractor.extract_summary_and_knowledge(text, healthy_files_output)
            os.makedirs(healthy_files_output, exist_ok=True)
            output_file = os.path.join(healthy_files_output, f"{file}.json")
            with(open(output_file, 'w')) as f:
                json.dump(result.model_dump(), f, indent=4)
        except Exception as e:
            print(f"Error during Summarization and Knowledge Extraction V: {str(e)}")

In [None]:
for file in os.listdir(healthy_files_path)[91:120]:
    with open(os.path.join(healthy_files_path, file), 'r', encoding='utf-8') as f:
        text = f.readline()

        try:
            result = summary_extractor.extract_summary_and_knowledge(text, healthy_files_output)
            os.makedirs(healthy_files_output, exist_ok=True)
            output_file = os.path.join(healthy_files_output, f"{file}.json")
            with(open(output_file, 'w')) as f:
                json.dump(result.model_dump(), f, indent=4)
        except Exception as e:
            print(f"Error during Summarization and Knowledge Extraction V: {str(e)}")

In [None]:
for file in os.listdir(healthy_files_path)[121:150]:
    with open(os.path.join(healthy_files_path, file), 'r', encoding='utf-8') as f:
        text = f.readline()

        try:
            result = summary_extractor.extract_summary_and_knowledge(text, healthy_files_output)
            os.makedirs(healthy_files_output, exist_ok=True)
            output_file = os.path.join(healthy_files_output, f"{file}.json")
            with(open(output_file, 'w')) as f:
                json.dump(result.model_dump(), f, indent=4)
        except Exception as e:
            print(f"Error during Summarization and Knowledge Extraction V: {str(e)}")

In [8]:
# parsing the json output and saving it back to the file
result_files = os.listdir(healthy_files_output)

for file in tqdm(result_files, total=len(result_files), desc="Rewriting Valid JSON"):
    extracted_text_path = os.path.join(healthy_files_output, file)
    with open(extracted_text_path, 'r', encoding='utf-8') as f:
        extracted_text = json.load(f)
        try:
            start = extracted_text['content'].find('{')
            end = extracted_text['content'].rfind('}')
            if start==-1 or end == -1:
                raise ValueError("No JSON object found in output")
            json_str  = extracted_text['content'][start: end+1]
            json_str = json.loads(json_str)
            with open(extracted_text_path, 'w', encoding='utf-8') as f:
                json.dump(json_str, f, indent=4)
        except Exception as e:
            raise ValueError(f"Failed to parse JSON from output: {e}")

Rewriting Valid JSON: 100%|██████████| 50/50 [00:00<00:00, 140.37it/s]


Sample Intermediate json file example:

```json
{
    "content": "```\n{\n  \"summary\": \"ABG Shipyard Limited experienced a challenging financial year in 2012-2013, with the global economy and shipbuilding industry facing significant downturns. The company's financial status was affected by the decline in new shipbuilding orders and the subsequent discontinuation of the Shipbuilding Subsidy Scheme. However, the Indian government has been supportive of the industry and has taken various initiatives to improve the efficiency and productivity of domestic shipbuilding companies. The company's future prospects remain bright, with opportunities for growth in the maritime business and potential for development of the shipping sector.\",\n  \"bankruptcy_level\": {\"lvl\": \"0\"},\n  \"entities\": [\n    {\"entity\": \"ABG Shipyard Limited\", \"type\": \"COMPANY\"},\n    {\"entity\": \"Nisar & Kumar Chartered Accountants\", \"type\": \"COMPANY\"},\n    {\"entity\": \"M. N. Ahmed\", \"type\": \"PERSON\"},\n    {\"entity\": \"F. R. No. 107117W\", \"type\": \"PRODUCT\"},\n    {\"entity\": \"Shipyards Association of India (SAI)\", \"type\": \"COMPANY\"},\n    {\"entity\": \"Associate Chambers of Commerce (ASSOCHAM)\", \"type\": \"COMPANY\"},\n    {\"entity\": \"Reserve Bank of India\", \"type\": \"COMPANY\"},\n    {\"entity\": \"Indian Government\", \"type\": \"COMPANY\"},\n    {\"entity\": \"Union budget 2013-14\", \"type\": \"EVENT\"}\n  ],\n  \"relations\": [\n    {\"source\": \"ABG Shipyard Limited\", \"target\": \"M. N. Ahmed\", \"relation\": \"EMPLOYS\"},\n    {\"source\": \"ABG Shipyard Limited\", \"target\": \"Shipyards Association of India (SAI)\", \"relation\": \"PARTICIPATES_IN\"},\n    {\"source\": \"ABG Shipyard Limited\", \"target\": \"Associate Chambers of Commerce (ASSOCHAM)\", \"relation\": \"PARTICIPATES_IN\"},\n    {\"source\": \"ABG Shipyard Limited\", \"target\": \"Reserve Bank of India\", \"relation\": \"PARTICIPATES_IN\"},\n    {\"source\": \"ABG Shipyard Limited\", \"target\": \"Indian Government\", \"relation\": \"PARTICIPATES_IN\"},\n    {\"source\": \"ABG Shipyard Limited\", \"target\": \"Union budget 2013-14\", \"relation\": \"PARTICIPATES_IN\"},\n    {\"source\": \"Nisar & Kumar Chartered Accountants\", \"target\": \"ABG Shipyard Limited\", \"relation\": \"MENTIONS\"},\n    {\"source\": \"M. N. Ahmed\", \"target\": \"ABG Shipyard Limited\", \"relation\": \"MENTIONS\"},\n    {\"source\": \"Shipyards Association of India (SAI)\", \"target\": \"ABG Shipyard Limited\", \"relation\": \"MENTIONS\"},\n    {\"source\": \"Associate Chambers of Commerce (ASSOCHAM)\", \"target\": \"ABG Shipyard Limited\", \"relation\": \"MENTIONS\"},\n    {\"source\": \"Reserve Bank of India\", \"target\": \"ABG Shipyard Limited\", \"relation\": \"MENTIONS\"},\n    {\"source\": \"Indian Government\", \"target\": \"ABG Shipyard Limited\", \"relation\": \"MENTIONS\"},\n    {\"source\": \"Union budget 2013-14\", \"target\": \"ABG Shipyard Limited\", \"relation\": \"MENTIONS\"}\n  ]\n}\n```",
    "additional_kwargs": {},
    "response_metadata": {
        "token_usage": {
            "completion_tokens": 723,
            "prompt_tokens": 2500,
            "total_tokens": 3223,
            "completion_time": 2.892,
            "prompt_time": 0.490955819,
            "queue_time": 0.02852289899999999,
            "total_time": 3.382955819
        },
        "model_name": "llama-3.1-70b-versatile",
        "system_fingerprint": "fp_b3ae7e594e",
        "finish_reason": "stop",
        "logprobs": null
    },
    "type": "ai",
    "name": null,
    "id": "run-b98d84d2-7a6c-421f-8ea5-0a2ce32ef718-0",
    "example": false,
    "tool_calls": [],
    "invalid_tool_calls": [],
    "usage_metadata": {
        "input_tokens": 2500,
        "output_tokens": 723,
        "total_tokens": 3223
    }
}
```

Final sample json file result.

```json
{
    "summary": "ABG Shipyard Limited is a company that operates in the shipbuilding industry. The company's financial year 2012-13 was challenging due to the global economic downturn. The Indian economy experienced a low growth rate, and the shipbuilding industry was affected by the decline in new shipbuilding orders. However, the company remains optimistic about its future prospects, citing the favorable demographics and the directional commitment towards liberalization in the Indian economy. The company's bankruptcy level is -1, indicating that it is a healthy company.",
    "bankruptcy_level": {
        "lvl": "-1"
    },
    "entities": [
        {
            "entity": "ABG Shipyard Limited",
            "type": "COMPANY"
        },
        {
            "entity": "Nisar & Kumar Chartered Accountants",
            "type": "COMPANY"
        },
        {
            "entity": "M. N. Ahmed",
            "type": "PERSON"
        },
        {
            "entity": "Indian Government",
            "type": "COMPANY"
        },
        {
            "entity": "Reserve Bank of India",
            "type": "COMPANY"
        },
        {
            "entity": "Associate Chambers of Commerce",
            "type": "COMPANY"
        },
        {
            "entity": "Shipyards Association of India",
            "type": "COMPANY"
        }
    ],
    "relations": [
        {
            "source": "ABG Shipyard Limited",
            "target": "M. N. Ahmed",
            "relation": "EMPLOYS"
        },
        {
            "source": "ABG Shipyard Limited",
            "target": "Indian Government",
            "relation": "PARTICIPATES_IN"
        },
        {
            "source": "Reserve Bank of India",
            "target": "ABG Shipyard Limited",
            "relation": "PARTICIPATES_IN"
        },
        {
            "source": "Associate Chambers of Commerce",
            "target": "ABG Shipyard Limited",
            "relation": "MENTIONS"
        },
        {
            "source": "Shipyards Association of India",
            "target": "ABG Shipyard Limited",
            "relation": "MENTIONS"
        }
    ]
}
```