<a href="https://colab.research.google.com/github/SUPERREALCODER/buisness_report_rag/blob/main/buisness_report.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip3 install llmware

Collecting llmware
  Downloading llmware-0.2.14-py3-none-any.whl (56.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.0/56.0 MB[0m [31m16.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting boto3==1.24.53 (from llmware)
  Downloading boto3-1.24.53-py3-none-any.whl (132 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.5/132.5 kB[0m [31m16.8 MB/s[0m eta [36m0:00:00[0m
Collecting openai>=1.0.0 (from llmware)
  Downloading openai-1.30.1-py3-none-any.whl (320 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m320.6/320.6 kB[0m [31m27.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pymongo>=4.7.0 (from llmware)
  Downloading pymongo-4.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (670 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m670.0/670.0 kB[0m [31m35.2 MB/s[0m eta [36m0:00:00[0m
Collecting Wikipedia-API==0.6.0 (from llmware)
  Downloading Wikipedia_API-0.6.0-py3-none-any.whl

In [None]:
import os
import re
from llmware.prompts import Prompt, HumanInTheLoop
from llmware.setup import Setup
from llmware.configs import LLMWareConfig
from llmware.retrieval import Query
from llmware.library import Library


In [None]:

"""     Fast Start Example #4 - RAG with Text Query

    This example shows a basic RAG recipe using text query combined with LLM prompt.

    We will show two different ways to achieve this basic recipe:

    -- Example 4A - this will integrate Library + Prompt - and is the most scalable general solution

    -- Example 4B - this will illustrate another capability of the Prompt class to add sources "inline"
     without necessarily a library in-place.  It is another useful tool when you want to be able to quickly
     pick up a document and start asking questions to it.

     Note: both of the examples are designed to achieve the same output.

"""



def example_4a_contract_analysis_from_library (model_name, verbose=False):

    """ Example #4a:  Main general case to run a RAG workflow from a Library """

    # Load the llmware sample files
    print (f"\n > Loading the llmware sample files...")

    contracts_path = "/root/document"

    contracts_lib = Library().create_new_library("example4_library")
    contracts_lib.add_files(contracts_path)

    # questions that we want to ask each contract
    question_list = [
    {"topic": "Company Overview", "llm_query": "What is the mission statement of the company?"},
    {"topic": "Financial Highlights", "llm_query": "What are the key financial highlights for the year?"},
    {"topic": "Revenue", "llm_query": "What is the total revenue reported for the year?"},
    {"topic": "Net Income", "llm_query": "What is the net income for the year?"},
    {"topic": "Cash Flow", "llm_query": "What are the net cash flows from operating activities?"},
    {"topic": "Auditor's Report", "llm_query": "What is the auditor's opinion on the financial statements?"},
    {"topic": "Risk Factors", "llm_query": "What are the primary risk factors mentioned?"},
    {"topic": "Market Position", "llm_query": "How does the company describe its market position?"},
    {"topic": "Board of Directors", "llm_query": "Who are the members of the board of directors?"},
    {"topic": "Executive Compensation", "llm_query": "What is the total compensation for the CEO?"},
    {"topic": "Strategic Initiatives", "llm_query": "What are the key strategic initiatives outlined for the upcoming year?"},
    {"topic": "Dividends", "llm_query": "What is the company's dividend policy?"},
    {"topic": "Environmental Initiatives", "llm_query": "What environmental initiatives did the company undertake?"},
    {"topic": "Corporate Social Responsibility", "llm_query": "What CSR activities are highlighted in the report?"},
    {"topic": "Shareholder Information", "llm_query": "How did the company's stock perform over the past year?"},
    {"topic": "Business Segments", "llm_query": "What are the different business segments and their revenues?"},
    {"topic": "Governance", "llm_query": "What governance practices are described in the report?"},
    {"topic": "Future Outlook", "llm_query": "What is the management's outlook for the next fiscal year?"},
    {"topic": "Legal Proceedings", "llm_query": "Are there any significant legal proceedings mentioned?"},
    {"topic": "Research and Development", "llm_query": "How much did the company spend on research and development?"}
]


    print (f"\n > Loading model {model_name}...")

    q = Query(contracts_lib)

    # get a list of all of the unique documents in the library

    # doc id list
    doc_list = q.list_doc_id()
    print("update: document id list - ", doc_list)

    # filename list
    fn_list = q.list_doc_fn()
    print("update: filename list - ", fn_list)

    prompter = Prompt().load_model(model_name)

    for i, doc_id in enumerate(doc_list):

        print("\nAnalyzing contract: ", str(i+1), doc_id, fn_list[i])

        print("LLM Responses:")

        for question in question_list:

            query_topic = question["topic"]
            llm_question = question["llm_query"]

            doc_filter = {"doc_ID": [doc_id]}
            query_results = q.text_query_with_document_filter(query_topic,doc_filter,result_count=5,exact_mode=True)

            if verbose:
                # this will display the query results from the query above
                for j, qr in enumerate(query_results):
                    print("update: querying document - ", query_topic, j, doc_filter, qr)

            source = prompter.add_source_query_results(query_results)

            #   *** this is the call to the llm with the source packaged in the context automatically ***
            responses = prompter.prompt_with_source(llm_question, prompt_name="default_with_context", temperature=0.3)

            #   unpacking the results from the LLM
            for r, response in enumerate(responses):
                print("update: llm response -  ", llm_question, re.sub("[\n]"," ", response["llm_response"]).strip())

            # We're done with this contract, clear the source from the prompt
            prompter.clear_source_materials()

    #   Save jsonl report to jsonl to /prompt_history folder
    print("\nPrompt state saved at: ", os.path.join(LLMWareConfig.get_prompt_path(),prompter.prompt_id))
    prompter.save_state()

    #   Save csv report that includes the model, response, prompt, and evidence for human-in-the-loop review
    csv_output = HumanInTheLoop(prompter).export_current_interaction_to_csv()
    print("\nCSV output saved at:  ", csv_output)

    return 0


def example_4b_contract_analysis_direct_from_prompt(model_name, verbose=False):

    """ Example #4b: Alternative implementation using prompt in-line capabilities without using a library """

    # Load the llmware sample files
    print(f"\n > Loading the llmware sample files...")

    contracts_path = "/root/document"

    # questions that we want to ask each contract
    question_list = [
    {"topic": "Company Overview", "llm_query": "What is the mission statement of the company?"},
    {"topic": "Financial Highlights", "llm_query": "What are the key financial highlights for the year?"},
    {"topic": "Revenue", "llm_query": "What is the total revenue reported for the year?"},
    {"topic": "Net Income", "llm_query": "What is the net income for the year?"},
    {"topic": "Cash Flow", "llm_query": "What are the net cash flows from operating activities?"},
    {"topic": "Auditor's Report", "llm_query": "What is the auditor's opinion on the financial statements?"},
    {"topic": "Risk Factors", "llm_query": "What are the primary risk factors mentioned?"},
    {"topic": "Market Position", "llm_query": "How does the company describe its market position?"},
    {"topic": "Board of Directors", "llm_query": "Who are the members of the board of directors?"},
    {"topic": "Executive Compensation", "llm_query": "What is the total compensation for the CEO?"},
    {"topic": "Strategic Initiatives", "llm_query": "What are the key strategic initiatives outlined for the upcoming year?"},
    {"topic": "Dividends", "llm_query": "What is the company's dividend policy?"},
    {"topic": "Environmental Initiatives", "llm_query": "What environmental initiatives did the company undertake?"},
    {"topic": "Corporate Social Responsibility", "llm_query": "What CSR activities are highlighted in the report?"},
    {"topic": "Shareholder Information", "llm_query": "How did the company's stock perform over the past year?"},
    {"topic": "Business Segments", "llm_query": "What are the different business segments and their revenues?"},
    {"topic": "Governance", "llm_query": "What governance practices are described in the report?"},
    {"topic": "Future Outlook", "llm_query": "What is the management's outlook for the next fiscal year?"},
    {"topic": "Legal Proceedings", "llm_query": "Are there any significant legal proceedings mentioned?"},
    {"topic": "Research and Development", "llm_query": "How much did the company spend on research and development?"}
]
    print(f"\n > Loading model {model_name}...")

    prompter = Prompt().load_model(model_name)

    for i, contract in enumerate(os.listdir(contracts_path)):

        # exclude potential mac os created file artifact in the samples folder path
        if contract != ".DS_Store":

            print("\nAnalyzing contract: ", str(i + 1), contract)

            print("LLM Responses:")

            for question in question_list:

                query_topic = question["topic"]
                llm_question = question["llm_query"]

                #   introducing "add_source_document"
                #   this will perform 'inline' parsing, text chunking and query filter on a document
                #   input is a file folder path, file name, and an optional query filter
                #   the source is automatically packaged into the prompt object

                source = prompter.add_source_document(contracts_path,contract,query=query_topic)

                if verbose:
                    print("update: document created source - ", source)

                #   calling the LLM with 'source' information from the contract automatically packaged into the prompt
                responses = prompter.prompt_with_source(llm_question, prompt_name="default_with_context",
                                                        temperature=0.3)

                #   unpacking the LLM responses
                for r, response in enumerate(responses):
                    print("update: llm response: ", llm_question, re.sub("[\n]", " ",
                                                                         response["llm_response"]).strip())

                # We're done with this contract, clear the source from the prompt
                prompter.clear_source_materials()

    # Save jsonl report to jsonl to /prompt_history folder
    print("\nupdate: Prompt state saved at: ", os.path.join(LLMWareConfig.get_prompt_path(), prompter.prompt_id))
    prompter.save_state()

    # Save csv report that includes the model, response, prompt, and evidence for human-in-the-loop review
    csv_output = HumanInTheLoop(prompter).export_current_interaction_to_csv()
    print("\nupdate: CSV output saved at - ", csv_output)

    return 0


if __name__ == "__main__":

    #   you can pick any model from the ModelCatalog
    #   we list a few representative good choices below

    LLMWareConfig().set_active_db("sqlite")

    example_models = ["llmware/bling-1b-0.1", "llmware/bling-tiny-llama-v0", "llmware/dragon-yi-6b-gguf"]

    #   to swap in a gpt-4 openai model - uncomment these two lines
    #   model_name = "gpt-4"
    #   os.environ["USER_MANAGED_OPENAI_API_KEY"] = "<insert-your-openai-key>"

    # use local cpu model
    model_name = example_models[0]

    #   two good recipes to address the use case

    #   first let's look at the main way of retrieving and analyzing from a library
    #example_4a_contract_analysis_from_library(model_name)

    #   second - uncomment this line, and lets run the "in-line" prompt way
    example_4b_contract_analysis_direct_from_prompt(model_name)


 > Loading the llmware sample files...

 > Loading model llmware/bling-1b-0.1...





Analyzing contract:  1 meta-facebook-ar-2023.pdf
LLM Responses:
[Errno 2] No such file or directory: '/root/llmware_data/tmp/parser_tmp/pdf_internal_test0.txt'




update: llm response:  What is the mission statement of the company? Not Found.
[Errno 2] No such file or directory: '/root/llmware_data/tmp/parser_tmp/pdf_internal_test0.txt'




update: llm response:  What are the key financial highlights for the year? •The company has a market capitalization of $1.2 billion. •The company has a market value of $1.2 billion. •The company has a market capitalization of $1.2 billion.
[Errno 2] No such file or directory: '/root/llmware_data/tmp/parser_tmp/pdf_internal_test0.txt'




update: llm response:  What is the total revenue reported for the year? Not Found.
[Errno 2] No such file or directory: '/root/llmware_data/tmp/parser_tmp/pdf_internal_test0.txt'




update: llm response:  What is the net income for the year? Not Found.
[Errno 2] No such file or directory: '/root/llmware_data/tmp/parser_tmp/pdf_internal_test0.txt'




update: llm response:  What are the net cash flows from operating activities? Net cash flows from operating activities are net of cash used in investing activities and cash used in financing activities.
[Errno 2] No such file or directory: '/root/llmware_data/tmp/parser_tmp/pdf_internal_test0.txt'




update: llm response:  What is the auditor's opinion on the financial statements? Not Found.
[Errno 2] No such file or directory: '/root/llmware_data/tmp/parser_tmp/pdf_internal_test0.txt'




update: llm response:  What are the primary risk factors mentioned? 1.  The company is in the midst of a restructuring. 2.  The company is in the midst of a restructuring. 3.  The company is in the midst of a restructuring. 4.  The company is in the midst of a restructuring. 5.  The company is in the midst of a restructuring. 6.  The company is in the midst of a restructuring. 7.  The company is in the midst of a restructuring. 8.  The company is in the midst of a restructuring. 9.  The company is in the midst of a restructuring. 10.  The company is in the midst of a restructuring. 11.  The company is in the midst of a restructuring. 12.  The company is in the midst of a restructuring. 13.  The company is in the midst of a restructuring. 14.  The company
[Errno 2] No such file or directory: '/root/llmware_data/tmp/parser_tmp/pdf_internal_test0.txt'




update: llm response:  How does the company describe its market position? Not Found.
[Errno 2] No such file or directory: '/root/llmware_data/tmp/parser_tmp/pdf_internal_test0.txt'




update: llm response:  Who are the members of the board of directors? Not Found.
[Errno 2] No such file or directory: '/root/llmware_data/tmp/parser_tmp/pdf_internal_test0.txt'




update: llm response:  What is the total compensation for the CEO? Not Found.
[Errno 2] No such file or directory: '/root/llmware_data/tmp/parser_tmp/pdf_internal_test0.txt'




update: llm response:  What are the key strategic initiatives outlined for the upcoming year? 1.  The Company will continue to focus on the development of its core business, and will continue to expand its customer base and market share. 2.  The Company will continue to focus on the development of its core business, and will continue to expand its customer base and market share. 3.  The Company will continue to focus on the development of its core business, and will continue to expand its customer base and market share. 4.  The Company will continue to focus on the development of its core business, and will continue to expand its customer base and market share. 5.  The Company will continue to focus on the development of its core business, and will continue to expand its customer base and market share.
[Errno 2] No such file or directory: '/root/llmware_data/tmp/parser_tmp/pdf_internal_test0.txt'




update: llm response:  What is the company's dividend policy? Not Found.
[Errno 2] No such file or directory: '/root/llmware_data/tmp/parser_tmp/pdf_internal_test0.txt'




update: llm response:  What environmental initiatives did the company undertake? Environmental initiatives, including the implementation of a new environmental policy, the implementation of a new environmental management system, and the implementation of a new environmental policy, were all undertaken.
[Errno 2] No such file or directory: '/root/llmware_data/tmp/parser_tmp/pdf_internal_test0.txt'




update: llm response:  What CSR activities are highlighted in the report? 1.  CSR activities are highlighted in the report.
[Errno 2] No such file or directory: '/root/llmware_data/tmp/parser_tmp/pdf_internal_test0.txt'




update: llm response:  How did the company's stock perform over the past year? The stock price has increased by $1.00 per share.
[Errno 2] No such file or directory: '/root/llmware_data/tmp/parser_tmp/pdf_internal_test0.txt'




update: llm response:  What are the different business segments and their revenues? 1.  Finance and Administration. 2.  Customer Service. 3.  Sales and Marketing. 4.  Sales and Distribution. 5.  Sales and Distribution. 6.  Sales and Distribution. 7.  Sales and Distribution. 8.  Sales and Distribution. 9.  Sales and Distribution. 10.  Sales and Distribution. 11.  Sales and Distribution. 12.  Sales and Distribution. 13.  Sales and Distribution. 14.  Sales and Distribution. 15.  Sales and Distribution. 16.  Sales and Distribution. 17.  Sales and Distribution. 18.  Sales and Distribution. 19.  Sales and Distribution. 20.  Sales and Distribution. 21.  Sales and Distribution. 22.  Sales and Distribution. 23.
[Errno 2] No such file or directory: '/root/llmware_data/tmp/parser_tmp/pdf_internal_test0.txt'




update: llm response:  What governance practices are described in the report? 1.  The governance practices are described in the report.
[Errno 2] No such file or directory: '/root/llmware_data/tmp/parser_tmp/pdf_internal_test0.txt'




update: llm response:  What is the management's outlook for the next fiscal year? Not Found.
[Errno 2] No such file or directory: '/root/llmware_data/tmp/parser_tmp/pdf_internal_test0.txt'




update: llm response:  Are there any significant legal proceedings mentioned? Yes, there are two significant legal proceedings.
[Errno 2] No such file or directory: '/root/llmware_data/tmp/parser_tmp/pdf_internal_test0.txt'
update: llm response:  How much did the company spend on research and development? Not Found.

update: Prompt state saved at:  /root/llmware_data/prompt_history/2bb1709d-42f5-4d64-9af7-af67adf62114

update: CSV output saved at -  {'report_name': 'interaction_report_Mon May 20 16:06:51 2024.csv', 'report_fp': '/root/llmware_data/prompt_history/interaction_report_Mon May 20 16:06:51 2024.csv', 'results': 0}


In [None]:
Setup().load_sample_files()

'/root/llmware_data/sample_files'

In [None]:
!mkdir new_folder