In [1]:
from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain


callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])


llm = LlamaCpp(
    model_path="/Users/altaf/Projects/LLM/Llama/llama-models/llama-2-7b-chat.ggmlv3.q4_0.bin",
    input={"temperature": 0.0, "max_length": 2000, "top_p": 1},
    callback_manager=callback_manager,
    verbose=True,
)

text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size = 1000,
    chunk_overlap  = 20,
    length_function = len,
    add_start_index = True,
)

llama.cpp: loading model from /Users/altaf/Projects/LLM/Llama/llama-models/llama-2-7b-chat.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: mem required  = 5185.72 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size  =  256.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0

In [2]:
with open("./clean.txt") as f:
    script = f.read()
texts = text_splitter.split_text(script)

In [3]:
from langchain.docstore.document import Document

docs = [Document(page_content=t) for t in texts[:5]]

In [4]:
from langchain.chains.summarize import load_summarize_chain

In [5]:
#chain = load_summarize_chain(llm, chain_type="map_reduce",return_intermediate_steps=True)

chain = load_summarize_chain(llm, chain_type="map_reduce", return_intermediate_steps=True)

#chain.run(docs)

chain({"input_documents": docs}, return_only_outputs=True)

  Ariful Chowdhury starts a meeting by asking where they stand on the updated FSD shared with them, and if there are any comments or issues related to it. He also mentions that they need to review the FSD further before their next meeting with Moody's.


llama_print_timings:        load time =  2127.69 ms
llama_print_timings:      sample time =    75.36 ms /    59 runs   (    1.28 ms per token,   782.94 tokens per second)
llama_print_timings: prompt eval time = 55979.79 ms /   311 tokens (  180.00 ms per token,     5.56 tokens per second)
llama_print_timings:        eval time = 15365.56 ms /    58 runs   (  264.92 ms per token,     3.77 tokens per second)
llama_print_timings:       total time = 71790.29 ms
Llama.generate: prefix-match hit


 Ariful Chowdhury and Tarun Malik are communicating via voice or video call about the revised FSD (Financial Statement Disclosure) that Moody's team has shared after a meeting on Tuesday. Ariful wants to know if Tarun has had the chance to go through the updated FSD and if he has any comments. He also mentions that technical aspects are being looked after by brother Tarun and brother I, Walker.


llama_print_timings:        load time =  2127.69 ms
llama_print_timings:      sample time =   123.16 ms /    97 runs   (    1.27 ms per token,   787.59 tokens per second)
llama_print_timings: prompt eval time = 61660.52 ms /   325 tokens (  189.72 ms per token,     5.27 tokens per second)
llama_print_timings:        eval time = 25163.17 ms /    96 runs   (  262.12 ms per token,     3.82 tokens per second)
llama_print_timings:       total time = 87480.15 ms
Llama.generate: prefix-match hit


 The speaker is discussing a report that needs improvement in terms of structure. They mention three layers of the report and how the front end needs to be structured better.


llama_print_timings:        load time =  2127.69 ms
llama_print_timings:      sample time =    45.55 ms /    35 runs   (    1.30 ms per token,   768.39 tokens per second)
llama_print_timings: prompt eval time = 58264.84 ms /   309 tokens (  188.56 ms per token,     5.30 tokens per second)
llama_print_timings:        eval time =  8936.72 ms /    34 runs   (  262.84 ms per token,     3.80 tokens per second)
llama_print_timings:       total time = 67502.12 ms
Llama.generate: prefix-match hit



The reports in the system are divided into three layers: category, reports, and data layer. There are common data layers across all portfolio reports, but each report has its own segregated section in the data layer. However, there is no segregation between the application layer and the data layer, which makes it difficult for the business to understand the end result and for the developer to work on it.


llama_print_timings:        load time =  2127.69 ms
llama_print_timings:      sample time =   108.41 ms /    85 runs   (    1.28 ms per token,   784.08 tokens per second)
llama_print_timings: prompt eval time = 52339.88 ms /   278 tokens (  188.27 ms per token,     5.31 tokens per second)
llama_print_timings:        eval time = 22722.86 ms /    84 runs   (  270.51 ms per token,     3.70 tokens per second)
llama_print_timings:       total time = 75651.39 ms
Llama.generate: prefix-match hit




The meeting discusses the document related to the project with Brother Nizar and Brother Cisse. Tarun Malik (C) highlights that the document is not structured properly and needs more work. He also mentions that there are assumptions and constraints missing, which need to be addressed. Additionally, he raises data issues raised from Mega and suggests working on them.


llama_print_timings:        load time =  2127.69 ms
llama_print_timings:      sample time =   103.86 ms /    80 runs   (    1.30 ms per token,   770.25 tokens per second)
llama_print_timings: prompt eval time = 53847.68 ms /   282 tokens (  190.95 ms per token,     5.24 tokens per second)
llama_print_timings:        eval time = 21656.57 ms /    79 runs   (  274.13 ms per token,     3.65 tokens per second)
llama_print_timings:       total time = 76055.95 ms
Llama.generate: prefix-match hit



Meeting discusses updated FSD shared with Moody's team. Ariful Chowdhury asks for comments or issues related to the updated FSD. Tarun Malik is also communicated via voice/video call about the revised FSD and raises concerns about the document's structure, assumptions, and constraints. The meeting also discusses a report that needs improvement in terms of structure.


llama_print_timings:        load time =  2127.69 ms
llama_print_timings:      sample time =   108.68 ms /    85 runs   (    1.28 ms per token,   782.13 tokens per second)
llama_print_timings: prompt eval time = 68200.10 ms /   368 tokens (  185.33 ms per token,     5.40 tokens per second)
llama_print_timings:        eval time = 23047.57 ms /    85 runs   (  271.15 ms per token,     3.69 tokens per second)
llama_print_timings:       total time = 91851.04 ms


{'intermediate_steps': ["  Ariful Chowdhury starts a meeting by asking where they stand on the updated FSD shared with them, and if there are any comments or issues related to it. He also mentions that they need to review the FSD further before their next meeting with Moody's.",
  " Ariful Chowdhury and Tarun Malik are communicating via voice or video call about the revised FSD (Financial Statement Disclosure) that Moody's team has shared after a meeting on Tuesday. Ariful wants to know if Tarun has had the chance to go through the updated FSD and if he has any comments. He also mentions that technical aspects are being looked after by brother Tarun and brother I, Walker.",
  ' The speaker is discussing a report that needs improvement in terms of structure. They mention three layers of the report and how the front end needs to be structured better.',
  '\nThe reports in the system are divided into three layers: category, reports, and data layer. There are common data layers across all 

In [21]:
resp

" This article discusses the need for increased coordination of existing knowledge, the need to liaise with Moody's team, and the need to understand the structure of the Moody's team in order to be in sync with it. It also discusses the need to enforce certain regulations, the need to take action in the last two to three weeks, and the need to understand the risk associated with a financial transaction known as a swap. It also looks at the work of Alfa Issa Goumandakoye, a Malian artist, and evaluates the value of his work, as well as the risks associated with swap exposure and how to manage them. Ariful Chowdhury and Alfa Issa Goumandakoye are discussing a plan of action and need to discuss it further."

In [19]:
prompt_template = """Write a concise summary of the following:

{text}

"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
chain = load_summarize_chain(llm, chain_type="map_reduce", prompt=PROMPT)
chain.run(docs)

InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 13971 tokens (13715 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.