# Step 4: Summarize
Uses the recommendations and related paths to focus an LLM summerization on the relevant information.

![step4](resources/Step4_Summarize.png)

The result of this step includes:
- Summary nodes, connected to Content nodes with a SUMMARIZES relationship and to Recommendation nodes with a FOCUS_ON relationship

## Setup

In [None]:
import os

from opentldr import KnowledgeGraph
kg=KnowledgeGraph()

import opentldr.Domain as domain

## Parameters
OpenTLDR workflows use the notebook block tagged as "parameters" to inject variables (for example to change the LLM model).

> **Do Not Change Variable Names in the Parameters Block** you are welcome to change the values of these parameter variables, but please do not change their names. They are used elsewhere in the notebook and in other workflow processes.

In [None]:
#Parameters

# When run an LLM locally, you need to download the model to your local machine
llm_model_path = "./models/mistral-7b-openorca.gguf2.Q4_0.gguf"

llm_prompt = '''
    Given these facts: {knowledge}
    Write a concise summary of this content: {content}
    Focus the summary to answer this request: {request}
    CONCISE SUMMARY: '''

llm_device = 'cpu'
#llm_device = 'gpu'



In [None]:
# uncomment if you run this notebook repeatedly
#kg.delete_all_summaries()

## Verify that a GPT4ALL Model is availible on the specified path.
When you run an LLM localling using GPT4ALL, you need to download a model file to your local machine.

- Model files are large and not part of the git repository.
- You can download them from here: https://gpt4all.io under "Model Explorer" and put them in a "models" folder.
- Be sure to check the license for the model before using.


In [None]:
if not os.path.exists(llm_model_path):
    raise ValueError ("No LLM Model File was found at {path}".format(path=llm_model_path))

# Prepare the LLM for producing summerizations
- TODO: in my case I run the LLM local but connecting to a remote ChatGPT might be interesting too

In [None]:
from langchain.chains.summarize import load_summarize_chain
from langchain.llms import GPT4All
from langchain import PromptTemplate, LLMChain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate
from langchain.docstore.document import Document
from langchain.text_splitter import CharacterTextSplitter

## load the gpt4all model and build the prompt with lang chain

In [None]:
llm=GPT4All(model=llm_model_path, backend='gptj', device=llm_device, verbose=False)
print("Using device: {device}".format(device=llm.device))

### This is the default langchain summerization chain

In [None]:
sum_chain = load_summarize_chain(llm, chain_type="refine")

def summarize(content_text):
    # put the content into a format that works for prompts
    out=""
    text_splitter = CharacterTextSplitter()
    texts = text_splitter.split_text(content_text)
    docs = [Document(page_content=t) for t in texts[:3]]
    if len(docs) > 0:
        out= sum_chain.run(docs)
    return out.strip()

### This is a custom langchain summerization with focus chain

In [None]:
focused_chain = load_summarize_chain(llm, chain_type="refine")

# an attempt at starting a customized prompt for summary
def summarizeWithFocus(content_body,focus_text,info_request):
    out=""

    # Assemble a prompt to summarize the article
    all_text = llm_prompt.format(knowledge=focus_text, content=content_body, request=info_request).strip()

    print("Prompt: ",all_text.strip())
    text_splitter = CharacterTextSplitter()
    texts = text_splitter.split_text(all_text)
    docs = [Document(page_content=t) for t in texts[:3]]

    # run the summarization chain on the combined prompt
    if len(docs) > 0:
        out= focused_chain.run(docs)
    return out.strip()

#### Display the shortest path between each request and content
This next step does the same pairwise comparison but output sthe path

In [None]:
# produces a text string from a node or edge, if supported
def explain(something):
    if hasattr(something, 'to_text') and callable(something.to_text):
        return something.to_text()
    
    return ""

In [None]:
# Demonstrate the shortest paths that provide a context for the prompt
for request in kg.get_all_requests():
    for content in kg.get_all_content():
        path = kg.shortest_path(request,content)
        if path is not None:
            print("{score}\t{req} -?-> {art}".format(score=len(path),req=request.title,art=content.title))
            for h in path:
                print("\t",explain(h))
            print()

## Build the prompt and run the LLM
This includes the original content, the request it is tailored for, and the explaination of the shortest path through the knowledge graph used to connect them.

In [None]:
requests = kg.get_all_requests()

for request in requests:
    print("Request ({request}):".format(request=request.text))
    recommendations = kg.get_recommendations_by_request(request=request)
    
    for recommendation in recommendations:
        for content in recommendation.recommends:
            print ("Recommended {title} ({score}): {url}".format(title=content.title, score=round(recommendation.score,3),url=content.url))
            path_text=""
            path=kg.shortest_path(request,content)

            if path is not None:
                for hop in path:
                    path_text+=explain(hop)+" "
            
            original= content.text
            #print("\tOriginal Content:\t{text}".format(text=original))
            #print("\tPath Text:\t{text}".format(text=path_text))

            # Uncomment to see a summary without a path focus
            #summary= summarize(content.text)
            #print("summary ({reduction}):\t{text}".format(reduction=round(len(summary)/len(original),3),text=summary))

            focused= summarizeWithFocus(content.text,str(path_text),request.text)
            print("Summary reduced {reduction}% of content:\t{text}".format(reduction=round(((len(original)-len(focused))/len(original))*100,1),text=focused))

            kg.add_summary(text=focused,content=content,recommendation=recommendation)
            print("\n")

In [None]:
kg.close()