# Summarize Multiple Documents
Uses the recommendations and related paths to focus an LLM summerization on the relevant information.

This notebook depends on the Content nodes already having summaries included in metadata.
This is performed by either the "Abstractive Summary" notebook or the 

The result of this step includes:
- Summary nodes, connected to Content nodes with a SUMMARIZES relationship and to Recommendation nodes with a FOCUS_ON relationship

## Setup

In [None]:
import os
import logging


## Parameters
OpenTLDR workflows use the notebook block tagged as "parameters" to inject variables (for example to change the LLM model).

> **Do Not Change Variable Names in the Parameters Block** you are welcome to change the values of these parameter variables, but please do not change their names. They are used elsewhere in the notebook and in other workflow processes.

In [None]:
#Parameters

# When run an LLM locally, you need to download the model to your local machine
llm_config = {'type': 'GPT4ALL', 'device':'gpu', 'model':'../LLM_Models/mistral-7b-openorca.gguf2.Q4_0.gguf'}
llm_config = {'type': 'Ollama', 'device':'local', 'model':'mistral:latest'}

llm_prompt = '''
        How do the following articles address: "{request}"?
        In your response, first identify what the articles have in common.
        Second, your response should include highlights of the main differences between the articles.
        The articles below are preceded by their title and separated from the other summaries using '========'. \n
        Summarize the following articles:\n {multicontent}
    '''

# Logging level ranges are (from least to most verbose): ERROR, WARN, INFO, DEBUG
logging_level = logging.INFO

# List of the UniqueIds to Ingest
list_of_uids = None

# level of unnecessary output
verbose = True

In [None]:
logging.getLogger("OpenTLDR").setLevel(logging_level)

from opentldr import KnowledgeGraph
kg=KnowledgeGraph()

import opentldr.Domain as domain

### Load Content

In [None]:
if list_of_uids is None:
    list_of_uids = kg.get_all_node_uids_by_tag("Request")

if verbose:
    print ("Found {} Request nodes to attempt multi-document summarization.".format(len(list_of_uids)))

## Run an LLM Model
This cell setups of access to a (usually locally running) LLM based on the llm_config parameter.

Ollama: runs locally with the Ollama service
- You need to start the Ollama server (ollama serve)
- It will attempt to pull models based on config

GPT4ALL: runs locally with a .gguf formatted model.
- When you run an LLM localling using GPT4ALL, you need to download a model file to your local machine.
- Model files are large and not part of the git repository.
- You can download them from here: https://gpt4all.io under "Model Explorer" and put them in a "models" folder.

All:
- Be sure to check the license for the model before using.

In [None]:
from SummarizeWithGPT4All import SummarizeWithGPT4All
from SummarizeWithLocalOllama import SummarizeWithLocalOllama

llm = None

match (llm_config['type'].lower()):
    case "gpt4all": 
        llm = SummarizeWithGPT4All(llm_config['model'],device=llm_config['device'], logging_level=logging_level)

    case "ollama":
        # TODO config for local and remote ollama services
        llm = SummarizeWithLocalOllama(model_name=llm_config['model'], logging_level=logging_level)
    case _:
        raise ValueError("No LLM type support for {}.".format(llm_config['type']))

## Build the prompt and run the LLM
This includes the original content, the request it is tailored for, and the explaination of the shortest path through the knowledge graph used to connect them.

In [None]:
def run_group_summary(kg:KnowledgeGraph, request:domain.Request) -> str:
    len_original=0
    len_summaries=0
    multisum=""
    multidoc_summary="Sorry - No Group Summary Created."
    print("summarizing recommendations for request {}".format(request.title))

    recommendations = kg.get_recommendations_by_request(request)
    # in the case there is only one, just sent that summary instead
    if len(recommendations) == 1:
        return "Only one recommendation."

    for recommendation in recommendations:
        for content in recommendation.recommends:
            print ("Recommended {title} ({score}): {url}".format(title=content.title, score=round(recommendation.score,3),url=content.url))
            len_original += len(content.text)

            if content.metadata is not None:
                summary=content.metadata["summary"]
                len_summaries += len(summary)

                if len(summary) > 0:
                    multisum+="\n\n========n\n\nArticle Title: {}:\n".format(content.title)
                    multisum+="{}\n".format(summary)

            else:
                print("skipping - no metadata summary found")

    if len(multisum) > 0:
        prompt_text = llm_prompt.format(multicontent=multisum, request=request.text)
        multidoc_summary= llm.summarize(prompt_text)
        
        print("\nSummary reduced {reduction}% of content:\t{text}".format(reduction=round(((len_original-len(multidoc_summary))/len_original)*100,1),text=multidoc_summary))

    return multidoc_summary

In [None]:
for uid in list_of_uids:
    request = kg.get_request_by_uid(uid)

    summary = run_group_summary(kg, request)

    tldr = kg.get_tldr_by_request(request)

    if tldr is not None:
        tldr.metadata["summary"]= summary
        tldr.save()
    else:
        print(summary)
    

In [None]:
kg.close()