# Abstractive Summary
A "summary" is a shorted restatement of the information from the target content, ideally it maintains information with fewer words.

An "abstractive" summary is a shorter restatement of the information from content without any requirement to reuse the same words or phrasing. This is often implemented using a generative approach. This is usually compared to an "extractive" summary which reduces the content word count by selecting words and phrases from the original content and removing everything else.

Note: this notebook does NOT implement "tailoring" of the summarization.

The result of this step includes:
- Summary nodes, connected to Content nodes with a SUMMARIZES relationship and to Recommendation nodes with a FOCUS_ON relationship

## Setup

In [None]:
import os
import logging
import json
import dotenv

dotenv.load_dotenv()  # Load environment variables from .env file

## Parameters
OpenTLDR workflows use the notebook block tagged as "parameters" to inject variables (for example to change the LLM model).

> **Do Not Change Variable Names in the Parameters Block** you are welcome to change the values of these parameter variables, but please do not change their names. They are used elsewhere in the notebook and in other workflow processes.

In [None]:
#Parameters

# If you set the llm_config to None, it will use the environment variable LLM_CONFIG
# Otherwise, here are some options (to run an LLM locally, you will need to download the model to your local machine)
# llm_config = {'type': 'GPT4ALL', 'device':'gpu', 'model':'../LLM_Models/mistral-7b-openorca.gguf2.Q4_0.gguf'}
# llm_config = {'type': 'Ollama', 'model':'mistral:latest'}
# llm_config = {'type': 'ChatGPT', 'model':'gpt-4'}
llm_config = None

llm_prompt = '''
    Summarize this content: {content}
    '''

# Logging level ranges are (from least to most verbose): ERROR, WARN, INFO, DEBUG
logging_level = logging.INFO

# List of the UniqueIds to Ingest
list_of_uids = None

# level of unnecessary output
verbose = True


In [None]:
logging.getLogger("OpenTLDR").setLevel(logging_level)

import opentldr.Domain as domain
from opentldr import KnowledgeGraph

kg=KnowledgeGraph()

### Load Content

In [None]:
if list_of_uids is None:
    # default to getting all not-previously summarized Content nodes
    list_of_uids = kg.cypher_query("MATCH (n:Content) WHERE NOT (n)<-[:SUMMARIZES]-() RETURN n.uid")

if verbose:
    print ("Found {} Content nodes without Summaries".format(len(list_of_uids)))


## Run an LLM Model
This notebook uses the `Summarizer` class to run an LLM model. 
You can set the LLM model by changing the `llm_config` variable in the parameters block above or setting LLM_CONFIG in the .env file or environment variable.


In [None]:
import Summarizer
llm:Summarizer = Summarizer.getSummarizer(llm_config, logging_level=logging_level)

In [None]:
if verbose:
    print ("Summarizing {} Content Nodes.".format(len(list_of_uids)))

for uid in list_of_uids:
    content = kg.get_by_uid(uid)
    prompt_text = llm_prompt.format(content=content.text).strip()
 
    if verbose:
        print ("Summarizing {uid}: {title}".format(uid=uid,title=content.title))
        print ("Prompt is: {}".format(prompt_text))

    summary = llm.summarize(prompt_text)
    
    kg.add_summary(text=summary,content=content)
    
    if verbose:
        print("Summary reduced {reduction}% of content: {text}\n".format(reduction=round(((len(content.text)-len(summary))/len(content.text))*100,1),text=summary))



In [None]:
kg.close()