# LLM comprehensive summary

Anton Antonov   
[MathematicaForPrediction at WordPress](https://mathematicaforprediction.wordpress.com)   
[RakuForPrediction at WordPress](https://rakuforprediction.wordpress.com)   
April-June 2025

In [None]:
from datetime import date
f"Generated on {date.today()}"

---

## Introduction

In this computational Markdown file we apply different LLM prompts in order to comprehensively (and effectively) summarize large texts.

**Remark:** This Markdown file is intended to serve as a template for the initial versions of comprehensive text analyses.

**Remark:** This Markdown template has corresponding notebooks versions: 
(i) [Wolfram Language notebook](https://community.wolfram.com/groups/-/m/t/3448842), 
(ii) [Raku-Jupyter notebook](),
(iii) [Python-Jupyter notebook]().

**Remark:** All remarks in italics are supposed to be removed.

---

## Setup

Load packages:

In [None]:
from LLMFunctionObjects import *
from LLMPrompts import *

from pytubefix import Playlist, YouTube
from youtube_transcript_api import YouTubeTranscriptApi

from IPython.display import HTML, Markdown, display
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
import re
import os

Define LLM access configurations:

In [None]:
conf4o_mini = llm_configuration('ChatGPT', model = 'gpt-4o-mini', max_tokens = 8192,  temperature = 0.5)
conf41_mini = llm_configuration('ChatGPT', model = 'gpt-4.1-mini', max_tokens = 8192,  temperature = 0.5)

#conf_gemini_flash = llm_configuration('Gemini', model = 'gemini-2.0-flash', max_tokens = 8192, temperature = 0.5)

# Choose an LLM access configuration
conf = conf4o_mini

Choose an output language:

In [None]:
lang = 'English'

Get transcript:

In [None]:
def get_transcript(video_url, languages = ("en", )):
    try:
        # Check if the input has "http://" prefix
        if "http://" in video_url or "https://" in video_url:
            # Extract the video ID from the URL
            video_id = video_url.split('v=')[1]
        else:
            # Assume the input is already a video ID
            video_id = video_url

        # Retrieve the transcript for the video
        transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=languages)

        # Concatenate the text from each transcript segment
        transcript_text = ' '.join([segment['text'] for segment in transcript])

        return transcript_text

    except Exception as e:
        print(f'An error occurred: {str(e)}')
        return None

Text stats function:

In [None]:
def text_stats(text):
    return {
        "char_count": len(text),
        "word_count": len(text.split()),
        "line_count": len(text.splitlines())
    }

Graph from edges:

In [None]:
def create_graph(edges):
    """
    Generates a directed graph from a list of dictionaries representing edges.

    Args:
        edges: A list of dictionaries, where each dictionary has keys "from" and "to".

    Returns:
        A networkx DiGraph object.
    """
    graph = nx.DiGraph()
    for edge in edges:
        graph.add_edge(edge['from'], edge['to'])
    return graph

-----

## Ingestion

**Remark:** Chose whether to analyze a text from a file or to analyze the transcript of a YouTube video.

Ingest text from a file:

In [None]:
# fileName = "";
# with open('file.txt', 'r') as file:
#     txtFocus = file.read()
# text_stats(txtFocus)

Ingest the transcript of a YouTube video:

In [None]:
txtFocus = get_transcript("ewU83vHwN8Y")
text_stats(txtFocus)

**Remark:** The text ingested above is the transcript of the video ["Live CEOing Ep 886: Design Review of LLMGraph"](https://www.youtube.com/watch?v=ewU83vHwN8Y).

**Remark:** The transcript of a YouTube video can be obtained in several ways:

- Use the Python package [“pytube”](https://pypi.org/project/pytube/) (or [“pytubefix”](https://pypi.org/project/pytubefix/)) 
- On macOS, download the audio track and use the program [hear](https://sveinbjorn.org/hear) 
- Use the Raku package ["WWW::YouTube"](https://raku.land/zef:antononcube/WWW::YouTube)


---

## Summary

Summarize the text:

In [None]:
summary = llm_synthesize([llm_prompt("Summarize"), txtFocus, llm_prompt('Translated')('English')], e = conf)
display(Markdown(summary))

---

## Tabulate topics

Extract and tabulate text topics:

In [None]:
tblThemes = llm_synthesize(llm_prompt("ThemeTableJSON")(txtFocus, "article", 30), e = conf, form = sub_parser('JSON', drop=True))
display(HTML(pandas.DataFrame(tblThemes).to_html(index=True)))

---

## Mind-map

In [None]:
edges = llm_synthesize([
        "Make a JSON array with the graph edges of a concise mind-map for the following text.",
        "Each edge is a dictionary with keys 'from' and 'to'.",
        "Make sure the graph is connected and represents a mind-map.",
        "TEXT START",
        txtFocus,
        "TEXT END",
        llm_prompt("NothingElse")("JSON")
    ], 
    e=conf,
    form = sub_parser('JSON', drop=True)
)

graph = create_graph(edges)

nx.draw(graph, with_labels=True)
plt.show()

------

## Sophisticated feedback

Give sophisticated feedback using different “idea hats”:

In [None]:
sophFeed = llm_synthesize(llm_prompt("SophisticatedFeedback")(txtFocus, 'HTML'), e = conf)

sophFeed = re.sub(r'^```html', '', sophFeed, flags=re.MULTILINE)
sophFeed = re.sub(r'^```', '', sophFeed, flags=re.MULTILINE)

display(Markdown(sophFeed))

---

## Specific questions

Get answers to specific questions (if any.)

In [None]:
questions = """
What technology? What it is used for?"
"""

In [None]:
ans = llm_synthesize([questions, txtFocus], e = conf)
display(Markdown(ans))

#### Structured

In [None]:
questions2 = ["Who is talking?", "Which technology is discussed?", "What product(s) are discussed?", "Which versions?"]

ans2 = llm_synthesize([
    "Give a question-answer dictionary for the questions:", 
    "\n".join(questions2),
    "Over the text:",
    txtFocus, 
    llm_prompt('JSON')
    ], 
    e = conf, form = sub_parser('JSON', drop=True)
)

display(HTML(pandas.DataFrame(ans2).to_html(index=False)))

---

## Extracted wisdom or cynical insights

**Remark:** Choose one of the prompts 
[“ExtractArticleWisdom”](https://www.wolframcloud.com/obj/antononcube/DeployedResources/Prompt/ExtractArticleWisdom/) or 
[“FindPropagandaMessage”](https://www.wolframcloud.com/obj/antononcube/DeployedResources/Prompt/FindPropagandaMessage/).
(The latter tends to be more fun.)

In [None]:
prompt = llm_prompt("ExtractArticleWisdom")() if True else llm_prompt("FindPropagandaMessage")
text_stats(prompt)

In [None]:
sumIdea = llm_synthesize([
        prompt,
        'TEXT START',
        txtFocus,
        'TEXT END'
     ], e = conf);

sumIdea = re.sub(r'^^#', '###', sumIdea, flags=re.MULTILINE)

display(Markdown(sumIdea))