# Abstractive Summarization

This notebook contains a sample for abstractive summarization using chain of density prompting.

In [1]:
!pip install python-dotenv google-generativeai



In [2]:
from dotenv import load_dotenv
import logging
import pandas as pd

# Set up logging
logging.basicConfig(level=logging.INFO)

load_dotenv()

True

In [3]:
from biagen.llm import CohereProvider, GroqProvider, OpenAIProvider, AIStudioProvider

#llm = CohereProvider.from_env()
#llm = GroqProvider.from_env()
llm = AIStudioProvider.from_env()

In [4]:
from typing import List

logger = logging.getLogger('abstractive_summary')

def abstractive_summary(article: str, iterations: int = 3) -> List[str]:
    summaries = []
    
    header = f"""You are a copy assistant at an outdoorsman magazine, with a master's degree in journalism. Your task is to provide analysis of the following article with precision.
    
## Article

```article
{article}
```
"""
    
    ## Analysis
    
    prompt_a = f"""{header}
We first must perform an analysis on the article, to identify the most important information.

## Analysis (the most significant topics, with one sentence commentary)

"""

    analysis = llm.generate_one(prompt_a, max_tokens=1024, temperature=0.7, stop_sequences=None)
    alen = len(analysis.split())

    logger.debug(f"analysis {alen} {analysis}")

    # Summarize

    prompt_s = f"""{header}
Given the article and the following analysis, provide a detailed, erudite, succinct, and accurate summary.

## Analysis

{analysis}

## Summary (here is a 5 paragraph summary)

"""

    summary = llm.generate_one(prompt_s, max_tokens=1024, temperature=0.5, stop_sequences=None)

    summaries.append(summary)
    slen = len(summary.split())
    logger.debug(f"summary {slen} {summary}")

    ### Improvement Loop

    for i in range(iterations):
        # Get our last summary

        summary = summaries[-1]

        # Check for missing information in the summary

        prompt_m = f"""{header}
## Current Summary

{summary}

Given the article and the current summary, identify missing information or ways to improve the summary.

## Missing Information (here are 12 entries of novel information not contained in the summary)

"""

        missing = llm.generate_one(prompt_m, max_tokens=1024, temperature=0.7, stop_sequences=None)
        mlen = len(missing.split())
        logger.debug(f"missing {mlen} {missing}")

        # Check for most important information in the summary

        prompt_r = f"""{header}
## Current Summary

{summary}

## Missing Information

{missing}

Using the current summary and the identified missing information, determine if our summary is six (6) paragraphs, identify what information is important and unimportant.

We should be sure to include important facts like size, phone number, unique features, wildlife, history, and contact information.

## Most Important and Interesting Information (here are 12 entries, with one sentence commentary)

"""

        important = llm.generate_one(prompt_r, max_tokens=2048, temperature=0.5, stop_sequences=None)
        ilen = len(important.split())
        logger.debug(f"important {ilen} {important}")

        # Resummarize

        prompt_r = f"""{header}
## Current Summary

{summary}

## Missing Information

{missing}

## Ideas To Consider

{important}

Do not mention the Wildlife Code.

Contact information should be included.

Cover overview, location, features, history, wildlife, game, camping, recreation, contact and management

Do not be brief.

Should be six (6) paragraphs. 

Using the current summary and the identified missing information, here is an improved summary of the article.

## Improved Summary (here is a reorganized and expanded version, including missing information and ideas; six (6) detailed paragraphs)

"""

        resummarized = llm.generate_one(prompt_r, max_tokens=2048, temperature=0.5, stop_sequences=None)
        rlen = len(resummarized.split())
        logger.debug(f"resummarized {rlen} {resummarized}")
        #print(prompt_r, resummarized, rlen)

        summaries.append(resummarized)

    return summaries

In [5]:
# load some data
results_file = 'assets/areas_results.ais.jsonl'
details_file = 'assets/areas_details.ais.jsonl'

scraped_lakes = pd.read_csv("assets/random_100_areas.tsv", sep='\t')

In [6]:
import os

from biagen.io import append_to_jsonl, read_jsonl

results = read_jsonl(results_file) if os.path.exists(results_file) else []
processed = set(r['area_id'] for r in results)
len(results)

0

In [7]:
model_desc = f"{llm.model}"

for i, row in scraped_lakes.iterrows():
    category = row['category']
    subcategory = row['subcategory']
    area_id = row['area_id']
    area_name = row['area_name']
    article = row['area_info']

    if area_id in processed:
        continue

    summaries = abstractive_summary(article)

    append_to_jsonl({
        'area_id': area_id,
        'area_name': area_name,
        'category': category,
        'subcategory': subcategory,
        'summary': summaries[-1],
        'source': llm.model,
    }, results_file)

    append_to_jsonl({
        'area_id': area_id,
        'area_name': area_name,
        'category': category,
        'subcatgory': subcategory,
        'summary': summaries[-1],
        'summaries': summaries,
        'source': llm.model,
    }, details_file)

    print('=' * 50)
    print(f"{area_name} ({area_id})")
    print('=' * 50)
    for i, s in enumerate(summaries):
        print(f"Round {i}: {s}")
        print('=' * 50)
    print()




INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 3.069653034210205s
INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 2.0461480617523193s
INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 3.3775460720062256s
INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 4.812037706375122s
INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 2.9770970344543457s
INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 4.450587034225464s
INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 4.83302903175354s
INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 3.4013640880584717s
INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 4.882368087768555s
INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 4.522252798080444s
INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 2.5720088481903076s


Bolivar Forestry Office (bolivar-forestry-office)
Round 0: ## Summary of the Bolivar Forestry Office Article

The Bolivar Forestry Office article serves as a comprehensive guide to regulations and activities permitted on the Bolivar Forestry Office area. It provides essential contact information, including the office's address, phone number, and operating hours, making it easily accessible for those seeking information or assistance. The article highlights a comprehensive set of general regulations, emphasizing the importance of consulting the Missouri Wildlife Code for detailed information on specific activities. 

A significant portion of the article focuses on outlining prohibited activities, such as destruction of property, guiding for pay, and the use of fireworks. This detailed list ensures visitors understand the boundaries of acceptable behavior and avoid potential violations. The article also emphasizes the need for special use permits for activities like trapping, field trial

INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 1.6372196674346924s
INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 1.7399928569793701s
INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 5.323991060256958s
INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 3.991055727005005s
INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 4.0955259799957275s
INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 4.299043893814087s
INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 5.017024040222168s
INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 3.8587708473205566s
INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 4.9460248947143555s
INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 5.23703408241272s
INFO:root:<function AIStudioProvider.generate at 0x1274bde40>: Exec: 3.6689679622650146s


Branson Forestry Office (branson-forestry-office)
Round 0: ## Summary of the Branson Forestry Office Article

This article provides a comprehensive overview of the Branson Forestry Office and its associated conservation area, serving as a valuable resource for anyone interested in visiting the area. The article clearly outlines the office's location, hours of operation, and contact information, making it easy for visitors to find and reach the office. 

The article's primary focus is on the detailed regulations governing activities within the conservation area. A comprehensive list of permitted activities is provided, including hiking, sightseeing, camping, hunting, fishing, and vehicle use. The article also outlines a specific list of prohibited activities, such as digging, guiding for pay, and placing game cameras. 

Furthermore, the article explains the need for special use permits for certain activities, including commercial use, field trials, and trapping. It directs readers to th

KeyboardInterrupt: 