## Abstract summarization 

This notebook demonstrates the practical use of an LLM for research assistance, specifically showing how to summarize research abstracts. The code facilitates loading text files containing research abstracts from a data directory, processing them with the LLM, and saving structured summaries for further analysis.

You can customize the prompts and parameters to suit your specific research needs or modify the code to work with different document types and summarization tasks.

In this example, one of the chat-tuned [**OLMo**](https://allenai.org/blog/olmo-open-language-model-87ccfc95f580) models is used for summarizing research abstracts. You need to create a model specific API key if you would like to use this model.

In [None]:
api_key = "<API-KEY>"

In [None]:
from aitta_client import Model, Client, StaticAccessTokenSource
import openai
import IPython

token_source = StaticAccessTokenSource(api_key)
aitta_client = Client("https://api-staging-aitta.2.rahtiapp.fi", token_source)

# load the "allenai/OLMo-7B-0724-Instruct" model
model = Model.load("allenai/OLMo-7B-0724-Instruct", aitta_client)
print(model.description)

# configure OpenAI client to use the Aitta OpenAI compatibility endpoints
client = openai.OpenAI(api_key=token_source.get_access_token(), base_url=model.openai_api_url)

In [None]:
def get_response(prompt, max_completion_tokens=100):
    response = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt
            }
        ],
        model=model.id,
        n=1,
        max_completion_tokens=max_completion_tokens,
        stream=False  # response streaming is currently not supported by Aitta
    )
    return response.choices[0].message.content

In [None]:
import os

data_folder = "abstracts/"
text_files = {}

# Read all text files
for filename in os.listdir(data_folder):
    if filename.endswith(".txt"):
        with open(os.path.join(data_folder, filename), "r", encoding="utf-8") as file:
            text_files[filename] = file.read()

print(f"Loaded {len(text_files)} text files.")

In [None]:
for filename in text_files.keys():
    print(filename)

In [None]:
def summarize(prompt, text,  max_completion_tokens=100):
    full_prompt = prompt + text
    summary = get_response(full_prompt, max_completion_tokens)
    return summary 


In [None]:
from tqdm import tqdm  # Progress bar

prompt= """
You excel at summarizing research articles.
Provide your answer in concise style in 1 to 3 sentences.
Summarize this research article: 
"""


results = {}
for filename, text in tqdm(text_files.items(), desc="Processing files"):
    results[filename] = summarize(prompt,text, 500)

In [None]:
for filename, summary in results.items():
    print(f"### {filename} Summary ###\n")
    print(summary)
    print("\n" + "="*50 + "\n")

In [None]:
import json
# Save results to JSON file
results_file = "summaries.json"
with open(results_file, 'w', encoding='utf-8') as f:
    json.dump(results, f, indent=2, ensure_ascii=False)
print(f"Results saved to {results_file}")

## Download `summaries.json`

Your work is not saved in Noppe. You should download preferred files if you want to retain your work after Noppe instance times out.

You can download a single file by going to the **File-menu** or by right-clicking the file and selecting **Download** from the dropdown menu.

![summaries-download](./images/download_summaries.png)