### Ollama

In [1]:
!ollama list

NAME           ID              SIZE      MODIFIED   
llama3.1:8b    42182419e950    4.7 GB    6 days ago    
llama3:8b      365c0bd3c000    4.7 GB    8 days ago    
qwen2:7b       dd314f039b9d    4.4 GB    9 days ago    
gemma2:9b      ff02c3702f32    5.4 GB    9 days ago    


In [2]:
from langchain_community.chat_models import ChatOllama
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate

In [3]:
local_model = "llama3:8b"
# llm = ChatOllama(model=local_model)
llm = ChatOllama(model=local_model, use_gpu=True)

In [4]:
llm

ChatOllama(model='llama3:8b')

In [5]:
import pandas as pd

df = pd.read_csv(
    r'F:\AI\Super AI SS4\Level 3 - INTERN\Jupyter Notebook\Latest-Dataset-Model-Generate\Random\Latest-20-random.csv')

df.shape

(20, 4)

In [6]:
df.head()

Unnamed: 0,content,extractive,abstractive,index_original
0,มิติหุ้น SKE คอนเฟิร์มรายได้ปี 62 เต...,\t บมจ.สากล เอนเนอยี หรือ SKE โดย นายจักรพงส์...,\tนายจักรพงส์ สุเมธโชติเมธา กรรมการผู้จัดการให...,3017
1,ผลกระทบจากเชื้อไวรัสโคโรนา (โควิด-19...,ผลกระทบจากโควิด-19 ผลต่อกำไรของบริษั...,\tนายมงคล พ่วงเภตรา ผู้ช่วยกรรมการผู้จัดการ ฝ่...,2844
2,องค์การส่งเสริมกิจการโคนมแห่งประเทศไ...,ดร.ณรงค์ฤทธิ์ วงศ์สุวรรณ ผู้อำนวยการ...,ดร.ณรงค์ฤทธิ์ วงศ์สุวรรณ ผู้อำนวยการ...,2055
3,กระทรวงพลังงานใช้เวลาไป 1 ปี กับอีก ...,\tกระทรวงพลังงานใช้เวลา 1 ปี 7 เดือน แก้ปมร...,\tกระทรวงพลังงานใช้เวลา 1 ปี 7 เดือน แก้ปมร...,199
4,นับเป็นอีกหนึ่งโครงการดีๆ ที่เปิดโอก...,\t อีกหนึ่งโครงการดีๆ ที่เปิดโอกาสให้เด็ก ...,\tโครงการประกวดศิลปกรรม ปตท. จากการสนับสนุนขอ...,216


### Summarize

#### Abstractive

In [7]:
def summarize_abstractive(text, llm):
    system_template = """You are an AI specialized in abstractive summarization for economic news articles."""

    prompt = """Your task is to create an abstractive summary of the given news article. Follow these guidelines:
    1. Summarize the content in Thai.
    2. Use neutral, and clear language while maintaining a formal tone.
    3. Use \t at the beginning of each paragraph to create indentation.
    4. Each paragraph should present a different point.
    5. DO NOT leave blank lines between paragraphs. All paragraphs must be continuous with no blank lines.
    6. Explain in detail the essence and main points of the article.
    7. Ensure the summary is coherent and flows well as a standalone piece.
    8. Preserve all important proper nouns such as names of people, companies, or organizations.
    9. Organize the content logically, which may differ from the original article's structure if it improves clarity.
    10. Synthesize information from different parts of the article when appropriate.
    11. DO NOT include any examples or case studies in the summary.
    
    IMPORTANT:
    - Please verify the accuracy of the information and present it in a neutral manner, without personal opinions or bias.
    - Focus on creating a coherent, flowing summary that captures the main ideas without direct quoting.
    - The summary has retained its original meaning and context.
    
    Article to summarize:
    {text}
    """

    system_message_prompt = SystemMessagePromptTemplate.from_template(system_template)
    human_message_prompt = HumanMessagePromptTemplate.from_template(prompt)

    chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])
    messages = chat_prompt.format_messages(text=text)

    return llm.invoke(messages).content

#### Extractive

In [8]:
def summarize_extractive(text, llm):
    system_template = """You are an AI specialized in extractive summarization for economic news articles."""

    prompt = """Your task is to summarize the key content from the given news article. Follow these guidelines:
    1. Summarize the content in Thai.
    2. Use \t at the beginning of each paragraph to create indentation.
    3. DO NOT leave blank lines between paragraphs. All paragraphs must be continuous with no blank lines.
    4. Focus on main points and important secondary points.
    5. Provide explanations of the article’s key topics without going into too much detail.
    6. Preserve all proper nouns such as names of people, companies, or organizations.
    7. Use 2-3 key sentences from the original article for each point.
    8. Maintain the original meaning and context.
    9. Arrange the content in the same order as presented in the original article.
    10. Reduce redundancy by combining similar points or information.
    11. DO NOT include any examples in the summary.

    IMPORTANT: 
    - DO NOT include any examples or case studies in the summary. Focus only on the main points and key information.

    Article to summarize:
    {text}
    """

    system_message_prompt = SystemMessagePromptTemplate.from_template(system_template)
    human_message_prompt = HumanMessagePromptTemplate.from_template(prompt)

    chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])
    messages = chat_prompt.format_messages(text=text)

    return llm.invoke(messages).content

In [9]:
!ollama list

NAME           ID              SIZE      MODIFIED   
llama3.1:8b    42182419e950    4.7 GB    6 days ago    
llama3:8b      365c0bd3c000    4.7 GB    8 days ago    
qwen2:7b       dd314f039b9d    4.4 GB    9 days ago    
gemma2:9b      ff02c3702f32    5.4 GB    9 days ago    


### Return best-score, best-prompt

In [43]:
import os
import re
import pandas as pd
import numpy as np
from langchain_community.chat_models import ChatOllama
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from tqdm import tqdm
import textgrad as tg
from textgrad.variable import Variable
from textgrad.optimizer import TextualGradientDescent
from textgrad.loss import TextLoss
from textgrad.engine import get_engine

# os.environ["GROQ_API_KEY"] = "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"

summary_model = "llama3:8b"  # Model for summarization (Ollama)
llm_summarizer = ChatOllama(model=summary_model, use_gpu=True)

# Set up Groq engine for textgrad
engine = get_engine("groq-llama3-70b-8192")
tg.set_backward_engine(engine, override=True)

def summarize_with_prompts(llm_summarizer, system_prompt, human_prompt, text):
    system_message = SystemMessagePromptTemplate.from_template(system_prompt)
    human_message = HumanMessagePromptTemplate.from_template(human_prompt)
    chat_prompt = ChatPromptTemplate.from_messages([system_message, human_message])
    messages = chat_prompt.format_messages(text=text)
    
    response = llm_summarizer.invoke(messages)
    return response.content

def evaluate_prompts(llm_summarizer, system_prompt, human_prompt, df):
    scores = []
    for _, row in df.iterrows():
        summary = summarize_with_prompts(llm_summarizer, system_prompt, human_prompt, row['content'])
        
        eval_system_prompt = "You are an AI that evaluates the quality of summaries. Compare the generated summary with the reference summary and provide a score between 0 and 1, where 1 is perfect. Only output the score as a number, nothing else."
        eval_human_prompt = f"Generated: {summary}\nReference: {row['label']}\n\nScore:"
        
        eval_messages = [
            {"role": "system", "content": eval_system_prompt},
            {"role": "user", "content": eval_human_prompt}
        ]
        
        eval_response = llm_summarizer.invoke(eval_messages)
        response_text = eval_response.content.strip()
        
        # Try to extract a float from the response
        match = re.search(r'\d+(\.\d+)?', response_text)
        if match:
            try:
                score = float(match.group())
                scores.append(score)
                print(f"Extracted score: {score:.4f}")
            except ValueError:
                print(f"Could not convert to float: {match.group()}")
                scores.append(0)
        else:
            print(f"Could not extract score from: {response_text}")
            scores.append(0)
    
    return np.mean(scores)

def optimize_prompts(llm_summarizer, initial_system_prompt, initial_human_prompt, df, epochs=10):
    system_prompt = Variable(initial_system_prompt, role_description="System prompt for summarization")
    human_prompt = Variable(initial_human_prompt, role_description="Human prompt for summarization")
    optimizer = TextualGradientDescent([system_prompt, human_prompt])
    
    best_score = 0
    best_system_prompt = initial_system_prompt
    best_human_prompt = initial_human_prompt
    
    for epoch in tqdm(range(epochs), desc="Optimizing prompts"):
        current_score = evaluate_prompts(llm_summarizer, system_prompt.value, human_prompt.value, df)
        
        if current_score > best_score:
            best_score = current_score
            best_system_prompt = system_prompt.value
            best_human_prompt = human_prompt.value
        
        print(f"Epoch {epoch+1}/{epochs} | Current Score: {current_score:.4f} | Best Score: {best_score:.4f}")
        print(f"Current System Prompt: {system_prompt.value[:100]}...")
        print(f"Current Human Prompt: {human_prompt.value[:100]}...")
        print("-" * 50)
        
        # Compute gradients and update prompts
        loss = 1 - current_score  # We want to minimize this loss
        loss_var = Variable(str(loss), role_description="Loss for optimization")
        loss_var.backward()
        optimizer.step()
        optimizer.zero_grad()
    
    return best_system_prompt, best_human_prompt, best_score

def run_optimization(df):
    initial_system_prompt = """You are an AI specialized in extractive summarization for economic news articles."""
    initial_human_prompt = """Your task is to summarize the key content from the given news article. Follow these guidelines:
    1. Summarize the content in Thai.
    2. Use \t at the beginning of each paragraph to create indentation.
    3. DO NOT leave blank lines between paragraphs. All paragraphs must be continuous with no blank lines.
    4. Focus on main points and important secondary points.
    5. Provide explanations of the article’s key topics without going into too much detail.
    6. Preserve all proper nouns such as names of people, companies, or organizations.
    7. Use 2-3 key sentences from the original article for each point.
    8. Maintain the original meaning and context.
    9. Arrange the content in the same order as presented in the original article.
    10. Reduce redundancy by combining similar points or information.
    11. DO NOT include any examples in the summary.

    IMPORTANT: 
    - DO NOT include any examples or case studies in the summary. Focus only on the main points and key information.

    Article to summarize:
    {text}"""
    
    best_system_prompt, best_human_prompt, best_score = optimize_prompts(llm_summarizer, initial_system_prompt, initial_human_prompt, df, epochs=10)
    
    print("\nBest System Prompt:", best_system_prompt)
    print("\nBest Human Prompt:", best_human_prompt)
    print(f"\nBest Score: {best_score:.4f}")

file_path = r'F:\AI\Super AI SS4\Level 3 - INTERN\Dataset\ThEconSum\AIFORTHAI-TextSummarizationCorpus\lstsumv1.test.jsonl'

def read_jsonl_file(file_path):
    return pd.read_json(file_path, lines=True)

test = read_jsonl_file(file_path)

# Drop unnecessary columns
test = test.drop(columns=['author', 'datePublish'])

# Randomly sample 10-20 examples from the test DataFrame
n_samples = 10  # Change to 20 if you want 20 samples
sampled_test = test.sample(n=n_samples, random_state=42)

if __name__ == "__main__":
    run_optimization(sampled_test)

Optimizing prompts:   0%|          | 0/10 [00:00<?, ?it/s]

Extracted score: 0.7


Optimizing prompts:  10%|█         | 1/10 [00:28<04:20, 28.90s/it]

Extracted score: 0.83
Epoch 1/10 | Current Score: 0.7650 | Best Score: 0.7650
Current System Prompt: You are an AI specialized in extractive summarization for economic news articles....
Current Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------
Extracted score: 0.8


Optimizing prompts:  20%|██        | 2/10 [00:59<03:59, 29.93s/it]

Extracted score: 0.85
Epoch 2/10 | Current Score: 0.8250 | Best Score: 0.8250
Current System Prompt: Summarize the main points of an economic news article, focusing on key events, trends, and market im...
Current Human Prompt: Summarize the main points from the following text: {text}...
--------------------------------------------------
Extracted score: 0.73


Optimizing prompts:  30%|███       | 3/10 [02:11<05:42, 48.91s/it]

Extracted score: 0.73
Epoch 3/10 | Current Score: 0.7300 | Best Score: 0.8250
Current System Prompt: Summarize the key events, trends, and market impacts discussed in an economic news article, highligh...
Current Human Prompt: Please summarize the key information and main ideas from the provided text: {text}...
--------------------------------------------------
Extracted score: 0.85


Optimizing prompts:  40%|████      | 4/10 [03:27<05:57, 59.60s/it]

Extracted score: 0.85
Epoch 4/10 | Current Score: 0.8500 | Best Score: 0.8500
Current System Prompt: Summarize the main points, key trends, and significant market implications discussed in the news art...
Current Human Prompt: Summarize the main points and key takeaways from the following text: {text}...
--------------------------------------------------
Extracted score: 0.92


Optimizing prompts:  50%|█████     | 5/10 [04:46<05:34, 66.82s/it]

Extracted score: 0.81
Epoch 5/10 | Current Score: 0.8650 | Best Score: 0.8650
Current System Prompt: Summarize the news article, focusing on the most critical economic points, key trends, and significa...
Current Human Prompt: What are the key points and main takeaways from the following text: {text}?...
--------------------------------------------------
Extracted score: 0.92


Optimizing prompts:  60%|██████    | 6/10 [06:28<05:14, 78.56s/it]

Extracted score: 0.85
Epoch 6/10 | Current Score: 0.8850 | Best Score: 0.8850
Current System Prompt: Summarize the news article, highlighting the most critical economic points, their relevance, and imp...
Current Human Prompt: Can you summarize the main ideas and key points from the following text: {text}?...
--------------------------------------------------
Extracted score: 0.83


Optimizing prompts:  70%|███████   | 7/10 [07:14<03:24, 68.16s/it]

Extracted score: 0.75
Epoch 7/10 | Current Score: 0.7900 | Best Score: 0.8850
Current System Prompt: Summarize the news article, focusing on the most critical economic points, their relevance to the br...
Current Human Prompt: What are the main ideas and key points from the following text: {text}?...
--------------------------------------------------
Extracted score: 0.85


Optimizing prompts:  80%|████████  | 8/10 [07:51<01:56, 58.10s/it]

Extracted score: 0.85
Epoch 8/10 | Current Score: 0.8500 | Best Score: 0.8850
Current System Prompt: Summarize the news article, extracting the most critical economic information and highlighting key p...
Current Human Prompt: Can you summarize the main ideas and key points from the following text: {text}?...
--------------------------------------------------
Extracted score: 0.82


Optimizing prompts:  90%|█████████ | 9/10 [08:36<00:53, 53.92s/it]

Extracted score: 0.92
Epoch 9/10 | Current Score: 0.8700 | Best Score: 0.8850
Current System Prompt: Extract the most critical economic information from the news article, focusing on key points and con...
Current Human Prompt: What are the main ideas and key points from the following text: {text}?...
--------------------------------------------------
Extracted score: 0.85


Optimizing prompts: 100%|██████████| 10/10 [09:09<00:00, 54.92s/it]

Extracted score: 0.81
Epoch 10/10 | Current Score: 0.8300 | Best Score: 0.8850
Current System Prompt: Extract the most critical economic information from the news article, focusing on key points and omi...
Current Human Prompt: Can you summarize the main ideas and key points from the following text: {text}?...
--------------------------------------------------

Best System Prompt: Summarize the news article, highlighting the most critical economic points, their relevance, and importance in the broader economic context, and prioritize the most impactful information.

Best Human Prompt: Can you summarize the main ideas and key points from the following text: {text}?

Best Score: 0.8850





In [40]:
import os
import re
import pandas as pd
import numpy as np
from langchain_community.chat_models import ChatOllama
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from tqdm import tqdm
import textgrad as tg
from textgrad.variable import Variable
from textgrad.engine import get_engine

# Set Groq API key
os.environ["GROQ_API_KEY"] = "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"

# Initialize models
summary_model = "llama3:8b"  # Model for summarization (Ollama)
llm_summarizer = ChatOllama(model=summary_model, use_gpu=True)

# Set up Groq engine for textgrad
engine = get_engine("groq-llama3-70b-8192")
tg.set_backward_engine(engine, override=True)

# Function to generate summary with given prompts
def summarize_with_prompts(llm_summarizer, system_prompt, human_prompt, text):
    system_message = SystemMessagePromptTemplate.from_template(system_prompt)
    human_message = HumanMessagePromptTemplate.from_template(human_prompt)
    chat_prompt = ChatPromptTemplate.from_messages([system_message, human_message])
    messages = chat_prompt.format_messages(text=text)
    
    response = llm_summarizer.invoke(messages)
    return response.content

def evaluate_prompts(llm_summarizer, system_prompt, human_prompt, df):
    scores = []
    for _, row in df.iterrows():
        summary = summarize_with_prompts(llm_summarizer, system_prompt, human_prompt, row['content'])
        
        eval_system_prompt = "You are an AI that evaluates the quality of summaries. Compare the generated summary with the reference summary and provide a score between 0 and 1, where 1 is perfect. Only output the score as a number, nothing else."
        eval_human_prompt = f"Generated: {summary}\nReference: {row['label']}\n\nScore:"
        
        eval_messages = [
            {"role": "system", "content": eval_system_prompt},
            {"role": "user", "content": eval_human_prompt}
        ]
        
        eval_response = llm_summarizer.invoke(eval_messages)
        response_text = eval_response.content.strip()
        
        match = re.search(r'\d+(\.\d+)?', response_text)
        if match:
            try:
                score = float(match.group())
                scores.append(score)
            except ValueError:
                print(f"Could not convert to float: {match.group()}")
                scores.append(0)
        else:
            print(f"Could not extract score from: {response_text}")
            scores.append(0)
    
    return np.mean(scores)

def compute_gradient(prompt, loss):
    # Placeholder for gradient computation logic
    # You need to implement the logic for calculating how prompt affects the loss
    return np.random.normal(size=len(prompt))  # Dummy gradient

def update_prompt(prompt, gradient, learning_rate):
    # Simple update logic based on gradient
    new_prompt = prompt  # Replace with actual prompt updating logic
    return new_prompt + f" (Adjusted by {learning_rate})"

def optimize_prompts(llm_summarizer, initial_system_prompt, initial_human_prompt, df, epochs=10, learning_rate=0.01):
    system_prompt = Variable(initial_system_prompt, role_description="System prompt for summarization")
    human_prompt = Variable(initial_human_prompt, role_description="Human prompt for summarization")
    
    best_score = 0
    best_system_prompt = initial_system_prompt
    best_human_prompt = initial_human_prompt
    
    for epoch in tqdm(range(epochs), desc="Optimizing prompts"):
        current_score = evaluate_prompts(llm_summarizer, system_prompt.value, human_prompt.value, df)
        
        if current_score > best_score:
            best_score = current_score
            best_system_prompt = system_prompt.value
            best_human_prompt = human_prompt.value
        
        print(f"Epoch {epoch+1}/{epochs} | Current Score: {current_score:.4f} | Best Score: {best_score:.4f}")

        # Compute loss
        loss = 1 - current_score  # We want to minimize this loss
        loss_var = Variable(str(loss), role_description="Loss for optimization")
        
        # Calculate gradients for system and human prompts
        system_gradient = compute_gradient(system_prompt.value, loss_var)
        human_gradient = compute_gradient(human_prompt.value, loss_var)
        
        # Update prompts using gradients
        system_prompt.value = update_prompt(system_prompt.value, system_gradient, learning_rate)
        human_prompt.value = update_prompt(human_prompt.value, human_gradient, learning_rate)
        
        # Print prompt changes
        print(f"Updated System Prompt: {system_prompt.value[:100]}...")
        print(f"Updated Human Prompt: {human_prompt.value[:100]}...")
        print("-" * 50)
    
    return best_system_prompt, best_human_prompt, best_score

def run_optimization(df):
    initial_system_prompt = """You are an AI specialized in extractive summarization for economic news articles."""
    initial_human_prompt = """Your task is to summarize the key content from the given news article in Thai, focusing on the main points. Summarize the following text: {text}"""
    
    best_system_prompt, best_human_prompt, best_score = optimize_prompts(llm_summarizer, initial_system_prompt, initial_human_prompt, df, epochs=10, learning_rate=0.01)
    
    print("\nBest System Prompt:", best_system_prompt)
    print("\nBest Human Prompt:", best_human_prompt)
    print(f"\nBest Score: {best_score:.4f}")

# Sample dataframe for testing
def create_sample_dataframe():
    data = {
        'content': [
            "เศรษฐกิจไทยในไตรมาสที่ 2 ปี 2566 เติบโต 1.8% เทียบกับช่วงเดียวกันของปีก่อน โดยได้แรงหนุนจากการบริโภคภาคเอกชนและการส่งออกบริการที่ฟื้นตัว แม้การส่งออกสินค้าจะหดตัว",
            "ธนาคารแห่งประเทศไทยคงอัตราดอกเบี้ยนโยบายที่ 2.25% ในการประชุมเมื่อวันที่ 27 กันยายน 2566 โดยให้เหตุผลว่าเศรษฐกิจไทยมีแนวโน้มฟื้นตัวต่อเนื่อง แม้จะเผชิญความเสี่ยงจากเศรษฐกิจโลก"
        ],
        'label': [
            "เศรษฐกิจไทย Q2/2566 โต 1.8% จากการบริโภคเอกชนและการส่งออกบริการฟื้นตัว",
            "ธปท. คงดอกเบี้ย 2.25% เห็นเศรษฐกิจฟื้นต่อเนื่องแม้มีความเสี่ยงจากเศรษฐกิจโลก"
        ]
    }
    return pd.DataFrame(data)

if __name__ == "__main__":
    df = create_sample_dataframe()
    run_optimization(df)

Optimizing prompts:  10%|█         | 1/10 [00:23<03:35, 23.97s/it]

Epoch 1/10 | Current Score: 0.8417 | Best Score: 0.8417
Updated System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.01)...
Updated Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


Optimizing prompts:  20%|██        | 2/10 [00:52<03:31, 26.49s/it]

Epoch 2/10 | Current Score: 0.8500 | Best Score: 0.8500
Updated System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.01)...
Updated Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


Optimizing prompts:  30%|███       | 3/10 [01:25<03:28, 29.76s/it]

Epoch 3/10 | Current Score: 0.7650 | Best Score: 0.8500
Updated System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.01)...
Updated Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


Optimizing prompts:  40%|████      | 4/10 [01:57<03:03, 30.51s/it]

Epoch 4/10 | Current Score: 0.7900 | Best Score: 0.8500
Updated System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.01)...
Updated Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


Optimizing prompts:  50%|█████     | 5/10 [02:27<02:31, 30.32s/it]

Epoch 5/10 | Current Score: 0.7750 | Best Score: 0.8500
Updated System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.01)...
Updated Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


Optimizing prompts:  60%|██████    | 6/10 [03:01<02:06, 31.50s/it]

Epoch 6/10 | Current Score: 0.7750 | Best Score: 0.8500
Updated System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.01)...
Updated Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


Optimizing prompts:  70%|███████   | 7/10 [03:28<01:29, 29.94s/it]

Epoch 7/10 | Current Score: 0.8500 | Best Score: 0.8500
Updated System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.01)...
Updated Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


Optimizing prompts:  80%|████████  | 8/10 [04:03<01:03, 31.62s/it]

Epoch 8/10 | Current Score: 0.7750 | Best Score: 0.8500
Updated System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.01)...
Updated Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


Optimizing prompts:  90%|█████████ | 9/10 [04:31<00:30, 30.65s/it]

Epoch 9/10 | Current Score: 0.8000 | Best Score: 0.8500
Updated System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.01)...
Updated Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


Optimizing prompts: 100%|██████████| 10/10 [05:03<00:00, 30.39s/it]

Epoch 10/10 | Current Score: 0.8817 | Best Score: 0.8817
Updated System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.01)...
Updated Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------

Best System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.01) (Adjusted by 0.01) (Adjusted by 0.01) (Adjusted by 0.01) (Adjusted by 0.01) (Adjusted by 0.01) (Adjusted by 0.01) (Adjusted by 0.01) (Adjusted by 0.01)

Best Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main points. Summarize the following text: {text} (Adjusted by 0.01) (Adjusted by 0.01) (Adjusted by 0.01) (Adjusted by 0.01) (Adjusted by 0.01) (Adjusted by 0.01) (Adjusted by 0.01) (Adjusted by 0.01) (Adjusted by 0.01)

Best Score: 0.8817





In [42]:
import os
import re
import pandas as pd
import numpy as np
from langchain_community.chat_models import ChatOllama
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from tqdm import tqdm
import textgrad as tg
from textgrad.variable import Variable
from textgrad.engine import get_engine

# Set Groq API key
os.environ["GROQ_API_KEY"] = "your_groq_api_key_here"

# Initialize models
summary_model = "llama3:8b"
llm_summarizer = ChatOllama(model=summary_model, use_gpu=True)

# Set up Groq engine for textgrad
engine = get_engine("groq-llama3-70b-8192")
tg.set_backward_engine(engine, override=True)

def summarize_with_prompts(llm_summarizer, system_prompt, human_prompt, text):
    system_message = SystemMessagePromptTemplate.from_template(system_prompt)
    human_message = HumanMessagePromptTemplate.from_template(human_prompt)
    chat_prompt = ChatPromptTemplate.from_messages([system_message, human_message])
    messages = chat_prompt.format_messages(text=text)
    
    response = llm_summarizer.invoke(messages)
    return response.content

def evaluate_prompts(llm_summarizer, system_prompt, human_prompt, df):
    scores = []
    for _, row in df.iterrows():
        summary = summarize_with_prompts(llm_summarizer, system_prompt, human_prompt, row['content'])
        
        eval_system_prompt = "You are an AI that evaluates the quality of summaries. Compare the generated summary with the reference summary and provide a score between 0 and 1, where 1 is perfect. Only output the score as a number, nothing else."
        eval_human_prompt = f"Generated: {summary}\nReference: {row['label']}\n\nScore:"
        
        eval_messages = [
            {"role": "system", "content": eval_system_prompt},
            {"role": "user", "content": eval_human_prompt}
        ]
        
        eval_response = llm_summarizer.invoke(eval_messages)
        response_text = eval_response.content.strip()
        
        match = re.search(r'\d+(\.\d+)?', response_text)
        if match:
            try:
                score = float(match.group())
                scores.append(score)
            except ValueError:
                scores.append(0)
        else:
            scores.append(0)
    
    return np.mean(scores)

def compute_gradient(prompt, loss):
    # Example: compute gradient based on loss
    # In practice, implement logic to reflect how prompt changes affect loss
    # This is a placeholder for illustration purposes
    return np.random.normal(0, 0.1, size=len(prompt))

def update_prompt(prompt, gradient, learning_rate):
    # Here, we create an adjustment string based on the gradient
    adjustment = f" (Adjusted by {learning_rate:.4f})"
    new_prompt = prompt + adjustment
    return new_prompt

def optimize_prompts(llm_summarizer, initial_system_prompt, initial_human_prompt, df, epochs=10, learning_rate=0.01):
    system_prompt = Variable(initial_system_prompt, role_description="System prompt for summarization")
    human_prompt = Variable(initial_human_prompt, role_description="Human prompt for summarization")
    
    best_score = 0
    best_system_prompt = initial_system_prompt
    best_human_prompt = initial_human_prompt
    
    for epoch in tqdm(range(epochs), desc="Optimizing prompts"):
        current_score = evaluate_prompts(llm_summarizer, system_prompt.value, human_prompt.value, df)
        
        if current_score > best_score:
            best_score = current_score
            best_system_prompt = system_prompt.value
            best_human_prompt = human_prompt.value
        
        print(f"Epoch {epoch+1}/{epochs} | Current Score: {current_score:.4f} | Best Score: {best_score:.4f}")

        # Compute loss
        loss = 1 - current_score
        loss_var = Variable(str(loss), role_description="Loss for optimization")
        
        # Calculate gradients for system and human prompts
        system_gradient = compute_gradient(system_prompt.value, loss_var)
        human_gradient = compute_gradient(human_prompt.value, loss_var)
        
        # Update prompts using gradients
        system_prompt.value = update_prompt(system_prompt.value, system_gradient, learning_rate)
        human_prompt.value = update_prompt(human_prompt.value, human_gradient, learning_rate)
        
        # Print prompt changes
        print(f"Updated System Prompt: {system_prompt.value[:100]}...")
        print(f"Updated Human Prompt: {human_prompt.value[:100]}...")
        print("-" * 50)
    
    return best_system_prompt, best_human_prompt, best_score

def run_optimization(df):
    initial_system_prompt = """You are an AI specialized in extractive summarization for economic news articles."""
    initial_human_prompt = """Your task is to summarize the key content from the given news article in Thai, focusing on the main points. Summarize the following text: {text}"""
    
    best_system_prompt, best_human_prompt, best_score = optimize_prompts(llm_summarizer, initial_system_prompt, initial_human_prompt, df, epochs=10, learning_rate=0.01)
    
    print("\nBest System Prompt:", best_system_prompt)
    print("\nBest Human Prompt:", best_human_prompt)
    print(f"\nBest Score: {best_score:.4f}")

# Sample dataframe for testing
def create_sample_dataframe():
    data = {
        'content': [
            "เศรษฐกิจไทยในไตรมาสที่ 2 ปี 2566 เติบโต 1.8% เทียบกับช่วงเดียวกันของปีก่อน โดยได้แรงหนุนจากการบริโภคภาคเอกชนและการส่งออกบริการที่ฟื้นตัว แม้การส่งออกสินค้าจะหดตัว",
            "ธนาคารแห่งประเทศไทยคงอัตราดอกเบี้ยนโยบายที่ 2.25% ในการประชุมเมื่อวันที่ 27 กันยายน 2566 โดยให้เหตุผลว่าเศรษฐกิจไทยมีแนวโน้มฟื้นตัวต่อเนื่อง แม้จะเผชิญความเสี่ยงจากเศรษฐกิจโลก"
        ],
        'label': [
            "เศรษฐกิจไทย Q2/2566 โต 1.8% จากการบริโภคเอกชนและการส่งออกบริการฟื้นตัว",
            "ธปท. คงดอกเบี้ย 2.25% เห็นเศรษฐกิจฟื้นต่อเนื่องแม้มีความเสี่ยงจากเศรษฐกิจโลก"
        ]
    }
    return pd.DataFrame(data)

if __name__ == "__main__":
    df = create_sample_dataframe()
    run_optimization(df)


Optimizing prompts:  10%|█         | 1/10 [00:33<05:01, 33.50s/it]

Epoch 1/10 | Current Score: 0.8500 | Best Score: 0.8500
Updated System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.010...
Updated Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


Optimizing prompts:  20%|██        | 2/10 [01:03<04:12, 31.53s/it]

Epoch 2/10 | Current Score: 0.8350 | Best Score: 0.8500
Updated System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.010...
Updated Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


Optimizing prompts:  30%|███       | 3/10 [01:42<04:03, 34.84s/it]

Epoch 3/10 | Current Score: 0.8625 | Best Score: 0.8625
Updated System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.010...
Updated Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


Optimizing prompts:  40%|████      | 4/10 [02:15<03:25, 34.29s/it]

Epoch 4/10 | Current Score: 0.7650 | Best Score: 0.8625
Updated System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.010...
Updated Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


Optimizing prompts:  50%|█████     | 5/10 [02:48<02:47, 33.51s/it]

Epoch 5/10 | Current Score: 0.7500 | Best Score: 0.8625
Updated System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.010...
Updated Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


Optimizing prompts:  60%|██████    | 6/10 [03:23<02:16, 34.24s/it]

Epoch 6/10 | Current Score: 0.7650 | Best Score: 0.8625
Updated System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.010...
Updated Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


Optimizing prompts:  70%|███████   | 7/10 [03:53<01:38, 32.70s/it]

Epoch 7/10 | Current Score: 0.8785 | Best Score: 0.8785
Updated System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.010...
Updated Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


Optimizing prompts:  80%|████████  | 8/10 [04:27<01:06, 33.37s/it]

Epoch 8/10 | Current Score: 0.7750 | Best Score: 0.8785
Updated System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.010...
Updated Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


Optimizing prompts:  90%|█████████ | 9/10 [05:07<00:35, 35.15s/it]

Epoch 9/10 | Current Score: 0.8535 | Best Score: 0.8785
Updated System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.010...
Updated Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


Optimizing prompts: 100%|██████████| 10/10 [05:39<00:00, 33.91s/it]

Epoch 10/10 | Current Score: 0.7950 | Best Score: 0.8785
Updated System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.010...
Updated Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------

Best System Prompt: You are an AI specialized in extractive summarization for economic news articles. (Adjusted by 0.0100) (Adjusted by 0.0100) (Adjusted by 0.0100) (Adjusted by 0.0100) (Adjusted by 0.0100) (Adjusted by 0.0100)

Best Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main points. Summarize the following text: {text} (Adjusted by 0.0100) (Adjusted by 0.0100) (Adjusted by 0.0100) (Adjusted by 0.0100) (Adjusted by 0.0100) (Adjusted by 0.0100)

Best Score: 0.8785





In [10]:
import os
import re
import pandas as pd
import numpy as np
from langchain_community.chat_models import ChatOllama
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from tqdm import tqdm
import textgrad as tg
from textgrad.variable import Variable
from textgrad.optimizer import TextualGradientDescent
from textgrad.loss import TextLoss
from textgrad.engine import get_engine

# Set Groq API key
os.environ["GROQ_API_KEY"] = "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"

# Initialize models
summary_model = "llama3:8b"  # Model for summarization (Ollama)
llm_summarizer = ChatOllama(model=summary_model, use_gpu=True)

# Set up Groq engine for textgrad
engine = get_engine("groq-llama3-70b-8192")
tg.set_backward_engine(engine, override=True)

# Function to generate summary with given prompts
def summarize_with_prompts(llm_summarizer, system_prompt, human_prompt, text):
    system_message = SystemMessagePromptTemplate.from_template(system_prompt)
    human_message = HumanMessagePromptTemplate.from_template(human_prompt)
    chat_prompt = ChatPromptTemplate.from_messages([system_message, human_message])
    messages = chat_prompt.format_messages(text=text)
    
    response = llm_summarizer.invoke(messages)
    return response.content

def evaluate_prompts(llm_summarizer, system_prompt, human_prompt, df):
    scores = []
    for _, row in df.iterrows():
        summary = summarize_with_prompts(llm_summarizer, system_prompt, human_prompt, row['content'])
        
        eval_system_prompt = "You are an AI that evaluates the quality of summaries. Compare the generated summary with the reference summary and provide a score between 0 and 1, where 1 is perfect. Only output the score as a number, nothing else."
        eval_human_prompt = f"Generated: {summary}\nReference: {row['label']}\n\nScore:"
        
        eval_messages = [
            {"role": "system", "content": eval_system_prompt},
            {"role": "user", "content": eval_human_prompt}
        ]
        
        eval_response = llm_summarizer.invoke(eval_messages)
        response_text = eval_response.content.strip()
        
        match = re.search(r'\d+(\.\d+)?', response_text)
        if match:
            try:
                score = float(match.group())
                scores.append(score)
                # Comment out or remove the line below to suppress score printing
                # print(f"Extracted score: {score:.4f}")
            except ValueError:
                print(f"Could not convert to float: {match.group()}")
                scores.append(0)
        else:
            print(f"Could not extract score from: {response_text}")
            scores.append(0)
    
    return np.mean(scores)

def optimize_prompts(llm_summarizer, initial_system_prompt, initial_human_prompt, df, epochs=10):
    system_prompt = Variable(initial_system_prompt, role_description="System prompt for summarization")
    human_prompt = Variable(initial_human_prompt, role_description="Human prompt for summarization")
    optimizer = TextualGradientDescent([system_prompt, human_prompt])
    
    best_score = 0
    best_system_prompt = initial_system_prompt
    best_human_prompt = initial_human_prompt
    
    for epoch in tqdm(range(epochs), desc="Optimizing prompts"):
        current_score = evaluate_prompts(llm_summarizer, system_prompt.value, human_prompt.value, df)
        
        if current_score > best_score:
            best_score = current_score
            best_system_prompt = system_prompt.value
            best_human_prompt = human_prompt.value
        
        # แสดงเฉพาะข้อมูลที่ต้องการ
        print(f"Epoch {epoch+1}/{epochs} | Current Score: {current_score:.4f} | Best Score: {best_score:.4f}")
        print(f"Current System Prompt: {system_prompt.value[:100]}...")
        print(f"Current Human Prompt: {human_prompt.value[:100]}...")
        print("-" * 50)
        
        # Compute gradients and update prompts
        loss = 1 - current_score  # We want to minimize this loss
        loss_var = Variable(str(loss), role_description="Loss for optimization")
        loss_var.backward()
        optimizer.step()
        optimizer.zero_grad()
    
    return best_system_prompt, best_human_prompt, best_score

# ฟังก์ชัน run_optimization เพื่อรันการปรับแต่ง prompt
def run_optimization(df):
    initial_system_prompt = """You are an AI specialized in extractive summarization for economic news articles."""
    initial_human_prompt = """Your task is to summarize the key content from the given news article in Thai, focusing on the main points. Summarize the following text: {text}"""
    
    best_system_prompt, best_human_prompt, best_score = optimize_prompts(llm_summarizer, initial_system_prompt, initial_human_prompt, df, epochs=10)
    
    print("\nBest System Prompt:", best_system_prompt)
    print("\nBest Human Prompt:", best_human_prompt)
    print(f"\nBest Score: {best_score:.4f}")

# Sample dataframe for testing
def create_sample_dataframe():
    data = {
        'content': [
            "เศรษฐกิจไทยในไตรมาสที่ 2 ปี 2566 เติบโต 1.8% เทียบกับช่วงเดียวกันของปีก่อน โดยได้แรงหนุนจากการบริโภคภาคเอกชนและการส่งออกบริการที่ฟื้นตัว แม้การส่งออกสินค้าจะหดตัว",
            "ธนาคารแห่งประเทศไทยคงอัตราดอกเบี้ยนโยบายที่ 2.25% ในการประชุมเมื่อวันที่ 27 กันยายน 2566 โดยให้เหตุผลว่าเศรษฐกิจไทยมีแนวโน้มฟื้นตัวต่อเนื่อง แม้จะเผชิญความเสี่ยงจากเศรษฐกิจโลก"
        ],
        'label': [
            "เศรษฐกิจไทย Q2/2566 โต 1.8% จากการบริโภคเอกชนและการส่งออกบริการฟื้นตัว",
            "ธปท. คงดอกเบี้ย 2.25% เห็นเศรษฐกิจฟื้นต่อเนื่องแม้มีความเสี่ยงจากเศรษฐกิจโลก"
        ]
    }
    return pd.DataFrame(data)

if __name__ == "__main__":
    df = create_sample_dataframe()
    run_optimization(df)

Optimizing prompts:  10%|█         | 1/10 [00:46<07:00, 46.71s/it]

Epoch 1/10 | Current Score: 0.8300 | Best Score: 0.8300
Current System Prompt: You are an AI specialized in extractive summarization for economic news articles....
Current Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


Optimizing prompts:  20%|██        | 2/10 [01:23<05:29, 41.14s/it]

Epoch 2/10 | Current Score: 0.8250 | Best Score: 0.8300
Current System Prompt: Summarize the main points of an economic news article, focusing on key events, trends, and market im...
Current Human Prompt: Summarize the main points from the following text: {text}...
--------------------------------------------------


Optimizing prompts:  30%|███       | 3/10 [02:46<07:00, 60.04s/it]

Epoch 3/10 | Current Score: 0.9250 | Best Score: 0.9250
Current System Prompt: Summarize the key events, trends, and market impacts discussed in an economic news article, highligh...
Current Human Prompt: Please summarize the key information and main ideas from the provided text: {text}...
--------------------------------------------------


Optimizing prompts:  40%|████      | 4/10 [04:23<07:28, 74.74s/it]

Epoch 4/10 | Current Score: 0.8700 | Best Score: 0.9250
Current System Prompt: Summarize the main points, key trends, and significant market implications discussed in the news art...
Current Human Prompt: Summarize the main points and key takeaways from the following text: {text}...
--------------------------------------------------


Optimizing prompts:  50%|█████     | 5/10 [05:54<06:41, 80.34s/it]

Epoch 5/10 | Current Score: 0.8400 | Best Score: 0.9250
Current System Prompt: Summarize the news article, focusing on the most critical economic points, key trends, and significa...
Current Human Prompt: What are the key points and main takeaways from the following text: {text}?...
--------------------------------------------------


Optimizing prompts:  60%|██████    | 6/10 [07:27<05:38, 84.66s/it]

Epoch 6/10 | Current Score: 0.7750 | Best Score: 0.9250
Current System Prompt: Summarize the news article, highlighting the most critical economic points, their relevance, and imp...
Current Human Prompt: Can you summarize the main ideas and key points from the following text: {text}?...
--------------------------------------------------


Optimizing prompts:  70%|███████   | 7/10 [08:36<03:58, 79.53s/it]

Epoch 7/10 | Current Score: 0.8300 | Best Score: 0.9250
Current System Prompt: Summarize the news article, focusing on the most critical economic points, their relevance to the br...
Current Human Prompt: What are the main ideas and key points from the following text: {text}?...
--------------------------------------------------


Optimizing prompts:  80%|████████  | 8/10 [09:24<02:18, 69.49s/it]

Epoch 8/10 | Current Score: 0.7750 | Best Score: 0.9250
Current System Prompt: Summarize the news article, extracting the most critical economic information and highlighting key p...
Current Human Prompt: Can you summarize the main ideas and key points from the following text: {text}?...
--------------------------------------------------


Optimizing prompts:  90%|█████████ | 9/10 [10:05<01:00, 60.76s/it]

Epoch 9/10 | Current Score: 0.7900 | Best Score: 0.9250
Current System Prompt: Extract the most critical economic information from the news article, focusing on key points and con...
Current Human Prompt: What are the main ideas and key points from the following text: {text}?...
--------------------------------------------------


Optimizing prompts: 100%|██████████| 10/10 [10:43<00:00, 64.33s/it]

Epoch 10/10 | Current Score: 0.8600 | Best Score: 0.9250
Current System Prompt: Extract the most critical economic information from the news article, focusing on key points and omi...
Current Human Prompt: Can you summarize the main ideas and key points from the following text: {text}?...
--------------------------------------------------

Best System Prompt: Summarize the key events, trends, and market impacts discussed in an economic news article, highlighting their significance and relevance to the economy.

Best Human Prompt: Please summarize the key information and main ideas from the provided text: {text}

Best Score: 0.9250





### Ollama engine

In [9]:
import re
import pandas as pd
import numpy as np
from langchain_community.chat_models import ChatOllama
from langchain_community.llms import Ollama
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from tqdm import tqdm
import textgrad as tg
from textgrad.variable import Variable
from textgrad.optimizer import TextualGradientDescent
from textgrad.loss import TextLoss
from textgrad.engine import get_engine

# Initialize models
summary_model = "llama3:8b"  # Model for summarization (Ollama)
llm_summarizer = ChatOllama(model=summary_model, use_gpu=True)

# Set up Ollama engine for textgrad
ollama_engine = Ollama(model="llama3.1:8b")
tg.set_backward_engine(ollama_engine, override=True)

# Function to generate summary with given prompts
def summarize_with_prompts(llm_summarizer, system_prompt, human_prompt, text):
    system_message = SystemMessagePromptTemplate.from_template(system_prompt)
    human_message = HumanMessagePromptTemplate.from_template(human_prompt)
    chat_prompt = ChatPromptTemplate.from_messages([system_message, human_message])
    messages = chat_prompt.format_messages(text=text)
    
    response = llm_summarizer.invoke(messages)
    return response.content

def evaluate_prompts(llm_summarizer, system_prompt, human_prompt, df):
    scores = []
    for _, row in df.iterrows():
        summary = summarize_with_prompts(llm_summarizer, system_prompt, human_prompt, row['content'])
        
        eval_system_prompt = "You are an AI that evaluates the quality of summaries. Compare the generated summary with the reference summary and provide a score between 0 and 1, where 1 is perfect. Only output the score as a number, nothing else."
        eval_human_prompt = f"Generated: {summary}\nReference: {row['label']}\n\nScore:"
        
        eval_messages = [
            {"role": "system", "content": eval_system_prompt},
            {"role": "user", "content": eval_human_prompt}
        ]
        
        eval_response = llm_summarizer.invoke(eval_messages)
        response_text = eval_response.content.strip()
        
        match = re.search(r'\d+(\.\d+)?', response_text)
        if match:
            try:
                score = float(match.group())
                scores.append(score)
            except ValueError:
                print(f"Could not convert to float: {match.group()}")
                scores.append(0)
        else:
            print(f"Could not extract score from: {response_text}")
            scores.append(0)
    
    return np.mean(scores)

def optimize_prompts(llm_summarizer, initial_system_prompt, initial_human_prompt, df, epochs=10):
    system_prompt = Variable(initial_system_prompt, role_description="System prompt for summarization")
    human_prompt = Variable(initial_human_prompt, role_description="Human prompt for summarization")
    optimizer = TextualGradientDescent([system_prompt, human_prompt])
    
    best_score = 0
    best_system_prompt = initial_system_prompt
    best_human_prompt = initial_human_prompt
    
    for epoch in tqdm(range(epochs), desc="Optimizing prompts"):
        current_score = evaluate_prompts(llm_summarizer, system_prompt.value, human_prompt.value, df)
        
        if current_score > best_score:
            best_score = current_score
            best_system_prompt = system_prompt.value
            best_human_prompt = human_prompt.value
        
        print(f"Epoch {epoch+1}/{epochs} | Current Score: {current_score:.4f} | Best Score: {best_score:.4f}")
        print(f"Current System Prompt: {system_prompt.value[:100]}...")
        print(f"Current Human Prompt: {human_prompt.value[:100]}...")
        print("-" * 50)
        
        # Compute gradients and update prompts
        loss = 1 - current_score  # We want to minimize this loss
        loss_var = Variable(str(loss), role_description="Loss for optimization")
        loss_var.backward()
        optimizer.step()
        optimizer.zero_grad()
    
    return best_system_prompt, best_human_prompt, best_score

# Function to run optimization
def run_optimization(df):
    initial_system_prompt = """You are an AI specialized in extractive summarization for economic news articles."""
    initial_human_prompt = """Your task is to summarize the key content from the given news article in Thai, focusing on the main points. Summarize the following text: {text}"""
    
    best_system_prompt, best_human_prompt, best_score = optimize_prompts(llm_summarizer, initial_system_prompt, initial_human_prompt, df, epochs=10)
    
    print("\nBest System Prompt:", best_system_prompt)
    print("\nBest Human Prompt:", best_human_prompt)
    print(f"\nBest Score: {best_score:.4f}")

# Sample dataframe for testing
def create_sample_dataframe():
    data = {
        'content': [
            "เศรษฐกิจไทยในไตรมาสที่ 2 ปี 2566 เติบโต 1.8% เทียบกับช่วงเดียวกันของปีก่อน โดยได้แรงหนุนจากการบริโภคภาคเอกชนและการส่งออกบริการที่ฟื้นตัว แม้การส่งออกสินค้าจะหดตัว",
            "ธนาคารแห่งประเทศไทยคงอัตราดอกเบี้ยนโยบายที่ 2.25% ในการประชุมเมื่อวันที่ 27 กันยายน 2566 โดยให้เหตุผลว่าเศรษฐกิจไทยมีแนวโน้มฟื้นตัวต่อเนื่อง แม้จะเผชิญความเสี่ยงจากเศรษฐกิจโลก"
        ],
        'label': [
            "เศรษฐกิจไทย Q2/2566 โต 1.8% จากการบริโภคเอกชนและการส่งออกบริการฟื้นตัว",
            "ธปท. คงดอกเบี้ย 2.25% เห็นเศรษฐกิจฟื้นต่อเนื่องแม้มีความเสี่ยงจากเศรษฐกิจโลก"
        ]
    }
    return pd.DataFrame(data)

if __name__ == "__main__":
    df = create_sample_dataframe()
    run_optimization(df)

Optimizing prompts:   0%|          | 0/10 [00:00<?, ?it/s]

Epoch 1/10 | Current Score: 0.7500 | Best Score: 0.7500
Current System Prompt: You are an AI specialized in extractive summarization for economic news articles....
Current Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


  new_text = self.engine(prompt_update_parameter, system_prompt=self.optimizer_system_prompt)
Optimizing prompts:  10%|█         | 1/10 [00:50<07:30, 50.09s/it]

Epoch 2/10 | Current Score: 0.5600 | Best Score: 0.7500
Current System Prompt: You are an AI designed to analyze economic news articles and generate concise summaries of the key p...
Current Human Prompt: Summarize the provided text by identifying the main ideas and key points, focusing on brevity and cl...
--------------------------------------------------


Optimizing prompts:  20%|██        | 2/10 [01:30<05:55, 44.40s/it]

Epoch 3/10 | Current Score: 0.5600 | Best Score: 0.7500
Current System Prompt: You are an AI designed to provide concise and accurate summaries of economic news articles, extracti...
Current Human Prompt: Summarize the provided text by condensing its key points into a concise summary that retains its ess...
--------------------------------------------------


Optimizing prompts:  30%|███       | 3/10 [02:10<04:57, 42.43s/it]

Epoch 4/10 | Current Score: 0.7600 | Best Score: 0.7600
Current System Prompt: You are an AI designed to distill complex information into concise summaries, presenting key points ...
Current Human Prompt: Condense the provided text into a summary that captures its main points, key ideas, and essential in...
--------------------------------------------------


Optimizing prompts:  40%|████      | 4/10 [02:50<04:08, 41.47s/it]

Epoch 5/10 | Current Score: 0.6400 | Best Score: 0.7600
Current System Prompt: You are an AI designed to provide concise overviews of complex information, highlighting the most es...
Current Human Prompt: Condense the provided text into a concise summary that accurately captures its main points, while re...
--------------------------------------------------


Optimizing prompts:  50%|█████     | 5/10 [03:33<03:29, 41.89s/it]

Epoch 6/10 | Current Score: 0.5850 | Best Score: 0.7600
Current System Prompt: You are an AI designed to distill complex information into bite-sized summaries that capture the ess...
Current Human Prompt: Provide a concise summary that distills the main points from the original text into 2-3 key takeaway...
--------------------------------------------------


Optimizing prompts:  60%|██████    | 6/10 [04:17<02:50, 42.54s/it]

Epoch 7/10 | Current Score: 0.6550 | Best Score: 0.7600
Current System Prompt: You are an AI designed to quickly summarize complex information into key points, making it easier fo...
Current Human Prompt: Provide a concise summary that captures the essence of the original text by extracting 2-3 key point...
--------------------------------------------------


Optimizing prompts:  70%|███████   | 7/10 [04:57<02:05, 41.93s/it]

Epoch 8/10 | Current Score: 0.6900 | Best Score: 0.7600
Current System Prompt: You're an AI that simplifies complex info, helping users stay informed on diverse topics by providin...
Current Human Prompt: Provide a concise summary that distills the main points of the text, highlighting key information an...
--------------------------------------------------


Optimizing prompts:  80%|████████  | 8/10 [05:38<01:23, 41.61s/it]

Epoch 9/10 | Current Score: 0.4050 | Best Score: 0.7600
Current System Prompt: We help users quickly grasp complex information by providing clear, concise summaries across various...
Current Human Prompt: Provide a concise summary that distills the main points of the provided text, while preserving its o...
--------------------------------------------------


Optimizing prompts:  90%|█████████ | 9/10 [06:20<00:41, 41.57s/it]

Epoch 10/10 | Current Score: 0.4050 | Best Score: 0.7600
Current System Prompt: We help users quickly understand complex topics with personalized summaries....
Current Human Prompt: Condense the content into a clear and concise summary that captures its key ideas, emotional resonan...
--------------------------------------------------


Optimizing prompts: 100%|██████████| 10/10 [07:01<00:00, 42.18s/it]


Best System Prompt: You are an AI designed to distill complex information into concise summaries, presenting key points in a clear and easy-to-understand format for users.

Best Human Prompt: Condense the provided text into a summary that captures its main points, key ideas, and essential information, ensuring the summary is accurate and retains the original's significance.

Best Score: 0.7600





In [8]:
import re
import pandas as pd
import numpy as np
from langchain_community.chat_models import ChatOllama
from langchain_community.llms import Ollama
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from tqdm import tqdm
import textgrad as tg
from textgrad.variable import Variable
from textgrad.optimizer import TextualGradientDescent
from textgrad.loss import TextLoss
from textgrad.engine import get_engine

# Initialize models
summary_model = "llama3:8b"  # Model for summarization (Ollama)
llm_summarizer = ChatOllama(model=summary_model, use_gpu=True)

# Set up Ollama engine for textgrad (evaluation and prompt optimization)
ollama_engine = Ollama(model="llama3.1:8b")
tg.set_backward_engine(ollama_engine, override=True)

# Function to generate summary with given prompts (using llm_summarizer)
def summarize_with_prompts(llm_summarizer, system_prompt, human_prompt, text):
    system_message = SystemMessagePromptTemplate.from_template(system_prompt)
    human_message = HumanMessagePromptTemplate.from_template(human_prompt)
    chat_prompt = ChatPromptTemplate.from_messages([system_message, human_message])
    messages = chat_prompt.format_messages(text=text)
    
    response = llm_summarizer.invoke(messages)
    return response.content

# Function to evaluate summaries using Ollama engine
def evaluate_summary(ollama_engine, generated_summary, reference_summary):
    eval_prompt = f"""You are an AI that evaluates the quality of summaries. Compare the generated summary with the reference summary and provide a score between 0 and 1, where 1 is perfect. Only output the score as a number, nothing else.

Generated: {generated_summary}
Reference: {reference_summary}

Score:"""
    
    response = ollama_engine(eval_prompt)
    match = re.search(r'\d+(\.\d+)?', response)
    if match:
        try:
            return float(match.group())
        except ValueError:
            print(f"Could not convert to float: {match.group()}")
            return 0
    else:
        print(f"Could not extract score from: {response}")
        return 0

def evaluate_prompts(llm_summarizer, ollama_engine, system_prompt, human_prompt, df):
    scores = []
    for _, row in df.iterrows():
        summary = summarize_with_prompts(llm_summarizer, system_prompt, human_prompt, row['content'])
        score = evaluate_summary(ollama_engine, summary, row['label'])
        scores.append(score)
    
    return np.mean(scores)

def optimize_prompts(llm_summarizer, ollama_engine, initial_system_prompt, initial_human_prompt, df, epochs=10):
    system_prompt = Variable(initial_system_prompt, role_description="System prompt for summarization")
    human_prompt = Variable(initial_human_prompt, role_description="Human prompt for summarization")
    optimizer = TextualGradientDescent([system_prompt, human_prompt])
    
    best_score = 0
    best_system_prompt = initial_system_prompt
    best_human_prompt = initial_human_prompt
    
    for epoch in tqdm(range(epochs), desc="Optimizing prompts"):
        try:
            current_score = evaluate_prompts(llm_summarizer, ollama_engine, system_prompt.value, human_prompt.value, df)
        
            if current_score > best_score:
                best_score = current_score
                best_system_prompt = system_prompt.value
                best_human_prompt = human_prompt.value
        
            print(f"Epoch {epoch+1}/{epochs} | Current Score: {current_score:.4f} | Best Score: {best_score:.4f}")
            print(f"Current System Prompt: {system_prompt.value[:100]}...")
            print(f"Current Human Prompt: {human_prompt.value[:100]}...")
            print("-" * 50)
        
            # Compute gradients and update prompts using Ollama engine
            loss = 1 - current_score  # We want to minimize this loss
            loss_var = Variable(str(loss), role_description="Loss for optimization")
            loss_var.backward()
            optimizer.step()
            optimizer.zero_grad()
        except IndexError as e:
            print(f"IndexError occurred: {e}")
            print(f"Current system prompt: {system_prompt.value}")
            print(f"Current human prompt: {human_prompt.value}")
    
    return best_system_prompt, best_human_prompt, best_score

# Function to run optimization
def run_optimization(df):
    initial_system_prompt = """You are an AI specialized in extractive summarization for economic news articles."""
    initial_human_prompt = """Your task is to summarize the key content from the given news article in Thai, focusing on the main points. Summarize the following text: {text}"""
    
    best_system_prompt, best_human_prompt, best_score = optimize_prompts(llm_summarizer, ollama_engine, initial_system_prompt, initial_human_prompt, df, epochs=10)
    
    print("\nBest System Prompt:", best_system_prompt)
    print("\nBest Human Prompt:", best_human_prompt)
    print(f"\nBest Score: {best_score:.4f}")

# Sample dataframe for testing
def create_sample_dataframe():
    data = {
        'content': [
            "เศรษฐกิจไทยในไตรมาสที่ 2 ปี 2566 เติบโต 1.8% เทียบกับช่วงเดียวกันของปีก่อน โดยได้แรงหนุนจากการบริโภคภาคเอกชนและการส่งออกบริการที่ฟื้นตัว แม้การส่งออกสินค้าจะหดตัว",
            "ธนาคารแห่งประเทศไทยคงอัตราดอกเบี้ยนโยบายที่ 2.25% ในการประชุมเมื่อวันที่ 27 กันยายน 2566 โดยให้เหตุผลว่าเศรษฐกิจไทยมีแนวโน้มฟื้นตัวต่อเนื่อง แม้จะเผชิญความเสี่ยงจากเศรษฐกิจโลก"
        ],
        'label': [
            "เศรษฐกิจไทย Q2/2566 โต 1.8% จากการบริโภคเอกชนและการส่งออกบริการฟื้นตัว",
            "ธปท. คงดอกเบี้ย 2.25% เห็นเศรษฐกิจฟื้นต่อเนื่องแม้มีความเสี่ยงจากเศรษฐกิจโลก"
        ]
    }
    return pd.DataFrame(data)

if __name__ == "__main__":
    df = create_sample_dataframe()
    run_optimization(df)

Optimizing prompts:   0%|          | 0/10 [00:00<?, ?it/s]

Epoch 1/10 | Current Score: 0.9200 | Best Score: 0.9200
Current System Prompt: You are an AI specialized in extractive summarization for economic news articles....
Current Human Prompt: Your task is to summarize the key content from the given news article in Thai, focusing on the main ...
--------------------------------------------------


Optimizing prompts:  10%|█         | 1/10 [01:17<11:34, 77.14s/it]

Epoch 2/10 | Current Score: 0.2750 | Best Score: 0.9200
Current System Prompt: You are an AI model designed to read and summarize economic news articles by extracting key points, ...
Current Human Prompt: Your task is to condense the main ideas from the provided text into a concise summary, highlighting ...
--------------------------------------------------


Optimizing prompts:  20%|██        | 2/10 [02:20<09:10, 68.85s/it]

Epoch 3/10 | Current Score: 0.0100 | Best Score: 0.9200
Current System Prompt: You are an AI model designed to read and summarize key information from various sources, providing a...
Current Human Prompt: Your task is to distill the essential information from the provided summary, focusing on the key tak...
--------------------------------------------------


Optimizing prompts:  30%|███       | 3/10 [03:30<08:05, 69.36s/it]

Epoch 4/10 | Current Score: 0.2000 | Best Score: 0.9200
Current System Prompt: You are an AI model designed to read and summarize a wide range of content, from news articles and r...
Current Human Prompt: Your task is to distill the essential information from a given text or summary, focusing on key poin...
--------------------------------------------------


Optimizing prompts:  40%|████      | 4/10 [04:40<06:58, 69.75s/it]

Epoch 5/10 | Current Score: 0.0850 | Best Score: 0.9200
Current System Prompt: You are an AI model designed to provide concise and accurate summaries of complex information, stayi...
Current Human Prompt: Your task is to condense a lengthy text into a brief summary, focusing on the most important points ...
--------------------------------------------------


Optimizing prompts:  50%|█████     | 5/10 [05:42<05:35, 67.08s/it]

Epoch 6/10 | Current Score: 0.2650 | Best Score: 0.9200
Current System Prompt: You are an AI model designed to provide concise and actionable information, distilling complex ideas...
Current Human Prompt: Your task is to reduce a lengthy text into a concise summary, highlighting the most crucial informat...
--------------------------------------------------


Optimizing prompts:  60%|██████    | 6/10 [06:46<04:23, 65.94s/it]

Epoch 7/10 | Current Score: 0.1550 | Best Score: 0.9200
Current System Prompt: You are an AI model designed to distill complex information into concise summaries that pinpoint key...
Current Human Prompt: Your task is to condense a lengthy text into a concise summary that captures the main points and key...
--------------------------------------------------


Optimizing prompts:  70%|███████   | 7/10 [07:43<03:09, 63.06s/it]

Epoch 8/10 | Current Score: 0.2500 | Best Score: 0.9200
Current System Prompt: You are an AI model designed to extract key points from lengthy texts, condensing complex informatio...
Current Human Prompt: Your task is to shorten a lengthy text into a brief summary that highlights the primary points and e...
--------------------------------------------------


Optimizing prompts:  80%|████████  | 8/10 [09:02<02:16, 68.18s/it]

Epoch 9/10 | Current Score: 0.6850 | Best Score: 0.9200
Current System Prompt: You are an AI model designed to condense complex information into concise summaries, distilling esse...
Current Human Prompt: Summarize a lengthy text into 1-2 concise paragraphs that capture its main points and key takeaways,...
--------------------------------------------------


Optimizing prompts:  90%|█████████ | 9/10 [10:33<01:15, 75.11s/it]

Epoch 10/10 | Current Score: 0.9650 | Best Score: 0.9650
Current System Prompt: You are an AI model designed to craft concise summaries of complex information, boiling down lengthy...
Current Human Prompt: Summarize a lengthy text into 1-2 concise paragraphs that capture the main idea, key points, and ess...
--------------------------------------------------


Optimizing prompts: 100%|██████████| 10/10 [12:38<00:00, 75.80s/it]


Best System Prompt: You are an AI model designed to craft concise summaries of complex information, boiling down lengthy content into easily digestible, essential points that highlight key takeaways.

Best Human Prompt: Summarize a lengthy text into 1-2 concise paragraphs that capture the main idea, key points, and essential information, while avoiding unnecessary details and focusing on clarity and readability. Be sure to highlight any statistics, quotes, or crucial data mentioned in the original text.

Best Score: 0.9650





In [12]:
!ollama list

NAME           ID              SIZE      MODIFIED   
llama3.1:8b    42182419e950    4.7 GB    6 days ago    
llama3:8b      365c0bd3c000    4.7 GB    8 days ago    
qwen2:7b       dd314f039b9d    4.4 GB    9 days ago    
gemma2:9b      ff02c3702f32    5.4 GB    9 days ago    


In [15]:
import re
import pandas as pd
import numpy as np
from langchain_community.chat_models import ChatOllama
from langchain_community.llms import Ollama
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from tqdm import tqdm
import textgrad as tg
from textgrad.variable import Variable
from textgrad.optimizer import TextualGradientDescent
from textgrad.loss import TextLoss
from textgrad.engine import get_engine

# Initialize models
summary_model = "gemma2:9b"
llm_summarizer = ChatOllama(model=summary_model, use_gpu=True)

ollama_engine = Ollama(model="llama3.1:8b")
tg.set_backward_engine(ollama_engine, override=True)

def summarize_with_prompts(llm_summarizer, system_prompt, human_prompt, text):
    system_message = SystemMessagePromptTemplate.from_template(system_prompt)
    human_message = HumanMessagePromptTemplate.from_template(human_prompt)
    chat_prompt = ChatPromptTemplate.from_messages([system_message, human_message])
    messages = chat_prompt.format_messages(text=text)
    
    response = llm_summarizer.invoke(messages)
    return response.content

def evaluate_summary(ollama_engine, generated_summary, reference_summary):
    eval_prompt = f"""You are an AI that evaluates the quality of summaries. Compare the generated summary with the reference summary and provide a score between 0 and 1, where 1 is perfect. Consider factors such as content coverage, conciseness, and clarity. Explain your reasoning briefly, then provide the score.

Generated: {generated_summary}
Reference: {reference_summary}

Explanation and Score:"""
    
    response = ollama_engine(eval_prompt)
    match = re.search(r'\d+(\.\d+)?', response)
    if match:
        try:
            score = float(match.group())
            print(f"Evaluation explanation: {response}")
            return score
        except ValueError:
            print(f"Could not convert to float: {match.group()}")
            return 0
    else:
        print(f"Could not extract score from: {response}")
        return 0

def evaluate_prompts(llm_summarizer, ollama_engine, system_prompt, human_prompt, df):
    scores = []
    for _, row in df.iterrows():
        summary = summarize_with_prompts(llm_summarizer, system_prompt, human_prompt, row['generated_summary'])
        score = evaluate_summary(ollama_engine, summary, row['reference_summary'])
        scores.append(score)
    
    return np.mean(scores)

def optimize_prompts(llm_summarizer, ollama_engine, initial_system_prompt, initial_human_prompt, df, epochs):
    system_prompt = Variable(initial_system_prompt, role_description="System prompt for summarization")
    human_prompt = Variable(initial_human_prompt, role_description="Human prompt for summarization")
    optimizer = TextualGradientDescent([system_prompt, human_prompt]) 
    
    best_score = 0
    best_system_prompt = initial_system_prompt
    best_human_prompt = initial_human_prompt
    
    for epoch in tqdm(range(epochs), desc="Optimizing prompts"):
        try:
            current_score = evaluate_prompts(llm_summarizer, ollama_engine, system_prompt.value, human_prompt.value, df)
        
            if current_score > best_score:
                best_score = current_score
                best_system_prompt = system_prompt.value
                best_human_prompt = human_prompt.value
        
            print(f"Epoch {epoch+1}/{epochs} | Current Score: {current_score:.4f} | Best Score: {best_score:.4f}")
            print(f"Current System Prompt: {system_prompt.value[:100]}...")
            print(f"Current Human Prompt: {human_prompt.value[:100]}...")
            print("-" * 50)
        
            loss = 1 - current_score
            loss_var = Variable(str(loss), role_description="Loss for optimization")
            loss_var.backward()
            optimizer.step()
            optimizer.zero_grad()
            
            # Add some randomness to avoid local optima
            if epoch % 5 == 0:
                system_prompt.value += " " + ollama_engine("Generate a short, relevant phrase to add to a summarization system prompt.")
                human_prompt.value += " " + ollama_engine("Generate a short, relevant phrase to add to a summarization human prompt.")
        
        except Exception as e:
            print(f"Error occurred: {e}")
            print(f"Current system prompt: {system_prompt.value}")
            print(f"Current human prompt: {human_prompt.value}")
    
    return best_system_prompt, best_human_prompt, best_score

def read_csv_file(file_path):
    df = pd.read_csv(file_path)
    
    # เลือกเฉพาะคอลัมน์ที่ต้องการ
    df = df[['sum_extractive', 'extractive']]
    
    # ตั้งชื่อคอลัมน์ใหม่เพื่อความชัดเจน
    df.columns = ['generated_summary', 'reference_summary']
    
    return df

def run_optimization(file_path):
    initial_system_prompt = """You are an AI specialized in extractive summarization for economic news articles."""
    initial_human_prompt = """Your task is to summarize the key content from the given news article. Follow these guidelines:
    1. Summarize the content in Thai.
    2. Use \t at the beginning of each paragraph to create indentation.
    3. DO NOT leave blank lines between paragraphs. All paragraphs must be continuous with no blank lines.
    4. Focus on main points and important secondary points.
    5. Provide explanations of the article’s key topics without going into too much detail.
    6. Preserve all proper nouns such as names of people, companies, or organizations.
    7. Use 2-3 key sentences from the original article for each point.
    8. Maintain the original meaning and context.
    9. Arrange the content in the same order as presented in the original article.
    10. Reduce redundancy by combining similar points or information.
    11. DO NOT include any examples in the summary.

    IMPORTANT: 
    - DO NOT include any examples or case studies in the summary. Focus only on the main points and key information.

    Article to summarize:
    {text}"""
    
    df = read_csv_file(file_path)
    best_system_prompt, best_human_prompt, best_score = optimize_prompts(llm_summarizer, ollama_engine, initial_system_prompt, initial_human_prompt, df, epochs=4)
    
    print("\nBest System Prompt:", best_system_prompt)
    print("\nBest Human Prompt:", best_human_prompt)
    print(f"\nBest Score: {best_score:.4f}")

if __name__ == "__main__":
    file_path = r'F:\AI\Super AI SS4\Level 3 - INTERN\Jupyter Notebook\Latest-Dataset-Model-Generate\Edit-Prompt\final\Gemma2-final-output.csv'
    run_optimization(file_path)

Optimizing prompts:   0%|          | 0/4 [00:00<?, ?it/s]

Evaluation explanation: **Score:** 0.85 (very good)

The generated summary is well-structured and covers the main points of the reference summary, including:

1. The total fund-raising in the Thai stock market for 2020 (approximately 500 billion THB).
2. The strength of listed companies in Thailand, which were able to overcome the COVID-19 pandemic.
3. The launch of Initial Public Offering (IPO) by several companies, including SCGP and KEX.
4. The performance of SCGP, which saw a significant increase in stock price.

However, there are some minor differences between the generated summary and the reference summary:

1. The number of IPOs mentioned is different (22 vs 25).
2. The market capitalization of SCGP at IPO is not explicitly stated in the generated summary.
3. Some details about other companies, such as PTT and KEX, are missing or differ slightly.

Overall, the generated summary provides a good overview of the key points, but could be improved with more specific details and accu

Optimizing prompts:  25%|██▌       | 1/4 [45:51<2:17:33, 2751.04s/it]

Evaluation explanation: The generated summary is a brief statement that responds to the prompt, but it fails to capture the essence of the reference summary. The content coverage is extremely low, as it only mentions "adding a phrase" without any context or relevance to the topic discussed in the reference summary.

The generated summary also lacks clarity and conciseness, making it difficult to understand what it's trying to convey. It appears to be unrelated to the topic of fundraising in the Thai stock market, which is the main focus of the reference summary.

In contrast, the reference summary provides a clear and concise overview of the topic, highlighting key statistics, quotes from an official, and specific examples of companies that have gone public or are planning to do so. It also touches on the impact of COVID-19 on the stock market and the growing trend of online trading.

Based on this analysis, I would give the generated summary a score of 0 out of 1, indicating that it i

Optimizing prompts:  25%|██▌       | 1/4 [57:50<2:53:31, 3470.41s/it]


KeyboardInterrupt: 

In [16]:
import re
import pandas as pd
import numpy as np
from langchain_community.chat_models import ChatOllama
from langchain_community.llms import Ollama
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from tqdm import tqdm
import textgrad as tg
from textgrad.variable import Variable
from textgrad.optimizer import TextualGradientDescent
from textgrad.loss import TextLoss
from textgrad.engine import get_engine

# Initialize models
summary_model = "gemma2:9b"
llm_summarizer = ChatOllama(model=summary_model, use_gpu=True)

ollama_engine = Ollama(model="llama3.1:8b")
tg.set_backward_engine(ollama_engine, override=True)

def summarize_with_prompts(llm_summarizer, system_prompt, human_prompt, text):
    system_message = SystemMessagePromptTemplate.from_template(system_prompt)
    human_message = HumanMessagePromptTemplate.from_template(human_prompt)
    chat_prompt = ChatPromptTemplate.from_messages([system_message, human_message])
    messages = chat_prompt.format_messages(text=text)
    
    response = llm_summarizer.invoke(messages)
    return response.content

def evaluate_summary(ollama_engine, generated_summary, reference_summary):
    eval_prompt = f"""You are an AI that evaluates the quality of summaries. Compare the generated summary with the reference summary and provide a score between 0 and 1, where 1 is perfect. Consider factors such as content coverage, conciseness, and clarity. Explain your reasoning briefly, then provide the score.

Generated: {generated_summary}
Reference: {reference_summary}

Explanation and Score:"""
    
    response = ollama_engine(eval_prompt)
    match = re.search(r'\d+(\.\d+)?', response)
    if match:
        try:
            score = float(match.group())
            print(f"Evaluation explanation: {response}")
            return score
        except ValueError:
            print(f"Could not convert to float: {match.group()}")
            return 0
    else:
        print(f"Could not extract score from: {response}")
        return 0

def evaluate_prompts(llm_summarizer, ollama_engine, system_prompt, human_prompt, df):
    scores = []
    for _, row in df.iterrows():
        summary = summarize_with_prompts(llm_summarizer, system_prompt, human_prompt, row['generated_summary'])
        score = evaluate_summary(ollama_engine, summary, row['reference_summary'])
        scores.append(score)
    
    return np.mean(scores)

def optimize_prompts(llm_summarizer, ollama_engine, initial_system_prompt, initial_human_prompt, df, epochs):
    system_prompt = Variable(initial_system_prompt, role_description="System prompt for summarization")
    human_prompt = Variable(initial_human_prompt, role_description="Human prompt for summarization")
    optimizer = TextualGradientDescent([system_prompt, human_prompt]) 
    
    best_score = 0
    best_system_prompt = initial_system_prompt
    best_human_prompt = initial_human_prompt
    
    # Keep track of all prompts
    prompt_history = []

    for epoch in tqdm(range(epochs), desc="Optimizing prompts"):
        try:
            current_score = evaluate_prompts(llm_summarizer, ollama_engine, system_prompt.value, human_prompt.value, df)
        
            # Save the prompts of the current epoch
            prompt_history.append({
                "epoch": epoch+1,
                "system_prompt": system_prompt.value,
                "human_prompt": human_prompt.value,
                "score": current_score
            })
        
            if current_score > best_score:
                best_score = current_score
                best_system_prompt = system_prompt.value
                best_human_prompt = human_prompt.value
        
            print(f"Epoch {epoch+1}/{epochs} | Current Score: {current_score:.4f} | Best Score: {best_score:.4f}")
            print(f"Current System Prompt: {system_prompt.value[:100]}...")
            print(f"Current Human Prompt: {human_prompt.value[:100]}...")
            print("-" * 50)
        
            loss = 1 - current_score
            loss_var = Variable(str(loss), role_description="Loss for optimization")
            loss_var.backward()
            optimizer.step()
            optimizer.zero_grad()
            
            # Add some randomness to avoid local optima
            if epoch % 5 == 0:
                system_prompt.value += " " + ollama_engine("Generate a short, relevant phrase to add to a summarization system prompt.")
                human_prompt.value += " " + ollama_engine("Generate a short, relevant phrase to add to a summarization human prompt.")
        
        except Exception as e:
            print(f"Error occurred: {e}")
            print(f"Current system prompt: {system_prompt.value}")
            print(f"Current human prompt: {human_prompt.value}")
    
    return best_system_prompt, best_human_prompt, best_score, prompt_history

def read_csv_file(file_path):
    df = pd.read_csv(file_path)
    
    # เลือกเฉพาะคอลัมน์ที่ต้องการ
    df = df[['sum_extractive', 'extractive']]
    
    # ตั้งชื่อคอลัมน์ใหม่เพื่อความชัดเจน
    df.columns = ['generated_summary', 'reference_summary']
    
    return df

def run_optimization(file_path):
    initial_system_prompt = """You are an AI specialized in extractive summarization for economic news articles."""
    initial_human_prompt = """Your task is to summarize the key content from the given news article. Follow these guidelines:
    1. Summarize the content in Thai.
    2. Use \t at the beginning of each paragraph to create indentation.
    3. DO NOT leave blank lines between paragraphs. All paragraphs must be continuous with no blank lines.
    4. Focus on main points and important secondary points.
    5. Provide explanations of the article’s key topics without going into too much detail.
    6. Preserve all proper nouns such as names of people, companies, or organizations.
    7. Use 2-3 key sentences from the original article for each point.
    8. Maintain the original meaning and context.
    9. Arrange the content in the same order as presented in the original article.
    10. Reduce redundancy by combining similar points or information.
    11. DO NOT include any examples in the summary.

    IMPORTANT: 
    - DO NOT include any examples or case studies in the summary. Focus only on the main points and key information.

    Article to summarize:
    {text}"""
    
    df = read_csv_file(file_path)
    best_system_prompt, best_human_prompt, best_score, prompt_history = optimize_prompts(llm_summarizer, ollama_engine, initial_system_prompt, initial_human_prompt, df, epochs=4)
    
    print("\nBest System Prompt:", best_system_prompt)
    print("\nBest Human Prompt:", best_human_prompt)
    print(f"\nBest Score: {best_score:.4f}")
    
    # Print full prompt history
    print("\nPrompt History:")
    for entry in prompt_history:
        print(f"Epoch {entry['epoch']}:")
        print(f"System Prompt: {entry['system_prompt']}")
        print(f"Human Prompt: {entry['human_prompt']}")
        print(f"Score: {entry['score']:.4f}")
        print("-" * 50)
    
    return prompt_history

if __name__ == "__main__":
    file_path = r'F:\AI\Super AI SS4\Level 3 - INTERN\Jupyter Notebook\Latest-Dataset-Model-Generate\Edit-Prompt\final\Gemma2-final-output.csv'
    optimize = run_optimization(file_path)
    print(optimize)

Optimizing prompts:   0%|          | 0/4 [00:00<?, ?it/s]

Evaluation explanation: To evaluate the quality of the generated summary, I will compare it with the reference summary. Here's my reasoning:

**Content Coverage:** The generated summary covers some key points from the original text, such as the total fundraising in the Thai stock market, the strength of Thai listed companies during the COVID-19 pandemic, and the popularity of certain IPOs. However, it lacks details and specific examples compared to the reference summary.

**Conciseness:** Both summaries are concise, but the generated summary is slightly more condensed and might miss important points. The reference summary provides a better balance between conciseness and content coverage.

**Clarity:** The clarity of the generated summary is good, but it could be improved by adding transitional phrases or words to connect ideas.

Considering these factors, I would give the generated summary a score of **0.65** out of 1. While it covers some essential points, it lacks depth and detail c

Optimizing prompts:  25%|██▌       | 1/4 [45:21<2:16:05, 2721.81s/it]

Evaluation explanation: **Explanation**

The generated summary is concise, but it does not accurately capture the essence of the reference article. The content coverage is limited to a few sentences about the market performance and new listings on the Thai stock exchange. However, there is no mention of the impact of COVID-19, the growth of online trading, or the specific companies mentioned in the reference article.

The clarity of the generated summary is also a concern, as it does not provide enough context for the reader to understand the significance of the market figures and company listings. The tone is somewhat passive, with no clear conclusion or takeaways from the data presented.

**Score: 0.2/1**

This score reflects the significant gap between the generated summary and the reference article in terms of content coverage, conciseness, and clarity. While the generated summary provides some basic information about market trends and company listings, it falls short of providing 

Optimizing prompts:  50%|█████     | 2/4 [58:29<52:48, 1584.14s/it]  

Evaluation explanation: Here's my evaluation of the generated summary:

The generated summary starts with a question asking for an article to be provided, which is not present in the reference summary. This indicates that the AI has not correctly understood its task or is missing crucial information.

However, if we ignore this initial part and only consider the second half of the generated summary, it appears to cover some key points from the reference summary, such as:

* The total funding raised by companies listed on the Thai stock exchange
* The impact of COVID-19 on companies and the subsequent increase in IPOs
* Examples of successful IPOs, including Central Retail Corporation, Srichita Golf, SCG Packaging, and PTT

However, there are some significant differences between the generated summary and the reference summary:

* The content coverage is not comprehensive, as it only mentions a few examples of successful IPOs without providing a broader context or more detailed informati

Optimizing prompts:  75%|███████▌  | 3/4 [1:11:50<20:26, 1226.45s/it]

Evaluation explanation: Here's a comparison of the generated summary with the reference summary:

**Generated Summary:** Not provided, as this is the starting point for summarization. 🤔

The evaluation will be based on the quality of the response to "Please provide me with the article you would like summarized!" which does not contain any information.

**Reference Summary:** A detailed report about market capitalization and stock offerings in Thailand during COVID-19, highlighting strong companies and impressive fundraising results.

Given that there is no generated summary provided, I will provide a score based on the quality of the response to the prompt:

**Score: 0** (No content coverage, conciseness, or clarity can be evaluated)

Please provide the actual generated summary for further evaluation.
Evaluation explanation: **Content Coverage:** 0.6 (The generated summary covers the main idea of PTT trying to shift towards a new business strategy, but it lacks specific details about t

Optimizing prompts: 100%|██████████| 4/4 [1:25:11<00:00, 1277.83s/it]


Best System Prompt: You are an AI model designed to distill key points from economic news articles and present them in a concise manner. Here is a short and relevant phrase that you could add to a summarization system prompt:

"Condense into 2-3 key points"

This phrase provides context for the summarization task by specifying the desired output (a brief summary of 2-3 key points) and can help guide the AI model in generating an accurate and concise summary.

Best Human Prompt: Please condense the main points from the article into a concise summary (approx. 100-150 words), highlighting key findings, arguments, or insights that capture the essence of the original text. Here is a short and relevant phrase that you can add to a summarization human prompt:

"Key points only."

This phrase tells the AI to focus on extracting the most important information from the text being summarized, and to leave out details or supporting evidence unless they are crucial to understanding the main idea.





In [None]:
optimize