### Ollama

In [1]:
!ollama list

NAME           ID              SIZE      MODIFIED   
llama3.1:8b    42182419e950    4.7 GB    6 days ago    
llama3:8b      365c0bd3c000    4.7 GB    8 days ago    
qwen2:7b       dd314f039b9d    4.4 GB    9 days ago    
gemma2:9b      ff02c3702f32    5.4 GB    9 days ago    


In [2]:
from langchain_community.chat_models import ChatOllama
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate

In [3]:
local_model = "llama3:8b"
# llm = ChatOllama(model=local_model)
llm = ChatOllama(model=local_model, use_gpu=True)

In [4]:
llm

ChatOllama(model='llama3:8b')

In [5]:
import pandas as pd

df = pd.read_csv(
    r'F:\AI\Super AI SS4\Level 3 - INTERN\Jupyter Notebook\Latest-Dataset-Model-Generate\Random\Latest-20-random.csv')

df.shape

(20, 4)

In [6]:
df.head()

Unnamed: 0,content,extractive,abstractive,index_original
0,มิติหุ้น SKE คอนเฟิร์มรายได้ปี 62 เต...,\t บมจ.สากล เอนเนอยี หรือ SKE โดย นายจักรพงส์...,\tนายจักรพงส์ สุเมธโชติเมธา กรรมการผู้จัดการให...,3017
1,ผลกระทบจากเชื้อไวรัสโคโรนา (โควิด-19...,ผลกระทบจากโควิด-19 ผลต่อกำไรของบริษั...,\tนายมงคล พ่วงเภตรา ผู้ช่วยกรรมการผู้จัดการ ฝ่...,2844
2,องค์การส่งเสริมกิจการโคนมแห่งประเทศไ...,ดร.ณรงค์ฤทธิ์ วงศ์สุวรรณ ผู้อำนวยการ...,ดร.ณรงค์ฤทธิ์ วงศ์สุวรรณ ผู้อำนวยการ...,2055
3,กระทรวงพลังงานใช้เวลาไป 1 ปี กับอีก ...,\tกระทรวงพลังงานใช้เวลา 1 ปี 7 เดือน แก้ปมร...,\tกระทรวงพลังงานใช้เวลา 1 ปี 7 เดือน แก้ปมร...,199
4,นับเป็นอีกหนึ่งโครงการดีๆ ที่เปิดโอก...,\t อีกหนึ่งโครงการดีๆ ที่เปิดโอกาสให้เด็ก ...,\tโครงการประกวดศิลปกรรม ปตท. จากการสนับสนุนขอ...,216


### Summarize

#### Abstractive

In [7]:
def summarize_abstractive(text, llm):
    system_template = """You are an AI specialized in abstractive summarization for economic news articles."""

    prompt = """Your task is to create an abstractive summary of the given news article. Follow these guidelines:
    1. Summarize the content in Thai.
    2. Use neutral, and clear language while maintaining a formal tone.
    3. Use \t at the beginning of each paragraph to create indentation.
    4. Each paragraph should present a different point.
    5. DO NOT leave blank lines between paragraphs. All paragraphs must be continuous with no blank lines.
    6. Explain in detail the essence and main points of the article.
    7. Ensure the summary is coherent and flows well as a standalone piece.
    8. Preserve all important proper nouns such as names of people, companies, or organizations.
    9. Organize the content logically, which may differ from the original article's structure if it improves clarity.
    10. Synthesize information from different parts of the article when appropriate.
    11. DO NOT include any examples or case studies in the summary.
    
    IMPORTANT:
    - Please verify the accuracy of the information and present it in a neutral manner, without personal opinions or bias.
    - Focus on creating a coherent, flowing summary that captures the main ideas without direct quoting.
    - The summary has retained its original meaning and context.
    
    Article to summarize:
    {text}
    """

    system_message_prompt = SystemMessagePromptTemplate.from_template(system_template)
    human_message_prompt = HumanMessagePromptTemplate.from_template(prompt)

    chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])
    messages = chat_prompt.format_messages(text=text)

    return llm.invoke(messages).content

#### Extractive

In [8]:
def summarize_extractive(text, llm):
    system_template = """You are an AI specialized in extractive summarization for economic news articles."""

    prompt = """Your task is to summarize the key content from the given news article. Follow these guidelines:
    1. Summarize the content in Thai.
    2. Use \t at the beginning of each paragraph to create indentation.
    3. DO NOT leave blank lines between paragraphs. All paragraphs must be continuous with no blank lines.
    4. Focus on main points and important secondary points.
    5. Provide explanations of the article’s key topics without going into too much detail.
    6. Preserve all proper nouns such as names of people, companies, or organizations.
    7. Use 2-3 key sentences from the original article for each point.
    8. Maintain the original meaning and context.
    9. Arrange the content in the same order as presented in the original article.
    10. Reduce redundancy by combining similar points or information.
    11. DO NOT include any examples in the summary.

    IMPORTANT: 
    - DO NOT include any examples or case studies in the summary. Focus only on the main points and key information.

    Article to summarize:
    {text}
    """

    system_message_prompt = SystemMessagePromptTemplate.from_template(system_template)
    human_message_prompt = HumanMessagePromptTemplate.from_template(prompt)

    chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])
    messages = chat_prompt.format_messages(text=text)

    return llm.invoke(messages).content

In [9]:
!ollama list

NAME           ID              SIZE      MODIFIED   
llama3.1:8b    42182419e950    4.7 GB    6 days ago    
llama3:8b      365c0bd3c000    4.7 GB    8 days ago    
qwen2:7b       dd314f039b9d    4.4 GB    9 days ago    
gemma2:9b      ff02c3702f32    5.4 GB    9 days ago    


In [12]:
!ollama list

NAME           ID              SIZE      MODIFIED   
llama3.1:8b    42182419e950    4.7 GB    6 days ago    
llama3:8b      365c0bd3c000    4.7 GB    8 days ago    
qwen2:7b       dd314f039b9d    4.4 GB    9 days ago    
gemma2:9b      ff02c3702f32    5.4 GB    9 days ago    


In [2]:
import re
import pandas as pd
import numpy as np
from langchain_community.chat_models import ChatOllama
from langchain_community.llms import Ollama
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from tqdm import tqdm
import textgrad as tg
from textgrad.variable import Variable
from textgrad.optimizer import TextualGradientDescent
from textgrad.loss import TextLoss
from textgrad.engine import get_engine

# Initialize models
summary_model = "gemma2:9b"
llm_summarizer = ChatOllama(model=summary_model, use_gpu=True)

ollama_engine = Ollama(model="llama3.1:8b")
tg.set_backward_engine(ollama_engine, override=True)

def summarize_with_prompts(llm_summarizer, system_prompt, human_prompt, text):
    system_message = SystemMessagePromptTemplate.from_template(system_prompt)
    human_message = HumanMessagePromptTemplate.from_template(human_prompt)
    chat_prompt = ChatPromptTemplate.from_messages([system_message, human_message])
    messages = chat_prompt.format_messages(text=text)
    
    response = llm_summarizer.invoke(messages)
    return response.content

def evaluate_summary(ollama_engine, generated_summary, reference_summary):
    eval_prompt = f"""You are an AI that evaluates the quality of summaries. Compare the generated summary with the reference summary and provide a score between 0 and 1, where 1 is perfect. Consider factors such as content coverage, conciseness, and clarity. Explain your reasoning briefly, then provide the score.

Generated: {generated_summary}
Reference: {reference_summary}

Explanation and Score:"""
    
    response = ollama_engine(eval_prompt)
    match = re.search(r'\d+(\.\d+)?', response)
    if match:
        try:
            score = float(match.group())
            print(f"Evaluation explanation: {response}")
            return score
        except ValueError:
            print(f"Could not convert to float: {match.group()}")
            return 0
    else:
        print(f"Could not extract score from: {response}")
        return 0

def evaluate_prompts(llm_summarizer, ollama_engine, system_prompt, human_prompt, df):
    scores = []
    for _, row in df.iterrows():
        summary = summarize_with_prompts(llm_summarizer, system_prompt, human_prompt, row['generated_summary'])
        score = evaluate_summary(ollama_engine, summary, row['reference_summary'])
        scores.append(score)
    
    return np.mean(scores)

def optimize_prompts(llm_summarizer, ollama_engine, initial_system_prompt, initial_human_prompt, df, epochs):
    system_prompt = Variable(initial_system_prompt, role_description="System prompt for summarization")
    human_prompt = Variable(initial_human_prompt, role_description="Human prompt for summarization")
    optimizer = TextualGradientDescent([system_prompt, human_prompt]) 
    
    best_score = 0
    best_system_prompt = initial_system_prompt
    best_human_prompt = initial_human_prompt
    
    # Keep track of all prompts
    prompt_history = []

    for epoch in tqdm(range(epochs), desc="Optimizing prompts"):
        try:
            current_score = evaluate_prompts(llm_summarizer, ollama_engine, system_prompt.value, human_prompt.value, df)
        
            # Save the prompts of the current epoch
            prompt_history.append({
                "epoch": epoch+1,
                "system_prompt": system_prompt.value,
                "human_prompt": human_prompt.value,
                "score": current_score
            })
        
            if current_score > best_score:
                best_score = current_score
                best_system_prompt = system_prompt.value
                best_human_prompt = human_prompt.value
        
            print(f"Epoch {epoch+1}/{epochs} | Current Score: {current_score:.4f} | Best Score: {best_score:.4f}")
            print(f"Current System Prompt: {system_prompt.value[:100]}...")
            print(f"Current Human Prompt: {human_prompt.value[:100]}...")
            print("-" * 50)
        
            loss = 1 - current_score
            loss_var = Variable(str(loss), role_description="Loss for optimization")
            loss_var.backward()
            optimizer.step()
            optimizer.zero_grad()
            
            # Add some randomness to avoid local optima
            if epoch % 5 == 0:
                system_prompt.value += " " + ollama_engine("Generate a short, relevant phrase to add to a summarization system prompt.")
                human_prompt.value += " " + ollama_engine("Generate a short, relevant phrase to add to a summarization human prompt.")
        
        except Exception as e:
            print(f"Error occurred: {e}")
            print(f"Current system prompt: {system_prompt.value}")
            print(f"Current human prompt: {human_prompt.value}")
    
    return best_system_prompt, best_human_prompt, best_score, prompt_history

def read_csv_file(file_path):
    df = pd.read_csv(file_path)
    
    # เลือกเฉพาะคอลัมน์ที่ต้องการ
    df = df[['sum_extractive', 'extractive']]
    
    # ตั้งชื่อคอลัมน์ใหม่เพื่อความชัดเจน
    df.columns = ['generated_summary', 'reference_summary']
    
    return df

def run_optimization(file_path):
    initial_system_prompt = """You are an AI specialized in extractive summarization for economic news articles."""
    initial_human_prompt = """Your task is to summarize the key content from the given news article. Follow these guidelines:
    1. Summarize the content in Thai.
    2. Use \t at the beginning of each paragraph to create indentation.
    3. DO NOT leave blank lines between paragraphs. All paragraphs must be continuous with no blank lines.
    4. Focus on main points and important secondary points.
    5. Provide explanations of the article’s key topics without going into too much detail.
    6. Preserve all proper nouns such as names of people, companies, or organizations.
    7. Use 2-3 key sentences from the original article for each point.
    8. Maintain the original meaning and context.
    9. Arrange the content in the same order as presented in the original article.
    10. Reduce redundancy by combining similar points or information.
    11. DO NOT include any examples in the summary.

    IMPORTANT: 
    - DO NOT include any examples or case studies in the summary. Focus only on the main points and key information.

    Article to summarize:
    {text}"""
    
    df = read_csv_file(file_path)
    best_system_prompt, best_human_prompt, best_score, prompt_history = optimize_prompts(llm_summarizer, ollama_engine, initial_system_prompt, initial_human_prompt, df, epochs=4)
    
    print("\nBest System Prompt:", best_system_prompt)
    print("\nBest Human Prompt:", best_human_prompt)
    print(f"\nBest Score: {best_score:.4f}")
    
    # Print full prompt history
    print("\nPrompt History:")
    for entry in prompt_history:
        print(f"Epoch {entry['epoch']}:")
        print(f"System Prompt: {entry['system_prompt']}")
        print(f"Human Prompt: {entry['human_prompt']}")
        print(f"Score: {entry['score']:.4f}")
        print("-" * 50)
    
    return prompt_history

if __name__ == "__main__":
    file_path = r'F:\AI\Super AI SS4\Level 3 - INTERN\Jupyter Notebook\Latest-Dataset-Model-Generate\Edit-Prompt\final\Gemma2-final-output.csv'
    optimize = run_optimization(file_path)
    print(optimize)

  response = ollama_engine(eval_prompt)


Evaluation explanation: The generated summary has some similarities with the reference summary, but it lacks crucial details. Here's a brief explanation of my reasoning:

* Content coverage: The generated summary covers only 50% of the content in the reference summary, missing key points such as the total market capitalization at the end of 2020 and the specific performance of SCGP.
* Conciseness: The generated summary is concise, but it's also incomplete. It fails to provide essential information, making it difficult to understand the overall context.
* Clarity: The language used in the generated summary is clear, but it's not as accurate as the reference summary.

Score: 0.6/1

The score indicates a moderate level of quality, but with significant room for improvement. The generated summary needs to provide more comprehensive information and accurately convey the details from the original text.
Evaluation explanation: The generated summary is quite close to the reference summary. Howe

Optimizing prompts:  25%|██▌       | 1/4 [48:10<2:24:32, 2890.79s/it]

Evaluation explanation: The generated summary is quite different from the reference summary. While it mentions a positive sentiment towards the "Key takeaways only" option, it does not provide any information about the Thai stock market or companies going public.

In contrast, the reference summary provides detailed information about the Thai stock market's fundraising efforts, the performance of various companies, and trends in the market. It also mentions specific companies that have gone public and their respective financial data.

Given this comparison, I would give the generated summary a score of 0.2 out of 1. The content coverage is minimal, conciseness is not an issue since it's too short to be concise, and clarity is lacking as it doesn't provide any meaningful information about the topic. Overall, the generated summary does not capture the essence of the reference summary or provide useful insights on its own.
Evaluation explanation: The generated summary is not relevant to t

Optimizing prompts:  50%|█████     | 2/4 [1:01:41<55:34, 1667.37s/it]

Evaluation explanation: To evaluate the generated summary, I will compare it with the reference summary.

The generated summary is a simple request for a news article, whereas the reference summary is a detailed text discussing the Thai stock market's fundraising efforts. The content coverage of the generated summary is essentially zero, as it doesn't provide any information about the topic.

Given this mismatch, I would score the generated summary a 0 out of 1 in terms of content coverage, conciseness, and clarity, since it fails to deliver even a basic summary of the reference text.
Evaluation explanation: To evaluate the quality of the generated summary, I will compare it with the reference summary.

The generated response is simply "Please provide me with the news article! I'm ready to analyze it and give you a concise summary of the key economic points. Just paste the text here, and I'll get to work. 😊"

This does not contain any content related to the news article or its analysis

Optimizing prompts:  75%|███████▌  | 3/4 [1:14:48<21:05, 1265.34s/it]

Evaluation explanation: **Score:** 0.8 (out of 1)

The generated summary effectively captures the key points from the reference summary, including:

* The amount of funds raised in the Thai stock market
* The impact of COVID-19 on listed companies and new IPOs
* The growth of market capitalization for some listed companies, such as Central Retail Corp. and SCG Packaging
* The increasing interest in online trading and the planned IPOs of Kerry Express and PTT Oil and Gas

However, the generated summary lacks some details from the reference, such as:

* The specific dates for some IPOs (e.g., "ตั้งแต่ต้นปี ถึง 15 ธันวาคม 2563")
* Some company names and financial data (e.g., "มูลค่าระดมทุน 55,902.00 ล้านบาท" for Central Retail Corp.)

The generated summary is concise and clear, making it easy to understand the main points. Overall, a good effort!
Evaluation explanation: **Content Coverage:** 0.8
The generated summary touches on the key points of PTT's strategy to become a leader in the oi

Optimizing prompts: 100%|██████████| 4/4 [1:25:44<00:00, 1286.13s/it]


Best System Prompt: Summarize this news article about economic topics for me.

Best Human Prompt: Take a lengthy article and condense it into a concise summary, highlighting the most critical information, key findings, or main takeaways in 1-2 paragraphs at most.

Best Score: 102.7000

Prompt History:
Epoch 1:
System Prompt: You are an AI specialized in extractive summarization for economic news articles.
Human Prompt: Your task is to summarize the key content from the given news article. Follow these guidelines:
    1. Summarize the content in Thai.
    2. Use 	 at the beginning of each paragraph to create indentation.
    3. DO NOT leave blank lines between paragraphs. All paragraphs must be continuous with no blank lines.
    4. Focus on main points and important secondary points.
    5. Provide explanations of the article’s key topics without going into too much detail.
    6. Preserve all proper nouns such as names of people, companies, or organizations.
    7. Use 2-3 key senten




In [3]:
optimize

[{'epoch': 1,
  'system_prompt': 'You are an AI specialized in extractive summarization for economic news articles.',
  'human_prompt': 'Your task is to summarize the key content from the given news article. Follow these guidelines:\n    1. Summarize the content in Thai.\n    2. Use \t at the beginning of each paragraph to create indentation.\n    3. DO NOT leave blank lines between paragraphs. All paragraphs must be continuous with no blank lines.\n    4. Focus on main points and important secondary points.\n    5. Provide explanations of the article’s key topics without going into too much detail.\n    6. Preserve all proper nouns such as names of people, companies, or organizations.\n    7. Use 2-3 key sentences from the original article for each point.\n    8. Maintain the original meaning and context.\n    9. Arrange the content in the same order as presented in the original article.\n    10. Reduce redundancy by combining similar points or information.\n    11. DO NOT include any