<a href="https://colab.research.google.com/github/Frasierzzz/Project/blob/main/Test_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Preparing

In [None]:
!pip install transformers datasets evaluate



In [None]:
!pip --upgrade install transformers datasets evaluate


Usage:   
  pip3 <command> [options]

no such option: --upgrade


In [None]:
# นำเข้า library
from transformers import T5Tokenizer, T5ForConditionalGeneration
from datasets import load_dataset
import evaluate

In [None]:
# 2. โหลดชุดข้อมูล xsum
dataset = load_dataset("xsum")  # ทดสอบแค่ 100 ตัวอย่างแรก

In [None]:
import random
from datasets import DatasetDict
# สุ่มข้อมูล 10%
random.seed(14)
def sample_10_percent(dataset):
    sample_size = int(len(dataset) * 0.1)
    indices = random.sample(range(len(dataset)), sample_size)
    return dataset.select(indices)

In [None]:
sampled_datasets = DatasetDict({
    split: sample_10_percent(dataset) for split, dataset in dataset.items()
})

In [None]:
sampled_datasets

DatasetDict({
    train: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 20404
    })
    validation: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 1133
    })
    test: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 1133
    })
})

# t5-small

In [None]:
# โหลดโมเดลและ tokenizer
model_name = "t5-small"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


In [None]:
def preprocess_texts(texts):
    """
    ตรวจสอบและแปลงข้อความใน `texts` ให้อยู่ในรูปแบบที่ใช้ได้
    Args:
        texts (list): รายการข้อความ (list of str หรือ list of list)
    Returns:
        list: ข้อความที่ถูกแปลงให้อยู่ในรูปแบบ list of str
    """
    processed_texts = []
    for text in texts:
        if isinstance(text, list):
            processed_texts.append(" ".join(text))  # รวมข้อความถ้าเป็น list
        elif not isinstance(text, str):
            processed_texts.append(str(text))  # แปลงเป็น string
        else:
            processed_texts.append(text)
    return processed_texts

In [None]:
def summarize_single_text(text, max_input_length=512, max_output_length=128):
    """
    สรุปข้อความเดี่ยว
    Args:
        text (str): ข้อความที่ต้องการสรุป
        max_input_length (int): ความยาวสูงสุดของข้อความ input
        max_output_length (int): ความยาวสูงสุดของข้อความ output
    Returns:
        str: ข้อความที่ถูกสรุป
    """
    # แปลงข้อความเป็น token
    inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", truncation=True, max_length=max_input_length)
    # สร้างข้อความสรุป
    outputs = model.generate(inputs, max_length=max_output_length, num_beams=4, early_stopping=True)
    # ถอดรหัสข้อความ
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

In [None]:
# การใช้งาน
documents = sampled_datasets['validation']['document'] # สมมติว่าเป็น list ของข้อความ
processed_documents = preprocess_texts(documents)  # เตรียมข้อมูล

In [None]:
processed_documents[0]

'Media playback is not supported on this device\nHis departure handed Jared Payne a chance to stake his claim to the Test full-back slot.\nUltimately he did not do that.\nThe New Zealand-born Ireland international had a fitful game. He was good in parts - notably putting in an excellent tackle on express train wing Waisake Naholo in the first half - but was indifferent in others, such as knocking on a routine high ball off a second-half kick-off.\nLeigh Halfpenny has not set the tour alight so far. The Welshman appeared in a laboured win over the Provincial Barbarians in the tour opener and the 22-16 defeat by the Blues.\nBut I think, even had Hogg stayed fit, Halfpenny might have been ahead in Warren Gatland\'s thinking.\nHe doesn\'t give you any of the X-factor that Hogg would have done. But, on the other hand, he is entirely reliable. He is probably the best goal-kicker in the world and, with Owen Farrell missing another penalty that he would usually slot today, that quality could b

In [None]:
text = "Cats are independent animals that require minimal care compared to dogs. They are known for their ability to clean themselves, hunt small pests, and adapt well to apartment living. While they can be affectionate, they also enjoy spending time alone and do not require constant attention from their owners."
summarize_single_text(text)

' Cats are independent animals that require minimal care compared to dogs . They are known for their ability to clean themselves, hunt small pests, and adapt well to apartment living . While they can be affectionate, they also enjoy spending time alone and do not require constant attention from their owners .'

In [None]:
summaries = []
for doc in processed_documents:
    summary = summarize_single_text(doc)  # เรียกทีละข้อความ
    summaries.append(summary)

KeyboardInterrupt: 

In [None]:
len(summaries)

100

In [None]:
import evaluate

# โหลด metric ROUGE
rouge = evaluate.load("rouge")

# ดึงข้อความอ้างอิง
references = sampled_datasets['test']['summary'][0:100]
# summaries: ค่าที่ได้จากโมเดล

# คำนวณ ROUGE
results = rouge.compute(predictions=summaries, references=references)

# แสดงผลแบบรวม
print("ROUGE Scores (average):")
print(f"ROUGE-1: {results['rouge1']:.4f}")
print(f"ROUGE-2: {results['rouge2']:.4f}")
print(f"ROUGE-L: {results['rougeL']:.4f}")
print(f"ROUGE-Lsum: {results['rougeLsum']:.4f}")


ROUGE Scores (average):
ROUGE-1: 0.1850
ROUGE-2: 0.0252
ROUGE-L: 0.1331
ROUGE-Lsum: 0.1334


# Bart

In [None]:
from transformers import BartTokenizer, BartForConditionalGeneration

# โหลดโมเดล DistilBART
tokenizer = BartTokenizer.from_pretrained("sshleifer/distilbart-cnn-12-6")
model = BartForConditionalGeneration.from_pretrained("sshleifer/distilbart-cnn-12-6")

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

In [None]:
# ฟังก์ชันสรุปข้อความ
def summarize_with_distilbart(text, max_input_length=512, max_output_length=128):
  inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=max_input_length)
  outputs = model.generate(inputs["input_ids"], max_length=max_output_length, num_beams=4, early_stopping=True)
  return tokenizer.decode(outputs[0], skip_special_tokens=True)

In [None]:
summarize_with_distilbart(processed_documents[0])



' Leigh Halfpenny has not set the tour alight so far . Jared Payne was given a chance to stake his claim to the Test full-back slot . Anthony Watson has made the biggest impression if the Lions need someone to come on'

In [None]:
bart_summaries = [summarize_with_distilbart(doc) for doc in processed_documents]

KeyboardInterrupt: 

In [None]:
# คำนวณ ROUGE สำหรับ Pegasus
bart_results = rouge.compute(predictions=bart_summaries, references=sampled_datasets['test']['summary'][0:100])

In [None]:
# แสดงผลลัพธ์
print("Pegasus ROUGE Scores:")
for key, value in pegasus_results.items():
    print(f"{key}: {value:.4f}")

# llama

In [None]:
# LLaMA
from transformers import LlamaTokenizer, LlamaForCausalLM
llama_tokenizer = LlamaTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
llama_model = LlamaForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")

In [None]:
def summarize_with_llama(text, max_input_length=512, max_output_length=50):
    input_text = f"Summarize: {text}"  # LLaMA ต้องมี context ที่ชัดเจน
    inputs = llama_tokenizer(input_text, return_tensors="pt", truncation=True, max_length=max_input_length)
    outputs = llama_model.generate(inputs["input_ids"], max_length=max_output_length, num_beams=4, early_stopping=True)
    return llama_tokenizer.decode(outputs[0], skip_special_tokens=True)


In [None]:
llama_summaries = [summarize_with_llama(doc) for doc in processed_documents]

In [None]:
# คำนวณ ROUGE สำหรับ LLaMA
llama_results = rouge.compute(predictions=llama_summaries, references=sampled_datasets['test']['summary'])


In [None]:
print("\nLLaMA ROUGE Scores:")
for key, value in llama_results.items():
    print(f"{key}: {value:.4f}")

In [None]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

In [None]:
def summarize_text(text, model_name="t5-small", max_input_length=512, max_output_length=150):
    # โหลด Tokenizer และ Model
    tokenizer = T5Tokenizer.from_pretrained(model_name)
    model = T5ForConditionalGeneration.from_pretrained(model_name)

    # เตรียมข้อความสำหรับ T5 (เพิ่ม prefix "summarize: ")
    input_text = f"summarize: {text}"
    inputs = tokenizer.encode(input_text, return_tensors="pt", max_length=max_input_length, truncation=True)

    # สร้างสรุป
    summary_ids = model.generate(inputs, max_length=max_output_length, min_length=20, length_penalty=2.0, num_beams=4, early_stopping=True)
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

    return summary

In [None]:
# ตัวอย่างการใช้งาน
if __name__ == "__main__":
    text = "Cats are independent animals that require minimal care compared to dogs. They are known for their ability to clean themselves, hunt small pests, and adapt well to apartment living. While they can be affectionate, they also enjoy spending time alone and do not require constant attention from their owners."
    summary = summarize_text(text)


    print("\nSummary:")
    print(summary)

Original Text:
Cats are independent animals that require minimal care compared to dogs. They are known for their ability to clean themselves, hunt small pests, and adapt well to apartment living. While they can be affectionate, they also enjoy spending time alone and do not require constant attention from their owners.

Summary:
cats are independent animals that require minimal care compared to dogs. they are known for their ability to clean themselves, hunt small pests and adapt well to apartment living.


In [None]:
# ตัวอย่างการใช้งาน
if __name__ == "__main__":
    text = "Cats are independent animals that require minimal care compared to dogs. They are known for their ability to clean themselves, hunt small pests, and adapt well to apartment living. While they can be affectionate, they also enjoy spending time alone and do not require constant attention from their owners."
    summary = summarize_text(text)

    # ตัวอย่างข้อความ
    predictions = [summary]
    references = ["Cats are independent, low-maintenance animals that thrive in various living environments and require minimal attention."]

    # คำนวณ ROUGE
    print("\nSummary:", summary)
    results = rouge.compute(predictions=predictions, references=references)

    # แสดงผลลัพธ์
    print(results)


Summary: cats are independent animals that require minimal care compared to dogs. they are known for their ability to clean themselves, hunt small pests and adapt well to apartment living.
{'rouge1': 0.4, 'rouge2': 0.18604651162790697, 'rougeL': 0.3111111111111111, 'rougeLsum': 0.3111111111111111}


In [None]:
# ตัวอย่างการใช้งาน
if __name__ == "__main__":
    text2 = "Education is a key driver of progress, benefiting both individuals and society. It empowers people with the skills and knowledge needed to pursue careers, innovate, and contribute to economic growth. Moreover, education helps reduce inequality by providing opportunities for all, regardless of their background. A well-educated population fosters social cohesion and promotes understanding among diverse communities. Despite its advantages, millions around the world lack access to quality education due to poverty, lack of infrastructure, and systemic inequalities. Bridging this gap requires investment, policy reforms, and community efforts to make education accessible for everyone."
    summary2 = summarize_text(text2)

    # ตัวอย่างข้อความ
    predictions2 = [summary2]
    references2 = ['Education drives progress, reduces inequality, and needs global efforts to improve accessibility.']

    # คำนวณ ROUGE
    print("\nSummary:", summary2)
    results2 = rouge.compute(predictions=predictions2, references=references2)

    # แสดงผลลัพธ์
    print(results2)


Summary: education is a key driver of progress, benefiting both individuals and society. it empowers people with the skills and knowledge needed to pursue careers. education helps reduce inequality by providing opportunities for all.
{'rouge1': 0.2222222222222222, 'rouge2': 0.0, 'rougeL': 0.17777777777777776, 'rougeLsum': 0.17777777777777776}


In [None]:
# ตัวอย่างการใช้งาน
if __name__ == "__main__":
    text3 = "Climate change poses one of the greatest challenges of our time, with profound impacts on the environment and humanity. The rise in global temperatures is fueled by human-induced emissions of greenhouse gases from activities such as burning fossil fuels, deforestation, and industrial processes. These emissions trap heat in the atmosphere, causing glaciers to melt, sea levels to rise, and weather patterns to become more extreme. Coastal regions are particularly vulnerable to flooding, while agricultural systems face disruptions due to droughts and unpredictable rainfall. Beyond environmental concerns, the economic and social consequences are alarming. Food insecurity, displacement of populations, and health crises, particularly in low-income regions, are directly linked to climate change. Addressing this crisis requires global collaboration to reduce emissions, transition to renewable energy, and implement sustainable practices. Governments, businesses, and individuals must work together to mitigate the effects and adapt to the changing climate."
    summary3 = summarize_text(text3)

    # ตัวอย่างข้อความ
    predictions3 = [summary3]
    references3 = ['Climate change threatens the environment and society, demanding global action and sustainable practices.']

    # คำนวณ ROUGE
    print("\nSummary:", summary3)
    results3 = rouge.compute(predictions=predictions3, references=references3)

    # แสดงผลลัพธ์
    print(results3)


Summary: rise in global temperatures is fueled by human-induced emissions of greenhouse gases from activities such as burning fossil fuels, deforestation, and industrial processes. these emissions trap heat in the atmosphere, causing glaciers to melt, sea levels to rise, and weather patterns to become more extreme. food insecurity, displacement of populations, and health crises are directly linked to climate change.
{'rouge1': 0.16438356164383564, 'rouge2': 0.028169014084507043, 'rougeL': 0.08219178082191782, 'rougeLsum': 0.08219178082191782}


In [None]:
if __name__ == "__main__":
  text5 = "The industrial revolution marked a pivotal shift in human history, transforming economies, societies, and the environment. Originating in the late 18th century in Britain, this era introduced innovations such as the steam engine, mechanized production, and the factory system. These advancements increased productivity and wealth, allowing goods to be produced at unprecedented scales and speeds. Cities grew rapidly as people moved from rural areas to urban centers in search of work, creating new opportunities but also significant challenges, such as overcrowding and poor living conditions. As industrialization spread globally, it brought remarkable improvements in living standards, access to goods, and technological progress. However, this progress came at a cost. The reliance on coal and other fossil fuels as primary energy sources led to widespread environmental degradation. Air and water pollution became rampant, and the exploitation of natural resources accelerated at unsustainable rates. Additionally, the working conditions for many laborers were harsh, with long hours, low wages, and unsafe environments being the norm. The social impact of industrialization was equally profound. While it created a new middle class and expanded access to education and healthcare, it also widened the gap between the wealthy and the working poor. Child labor was prevalent, and workers often lacked legal protections. Over time, these issues sparked movements for reform, leading to labor unions, worker rights legislation, and the eventual establishment of minimum wage laws and workplace safety standards. In the modern era, the legacy of the industrial revolution continues to shape our world. Technological advancements stemming from this period laid the groundwork for the digital revolution, which has transformed communication, transportation, and nearly every aspect of daily life. However, the environmental challenges initiated during this time remain unresolved, with climate change now presenting a global crisis. The shift toward renewable energy, sustainable practices, and green technologies reflects an ongoing effort to balance industrial progress with ecological responsibility."
  summary5 = summarize_text(text5)

  # ตัวอย่างข้อความ
  predictions5 = [summary5]
  references5 = ['The industrial revolution transformed society and technology, but its environmental and social challenges persist, requiring sustainable solutions today.']

# คำนวณ ROUGE
  print("\nSummary:", summary5)
  results5 = rouge.compute(predictions=predictions5, references=references5)

  # แสดงผลลัพธ์
  print(results5)


Summary: the industrial revolution began in the late 18th century in the uk. it brought remarkable improvements in living standards, access to goods, and technological progress. reliance on coal and other fossil fuels as primary energy sources led to widespread environmental degradation.
{'rouge1': 0.2033898305084746, 'rouge2': 0.07017543859649122, 'rougeL': 0.16949152542372883, 'rougeLsum': 0.16949152542372883}
