# Final Project of the NLP 2024 Course

Slides: https://docs.google.com/presentation/d/1NbH4E2HKVHQlaW_ivKCyjpWuEJFvmz3bSKsX8fs67tA/edit#slide=id.g2d17364e0e4_0_34


## Environment Setup

Get your own huggingface access token via
https://huggingface.co/settings/tokens

And set up HF_TOKEN as a secret of Colab

In [1]:
!pip install transformers accelerate

Collecting accelerate
  Downloading accelerate-0.31.0-py3-none-any.whl (309 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m309.4/309.4 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.w

## Generate Test Dataset with Gemini
Use the data on arxiv : https://paperswithcode.com/dataset/arxiv-10

In [None]:
import pandas as pd
import csv
from time import sleep

import google.generativeai as genai

In [None]:
### User Gemini
GEMINI_API_KEY = "your_api_key" # Please change your api key here!
genai.configure(api_key=GEMINI_API_KEY)
model = genai.GenerativeModel('gemini-1.5-flash')

In [None]:
with open('output_new.csv', 'w', newline='') as f1:
  writer = csv.writer(f1)
  writer.writerow(['Abstract', 'Metholodgy_LLM'])

In [None]:
data = pd.read_csv("arxiv100.csv")
data = data.sample(frac=1, random_state=0)

abstract_example = """
The reliability of self-labeled data is an important issue when the data are regarded as ground-truth for training and testing learning-based models.
This paper addresses the issue of false-alarm hashtags in the self-labeled data for irony detection.
We analyze the ambiguity of hashtag usages and propose a novel neural network-based model, which incorporates linguistic information from different aspects, to disambiguate the usage of three hashtags that are widely used to collect the training data for irony detection.
Furthermore, we apply our model to prune the self-labeled training data.
Experimental results show that the irony detection model trained on the less but cleaner training instances outperforms the models trained on all data.
"""

method_example = """
We analyze the ambiguity of hashtag usages and propose a novel neural network-based model,
which incorporates linguistic information from different aspects,
to disambiguate the usage of three hashtags that are widely used to collect the training data for irony detection. Furthermore,
we apply our model to prune the self-labeled training data.
"""

In [None]:
num=0
for i in range(num, 100):
  print("Question", num)
  abstract = data['abstract'][i]
  prompt = f"""
  I would like you to extract the methodology from the abstract.
  Below is an example for your reference:
  \n\n\n
  Abstract Example: {abstract_example}
  \n\n\n
  Metholodhy Example: {method_example}
  \n\n\n
  Now, Here is an abstract of an article:
  Abstract: {abstract}
  \n\n\n
  Please extract the methodology from the abstract and don't use markdown.
  You just need to extract the sentences related to the method like the example above, no need to change their meaning.
  """

  response = model.generate_content(prompt)
  sleep(30)

  prompt_round2 = f""""
  Based on the following summary's methodology, please rephrase it and don't use markdown or list. \n\n\n
  Summary:
  {response.text}
  """

  response_final = model.generate_content(prompt_round2)

  with open('output_new.csv', 'a+', newline='') as f2:
      writer = csv.writer(f2)
      writer.writerow([data['abstract'][i], response_final.text])
      f2.flush()

  num+=1
  sleep(30)

## Using the pre-trained model

In [2]:
"""Module to generate OpenELM output given a model and an input prompt."""
import os
import logging
import time
import argparse
from typing import Optional, Union
import torch

from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments, BertForQuestionAnswering, BertTokenizer

from google.colab import userdata


# The following function is revised from https://huggingface.co/apple/OpenELM/blob/main/generate_openelm.py
def generate(
    prompt: str,
    model: Union[str, AutoModelForCausalLM],
    hf_access_token: str = None,
    tokenizer: Union[str, AutoTokenizer] = 'meta-llama/Llama-2-7b-hf',
    device: Optional[str] = None,
    max_length: int = 1024,
    assistant_model: Optional[Union[str, AutoModelForCausalLM]] = None,
    generate_kwargs: Optional[dict] = None,
) -> str:
    """ Generates output given a prompt.
    Args:
        prompt: The string prompt.
        model: The LLM Model. If a string is passed, it should be the path to
            the hf converted checkpoint.
        hf_access_token: Hugging face access token.
        tokenizer: Tokenizer instance. If model is set as a string path,
            the tokenizer will be loaded from the checkpoint.
        device: String representation of device to run the model on. If None
            and cuda available it would be set to cuda:0 else cpu.
        max_length: Maximum length of tokens, input prompt + generated tokens.
        assistant_model: If set, this model will be used for
            speculative generation. If a string is passed, it should be the
            path to the hf converted checkpoint.
        generate_kwargs: Extra kwargs passed to the hf generate function.
    Returns:
        output_text: output generated as a string.
        generation_time: generation time in seconds.
    Raises:
        ValueError: If device is set to CUDA but no CUDA device is detected.
        ValueError: If tokenizer is not set.
        ValueError: If hf_access_token is not specified.
    """
    if not device:
        if torch.cuda.is_available() and torch.cuda.device_count():
            device = "cuda:0"
            logging.warning(
                'inference device is not set, using cuda:0, %s',
                torch.cuda.get_device_name(0)
            )
        else:
            device = 'cpu'
            logging.warning(
                (
                    'No CUDA device detected, using cpu, '
                    'expect slower speeds.'
                )
            )

    if 'cuda' in device and not torch.cuda.is_available():
        raise ValueError('CUDA device requested but no CUDA device detected.')

    if not tokenizer:
        raise ValueError('Tokenizer is not set in the generate function.')

    if not hf_access_token:
        raise ValueError((
            'Hugging face access token needs to be specified. '
            'Please refer to https://huggingface.co/docs/hub/security-tokens'
            ' to obtain one.'
            )
        )

    if isinstance(model, str):
        checkpoint_path = model
        model = AutoModelForCausalLM.from_pretrained(
            checkpoint_path,
            trust_remote_code=True
        )
    model.to(device).eval()
    if isinstance(tokenizer, str):
        tokenizer = AutoTokenizer.from_pretrained(
            tokenizer,
            token=hf_access_token,
        )

    # Speculative mode
    draft_model = None
    if assistant_model:
        draft_model = assistant_model
        if isinstance(assistant_model, str):
            draft_model = AutoModelForCausalLM.from_pretrained(
                assistant_model,
                trust_remote_code=True
            )
        draft_model.to(device).eval()

    # Prepare the prompt
    tokenized_prompt = tokenizer(prompt)
    tokenized_prompt = torch.tensor(
        tokenized_prompt['input_ids'],
        device=device
    )

    tokenized_prompt = tokenized_prompt.unsqueeze(0)


    # Generate
    stime = time.time()
    output_ids = model.generate(
        tokenized_prompt,
        max_length=max_length,
        pad_token_id=0,
        assistant_model=draft_model,
        **(generate_kwargs if generate_kwargs else {}),
    )
    generation_time = time.time() - stime

    output_text = tokenizer.decode(
        output_ids[0][tokenized_prompt.shape[1]:].tolist(),
        skip_special_tokens=True
    )

    return output_text, generation_time

## Implement your main function here
The input `abstract` is a `str` that forms an abstract of a research paper.
Your function will be invoked for returning the **sentence(s)** from the `abstract` that show the **research methodology**.

In [4]:
def extract_sentence(abstract: str) -> str:
    # # 0.4044943820224719
    # prompt = "From the following abstract, extract the sentences that shows the methods of the research. Only the sentences from the abstract, no other information.\n\n\n```%s``` \n Don't predict line breaks and other information" % abstract

    # # 0.4122137404580153
    # prompt = "From the following abstract, extract the sentences that shows the methods of the research. Only the sentences from the abstract, no other information.\n\n\n```%s``` \n Don't predict line breaks and other information" % abstract

    # # 0.5668449197860963
    # prompt = "From the abstract provided, extract all sentences that discuss the primary results or findings of the study. Ensure that only information from the abstract is included:```%s``` \nDon't predict line breaks and other information" % abstract

    promptList = ["From the following abstract, extract the sentences that shows the methods of the research. Only the sentences from the abstract, no other information.\n\n\n```%s``` Don't predict line breaks" % abstract,
            "From the following abstract, extract the sentences that shows the methods of the research. Only the sentences from the abstract, no other information.\n\n\n```%s``` \n Don't predict line breaks" % abstract,
            "From the abstract provided, extract all sentences that discuss the primary results or findings of the study. Ensure that only information from the abstract is included:```%s``` \nDon't predict line breaks and other information" % abstract]


    output_textList = []

    for index, prompt in enumerate(promptList):
      print(f"==============={index}====================")
      print(prompt)
      output_text, genertaion_time = generate(
          prompt=prompt,
          # model="apple/OpenELM-450M-Instruct",
          model="apple/OpenELM-1_1B-Instruct",
          hf_access_token=userdata.get('HF_TOKEN')
      )
      output_textList.append(output_text)
      print("================finish=================")

    return output_textList

Your function is expected to be used as follows.

In [None]:
abstract = """The reliability of self-labeled data is an important issue when the data are regarded as ground-truth for training and testing learning-based models.
This paper addresses the issue of false-alarm hashtags in the self-labeled data for irony detection.
We analyze the ambiguity of hashtag usages and propose a novel neural network-based model, which incorporates linguistic information from different aspects, to disambiguate the usage of three hashtags that are widely used to collect the training data for irony detection.
Furthermore, we apply our model to prune the self-labeled training data.
Experimental results show that the irony detection model trained on the less but cleaner training instances outperforms the models trained on all data."""

predicted_list = extract_sentence(abstract)
print(predicted_list)

From the following abstract, extract the sentences that shows the methods of the research. Only the sentences from the abstract, no other information.


```The reliability of self-labeled data is an important issue when the data are regarded as ground-truth for training and testing learning-based models.
This paper addresses the issue of false-alarm hashtags in the self-labeled data for irony detection.
We analyze the ambiguity of hashtag usages and propose a novel neural network-based model, which incorporates linguistic information from different aspects, to disambiguate the usage of three hashtags that are widely used to collect the training data for irony detection.
Furthermore, we apply our model to prune the self-labeled training data.
Experimental results show that the irony detection model trained on the less but cleaner training instances outperforms the models trained on all data.``` Don't predict line breaks




config.json:   0%|          | 0.00/1.60k [00:00<?, ?B/s]

configuration_openelm.py:   0%|          | 0.00/14.3k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/apple/OpenELM-1_1B-Instruct:
- configuration_openelm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_openelm.py:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/apple/OpenELM-1_1B-Instruct:
- modeling_openelm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors:   0%|          | 0.00/2.16G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

From the following abstract, extract the sentences that shows the methods of the research. Only the sentences from the abstract, no other information.


```The reliability of self-labeled data is an important issue when the data are regarded as ground-truth for training and testing learning-based models.
This paper addresses the issue of false-alarm hashtags in the self-labeled data for irony detection.
We analyze the ambiguity of hashtag usages and propose a novel neural network-based model, which incorporates linguistic information from different aspects, to disambiguate the usage of three hashtags that are widely used to collect the training data for irony detection.
Furthermore, we apply our model to prune the self-labeled training data.
Experimental results show that the irony detection model trained on the less but cleaner training instances outperforms the models trained on all data.``` 
 Don't predict line breaks




From the abstract provided, extract all sentences that discuss the primary results or findings of the study. Ensure that only information from the abstract is included:```The reliability of self-labeled data is an important issue when the data are regarded as ground-truth for training and testing learning-based models.
This paper addresses the issue of false-alarm hashtags in the self-labeled data for irony detection.
We analyze the ambiguity of hashtag usages and propose a novel neural network-based model, which incorporates linguistic information from different aspects, to disambiguate the usage of three hashtags that are widely used to collect the training data for irony detection.
Furthermore, we apply our model to prune the self-labeled training data.
Experimental results show that the irony detection model trained on the less but cleaner training instances outperforms the models trained on all data.``` 
Don't predict line breaks and other information


## Evaluation

We will evaluate your module with a close testset.
The sentence returned by your function will be compared with a golden reference.
The evaluation metric is `ROUGE-L`, which measures the overlap ratio between a predicted output and a reference. The details will be introduced in class.

In [5]:
!pip install rouge-score

Collecting rouge-score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge-score
  Building wheel for rouge-score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge-score: filename=rouge_score-0.1.2-py3-none-any.whl size=24933 sha256=2f5023bdfa99ee2a702f508dbfdaa8421188c2a9e92262af675fb502ad2ab799
  Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4
Successfully built rouge-score
Installing collected packages: rouge-score
Successfully installed rouge-score-0.1.2


In [None]:
from rouge_score import rouge_scorer
scorer = rouge_scorer.RougeScorer(['rougeL'])

In [None]:
reference = """We analyze the ambiguity of hashtag usages and propose a novel neural network-based model, which incorporates linguistic information from different aspects, to disambiguate the usage of three hashtags that are widely used to collect the training data for irony detection. Furthermore, we apply our model to prune the self-labeled training data."""

In [6]:
import string

def longest_common_subsequence_words(X, Y):
    X_words = X.split()
    Y_words = Y.split()
    m = len(X_words)
    n = len(Y_words)
    L = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X_words[i - 1] == Y_words[j - 1]:
                L[i][j] = L[i - 1][j - 1] + 1
            else:
                L[i][j] = max(L[i - 1][j], L[i][j - 1])

    lcs_index = L[m][n]
    lcs = [""] * lcs_index
    i, j = m, n
    while i > 0 and j > 0:
        if X_words[i - 1] == Y_words[j - 1]:
            lcs[lcs_index - 1] = X_words[i - 1]
            i -= 1
            j -= 1
            lcs_index -= 1
        elif L[i - 1][j] > L[i][j - 1]:
            i -= 1
        else:
            j -= 1

    return lcs

def build_reference_array(reference, predicted):

    translator = str.maketrans('', '', string.punctuation)
    reference = reference.translate(translator)
    translator = str.maketrans('', '', string.punctuation)
    predicted = predicted.translate(translator)

    lcs_words = longest_common_subsequence_words(reference, predicted)
    reference_words = reference.split()
    matched_indices = [0] * len(reference_words)

    lcs_word_iter = iter(lcs_words)
    current_lcs_word = next(lcs_word_iter, None)

    for idx, word in enumerate(reference_words):
        if word == current_lcs_word:
            matched_indices[idx] = 1
            current_lcs_word = next(lcs_word_iter, None)

    return matched_indices

def segment_and_filter(reference, matched_indices):
    reference_words = reference.split()
    segments = reference.split('.')
    segment_indices = [0]  # 段落開始的索引列表
    start_idx = 0

    # 計算每個段落開始的單詞索引
    for segment in segments:
        num_words = len(segment.split())
        start_idx += num_words
        segment_indices.append(start_idx)

    # 確定哪些段落應該被保留
    kept_segments = []
    for i in range(len(segment_indices) - 1):
        start = segment_indices[i]
        end = segment_indices[i + 1]
        if any(matched_indices[start:end]):
            kept_segments.append(segments[i].strip())

    return kept_segments

def segment_and_filter_with_average(reference, matched_indices):
    reference_words = reference.split()
    segments = reference.split('.')
    segment_indices = [0]
    start_idx = 0

    # 計算每個段落開始的單詞索引
    for segment in segments[:-1]:  # 排除最後一段空行
        num_words = len(segment.split())
        start_idx += num_words
        segment_indices.append(start_idx)


    # 確定哪些段落要被保留
    sentence_count=0
    each_sentence_score = []
    segment_length_list = []
    for i in range(len(segment_indices) - 1):
      start = segment_indices[i]
      end = segment_indices[i + 1]
      segment_match_count = sum(matched_indices[start:end])
      segment_length = end - start
      segment_length_list.append(segment_length)

      score = segment_match_count / segment_length
      each_sentence_score.append(score)
      print(score, reference_words[start:end])
      sentence_count+=1


    kept_segments = []
    limit = round(sum(each_sentence_score)/sentence_count,2)
    for i in range(len(segment_indices) - 1):
      if segment_length_list[i] > 0 and each_sentence_score[i] > limit:
        kept_segments.append(segments[i].strip())

    return kept_segments, limit


In [16]:
#以句點切斷，看每段中predict出現的次數，如果段中平均超過limit則留下那段
def after_process(abstract, predicted_list):
  # reference = abstract
  reference = abstract

  print(reference)

  reference_words = reference.split()
  matched_indices = [0] * len(reference_words)


  for predicted in predicted_list:
    reg = build_reference_array(reference, predicted)
    for i in range(len(reg)):
      matched_indices[i] += reg[i]


  kept_segments, limit = segment_and_filter_with_average(reference, matched_indices)
  print()
  print(f"Kept segments where average matches > {limit}:")

  last_predict = ""

  for segment in kept_segments:
      last_predict = last_predict + segment + ". "

  return last_predict

# print(last_predict)

In [17]:
print(scorer.score(reference, last_predict)['rougeL'].fmeasure)

NameError: name 'scorer' is not defined

## OpenELM-450M_finetune with TRL (Not Use)

In [None]:
# !pip install trl
# !pip install wandb==0.16.6
# !pip install bitsandbytes==0.43.1
# !pip install datasets



In [None]:
# from datasets import load_dataset, DatasetDict

# dataset = load_dataset('csv', data_files="output.csv", split="all")

# def create_conversation(sample):
#   return {
#     "messages": [
#       {"role": "user", "content": sample["Abstract"]},
#       {"role": "assistant", "content": sample["Metholodgy_LLM"]}
#     ]
#   }

# dataset = dataset.map(create_conversation, remove_columns=dataset.features)
# dataset = dataset.train_test_split(test_size=0.2)
# print(dataset)
# print(dataset['train'][0])

DatasetDict({
    train: Dataset({
        features: ['messages'],
        num_rows: 76
    })
    test: Dataset({
        features: ['messages'],
        num_rows: 19
    })
})
{'messages': [{'content': '  In September 2017, the IceCube Neutrino Observatory recorded a\nvery-high-energy neutrino in directional coincidence with a blazar in an\nunusually bright gamma-ray state, TXS0506+056. Blazars are prominent photon\nsources in the universe because they harbor a relativistic jet whose radiation\nis strongly collimated and amplified. High-energy atomic nuclei known as cosmic\nrays can produce neutrinos; thus the recent detection may help identifying the\nsources of the diffuse neutrino flux and the energetic cosmic rays. Here we\nreport on a self-consistent analysis of the physical relation between the\nobserved neutrino and the blazar, in particular the time evolution and spectral\nbehavior of neutrino and photon emission. We demonstrate that a moderate\nenhancement in the number of c

In [None]:
# from transformers import AutoModelForCausalLM, AutoTokenizer
# import torch
# from transformers import TrainingArguments, set_seed, get_constant_schedule
# from trl import SFTTrainer, setup_chat_format, DataCollatorForCompletionOnlyLM
# from datasets import load_dataset
# import uuid, wandb

# set_seed(0)
# lr = 5e-5
# run_id = f"OpenELM-1_1B-Instruct_LR-{lr}_OA_{str(uuid.uuid4())}"

# model = AutoModelForCausalLM.from_pretrained(
#     "apple/OpenELM-1_1B-Instruct",
#     trust_remote_code=True,
#     device_map = None,
#     torch_dtype = torch.bfloat16,
#     )

# tokenizer = AutoTokenizer.from_pretrained(
#     "meta-llama/Llama-2-7b-hf",
#     use_fast=False)

# model, tokenizer = setup_chat_format(model, tokenizer)
# if tokenizer.pad_token in [None, tokenizer.eos_token]:
#       tokenizer.pad_token = tokenizer.unk_token



# training_arguments = TrainingArguments(
#       output_dir = "./result",
#       evaluation_strategy = "steps",
#       label_names = ["labels"],
#       per_device_train_batch_size = 8,
#       gradient_accumulation_steps = 2,
#       save_steps = 250,
#       eval_steps = 250,
#       logging_steps = 1,
#       learning_rate = lr,
#       num_train_epochs = 1,
#       lr_scheduler_type = "constant",
#       optim = 'paged_adamw_8bit',
#       bf16 = True,
#       gradient_checkpointing = True,
#       group_by_length = True,
#   )

# trainer = SFTTrainer(
#       model = model,
#       tokenizer = tokenizer,
#       train_dataset = dataset["train"],
#       eval_dataset = dataset['test'],
#       data_collator = DataCollatorForCompletionOnlyLM(
#           instruction_template = "<|im_start|>user",
#           response_template = "<|im_start|>assistant",
#           tokenizer = tokenizer,
#           mlm = False),
#       max_seq_length = 2048,
#       dataset_kwargs = dict(add_special_tokens = False),
#       args = training_arguments,
#   )

# wandb.init(
#     project = "OpenELM",
#     name = run_id,
# ).log_code(include_fn=lambda path: path.endswith(".py") or path.endswith(".ipynb"))


# trainer.train()


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
train/epoch,▁▂▂▃▃▄▄▅▅▅▆▇▇▇██
train/global_step,▁▁▂▃▃▃▄▅▅▅▆▇▇▇██
train/grad_norm,█▂▂▁▁▁▁▁▁▁▁▁▁▁▁
train/learning_rate,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train/loss,█▂▂▁▁▁▁▁▁▁▁▁▁▁▁

0,1
total_flos,899530418380800.0
train/epoch,3.0
train/global_step,15.0
train/grad_norm,3.01562
train/learning_rate,5e-05
train/loss,0.8524
train_loss,3.51228
train_runtime,570.3702
train_samples_per_second,0.4
train_steps_per_second,0.026




Step,Training Loss,Validation Loss


TrainOutput(global_step=5, training_loss=8.169578742980956, metrics={'train_runtime': 191.0216, 'train_samples_per_second': 0.398, 'train_steps_per_second': 0.026, 'total_flos': 299364697018368.0, 'train_loss': 8.169578742980956, 'epoch': 1.0})

In [None]:
# model_path = "model_name"

# model = AutoModelForCausalLM.from_pretrained(
#     model_path,
#     trust_remote_code=True,
#     device_map=None
# )

OSError: /content/result/runs/Jun12_13-17-47_01fef0d31863 does not appear to have a file named config.json. Checkout 'https://huggingface.co//content/result/runs/Jun12_13-17-47_01fef0d31863/tree/None' for available files.

In [None]:
# def extract_sentence(abstract: str) -> str:
#     # # 0.4044943820224719
#     # prompt = "From the following abstract, extract the sentences that shows the methods of the research. Only the sentences from the abstract, no other information.\n\n\n```%s``` \n Don't predict line breaks and other information" % abstract

#     # # 0.4122137404580153
#     # prompt = "From the following abstract, extract the sentences that shows the methods of the research. Only the sentences from the abstract, no other information.\n\n\n```%s``` \n Don't predict line breaks and other information" % abstract

#     # # 0.5668449197860963
#     # prompt = "From the abstract provided, extract all sentences that discuss the primary results or findings of the study. Ensure that only information from the abstract is included:```%s``` \nDon't predict line breaks and other information" % abstract

#     promptList = ["From the following abstract, extract the sentences that shows the methods of the research. Only the sentences from the abstract, no other information.\n\n\n```%s``` Don't predict line breaks" % abstract,
#             "From the following abstract, extract the sentences that shows the methods of the research. Only the sentences from the abstract, no other information.\n\n\n```%s``` \n Don't predict line breaks" % abstract,
#             "From the abstract provided, extract all sentences that discuss the primary results or findings of the study. Ensure that only information from the abstract is included:```%s``` \nDon't predict line breaks and other information" % abstract]


#     output_textList = []

#     for index, prompt in enumerate(promptList):
#       print(f"==============={index}====================")
#       print(prompt)
#       output_text, genertaion_time = generate(
#           prompt=prompt,
#           # model="apple/OpenELM-1_1B-Instruct",
#           model=trainer.model,
#           hf_access_token=userdata.get('HF_TOKEN')
#       )
#       output_textList.append(output_text)
#       print("================finish=================")

#     return output_textList

## Evaluate

In [None]:
def evaluate(foo):
    import urllib.request
    test = "https://www.cs.nccu.edu.tw/~hhhuang/courses/nlp2024/test2024.in"
    gold = "https://www.cs.nccu.edu.tw/~hhhuang/courses/nlp2024/test2024.gold"

    from rouge_score import rouge_scorer
    scorer = rouge_scorer.RougeScorer(['rougeL'])

    total = 0
    cnt = 0
    with urllib.request.urlopen(test) as testin, \
         urllib.request.urlopen(gold) as gold:
        for input, ref in zip(testin, gold):
            input = input.decode("utf-8")
            print(f"input = {input}")
            ref = ref.decode("utf-8")
            print(f"ref = {ref}")
            output = foo(input)
            print(output)
            output = after_process(input, output)
            score = scorer.score(ref, output)['rougeL'].fmeasure
            cnt += 1
            total += score
            print("Test case %d: %g" % (cnt, score))
    print("Overall: %g" % (total / cnt))
    return total / cnt

# As your working function is `extract_sentence`, so do evaluation with the following statement
evaluate(extract_sentence)

input = This article introduces a named entity matching model that makes use of both semantic and phonetic evidence. The matching of semantic and phonetic information is captured by a unified framework via a bipartite graph model. By considering various technical challenges of the problem, including order insensitivity and partial matching, this approach is less rigid than existing approaches and highly robust. One major component is a phonetic matching model which exploits similarity at the phoneme level. Two learning algorithms for learning the similarity information of basic phonemic matching units based on training examples are investigated. By applying the proposed named entity matching model, a mining system is developed for discovering new named entity translations from daily Web news. The system is able to discover new name translations that cannot be found in the existing bilingual dictionary.

ref = The matching of semantic and phonetic information is captured by a unified fr



From the following abstract, extract the sentences that shows the methods of the research. Only the sentences from the abstract, no other information.


```This article introduces a named entity matching model that makes use of both semantic and phonetic evidence. The matching of semantic and phonetic information is captured by a unified framework via a bipartite graph model. By considering various technical challenges of the problem, including order insensitivity and partial matching, this approach is less rigid than existing approaches and highly robust. One major component is a phonetic matching model which exploits similarity at the phoneme level. Two learning algorithms for learning the similarity information of basic phonemic matching units based on training examples are investigated. By applying the proposed named entity matching model, a mining system is developed for discovering new named entity translations from daily Web news. The system is able to discover new name translat



From the abstract provided, extract all sentences that discuss the primary results or findings of the study. Ensure that only information from the abstract is included:```This article introduces a named entity matching model that makes use of both semantic and phonetic evidence. The matching of semantic and phonetic information is captured by a unified framework via a bipartite graph model. By considering various technical challenges of the problem, including order insensitivity and partial matching, this approach is less rigid than existing approaches and highly robust. One major component is a phonetic matching model which exploits similarity at the phoneme level. Two learning algorithms for learning the similarity information of basic phonemic matching units based on training examples are investigated. By applying the proposed named entity matching model, a mining system is developed for discovering new named entity translations from daily Web news. The system is able to discover ne



['or word boundaries.\n\n1. Phonetic matching model:\n   a. Unified framework: Captures both semantic and phonetic information.\n   b. Technical challenges: Order insensitivity and partial matching.\n   c. Less rigid than existing approaches: Highly robust.\n   d. Phoneme-level similarity information: Based on training examples.\n   e. Two learning algorithms: Supported by two matching models:\n\n2. Named entity matching model:\n   a. Captures semantic information: Relationship between words and concepts.\n   b. Technical challenges: Order insensitivity, ambiguity, and partial matching.\n   c. Unified with phoneme matching: Less sensitive to order and partial matching.\n   d. Supported by a bipartite graph model: Captures semantic and phonetic information.\n   e. Two matching methods: Partial matching and semantic similarity.\n   f. Mining system: Discover new name translations from daily Web news.\n\n\n\n```\n\n\n\n', "; extract the sentences that shows the methods.\n\n```The phonetic



From the following abstract, extract the sentences that shows the methods of the research. Only the sentences from the abstract, no other information.


```We present a method for creating a comparable text corpus from two document collections in different languages. The collections can be very different in origin. In this study, we build a comparable corpus from articles by a Swedish news agency and a U.S. newspaper. The keys with best resolution power were extracted from the documents of one collection, the source collection, by using the relative average term frequency (RATF) value. The keys were translated into the language of the other collection, the target collection, with a dictionary-based query translation program. The translated queries were run against the target collection and an alignment pair was made if the retrieved documents matched given date and similarity score criteria. The resulting comparable collection was used as a similarity thesaurus to translate queries alo



From the abstract provided, extract all sentences that discuss the primary results or findings of the study. Ensure that only information from the abstract is included:```We present a method for creating a comparable text corpus from two document collections in different languages. The collections can be very different in origin. In this study, we build a comparable corpus from articles by a Swedish news agency and a U.S. newspaper. The keys with best resolution power were extracted from the documents of one collection, the source collection, by using the relative average term frequency (RATF) value. The keys were translated into the language of the other collection, the target collection, with a dictionary-based query translation program. The translated queries were run against the target collection and an alignment pair was made if the retrieved documents matched given date and similarity score criteria. The resulting comparable collection was used as a similarity thesaurus to transl



['\n\n```\n1: Key: date\n2: Method: Query-Translation-Alignment\n3: Abstract:\n\n2: Text:\n\n  2.1: Key: Translation-Aligning-Query-Program\n  2.2: Method: Dictionary-Based-Query-Translation\n  2.3: Abstract:\n\n  2.3.1: RATF: Relative Average Term Frequency\n  2.3.2: Target Collection: U.S. Newspaper\n  2.3.3: Source Collection: Swedish News Agency Article Database\n  2.3.4: Date Matching Criteria: Newspaper articles published within the last 12 months\n  2.3.5: Translation Program: TermFreq\n  2.3.6: Translation Algorithm: Maximum Aligned Terms (MAT)\n  2.3.7: Translation Queries: TermFreq-Matched Query Program\n  2.3.8: Alignment Pair: Target Collection and Source Collection\n  2.3.9: Alignment Score: Term Frequency Alignment Quality (TF-AQ)\n  2.3.10: Alignment Precision: Term Frequency Alignment Precision (TF-AP)\n\n  2.4: Findings:\n     2.4.1: Comparable Text Corpus: 95% similarity\n     2.4.2: Translation Queries: Translation-Aligned Queries Outperform Non-Aligned Queries\n    



From the following abstract, extract the sentences that shows the methods of the research. Only the sentences from the abstract, no other information.


```Web search engines typically provide search results without considering user interests or context. We propose a personalized search approach that can easily extend a conventional search engine on the client side. Our mapping framework automatically maps a set of known user interests onto a group of categories in the Open Directory Project (ODP) and takes advantage of manually edited data available in ODP for training text classifiers that correspond to, and therefore categorize and personalize search results according to user interests. In two sets of controlled experiments, we compare our personalized categorization system (PCAT) with a list interface system (LIST) that mimics a typical search engine and with a nonpersonalized categorization system (CAT). In both experiments, we analyze system performances on the basis of the type 



From the abstract provided, extract all sentences that discuss the primary results or findings of the study. Ensure that only information from the abstract is included:```Web search engines typically provide search results without considering user interests or context. We propose a personalized search approach that can easily extend a conventional search engine on the client side. Our mapping framework automatically maps a set of known user interests onto a group of categories in the Open Directory Project (ODP) and takes advantage of manually edited data available in ODP for training text classifiers that correspond to, and therefore categorize and personalize search results according to user interests. In two sets of controlled experiments, we compare our personalized categorization system (PCAT) with a list interface system (LIST) that mimics a typical search engine and with a nonpersonalized categorization system (CAT). In both experiments, we analyze system performances on the bas



['\n<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<', ":\n\n```\n1: Automatically map user interests onto categories in the Open Directory Project (ODP)\n2: Use manually edited data available in ODP for training text classifiers that correspond to, and therefore categorize and personalize search results according to



From the following abstract, extract the sentences that shows the methods of the research. Only the sentences from the abstract, no other information.


```The aim of this article is to produce an alternative view of the adaptive hypermedia (AH) domain from a contextually-aware open hypermedia (OH) perspective. We believe that a wide range of AH techniques can be supported with a small number of OH structures, which can be combined together to create more complex effects, possibly simplifying the development of new AH systems. In this work we reexamine Brusilovsky's taxonomy of AH techniques from a structural OH perspective. We also show that it is possible to identify and model common structures across the taxonomy of adaptive techniques. An agent-based adaptive hypermedia system called HA^3L is presented, which uses these OH structures to provide a straightforward implementation of a variety of adaptive hypermedia techniques. This enables us to reflect on the structural equivalence o



From the abstract provided, extract all sentences that discuss the primary results or findings of the study. Ensure that only information from the abstract is included:```The aim of this article is to produce an alternative view of the adaptive hypermedia (AH) domain from a contextually-aware open hypermedia (OH) perspective. We believe that a wide range of AH techniques can be supported with a small number of OH structures, which can be combined together to create more complex effects, possibly simplifying the development of new AH systems. In this work we reexamine Brusilovsky's taxonomy of AH techniques from a structural OH perspective. We also show that it is possible to identify and model common structures across the taxonomy of adaptive techniques. An agent-based adaptive hypermedia system called HA^3L is presented, which uses these OH structures to provide a straightforward implementation of a variety of adaptive hypermedia techniques. This enables us to reflect on the structura



["or new sentences.\n\n\nThe abstract mentions the following methods:\n\n1. Brusilovsky's adaptive hypermedia (AH) taxonomy:\n   - Adaptive techniques:\n     - Adaptive navigation\n     - Adaptive presentation\n     - Adaptive content\n     - Adaptive interaction\n     - Adaptive learning\n     - Adaptive presentation of information\n     - Adaptive hypermedia\n     - Adaptive hypermedia techniques\n   - Structural equivalences:\n     - Context-sensitive navigation\n     - Context-sensitive presentation\n     - Context-sensitive content\n     - Context-sensitive interaction\n     - Context-sensitive hypermedia\n     - Context-sensitive adaptation\n   - OH approach:\n     - Context-sensitive ontology (CSO)\n     - Context-sensitive object-relational mapping (OSORM)\n     - Context-sensitive object-oriented mapping (OSOML)\n     - Context-sensitive object-relational mapping (OSOM)\n     - Context-sensitive object-oriented object-relational mapping (OSOOM)\n     - Context-sensitive object



From the following abstract, extract the sentences that shows the methods of the research. Only the sentences from the abstract, no other information.


```The rapid advancement of Internet technologies enables more and more educational institutes, companies, and government agencies to provide services, namely online services, through web portals. With hundreds of online services provided through a web portal, it is critical to design web portals, namely service portals, through which online services can be easily accessed by their consumers. This article addresses this critical issue from the perspective of service selection, that is, how to select a small number of service-links (i.e., hyperlinks pointing to online services) to be featured in the homepage of a service portal such that users can be directed to find the online services they seek most effectively. We propose a mathematically formulated metric to measure the effectiveness of the selected service-links in directing users 



From the abstract provided, extract all sentences that discuss the primary results or findings of the study. Ensure that only information from the abstract is included:```The rapid advancement of Internet technologies enables more and more educational institutes, companies, and government agencies to provide services, namely online services, through web portals. With hundreds of online services provided through a web portal, it is critical to design web portals, namely service portals, through which online services can be easily accessed by their consumers. This article addresses this critical issue from the perspective of service selection, that is, how to select a small number of service-links (i.e., hyperlinks pointing to online services) to be featured in the homepage of a service portal such that users can be directed to find the online services they seek most effectively. We propose a mathematically formulated metric to measure the effectiveness of the selected service-links in d



["\n\n```The purpose of this study is to develop a method to design adaptive websites that continuously optimize user experience by selecting the most effective service-links (i.e., hyperlinks pointing to online services) to feature in the homepage of a service portal.\n\n1.1 Background and motivation:\nThe rapid advancement of Internet technologies has led to the rapid growth of educational institutes, companies, and government agencies providing services through web portals. However, with hundreds of online services provided through a web portal, it is critical to design web portals, namely service portals, that are effective in guiding users to find the desired online services efficiently and effectively.\n\n1.2 Research problem statement:\nTo design adaptive websites that continuously optimize user experience, we consider the problem of selecting a small number of service-links (i.e., hyperlinks pointing to online services) to be featured in the homepage of a service portal such th



From the following abstract, extract the sentences that shows the methods of the research. Only the sentences from the abstract, no other information.


```Stemmers attempt to reduce a word to its stem or root form and are used widely in information retrieval tasks to increase the recall rate. Most popular stemmers encode a large number of language-specific rules built over a length of time. Such stemmers with comprehensive rules are available only for a few languages. In the absence of extensive linguistic resources for certain languages, statistical language processing tools have been successfully used to improve the performance of IR systems. In this article, we describe a clustering-based approach to discover equivalence classes of root words and their morphological variants. A set of string distance measures are defined, and the lexicon for a given text collection is clustered using the distance measures to identify these equivalence classes. The proposed approach is compared with

