In [1]:
import torch
import pandas as pd
from datasets import Dataset
from transformers import AutoModelForCausalLM, AutoTokenizer

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
trained_model = "./llama-student-phase1"
tokenizer = AutoTokenizer.from_pretrained(trained_model)
model = AutoModelForCausalLM.from_pretrained(trained_model)
model.eval()

The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.


LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 2048)
    (layers): ModuleList(
      (0-15): 16 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): lora.Linear(
            (base_layer): Linear(in_features=2048, out_features=2048, bias=False)
            (lora_dropout): ModuleDict(
              (default): Dropout(p=0.05, inplace=False)
            )
            (lora_A): ModuleDict(
              (default): Linear(in_features=2048, out_features=8, bias=False)
            )
            (lora_B): ModuleDict(
              (default): Linear(in_features=8, out_features=2048, bias=False)
            )
            (lora_embedding_A): ParameterDict()
            (lora_embedding_B): ParameterDict()
            (lora_magnitude_vector): ModuleDict()
          )
          (k_proj): Linear(in_features=2048, out_features=512, bias=False)
          (v_proj): lora.Linear(
            (base_layer): Linear(in_features=2048, out_features=512, bi

In [3]:
# ====== Load dataset ======
def load_partition(path: str) -> Dataset:
    df = pd.read_csv(path)
    return Dataset.from_pandas(df)

dataset = load_partition("./merged_dataset.csv")

# Select the first 10 rows only
subset_df = dataset.to_pandas().head(10)

In [7]:
# Perform inference step here
def generate_output(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
    with torch.no_grad():
        output = model.generate(**inputs, max_new_tokens=50)

    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    return generated_text

predictions = []

for index, row in subset_df.iterrows():
    text_input = row["string"]
    predicted_label = generate_output(text_input)
    predictions.append(predicted_label)
    print("classification: ", predicted_label)

# Save to DataFrame
subset_df["predicted_classification"] = predictions
subset_df.to_csv("first_10_classification_results.csv", index=False)
print("Inference for first 10 data points completed.")
# Save results
# dataset = dataset.add_column("generated_reasoning", predictions)
# output_csv_path = "./llama-student-phase2.csv"
# dataset.to_pandas().to_csv(output_csv_path, index=False)

# print(f"Inference completed. Results saved to {output_csv_path}")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


classification:  However, how frataxin interacts with the Fe-S cluster biosynthesis components remains unclear as direct one-to-one interactions with each component were reported (IscS [12,22], IscU/Isu1 [6,11,16] or ISD11/Isd11 [14,15]). The potential role of frataxin in the regulation of the Fe-S cluster biosynthesis pathway is also unclear, and it is essential to understand how frataxin interacts with the components of the pathway to fully appreciate its function in the regulation of iron metabolism


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


classification:  In the study by Hickey et al. (2012), spikes were sampled from the field at the point of physiological
robinson et al.: genomic regions influencing root traits in barley 11 of 13
maturity, dried, grain threshed by hand, and stored at −20C to preserve grain dormancy before germination testing. The grain was then dried to a moisture content of 20%, ground into a fine powder, and a sample of 100 grains was taken from each plot.
The grain was then dried to a moisture content of 20%, ground into a fine powder


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


classification:  The drug also reduces catecholamine secretion, thereby reducing stress and leading to a modest (10-20%) reduction in heart rate and blood pressure, which may be particularly beneficial in patients with cardiovascular disease.(7) Unlike midazolam, dexmedetomidine does not affect the ventilatory response to carbon dioxide. However, it may cause a mild increase in heart rate and blood pressure, and may cause sedation. (8) Dexmedetomidine is used in the treatment of severe pain, especially in patients with a high risk of postoperative complications,


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


classification:  By clustering with lowly aggressive close kin (King 1989a,b; Viblanc et al. 2010; Arnaud, Dobson & Murie 2012), breeding females may decrease the time/energy cost of maintaining territorial boundaries (Festa-Bianchet & Boag 1982; Murie & Harris 1988), which could ultimately lead to increases in net energy income (TA) or higher allocations in somatic or reproductive functions. However, the impact of this strategy on the reproductive success of the species (e.g., breeding success, reproductive success) is not well understood.

In this study, we examined the effect of clustering with lowly aggressive close kin on the reproductive success of


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


classification:  Ophthalmic symptoms are rare manifestations of the intracranial arachnoid cyst, and include unilateral exophthalmos, visual field abnormality, decreased visual acuity and isolated palsies of the third, fourth and sixth cranial nerves [1–5]. The most common symptoms are related to the third cranial nerve, which is responsible for the lateral rectus muscle, leading to the exophthalmos and the third cranial nerve palsy. The fourth and sixth cranial nerves are also affected,


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


classification:  Recent studies identified Wee1 as a potential molecular target in cancer cells and the selective small molecule Wee1-inhibitor MK-1775 demonstrated promising results in cancer cells with enhanced levels of Wee1 (96-98). However, the mechanism of Wee1 inhibition in cancer cells is not fully understood, and further research is required to understand the potential therapeutic applications of Wee1 inhibitors.

## Step 1: Identify the target of the study
The study focuses on


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


classification:  These problems combine to make early diagnosis essential and immediate treatment a necessity, even for the youngest patients [17, 18]. However, the diagnosis of early-stage lung cancer in children often relies on the combination of clinical and radiological features, which may not always be easy to distinguish.
Lung cancer in children is a rare disease, and its early detection is challenging. However


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


classification:  Also, results demonstrated that the molecular weight and G/M ratio were important factors in controlling the antioxidant properties of sodium alginate (Şen 2011).


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


classification:  Recently, the light-induced method, which is based on the changes of surface wettability of certain materials [12], has been developed and provides amore convenient approach for cell harvesting [13]. The light-induced method has been used for the extraction of various compounds from cell cultures, including proteins, lipids, and nucleic acids [14, 15].
However, the current methods for extracting compounds from cell cultures are often based on the extraction
classification:  Currently, with advances in radiotherapeutic, chemotherapeutic, and surgical techniques, limb-salvage surgery has become an accepted treatment [2–9]. However, limb-salvage surgery is not without complications, and the choice of the appropriate surgical technique depends on the specific clinical situation.
Limb-salvage surgery is often used for patients who have lost a limb due to trauma, tumor
Inference for first 10 data points completed.


In [None]:
for i, prediction in enumerate(predictions):
    print(f"Index {i}: {prediction} \n")