In [1]:
!pip install gradio transformers accelerate peft --quiet


[notice] A new release of pip is available: 24.3.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
!pip install --upgrade transformers --quiet


[notice] A new release of pip is available: 24.3.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


# Academic Writing LoRa Model: 
<br>
<p>This Project creates an Academic Writing Assistant that generates text continuations for the user in an academic-style.</p>
<p>
The idea for this project started in code, CoPilot has been a massive success that has aided millions of people in coding. Providing real time assistance, as well as an auto-complete function for coding based on previous works of the user. This, significantly improving the experience of users, has not been applied to a wide range in natural language. </p>
<p>
While autocomplete has been around for a while, one area wherein text continuation has not been taken advantage of is in academic writing -- As Copilot has provided a large quality of life experience for those in coding, this project aims to do the same for academic writers. 
</p>

# Why Lora? 
<p>
Lora is a great fit for the initial task of training and fine tuning the model as the original task of the code(text continuation) can already be done on a surface level. 
</p>
<p>
The main adaptation of this code is to create a text continuation for academic writing -- That is, training a model for the specific task of writing for academic writings and scholarly papers. In the context of this code, this means that Lora is perfect for us as it allows us to fine tune an already proven and working base model (AutoModeforCausalM) and create our academic writing auto continuation without the exceedingly high computational costs of training an entire base model by ourselves.  </p>

In [3]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from peft import PeftModel
import gradio as gr

# Load LoRA model using the approach from model_evaluation_and_finetuning.ipynb
model_name = "./lora_academic_model"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_name)
base_model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
model = PeftModel.from_pretrained(base_model, model_name).to(device)
model.eval()

# Add a padding token if it doesn't exist
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token



config.json: 0.00B [00:00, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/526M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/119 [00:00<?, ?B/s]



### TESTING MODEL

In [4]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Ensure the fine-tuned model and tokenizer are loaded (from cell AlY-JVt2g7M6)
# If they are not in scope, you might need to run cell AlY-JVt2g7M6 first.

# Define a context for generation
# Using the same context as the stylistic control example
context = "Deep reinforcement learning has been widely adopted in robotic navigation."

# Tokenize the input context
inputs = tokenizer(context, return_tensors="pt").to(device)

# Generate continuation using the fine-tuned model
# You can adjust max_new_tokens, do_sample, temperature, top_p as needed
output_ids = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask, # Explicitly pass attention_mask
    max_new_tokens=100, # Generate up to 100 new tokens
    num_return_sequences=1,
    do_sample=True,      # Enable sampling for more diverse outputs
    temperature=0.7,     # Control creativity (lower for less, higher for more)
    top_p=0.9,           # Nucleus sampling
    pad_token_id=tokenizer.eos_token_id
)

# Decode the generated tokens, skipping the input context and special tokens
generated_text = tokenizer.decode(output_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)

print("Context:", context)
print("\nGenerated Continuation:", generated_text.strip())

Context: Deep reinforcement learning has been widely adopted in robotic navigation.

Generated Continuation: The goal of this paper is to provide a novel algorithm for learning a novel robotic navigation system. The algorithm is composed of a first-order gradient descent (GAD) algorithm with a learning rate, a second-order loss function, and a third-order loss function. The algorithm is implemented on an artificial neural network (ANN) based on a multi-layer perceptron (MLP) and a single-layer perceptron (SLP). The algorithm is evaluated by a simulation study and compared


### FUNCTION

In [5]:

def generate_continuation(context, max_new_tokens=100, temperature=0.7, top_p=0.9, print_result=True):
    """
    Generate academic text continuation using the loaded LoRA model.
    
    Args:
        context (str): Input prompt/context for generation
        max_new_tokens (int): Maximum number of new tokens to generate (default: 100)
        temperature (float): Sampling temperature - higher for more creativity (default: 0.7)
        top_p (float): Nucleus sampling parameter (default: 0.9)
        print_result (bool): Whether to print the input and output (default: True)
    
    Returns:
        str: Generated continuation text
    """
    # Tokenize the input context
    inputs = tokenizer(context, return_tensors="pt").to(device)
    
    # Generate continuation using the fine-tuned model
    output_ids = model.generate(
        input_ids=inputs.input_ids,
        attention_mask=inputs.attention_mask,
        max_new_tokens=max_new_tokens,
        num_return_sequences=1,
        do_sample=True,
        temperature=temperature,
        top_p=top_p,
        pad_token_id=tokenizer.eos_token_id
    )
    
    # Decode the generated tokens, skipping the input context and special tokens
    generated_text = tokenizer.decode(
        output_ids[0][inputs.input_ids.shape[1]:], 
        skip_special_tokens=True
    )
    
    if print_result:
        print("Context:", context)
        print("\nGenerated Continuation:", generated_text.strip())
        print("-" * 80)
    
    return generated_text.strip()


# Test the function with sample prompts
test_prompts = [
    "Deep reinforcement learning has been widely adopted in robotic navigation.",
    "The key findings of this research are",
    "In recent years, machine learning has demonstrated"
]

print("Testing LoRA model with sample prompts:\n")
for prompt in test_prompts:
    generate_continuation(prompt, max_new_tokens=80)
    print("\n")

Testing LoRA model with sample prompts:

Context: Deep reinforcement learning has been widely adopted in robotic navigation.

Generated Continuation: However, the time required to train the algorithm is relatively high, especially for deep learning, and thus the performance of the algorithm is degraded. The reason is that the network is designed to learn to perform the task, while the training process is relatively complex and therefore it is difficult to train the algorithm.
To address this, a recent method of applying a network to the training process has been proposed.
--------------------------------------------------------------------------------


Context: The key findings of this research are

Generated Continuation: : (i) The incidence of chronic obstructive pulmonary disease in Iran is higher than in other developing countries, (ii) The prevalence of chronic obstructive pulmonary disease in Iran is higher than in other developing countries, (iii) The prevalence of chronic obst

# Lora Model Eval
<p>
Now that we've built our model we can perform evaluation on it to test our hypothesis of this Lora Model being more successful. <br>
Same as the last evaluation we are going to test the model against the reference paragraph to see if the generated predictions are valid in the context of the references. <br>
The sentences given are written arbitrarily to further test the system. <br>

<p> "The system collects raw input data and converts it into a standardized format for processing. A central module then applies predefined rules to analyze the data and generate intermediate results. Finally, the output component compiles these results into a structured report that can be used by downstream applications." <br>
"The application monitors incoming signals and filters out any values that fall outside the expected range. It then applies a series of transformation steps to extract relevant features from the remaining data. These features are stored in a shared buffer, allowing other system components to access them efficiently." <br><br>
adding to these, we will be using a casual-tone paragraph to check if there has been any deviation from the usual model: <br>
"I checked the system this morning, and everything seemed to be running smoothly. There were a few slow moments, but nothing that looked serious or out of the ordinary. If anything changes, I‚Äôll take another look and make sure it‚Äôs all sorted." </p>



In [12]:
test_prompts = [
    "The system collects raw input data and converts it into a standardized format for processing.",
    "The application monitors incoming signals and filters out any values that fall outside the expected range.",
    "I checked the system this morning, and everything seemed to be running smoothly"
]

print("Testing LoRA model with sample prompts:\n")
for prompt in test_prompts:
    generate_continuation(prompt, max_new_tokens=80)
    print("\n")

Testing LoRA model with sample prompts:

Context: The system collects raw input data and converts it into a standardized format for processing.

Generated Continuation: In other words, the system is a computer that can run a variety of applications. These applications are typically applications for which the input data is processed in a variety of ways.

The system collects raw input data and converts it into a standardized format for processing.

For example, a system may collect raw input data from a user‚Äôs computer (e.g., an
--------------------------------------------------------------------------------


Context: The application monitors incoming signals and filters out any values that fall outside the expected range.

Generated Continuation: For example, if the signal is a random number between 0 and 1, then it is possible to check whether it is positive or negative. If it is positive, then it is possible to determine whether it is positive or negative.

As the signal is sent o

In [7]:
from evaluate import load as load_evaluate
import numpy as np
import nltk

# BLEU requires punkt tokenizer
nltk.download("punkt")
nltk.download('punkt_tab')

def evaluate_predictions(predictions, references):
    """
    Compare model predictions with ground truth using:
    - BERTScore
    - ROUGE (1,2,L)
    - BLEU
    """

    # Ensure matching lengths
    assert len(predictions) == len(references), "Prediction and reference length mismatch."

    # ============================
    # 1. BERTScore
    # ============================
    bertscore = load_evaluate("bertscore")
    bert_res = bertscore.compute(
        predictions=predictions,
        references=references,
        model_type="bert-base-uncased"
    )
    bert_precision = np.mean(bert_res["precision"])
    bert_recall = np.mean(bert_res["recall"])
    bert_f1 = np.mean(bert_res["f1"])

    # ============================
    # 2. ROUGE
    # ============================
    rouge = load_evaluate("rouge")
    rouge_res = rouge.compute(
        predictions=predictions,
        references=references,
    )

    # ============================
    # 3. BLEU (corpus)
    # ============================
    from nltk.translate.bleu_score import corpus_bleu

    # NLTK corpus BLEU expects tokenized inputs ‚ûô we will use word_tokenize
    tokenized_preds = [nltk.word_tokenize(p) for p in predictions]
    tokenized_refs = [[nltk.word_tokenize(r)] for r in references]

    bleu_score = corpus_bleu(tokenized_refs, tokenized_preds)

    # ============================
    # Final Output
    # ============================
    return {
        "bertscore_precision": bert_precision,
        "bertscore_recall": bert_recall,
        "bertscore_f1": bert_f1,
        "rouge1": rouge_res["rouge1"],
        "rouge2": rouge_res["rouge2"],
        "rougeL": rouge_res["rougeL"],
        "bleu": bleu_score,
    }

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Patrick\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt.zip.
[nltk_data] Downloading package punkt_tab to
[nltk_data]     C:\Users\Patrick\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt_tab.zip.


In [14]:
predictions = [
    "The system collects raw input data and converts it into a standardized format for processing. In other words, the system is a computer that can run a variety of applications. These applications are typically applications for which the input data is processed in a variety of ways.",
    "The application monitors incoming signals and filters out any values that fall outside the expected range. It then applies a series of transformation steps to extract relevant features from the remaining data. These features are stored in a shared buffer, allowing other system components to access them efficiently.",
    "I checked the system this morning, and everything seemed to be running smoothly. There were a few slow moments, but nothing that looked serious or out of the ordinary. If anything changes, I‚Äôll take another look and make sure it‚Äôs all sorted."
]

# Ground truth references
references = [
    "The system collects raw input data and converts it into a standardized format for processing. A central module then applies predefined rules to analyze the data and generate intermediate results. Finally, the output component compiles these results into a structured report that can be used by downstream applications.",
    "The application monitors incoming signals and filters out any values that fall outside the expected range. For example, if the signal is a random number between 0 and 1, then it is possible to check whether it is positive or negative. If it is positive, then it is possible to determine whether it is positive or negative. As the signal is sent over the Internet, it is possible to send a number of different data, such as a random number, to the processor",
    "I checked the system this morning, and everything seemed ot be running smoothly. It‚Äôs a good time to do a little more research on the topic of ‚Äúhow to do something that makes sense I‚Äôm a bit of a writer, so I want to know how to get some of the ‚Äústuff‚Äù I‚Äôve been using in my head. So far I‚Äôve been using the"
]

# Evaluate
results = evaluate_predictions(predictions, references)
print("Evaluation Results:")
print(f"BERTScore Precision: {results['bertscore_precision']:.4f}")
print(f"BERTScore Recall: {results['bertscore_recall']:.4f}")
print(f"BERTScore F1: {results['bertscore_f1']:.4f}")
print(f"ROUGE-1: {results['rouge1']:.4f}")
print(f"ROUGE-2: {results['rouge2']:.4f}")
print(f"ROUGE-L: {results['rougeL']:.4f}")
print(f"BLEU: {results['bleu']:.4f}")

Evaluation Results:
BERTScore Precision: 0.6731
BERTScore Recall: 0.6117
BERTScore F1: 0.6400
ROUGE-1: 0.4147
ROUGE-2: 0.2578
ROUGE-L: 0.3586
BLEU: 0.2055


# GRADIO

In [6]:
# Gradio UI
def generate_response(prompt, max_tokens=100, temperature=0.7, top_p=0.9):
    """
    Wrapper function for Gradio interface.
    Calls generate_continuation without printing.
    """
    return generate_continuation(
        prompt, 
        max_new_tokens=max_tokens,
        temperature=temperature,
        top_p=top_p,
        print_result=False
    )

# Create Gradio interface
interface = gr.Interface(
    fn=generate_response,
    inputs=[
        gr.Textbox(
            lines=3, 
            placeholder="Enter your academic prompt here...",
            label="Input Prompt"
        ),
        gr.Slider(
            minimum=20,
            maximum=200,
            value=100,
            step=10,
            label="Max Tokens",
            info="Maximum number of tokens to generate"
        ),
        gr.Slider(
            minimum=0.1,
            maximum=1.5,
            value=0.7,
            step=0.1,
            label="Temperature",
            info="Higher = more creative, Lower = more focused"
        ),
        gr.Slider(
            minimum=0.1,
            maximum=1.0,
            value=0.9,
            step=0.05,
            label="Top-p (Nucleus Sampling)",
            info="Controls diversity of output"
        )
    ],
    outputs=gr.Textbox(
        lines=5,
        label="Generated Continuation"
    ),
    title="üéì Academic Writing LoRA Model",
    description="Generate formal academic text continuations using a fine-tuned GPT-2 model with LoRA adapters.",
    examples=[
        ["Deep reinforcement learning has been widely adopted in robotic navigation.", 100, 0.7, 0.9],
        ["The key findings of this research are", 80, 0.7, 0.9],
        ["In recent years, machine learning has demonstrated", 120, 0.8, 0.9],
        ["The methodology employed in this study involves", 100, 0.6, 0.9],
        ["These results suggest that", 90, 0.7, 0.9]
    ],
    theme=gr.themes.Soft(),
    allow_flagging="never"
)

# Launch the interface
interface.launch(share=False, debug=True)



* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.


Keyboard interruption in main thread... closing server.




In [None]:
# To run Gradio, just execute the cell above. No need for a separate command.