# Step 1 - Installing the required dependencies
Now that we have successfully trained our model we want to generate text or infer from some given context such as a question, sentence or command. Before we can begin lets install the minimum required dependencies for this task. 


In [None]:
# In order to avoid future dependency issues we have frozen the versions. 
# This means you may have to alter these as time goes by and new releases
# are available. 
!pip install transformers==4.25.1
!pip install accelerate==0.15.0
!pip install gradio==3.16.2

# Step 2 - Accessing the model checkpoint
In our previous lesson we trained a model and saved to our google drive. Now we just need to mount the drive and set the location where we saved it for future steps. 

In [None]:
# Load the Drive helper and mount
from google.colab import drive

# This will prompt for authorization.
drive.mount('/content/drive/')

checkpoint = '/content/drive/MyDrive/models'

import os
from os import path

if path.exists(checkpoint) == False:
    print("Unable to find the model directory are you sure the path is correct")


# Step 3 - Loading the Model Checkpoint and Tokenizer
This part is pretty easy thanks to the hard work of the folks at huggingface. We just need to load these two into memory so that we can encode/decode inputs/outputs for our model and display the text generated from the model in human readable form. 

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch, gc

model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype="auto", 
                                             device_map="auto")

tokenizer = AutoTokenizer.from_pretrained(checkpoint, local_files_only=True)

# Step 4 - Creating a UI and Testing with Prompt Engineering

Now that we have our model loaded in memory we want to run some inferences. In this section we will go into detail on some prompt engineering. However, lets first put it in a nice gradio ui so we can run any number of inferences. We will also explore the various hyperparameters and how to tune them. 

In [None]:
import gradio as gr

title = "expert-system-gpt"

examples = [
    ["{Insert your custom prompts here}"],
    ["The Moon's orbit around Earth has"],
    ["The smooth Borealis basin in the Northern Hemisphere covers 40%"],
]

def generate_response(text: str):
    inputs = tokenizer.encode(text, return_tensors="pt").to("cuda")
    outputs = model.generate(inputs,
                             early_stopping=True,
                             max_new_tokens=250, 
                             temperature=0.0, 
                             repetition_penalty = 5.0,
                             top_k=5, 
                             top_p=0.95)

    return tokenizer.decode(outputs[0])

demo = gr.Interface(fn=generate_response,
    inputs=gr.Textbox(lines=5, max_lines=6, label="Input Text"),
    title=title,
    outputs="text",
    examples=examples,
)

if __name__ == "__main__":
    demo.launch()
