Here is the full model for Google Colab. It classifies the dog breed on an image, then based off the answer it passes that dog breed into the language model to output a naturally spoken fact about said breed. The image classification model should run on any CPU, but the language model requires a CUDA GPU. The code below also records the softmax values of the model's top 3 and the answer the user was expecting in a CSV file.

<font color='red'>First you need to install the dependencies for the language model. You can do so by running the cell bellow. If you do not have a CUDA GPU, you can skip this cell. Given that you are most likely doing this in Google Colab, you can access a GPU by clicking on the triangle pointing downward under "Comment" in the top left corner, then going to "Change Runtime Type" and choosing the "T4 GPU".

In [1]:
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install xformers trl peft accelerate triton bitsandbytes #keep an eye out on the xformers version. Usually you want one version before the latest; causes errors often

You will need LoRA adapters and a pytorch dictionary file (.pth) for image calssification. The cell below downloads them from a Google Drive. The last four lines download a couple of images to test around as well.

<font color='red'>Run the cell to get the adapters and pytorch dictionary. If you have your own images of a dog instead feel free to erase the last four lines here.

In [None]:
import gdown

#Download the folder containnng the LoRA adapters
url = "https://drive.google.com/drive/folders/1cf-XTMMZb42k5bs3AR3xMRs9V5koXZnQ"
gdown.download_folder(url, quiet=True, use_cookies=False)

#Download the .pth image model
url = 'https://drive.google.com/file/d/1ABhX5RLxh_-OSubYhwV7RhgqeeaHYOpl/view?usp=sharing'
output_path = 'DogImageModel.pth'
gdown.download(url, output_path, quiet=False,fuzzy=True)

#Download the Dog_List.txt
url = 'https://drive.google.com/file/d/13Va0eqByu_CiR1O3xXO-OZLfJjY8NwwM/view?usp=sharing'
output_path = 'Dog_List.txt'
gdown.download(url, output_path, quiet=False,fuzzy=True)

#Download some sample images to try the model on
gdown.download('https://drive.google.com/file/d/1PLTSwWyPoMwgxBU9gTwDoYSzokqAJdVF/view?usp=sharing', 'Dogpic1.jpg', quiet=False,fuzzy=True)
gdown.download('https://drive.google.com/file/d/1LDp6ru_AhUQd4pHecIBMqyiw2tQdKpfe/view?usp=sharing', 'Dogpic2.jpg', quiet=False,fuzzy=True)
gdown.download('https://drive.google.com/file/d/1uA2HJknMZyeR8X3vVvJgTq8pJAmCzaCp/view?usp=sharing', 'Dogpic3.jpg', quiet=False,fuzzy=True)
gdown.download('https://drive.google.com/file/d/1kY5cxn7bTtNm2waDn9hPxg6E5rC7UZYS/view?usp=sharing', 'Dogpic4.jpg', quiet=False,fuzzy=True)

Down below is the script to load and run the dog classification model. If a CUDA GPU model is available, it will also load the language model here. It is device agnostic so it will automatically detect CUDA.

<font color='red'>Focusing on the image_path variable, feel free  to use one of the images provided by the download above (Dogpic1.jpg - Dogpic.jpg4). You can also use your own photo of a dog from the selected dog breed list. If you choose to use your own photo, make sure to upload it and adjust the image_path to wherever your photo is saved in the Colab files (Google Colab ownly allows temporary uploads so it will be erased at the end of your runtime). You might need to change the extension to .jpg before uploading. After you have uploaded and adjusted the image_path variable appropriately, you can run the cell.

In [None]:
#Specify the path to your image here in the Google Colab notebook. You can copy the path by right clicking on the file you want to use. Usually an uploaded image is found in the /content folder
image_path = '/content/Dogpic1.jpg'

import torch
from PIL import Image
from torchvision.transforms import v2
import torchvision.models as models
import os

# Load the pre-trained model that was used in the training script. In this case it was ResNet18 model
model = models.resnet18()

# Modify the final fully connected layer to have 73 output classes, same as in the training script
num_ftrs = model.fc.in_features
model.fc = torch.nn.Linear(num_ftrs, 73)

# Directory to load the .pth file that was acreated by the training script
model.load_state_dict(torch.load('/content/DogImageModel.pth', map_location=torch.device('cpu')))

# Set the model to evaluation mode
model.eval()

# Automatically detect the available device (CPU or GPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

#Open a specified image
image = Image.open(image_path)

#Open the first file in a folder directory with the specified filetypes
"""image = Image.open(next((f for f in os.listdir('/content/') if f.endswith(('.jpg', '.jpeg', '.png', '.webp', 'bmp'))), None))"""

#Transforms the images to how they were tested for the model to read for inference. Keep Exactly the same as the transformation for the test and valid sets. No randomizing here!
transforms_test = v2.Compose([
    v2.Resize((224, 224), antialias=True),
    v2.CenterCrop((224, 224)),
    v2.ToImage(),
    v2.ToDtype(torch.float32, scale=True),
    v2.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

#Apply the transformation to your image
transformed_img = transforms_test(image)

# Add a batch dimension (1, 3, 224, 224)
transformed_img = transformed_img.unsqueeze(0).to(device)  # Move data to device

output = model(transformed_img)

with open('Dog_List.txt', 'r') as f:
    labels = [line.strip() for line in f.readlines()]

#I have the breed_nicknames dictionary set because some breeds arent recognized that much by the official breed, such as the ones below.
breed_nicknames = {
    'Xoloitzcuintli': ' (Mexican Hairless)',
    'Staffordshire-Bull-Terrier': ' (Pitbull)',
    'Pembroke-Welsh-Corgi': ' (Corgi)',}


# Apply softmax to get probabilities
output_softmax = torch.nn.functional.softmax(output, dim=1)

# Get the top 3 predictions by using pytorch to find the Top-3 K values
topk_values, topk_indices = torch.topk(output_softmax, 3)
topk_indices = topk_indices.tolist()[0]  # Convert tensor to list of integers
topk_labels = [labels[index] for index in topk_indices] # Use the indices to get the labels. This turns the highest three values in the tensor into the labels of the dog breeds

# Print all probabilities. This is useful if you would like to record all of the labels for more detailed data.
"""for i, prob in enumerate(output_softmax.tolist()[0]):
    print(f"{labels[i]}: {prob:.4f}")"""

print("Choose a dog breed:")

early_stop = False #Set an early stop condition. Used to prevent the language model from runing later on

#For data feedback, this is a function to save user data. It saves the probabilities of the Top-3, their respective labels, and the label specified by the user
def save_answer(answer, label=None):
    filename = "user_answers.csv"
    if not os.path.exists(filename):
        open(filename, 'w').close()  # Create file if it doesn't exist with the categories as the first row, then closes it
        with open(filename, 'w') as file:
            file.write("Probability 1, Probability 2, Probability 3, Label Rank 1, Label Rank 2, Label Rank 3, User Label, Top-3?\n") #The categories to be written at teh top if the file does not exist

    probabilities = ', '.join(map(str, topk_values.tolist())).strip('[]')
    labels = ', '.join(topk_labels)
    correct_label = label if label else answer
    top3 = 'Y' if int(answer) in range(1, 4) else 'N'  # Check if answer is in top 3

    with open(filename, 'a') as file:  # Append the user results
        file.write(f"{probabilities}, {labels}, {correct_label}, {top3}\n")

#While loop that repeatedly prompts user for dog breed choice until the user provides a valid response.
while True:
    for i, label in enumerate(topk_labels):
        # Check if the breed has a nickname, then print it with the nickname in parantheses
        if label in breed_nicknames:
            print(f"{i+1}. {label}{breed_nicknames[label]}")
        else:
            print(f"{i+1}. {label}")
    print("4. None of these.")
    try:
        choice = int(input("Enter the number of your chosen breed: ")) - 1
        if choice in [0, 1, 2]:
            dog_breed = topk_labels[choice]
            print(f"My dog is a {dog_breed.replace('-', ' ')} breed.")
            save_answer(choice + 1, dog_breed)
            break
        elif choice == 3:  # If the dog breed is not in the top-3, the following instead is ran
            dog_breed = input("Please enter your dog's breed (50 characters or less): ")
            if len(dog_breed) <= 50: #Allows the user to only input 50 characters
                early_stop = True #This blocks the language model from operating when a manual input is inserted. Set to False if you want the language model to work on manual input
                print("Thank you for letting us know! We'll work on improving our model for", dog_breed, "breeds.")
                save_answer(4, dog_breed)
                break
            else:
                print("Sorry, that's too long. Please keep it under 50 characters.")
        else:
            print("Invalid choice. Please enter 1, 2, 3, or 4.")
    except ValueError:
        print("Invalid input. Please enter a number.")

#Major Note: If you would like for the custom breed (ie. they choose option 4) to be used in the language model, set the early_stop variable that is INSIDE the loop to False.




#Load the LoRA adapters and set FastLanguageModel for inference (if a CUDA GPU is present)
#Wont load if the user answered number 4 and you DID NOT change early_stop to True in the while block
if not early_stop:
  if torch.cuda.is_available():
      from unsloth import FastLanguageModel


      max_seq_length = 2048
      dtype = None
      load_in_4bit = True

      model, tokenizer = FastLanguageModel.from_pretrained( #same parameters as it was trained on.
          model_name = "/content/Dog-LoRA", #Directory to the folder (not the file) where your model is saved. Any of the save methods from the Unsloth training should work
          max_seq_length = max_seq_length,
          dtype = dtype,
          load_in_4bit = load_in_4bit,
      )
      FastLanguageModel.for_inference(model)

      #Need to set the prompt again
      alpaca_prompt = """

      ### label:
      {}

      ### text:
      {}"""

      labels = tokenizer(
          [
              alpaca_prompt.format(
                  f"Please tell me something interesting about the {dog_breed} Dog",
                  "",
              )
          ], return_tensors = "pt").to("cuda")


  else:
      print("Language model output is only available for GPU hardware")

try:
    texts = model.generate(**labels, max_new_tokens = 128, use_cache = True)
    print(tokenizer.batch_decode(texts))
except Exception as e:
    pass  # Ignore the error



If you loaded the language model in the previouos cell, the next cell is for running the dog_breed variable through the model. It is in a seperate cell so you may run the output several times without reloading the output each time.

In [None]:
texts = model.generate(**labels, max_new_tokens = 128, use_cache = True)
tokenizer.batch_decode(texts)