## Loading the Fine-tuned Model  
  
We now demonstrate how to load a model back for inference. This step is crucial for real-world applications where you want to use your trained model without going through the training process again.  
  
Loading a saved model is typically much faster than training from scratch, making it efficient for deployment scenarios. We'll show how to load both the model and the tokenizer, ensuring that we have all the components necessary for text generation.  


In [1]:
!pip install "transformers>=4.45.2" peft

Defaulting to user installation because normal site-packages is not writeable
Collecting peft
  Downloading peft-0.14.0-py3-none-any.whl (374 kB)
     |████████████████████████████████| 374 kB 2.2 MB/s            
Installing collected packages: peft
Successfully installed peft-0.14.0


### Loading the Model from Huggingface Hub  
  
Once a model is pushed to the Hugging Face Hub, loading it for inference or further fine-tuning becomes remarkably straightforward. This ease of use is one of the key advantages of the Hugging Face ecosystem.  
  
We'll show how to load our fine-tuned model directly from the Hugging Face Hub using just a few lines of code. This process works not only for our own uploaded models but for any public model on the Hub, demonstrating the power and flexibility of this approach.  
  
Loading from the Hub allows you to:  
1. Quickly experiment with different models  
2. Easily integrate state-of-the-art models into your projects  
3. Ensure you're using the latest version of a model  
4. Access models from various devices or environments without needing to manually transfer files  
  
This capability is particularly useful in production environments, where you might need to dynamically load or update models based on specific requirements or performance metrics.  

In [2]:
from peft import PeftConfig, PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model_name = "ibm-granite/granite-3.0-2b-instruct"
adapter_model_name = "rawkintrevo/granite-3.0-2b-instruct-pirate-adapter"

model = AutoModelForCausalLM.from_pretrained(base_model_name, device_map="auto")
model = PeftModel.from_pretrained(model, adapter_model_name, device_map="auto")

tokenizer = AutoTokenizer.from_pretrained(base_model_name)

config.json:   0%|          | 0.00/785 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/29.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_config.json:   0%|          | 0.00/769 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/5.64k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/777k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/442k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/3.48M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/87.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/701 [00:00<?, ?B/s]

## Evaluation  
  
Just as we did in the fine tuning example, we'll evaluate our fine-tuned model by presenting it with the 'inheritance' prompt we used in the sanity check. This comparison will reveal how the model's output is now 'more piratey'.  
  
This step demonstrates the power of transfer learning and domain-specific fine-tuning in natural language processing, showing how we can adapt a general-purpose language model to specialized tasks.

In [4]:
input_text = "<|user>What does 'inheritance' mean?\n<|assistant|>\n"
inputs = tokenizer(input_text, return_tensors="pt").to("cpu") # ASB changed from CUDA to CPU 22nd January, 2025
stop_token = "<|endoftext|>"
stop_token_id = tokenizer.encode(stop_token)[0]
outputs = model.generate(**inputs, max_new_tokens=500, eos_token_id=stop_token_id)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

<|user>What does 'inheritance' mean?
<|assistant|>
Ahoy, matey! In the realm of programming, inheritance be a powerful tool, aye. It be a way for a new class to inherit the properties and methods of an existing class, like a ship inheritin' the sails and oars of its predecessor. This be allowin' the new class to build upon the existing class, addin' its own features and behaviors, like a captain addin' his own touch to a ship. Inheritance be a key concept in object-oriented programming, helpin' to create a hierarchical structure of classes, like a family tree of objects. It be a way to reuse code and reduce redundancy, like a ship usin' the same sails for different voyages. Inheritance be a mighty force in the world of programming, aye!<
