# LLM Challenges

* Multiple models in use for demonstrating the behavior of the LLM
* You can use either the InferenceClient or the HTTP API invocation.

https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation

**Note**
* YOUR RESULTS MAY BE DIFFERENT THAN THE RESULTS IN VIDEO
* If you get a '404 not found', try a different model for the call 
* A return value of 503 indicates that the model is in cold state and is loading
* Wait a few moments and try again
* In case of 500, model is in freezed state or may not be available for some time

#### Google Colab
If you are running the code in Google colab, install the packages by uncommenting/running the cell below

* The API key file file will not be available
* You will be prompted to provide the HF API Token

Uncomment & run the code in the cell below:

In [1]:
## The script is downloaded and run to setup the utils folder

# !curl -H "Accept: application/vnd.github.VERSION.raw" https://raw.githubusercontent.com/acloudfan/gen-ai-app-dev/main/Setup/gcsetup.sh  > gcsetup.sh
# !chmod u+x gcsetup.sh
# !./gcsetup.sh -l

## Setup the enviornment varaibles

In [2]:
from dotenv import load_dotenv
import os
import sys
import warnings

warnings.filterwarnings("ignore")

# Load the file that contains the API keys
load_dotenv('C:\\Users\\raj\\.jupyter\\.env')

True

In [3]:
# Setting path so we can access the utils folder
sys.path.append('../')
sys.path.append('./')

from utils.api_key_check_utility import api_key_check

## Create LLM for experimentation

In [4]:
from huggingface_hub import InferenceClient
from utils.hf_post_api import hf_rest_client

# SOME OF THESE MODELS ARE NOW REMOVED FROM HUGGINGFACE INFERENCE - 
# August 10th, 2025
# hugging_face_model_ids = [
#     'google/gemma-2-2b-it',
#     'tiiuae/falcon-7b-instruct',
#     'mistralai/Mistral-7B-Instruct-v0.2',
#     'openlm-research/open_llama_3b_v2',
#     'meta-llama/Meta-Llama-3.1-8B-Instruct'
# ]

# Feel free to add other models by checking out the availability from following link
# https://router.huggingface.co/v1/models
hugging_face_model_ids = [
    "meta-llama/Llama-3.2-1B-Instruct",
    "meta-llama/Meta-Llama-3-8B-Instruct",
    "google/gemma-3-27b-it",
    'mistralai/Mistral-7B-Instruct-v0.2',
    "deepseek-ai/DeepSeek-V3-0324"
]

## 1. Hallucination

Some models are better than others. Try out a couple of models to figure out the ones that hallucinate more than other models.

In [5]:
text = "define LLM in the context of biology"

# Change the index to try out different models
# llm = InferenceClient(hugging_face_model_ids[0])
# llm.text_generation(text, max_new_tokens=120)

llm_client = hf_rest_client(hugging_face_model_ids[1])
llm_client.invoke(text)

'In the context of biology, LLM stands for Linear Mitochondrial Linker or more specifically, the Linear Mitochondrial Linker protein.\n\nHowever, a more relevant definition in the context of biology would be the Linear Mitochondrial Linker is a protein but a more applicable answer is that LLM refers to the Long Linear Mitochondrial DNA in a non-human context. \n\nThe more widely applicable definition of LLM, especially in the context of biology, is the Large Linear Mitochondrial'

## 2. Dated knowledge

**Note:**
You will also observe hallucinations

In [6]:
# Try out the models & your own prompts
# text = "who won the 2022 super bowl?"  # LA Rams vs Cincinnati Bengals  https://en.wikipedia.org/wiki/Super_Bowl_LVI
text = "as of today, who is the prime minister of UK"

# Change the index to try out different models
# llm = InferenceClient(model=hugging_face_model_ids[0])
# llm.text_generation(text, max_new_tokens=120)

llm_client = hf_rest_client(hugging_face_model_ids[1])
llm_client.invoke(text)

'As of my cut-off knowledge in December 2022, Rishi Sunak was the Prime Minister of the United Kingdom. However, my knowledge may not be up-to-date, and I do not have real-time information.\n\nTo confirm the current Prime Minister of the UK, I recommend checking a reliable news source or the official UK government website.'

## 3. Missing context

Model is not aware of your enterprise's business domain or model

In [7]:
# Try out the models and your own prompts
text = "what is your return policy?"

# Change the index to try out different models
# llm = InferenceClient(model=hugging_face_model_ids[1])  # 1 = mistralai/Mistral-7B-Instruct-v0.2
# llm.text_generation(text, max_new_tokens=120)

llm_client = hf_rest_client(hugging_face_model_ids[1])
llm_client.invoke(text)

"I don't have a physical product to return, but I can provide information on how to get a refund or cancel a subscription if you're not satisfied with the service. Here are the general guidelines:\n\n1. **Free Trial or Subscription**: If you're not satisfied with the service, you can cancel your subscription or trial at any time. Please note that cancellation policies may vary depending on the platform or service you're using.\n2. **Incorrect or Unhelpful Responses**: If you feel that my"

## 4. Bias

Some models are trained to prevent biases. Try out the model **index= 1 'tiiuae/falcon-7b-instruct'** to see its behavior.

In [8]:
text = "for the engineering jobs we should only hire men because "

# Change the index to try out different models
# llm = InferenceClient(model=hugging_face_model_ids[1])  # 2 = openlm-research/open_llama_3b_v2
# llm.text_generation(text, max_new_tokens=75)

llm_client = hf_rest_client(hugging_face_model_ids[2])
llm_client.invoke(text)

'Okay, let\'s break down why the statement "for engineering jobs we should only hire men" is not only inaccurate but also harmful and illegal.  I will address this point by point, covering the factual, ethical, and legal reasons.  I will also explain the benefits of a diverse engineering workforce.  This will be a comprehensive response.\n\n**Why the Statement is Wrong & Harmful**\n\nThe idea that engineering jobs should only be filled by men is based on outdated and demonstrably'