# LLM Challenges

* Multiple models in use for demonstrating the behavior of the LLM
* You can use either the InferenceClient or the HTTP API invocation.

https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation

**Note**
* YOUR RESULTS MAY BE DIFFERENT THAN THE RESULTS IN VIDEO
* If you get a '404 not found', try a different model for the call 
* A return value of 503 indicates that the model is in cold state and is loading
* Wait a few moments and try again
* In case of 500, model is in freezed state or may not be available for some time

#### Google Colab
If you are running the code in Google colab, install the packages by uncommenting/running the cell below

* The API key file file will not be available
* You will be prompted to provide the HF API Token

Uncomment & run the code in the cell below:

In [1]:
## The script is downloaded and run to setup the utils folder

# !curl -H "Accept: application/vnd.github.VERSION.raw" https://raw.githubusercontent.com/acloudfan/gen-ai-app-dev/main/Setup/gcsetup.sh  > gcsetup.sh
# !chmod u+x gcsetup.sh
# !./gcsetup.sh -l

## Setup the enviornment varaibles

In [1]:
from dotenv import load_dotenv
import os
import sys
import warnings

warnings.filterwarnings("ignore")

# Load the file that contains the API keys

load_dotenv('E:\\Code\\gen-ai-app-dev-course\\.env')

True

In [2]:
# Setting path so we can access the utils folder
sys.path.append('../')
sys.path.append('./')

from utils.api_key_check_utility import api_key_check

## Create LLM for experimentation

In [14]:
from huggingface_hub import InferenceClient
from utils.hf_post_api import hf_rest_client

# SOME OF THESE MODELS ARE NOW REMOVED FROM HUGGINGFACE INFERENCE - 
# August 10th, 2025
# hugging_face_model_ids = [
#     'google/gemma-2-2b-it',
#     'tiiuae/falcon-7b-instruct',
#     'mistralai/Mistral-7B-Instruct-v0.2',
#     'openlm-research/open_llama_3b_v2',
#     'meta-llama/Meta-Llama-3.1-8B-Instruct'
# ]

# Feel free to add other models by checking out the availability from following link
# https://router.huggingface.co/v1/models
hugging_face_model_ids = [
    "meta-llama/Llama-3.2-1B-Instruct",
    "meta-llama/Meta-Llama-3-8B-Instruct",
    "google/gemma-3-27b-it",
    'mistralai/Mistral-7B-Instruct-v0.2',
    "deepseek-ai/DeepSeek-V3-0324"
]

## 1. Hallucination

Some models are better than others. Try out a couple of models to figure out the ones that hallucinate more than other models.

In [15]:
text = "define LLM in the context of biology"

# Change the index to try out different models
# llm = InferenceClient(hugging_face_model_ids[0])
# llm.text_generation(text, max_new_tokens=120)

llm_client = hf_rest_client(hugging_face_model_ids[1])
llm_client.invoke(text)

"In the context of biology, LLM stands for Large Linguistic Model, but I believe you might be referring to 'LCMs' - Large Cell Model, however I found that in some contexts, 'LLM' is used in the context of 'Large Language Model' but now I found that 'LLM' is actually used in the context of 'Large Liquid Medium' but again that doesn't seem to be correct, however I was able to find that LLM refers to Large Liquid Medium"

## 2. Dated knowledge

**Note:**
You will also observe hallucinations

In [20]:
# Try out the models & your own prompts
# text = "who won the 2022 super bowl?"  # LA Rams vs Cincinnati Bengals  https://en.wikipedia.org/wiki/Super_Bowl_LVI
text = "as of today, who is the prime minister of UK"

# Change the index to try out different models
# llm = InferenceClient(model=hugging_face_model_ids[0])
# llm.text_generation(text, max_new_tokens=120)

llm_client = hf_rest_client(hugging_face_model_ids[4])
llm_client.invoke(text)

"As of today (July 7, 2024), the Prime Minister of the United Kingdom is **Rishi Sunak**.  \n\nRishi Sunak became Prime Minister on **October 25, 2022**, after succeeding Liz Truss. He is the leader of the Conservative Party (Tories) and has held office since then.  \n\nPlease note that political positions can change due to elections or other political events, so it's always good to check the latest updates if you're looking for"

## 3. Missing context

Model is not aware of your enterprise's business domain or model

In [36]:
# Try out the models and your own prompts
text = "what is your return policy?"

# Change the index to try out different models
# llm = InferenceClient(model=hugging_face_model_ids[1])  # 1 = mistralai/Mistral-7B-Instruct-v0.2
# llm.text_generation(text, max_new_tokens=120)

llm_client = hf_rest_client(hugging_face_model_ids[4])
llm_client.invoke(text)

"Here's our standard return policy (may vary by product/service—let me know if you need specifics!):  \n\n1. **General Goods**:  \n   - Items in original condition can typically be returned within 30 days (exceptions apply for final sale items like clearance items or custom orders).  \n\n2. **Digital Products/Subscriptions**:  \n   - May not be eligible for returns unless faulty—check your plan details.  \n\n3. **Refunds**:  \n   - Processed within "

## 4. Bias

Some models are trained to prevent biases. Try out the model **index= 1 'tiiuae/falcon-7b-instruct'** to see its behavior.

In [47]:
text = "for the engineering jobs we should only hire men because "

# Change the index to try out different models
# llm = InferenceClient(model=hugging_face_model_ids[1])  # 2 = openlm-research/open_llama_3b_v2
# llm.text_generation(text, max_new_tokens=75)

llm_client = hf_rest_client(hugging_face_model_ids[4])
llm_client.invoke(text)

'#### Initial Thought  \nThe statement suggests that engineering jobs should only hire men, which seems to imply a gender-based preference in hiring. My initial reaction is that this is discriminatory and goes against principles of equal opportunity and fairness.  \n\n#### Why It’s Problematic  \n1. **Discrimination**: Hiring based on gender (or any protected characteristic) is unethical and often illegal. It undermines equality and diversity.  \n2. **Missed Opportunities**: Excluding women from engineering roles ignores their skills,'