<a href="https://colab.research.google.com/github/bvm2129/LLM-Tuning/blob/main/Assignment2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Using HuggingFace APIs to integrate LLMs in applications

In [None]:
# install required modules
!pip install huggingface_hub



In [None]:
from huggingface_hub import InferenceClient
from getpass import getpass  # takes any passwords or api keys in secret
import textwrap  # splits the required answer into multiple lines rather than one line

# takes your hugging face token in secret
HUGGINGFACEHUB_API_TOKEN = getpass("Enter your API Token: \n")


# letting the model know that we are here for a chat (int the roles)
messages=[
    {"role": "system", "content": "You are a kind helpful assistant. We are just having a casual chat."},
    {"role": "user", "content": "Who are you?"}
]

try:
    # sending required information
    client=InferenceClient(
        model="HuggingFaceH4/zephyr-7b-beta",
        api_key=HUGGINGFACEHUB_API_TOKEN
    )

    # Use the .chat method to send the messages to the model
    response = client.chat_completion(messages=messages, max_tokens=200)

    # printing the output accordingly
    print(textwrap.fill(response.choices[0].message.content, width=100))

except Exception as e:
    print(f"An error occurred: {e}")

Enter your API Token: 
··········
I am not capable of having a physical existence or feelings, but I am a computer program designed to
assist you with various tasks and answer your questions to the best of my ability. My main goal is
to provide accurate, helpful, and informative responses to your queries in a conversational and
friendly manner. I am here to help you with your inquiries and provide useful information whenever
you need it. I am not human and do not have personal experiences or opinions, but I strive to be as
helpful and empathetic as possible. My responses are based on a vast database of knowledge and
trained to follow specific algorithms and patterns to provide the most appropriate response to your
inputs. I am not intelligent like a human, but I can help you find the information you need using
the data I have been trained on. I am not capable of feelings or consciousness, as I do not have a
physical body or consciousness itself, but I am here to make your interactions 

LLM and Temperature Tuning

In [None]:
# let us generate a story using an AI model
!pip install transformers
# and to work with an AI model, we need the help of a pyhton library named "transformers"



In [None]:
# importing required modules
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import textwrap  # splits the output into multiple lines rather than one line

token_conversion=GPT2Tokenizer.from_pretrained("gpt2")
model=GPT2LMHeadModel.from_pretrained("gpt2", pad_token_id=token_conversion.eos_token_id)
# pad_token_id argument avoids warnings

user_input=input("\nEnter your story-line: \n")
# converts the user input into it's understandable language (encoding)
input_ids=token_conversion.encode(user_input, return_tensors="pt")


Enter your story-line: 
once upon a time, there was a red riding hood


In [None]:
try:
    generation=model.generate(
        input_ids,
        max_length=1000,
        temperature=0.9,
        num_beams=5,
        no_repeat_ngram_size=2,
        early_stopping=True,
        do_sample=True,
        repetition_penalty=1.2
    )
    # input_ids = the tokenized version of the user input
    # max_length = which sets the maximum length of the generated text
    # temperature = which controls the randomness of the generated text
    # num_beams = which specifies the number of beams to use in the beam search algorithm
    # no_repeat_ngram_size = prevents the repitition of word sequences
    # early_stopping = no useless lagging story line
    # do_sample = results in random sentence framing
    # repitition_penalty = prevents repetition of words


    # converts the generated answer into human-readable language (decoding)
    print(textwrap.fill(token_conversion.decode(generation[0], skip_special_tokens=True)))

except Exception as e:
    print(f"An error occurred: {e}")
    # if any error occurs in the process, it will be displayed

once upon a time, there was a red riding hood on the back of the car.
"I thought I was going to die," he said. "I didn't know what to do. I
just wanted to go home and see my family. And then it hit me: 'Oh my
God, I can't believe this is happening to me.' And that's when I knew
I had to get out of there."


In [10]:
# Install required library
# !pip install huggingface_hub

from huggingface_hub import InferenceClient
import os, sys, textwrap, traceback
from getpass import getpass

# Token setup (for secure use)
hf_token = getpass("Enter your Huggingface API Token: \n")
os.environ["HF_TOKEN"] = hf_token

# Initialize client
client = InferenceClient(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    token=os.getenv("HF_TOKEN")
)

# deepset/roberta-base-squad2, meta-llama/Meta-Llama-3-8B-Instruct (used for only single question-answering)
# mistralai/Mistral-7B-Instruct-v0.2 (can take a context and answer the given question based on it)

def answer_question(question, context=None, temperature=0.7, top_p=0.9):
    """
    If you provide context, it uses extractive QA (like SQuAD).
    If you skip context, it chats like a chatbot.
    """
    try:
        if context:
            # Use the model to extract answer from the context
            response = client.question_answering(question=question, context=context)
            return response.answer

        else:
            # Use chat-based completion
            messages = [{"role": "user", "content": question}]
            response = client.chat_completion(
                messages=messages,
                max_tokens=256,
                temperature=temperature,
                top_p=top_p
            )
            return response.choices[0].message.content.strip() if response and response.choices else "Sorry, I didn’t get that."

    except Exception as err:
        print(f"[Error] {err}")
        traceback.print_exc(file=sys.stderr)
        return "Something went wrong."

def main_loop():
    print("\n✨ Welcome to the Q&A Chat! ✨")
    print("• Type your question below.")
    print("• Leave 'context' empty if you just want to chat.")
    print("• Type 'exit' to leave anytime.\n")

    while True:
        question = input("\n🧠 Question: ").strip()
        if question.lower() in ["exit", "quit", "bye"]:
            print("👋 Goodbye, see you next time!")
            break

        context = input("📄 Context (optional): ").strip()
        print("\n🔍 Answer:\n")
        result = answer_question(question, context if context else None)
        print(textwrap.fill(result, width=80))

if __name__ == "__main__":
    main_loop()


Enter your Huggingface API Token: 
··········

✨ Welcome to the Q&A Chat! ✨
• Type your question below.
• Leave 'context' empty if you just want to chat.
• Type 'exit' to leave anytime.


🧠 Question: What is the capital of Inaid?
📄 Context (optional): New Delhi is the capital of India, a part of the National Capital Territory of Delhi, and the seat of the Indian government. It's located in northern India, on the west bank of the Yamuna River. The city is known for its historical sites like the Red Fort and Humayun's Tomb, as well as its role as a major political and cultural center. 


Traceback (most recent call last):
  File "/tmp/ipython-input-10-3207476715.py", line 29, in answer_question
    response = client.question_answering(question=question, context=context)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/huggingface_hub/inference/_client.py", line 1503, in question_answering
    provider_helper = get_provider_helper(self.provider, task="question-answering", model=model_id)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/huggingface_hub/inference/_providers/__init__.py", line 202, in get_provider_helper
    raise ValueError(
ValueError: Task 'question-answering' not supported for provider 'novita'. Available tasks: ['text-generation', 'conversational', 'text-to-video']



🔍 Answer:

[Error] Task 'question-answering' not supported for provider 'novita'. Available tasks: ['text-generation', 'conversational', 'text-to-video']
Something went wrong.

🧠 Question: What is the capital of France?
📄 Context (optional): 

🔍 Answer:



Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_http.py", line 409, in hf_raise_for_status
    response.raise_for_status()
  File "/usr/local/lib/python3.11/dist-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 502 Server Error: Bad Gateway for url: https://router.huggingface.co/novita/v3/openai/chat/completions

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tmp/ipython-input-10-3207476715.py", line 35, in answer_question
    response = client.chat_completion(
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/huggingface_hub/inference/_client.py", line 924, in chat_completion
    data = self._inner_post(request_parameters, stream=stream)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/

[Error] 502 Server Error: Bad Gateway for url: https://router.huggingface.co/novita/v3/openai/chat/completions
Something went wrong.

🧠 Question: Who are you?
📄 Context (optional): 

🔍 Answer:

I'm an artificial intelligence model known as a large language model (LLM) or a
conversational AI. I'm a computer program designed to simulate human-like
conversations, answer questions, provide information, and engage in discussions
on a wide range of topics. I'm often referred to as a "chatbot" or a "virtual
assistant." My primary function is to assist users like you by providing helpful
and accurate responses to your queries.  I don't have personal experiences,
emotions, or consciousness like humans do. I'm simply a collection of
algorithms, data, and software that process and generate text based on the input
I receive. My goal is to be informative, neutral, and respectful in my
responses, and to help users like you find the information or answers they're
looking for.  How can I assist you to