**Question 1**

In [1]:
# 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

"""
Inference API is :
Send a request to the model’s API URL
Provide your input data (e.g., prompt, question, or context)
Receive the model’s response/output as JSON

Whatever you do manually in Postman — like choosing: post method, content type, authentication, body is written here!

Inference Client and Inference API are different.
Inference Client is a class where we communicate to LLM
"""

'\nInference API is :\nSend a request to the model’s API URL\nProvide your input data (e.g., prompt, question, or context)\nReceive the model’s response/output as JSON\n\nWhatever you do manually in Postman — like choosing: post method, content type, authentication, body is written here!\n\nInference Client and Inference API are different.\nInference Client is a class where we communicate to LLM\n'

In [2]:
import requests
# A Python library to send HTTP requests. It's used here to interact with the Hugging Face API.
# requests.post(), headers = {}, headers("Content-type"), json = {...}

import textwrap
# used to format the output text

from getpass import getpass
# used to securely input sensitive info (API Token)

HF_TOKEN = getpass("Enter your token :")

API_URL = "https://api-inference.huggingface.co/models/deepset/roberta-base-squad2"
#  pretrained Question Answering model that extracts answers from a given context.

headers = {
    "Authorization": f"Bearer {HF_TOKEN}"
}
# HTTP headers used in the request.
# Authorization tells Hugging Face who you are using your Bearer Token.

def ask_question(question, context):
    # 2 prompts (question --> user's input, context --> passage given priorly)
    payload = {
        "inputs": {
            "question": question,
            "context": context
        }
    }
    # body of HTTP POST request
    # follows hf expected format, dictionary with "inputs" containing qstns and context
    # This is a Python dictionary used to define the input format required by the Hugging Face Inference API
    # This payload will be sent to the model so it can read the context and find the best answer to the question.

    try:
      response = requests.post(API_URL, headers=headers, json=payload)
      response.raise_for_status()
      result = response.json()
      # requests.post(...): Sends the payload to the API.
      # headers: Includes your authentication token. A dictionary containing metadata
      # json=payload: This sends your input data (question + context) in JSON format to the API.  usually is like {"answer": "Sam"}
      # raise_for_status(): Raises an error if the request failed (like 404, 403, etc).
      # result ==> Converts the raw HTTP response into a Python dictionary using .json().
      # When you call an API it sends back a JSON string. To work with that response in Python, we must parse it into a Python data structure (dict)

      if "answer" in result:
          return result["answer"]
          # If the response contains an "answer", it must return it.

      elif isinstance(result, dict) and "error" in result:
          return f"Model error: {result['error']}"
          # isinstance(result, dict) ==> Ensures result is a dictionary
          # This line checks if something went wrong in the API response and returns the error message.

      else:
          return "No answer found."

    except requests.exceptions.RequestException as e:
        return f"Request failed: {e}"
        # exceptions.RequestException ==> exception class, submodule inside requests


default_context = """
Generative AI refers to a category of artificial intelligence models that can create new content such as
text, images, audio, and video. These models learn patterns from existing data and use that knowledge to generate original outputs.
One of the most popular types of generative AI models is the Large Language Model (LLM), such as GPT, which can produce human-like text.
Generative AI has revolutionized industries including education, healthcare, marketing, and software development.
Tools like ChatGPT, GitHub Copilot, and DALL·E demonstrate its power in real-world applications. It is based on deep learning,
particularly transformer architectures, and uses techniques such as self-attention to understand and generate content. Despite its advantages,
Generative AI also raises ethical concerns around misinformation, bias, copyright issues, and job displacement. Researchers are actively
working to make these systems more safe, fair, and transparent. The future of generative AI is expected to involve more personalized assistants,
content creators, and AI copilots across domains.
"""
# This is the default passage from which the model tries to extract answers.

print("Welcome! Ask a question based on the provided context. Type 'exit' to quit.\n")


while True:
    user_question = input("You: ")
    if user_question.lower() == "exit":
      print("Goodbye!")
      break

    answer = ask_question(user_question, default_context)
    print("\nAssistant:", textwrap.fill(answer, width=80), "\n")

Enter your token :··········
Welcome! Ask a question based on the provided context. Type 'exit' to quit.

You: what is gen ai

Assistant: Request failed: 401 Client Error: Unauthorized for url: https://api-
inference.huggingface.co/models/deepset/roberta-base-squad2 

You: exit
Goodbye!


**Question 2**

In [3]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel
# loads the tokenizer, loads the model (brain that generates predictions)
# transormer => python library developed by hugging face, contains many pre-trained tools and models
# tokenizer is a tool that splits your text into tiny parts called tokens. usually words or sub words
# GPT-2 cannot directly read English text — it only understands token IDs (numbers).
# LM => model ends with a layer for predicting the next word.
# GPT2LMHeadModel => designed specifically for teext generation.

import torch
# imports PyTorch library to run models, For handling tensor computations (for faster model performance)

import warnings
# To avoid unnecessary warning messages

# Function to generate story using GPT-2
def generate_story(prompt, temperature=0.7):
    try:
        # Load GPT-2 tokenizer (for converting words to tokens and vice versa)
        tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

        # Load GPT-2 model (the actual pre-trained model)
        model = GPT2LMHeadModel.from_pretrained("gpt2")

        # Set padding token to avoid warning (GPT-2 has no pad token by default)
        tokenizer.pad_token = tokenizer.eos_token
        # (end of sentence token to avoid warnings during generation)

        # Convert input text prompt into input tokens (tensor format)
        input_ids = tokenizer.encode(prompt, return_tensors="pt")
        # 'pt' means PyTorch tensor
        # tokenizer.encode(): Turns text (e.g., "Once upon a time") into numbers the model understands.
        # return_tensors="pt": Returns the result as a PyTorch tensor (a fancy array for deep learning).

        # Generate the story using model.generate()
        output = model.generate(
            input_ids=input_ids,      # input prompt token
            max_length=200,           # Maximum number of tokens(words) to generate
            temperature=temperature,  # Controls creativity (higher = more creative)
            do_sample=True,           # Tells the model to use sampling, not greedy/boring output.
            top_k=50,                 # Only consider the top 50 most likely words at each step.
            top_p=0.95,               # Nucleus sampling: selects from top 95% probability mass
            repetition_penalty=1.2,   # Discourages repetition
            pad_token_id=tokenizer.eos_token_id  # Prevents warning by specifying what padding to use.
        )

        # Decode the generated token IDs into human-readable text
        story = tokenizer.decode(output[0], skip_special_tokens=True)
        # skip_special_tokens=True tells it to remove special control tokens like ``.

        # Print the generated story
        print("\nGenerated Story (Temperature =", temperature, "):\n")
        print(story)

    except Exception as e:
        print("Something went wrong while generating the story:", e)

user_prompt = input("Enter a prompt for the story: ")
generate_story(user_prompt)



Enter a prompt for the story: Once upon a time


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.



Generated Story (Temperature = 0.7 ):

Once upon a time, it will be called "Piercing," and if you want to get out of the game with your feet wet or under pressure.
You may find that many people have never been taught about how this can actually work so there is no point in trying until they are older (and those who know better than I do) just because most games don't come bundled up as tightly together like some other modern video-game systems tend toward doing otherwise.[/quote]

: The reason for using an internal plug on any PS4 controller instead would not only help prevent its from being plugged into another PC by accident but also make playing even more enjoyable when paired with Nintendo's GamePad console,[1][2], which has builtin buttons such heretofore unobtainable through normal use;[3]. This seems to me quite odd since we all knew what was going around though! My experience at E9 came down somewhat sharply after my initial impressions were made earlier during our
