<a href="https://colab.research.google.com/github/DamonJSmithAI/Course/blob/main/CyclingPChat.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install transformers




In [1]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

#Damon J. Smith GenAI Chat

# Choose a model from Hugging Face. GPT-2 is a small model suitable for demos.
#model_name = "gpt2"
#model_name = "EleutherAI/gpt-neo-1.3B"
# This variable stores the name (identifier) of the pretrained model you want to use.

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
# 'tokenizer' is an object that handles converting text (strings) to numerical tokens
# and back again. 'from_pretrained' downloads (or loads locally) the tokenizer data
# for the specified model name (gpt2 here).

model = AutoModelForCausalLM.from_pretrained(model_name)
# 'model' is the actual neural network (GPT-2 in this case) that generates text.
# 'AutoModelForCausalLM' is a class that loads a model specialized for text generation (causal language modeling).
# 'from_pretrained' fetches the model weights/config from Hugging Face's model hub or your cache if available.

# Move model to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
# We check if there's a CUDA-compatible GPU. If yes, 'device' = "cuda", otherwise 'device' = "cpu".

model.to(device)
# This line transfers the model's parameters (weights) to the chosen device (GPU or CPU).
# This step is necessary for faster computation on a GPU or if you want to keep it on CPU.

print("Model and tokenizer loaded successfully!")
# This simply prints a confirmation message indicating everything is set up.


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.35k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/5.31G [00:00<?, ?B/s]

Model and tokenizer loaded successfully!


In [2]:
def generate_response(prompt, max_length=100, temperature=0.7, top_p=0.9):
    """
    Generates text continuation from the given prompt using the loaded model.
    """
    # Encode the prompt into numerical tokens and move them to the correct device (CPU or GPU)
    input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)

    # Generate text using the model
    with torch.no_grad():  # Temporarily disables gradient calculation (faster inference, no backprop needed)
        output_ids = model.generate(
            input_ids,
            max_length=max_length,     # The maximum length (in tokens) of the generated text
            temperature=temperature,   # Controls how "creative" or random the generation is
            top_p=top_p,               # Top-p (nucleus) sampling: chooses from the smallest set of tokens with p cumulative probability
            do_sample=True,            # Enables sampling instead of greedy decoding
            pad_token_id=tokenizer.eos_token_id  # Uses the end-of-sequence token to pad, avoiding errors
        )

    # Convert the generated token IDs back into readable text
    response = tokenizer.decode(output_ids[0], skip_special_tokens=True)

    # Return the final generated string
    return response

# Test the function with a quick prompt
sample_prompt = "I am a cyclist looking to improve my climbing ability. Any advice?"
# Print the model's generated response to the prompt
print(generate_response(sample_prompt))


The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


I am a cyclist looking to improve my climbing ability. Any advice?

Thanks for all the advice so far. I am not a pro climber, so I'm not sure what I'm doing wrong. I've been doing a lot of training in the gym, but I'm not sure if that's going to help. I'm 6'1" and 175 lbs. I've been training in the gym for about a month, but I don't really have a good base of what


In [6]:
!pip install gradio

import gradio as gr
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

def generate_response(prompt):
    input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
    with torch.no_grad():
        output_ids = model.generate(
            input_ids,
            max_length=100,
            temperature=0.5,
            #temperature=0.5,
            #top_k=0.7,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    return tokenizer.decode(output_ids[0], skip_special_tokens=True)

def chatbot_interface(user_input):
    return generate_response(user_input)

# Create a Gradio interface
demo = gr.Interface(
    fn=chatbot_interface,
    inputs="text",
    outputs="text",
    title="DSWAGG's Performance Chatbot",
    description="Enter your cycling question below."
)

demo.launch()


Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://761a4f648c3bf0def3.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


