Using the Hugging Face transforms library with PyTorch as the backend and other core libraries.

In [1]:
%pip install torch torchvision torchaudio
%pip install transformers accelerate bitsandbytes
%pip install tf-keras

Note: you may need to restart the kernel to use updated packages.




Setting environment variable to avoid Keras 3/Transforms conflict

In [2]:
import os
os.environ['TF_USE_LEGACY_KERAS'] = '1'
print("Environment variable set for Keras compatibility.")

Environment variable set for Keras compatibility.


Imports

The Auto class automatically detects the specific architecture of the Mistral model and loads the correct corresponding PyTorch (or TensorFlow) model class.

AutoTokenizer is the translator. It handles the essential process of converting human-readable text into numerical data (tokens/IDs) that the model can understand, and converting the model's numerical output back into readable text. The tokernizer breaks data down into numbers and assigns a unique ID to each piece. It then stiches those numberical IDs back together to form the final output.

AutoTokenizer automatically detects the specific tokenizer associated with the Mistral model (which might use techniques like Byte-Pair Encoding or SentencePiece) and loads the correct version.

Time is a standard Python library used to handle time-related tasks.

The pipeline is a high-level wrapper that is often more stable in Jupyter environments than custom widget-heavy setups.

In [3]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline
import time




Definings model and configuring for quantization

This cell sets up the device, quantization, and loads the 7-billion-parameter Mistral model.

In [4]:
# --- REVISED Cell 2: Simplified Model Loading (No direct model object) ---
MODEL_ID = "mistralai/Mistral-7B-Instruct-v0.2"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

print(f"Loading {MODEL_ID} components directly onto {DEVICE}.")

try:
    # Load Tokenizer ONLY
    tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
    
    # We will NOT load the 'model' object here, 
    # as the pipeline will handle that in the next step for better stability.

    print("\nTokenizer loaded successfully!")
except Exception as e:
    print(f"\nFATAL ERROR during tokenizer loading: {e}")

Loading mistralai/Mistral-7B-Instruct-v0.2 components directly onto cpu.

Tokenizer loaded successfully!


#Loading the Model

In [5]:
# --- NEW Cell 2: Simplified Model Loading ---
MODEL_ID = "mistralai/Mistral-7B-Instruct-v0.2"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# NO MORE bnb_config definition here!

print(f"Loading {MODEL_ID} directly onto {DEVICE}. This may take time.")

try:
    # Load Tokenizer
    tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

    # Load Model - NOTE: NO quantization_config or device_map="auto"
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_ID,
    ).to(DEVICE) # <-- Explicitly move the final model to the target device

    print("\nModel and Tokenizer loaded successfully!")
except Exception as e:
    print(f"\nFATAL ERROR during model loading: {e}")

Loading mistralai/Mistral-7B-Instruct-v0.2 directly onto cpu. This may take time.


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]


Model and Tokenizer loaded successfully!


This cell defines the prompt and the single, corrected generation function.

In [6]:
# --- REVISED Cell 3 & 4: Define Prompt, Pipeline, and Run Experiment ---

# Define the Prompt
CREATIVE_PROMPT = "Write a short, suspenseful story about an antique clock collector who finds a cryptic message hidden inside a newly acquired timepiece from the 1800s. The message must hint at a long-lost treasure."

# --- Initialize the Pipeline (Handles loading the model and moving it to the device) ---
print("Initializing text generation pipeline...")

pipe = pipeline(
    "text-generation",
    model=MODEL_ID,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16 if DEVICE == "cuda" else None, 
    device=0 if DEVICE == "cuda" else -1 # 0 for GPU, -1 for CPU
)

def generate_story_pipe(temperature: float, max_length: int = 80):
    """Generates text using the Hugging Face pipeline."""
    
    # Format the prompt using the chat template
    messages = [{"role": "user", "content": CREATIVE_PROMPT}]
    prompt_formatted = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    
    start_time = time.time()

    # Generate the output sequence using the pipeline
    output = pipe(
        prompt_formatted,
        max_new_tokens=max_length,
        do_sample=True,
        temperature=temperature,
        pad_token_id=pipe.tokenizer.eos_token_id,
        return_full_text=False # Only return the generated part
    )
    
    generation_time = time.time() - start_time
    output_text = output[0]['generated_text']
    
    return output_text, generation_time, temperature

# --- STEP 4: Parameter Experimentation ---
EXPERIMENT_RESULTS = []
TEMPERATURE_VALUES = [0.2, 0.7, 1.0]
EXPERIMENT_MAX_LENGTH = 80 # Keep it short for speed!

print("\n--- Starting Generation Experiments (Varying Temperature) ---")

for temp in TEMPERATURE_VALUES:
    print(f"\n[Run {len(EXPERIMENT_RESULTS) + 1}/3] Generating story with Temperature = {temp}...")
    
    # Call the new function
    story_text, gen_time, current_temp = generate_story_pipe(
        temperature=temp, 
        max_length=EXPERIMENT_MAX_LENGTH
    )
    
    EXPERIMENT_RESULTS.append({
        "temperature": current_temp,
        "time": f"{gen_time:.2f} seconds",
        "story": story_text.strip()
    })

# --- Display Results ---
print("\n--- Summary of Generated Outputs ---")
for i, result in enumerate(EXPERIMENT_RESULTS):
    print(f"\n### Test Case {i+1}: Temperature = {result['temperature']} ({result['time']})")
    print("--------------------------------------------------")
    print(result['story'])
    print("--------------------------------------------------")

Initializing text generation pipeline...


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Device set to use cpu



--- Starting Generation Experiments (Varying Temperature) ---

[Run 1/3] Generating story with Temperature = 0.2...

[Run 2/3] Generating story with Temperature = 0.7...

[Run 3/3] Generating story with Temperature = 1.0...

--- Summary of Generated Outputs ---

### Test Case 1: Temperature = 0.2 (206.17 seconds)
--------------------------------------------------
In the heart of the quaint, cobblestoned town of Briarwood, nestled between the labyrinthine lanes and the hallowed halls of history, resided a man of peculiar passions and unquenchable curiosity. His name was Edgar Montrose, a renowned antique clock collector, whose home was a veritable museum of time
--------------------------------------------------

### Test Case 2: Temperature = 0.7 (110.53 seconds)
--------------------------------------------------
In the heart of the bustling city of London, nestled amidst the labyrinthine network of cobblestone streets and quaint, ivy-covered houses, lies an unassuming antique shop, k

Executing the experiments and pritings results for analysis

Parameter expermintation

The goal is to determine how the Temperature (a sampling parameter) influences the model's output quality. Temperature controls the randomness of the token selection:

Low Temperature (0.2): High probability, less random, more predictable (safe, coherent).

High Temperature (1.0): Low probability, highly random, more creative (unpredictable, potentially rambling).

compare the stories you generated for Coherence, Creativity, and Adherence to the Prompt (CREATIVE_PROMPT).

0.2	Highly structured, predictable, and formally written prose. Slowest generation.	

Best for formal summaries, factual answers, or generating text where accuracy and coherence are paramount.


0.7	Balanced creativity and quality. Fastest generation for high quality.	

Best for creative writing, ideation, and general-purpose answers where you want some novelty but need to stay on-topic.


1.0	Highly novel, poetic language, and less predictable narrative choices. Fast generation.	

Best for brainstorming, generating dialogue, or forcing the model to break away from common phrases and provide maximum variety.