<a href="https://colab.research.google.com/github/abdul9870/abdul9870/blob/main/project%203_Langchain_Story_Generator_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

!/usr/bin/env python
coding: utf-8

# Day 3/4: LangChain - Story Generator & CLI Tool

## Introduction
Welcome to this session on LangChain! Today, we'll explore how to use LangChain to build a simple story generator and then package it as a Command Line Interface (CLI) tool. We will focus on using an open-source LLM suitable for environments like Google Colab with a T4 GPU.

## Learning Objectives
* Understand the basics of LangChain: PromptTemplates and LLMChains.
* Learn how to integrate an open-source LLM (e.g., a quantized Mistral model) with LangChain.
* Build a story generator using LangChain.
* Create a CLI tool using Python's `argparse` to interact with the story generator.
* Understand considerations for running LLMs on T4 GPUs (quantization).

## Part 1: Setup and Installations

In [None]:
!pip install -q langchain langchain-community langchain-huggingface transformers torch accelerate bitsandbytes sentencepiece


### Explanation:
* `langchain`: The core LangChain library.
* `transformers`: For loading models and tokenizers from Hugging Face.
* `torch`: The PyTorch library, essential for running most Hugging Face models.
* `accelerate`: Simplifies running PyTorch models on any infrastructure (CPU, GPU, multi-GPU).
* `bitsandbytes`: For 4-bit quantization, crucial for running larger models on limited VRAM like a T4 GPU.
* `sentencepiece`: Often required for tokenizers of models like Llama or Mistral.

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain_huggingface import HuggingFacePipeline
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
import argparse
import os

### Explanation:
* We import necessary modules from PyTorch, Transformers, and LangChain.
* `argparse` will be used later for building the CLI.
* `os` can be useful for environment variable settings if needed.

## Part 2: Loading the Language Model (LLM)

### Explanation:
We will use a quantized version of an open-source model like Mistral-7B. Quantization (e.g., to 4-bit using `bitsandbytes`) significantly reduces the model's memory footprint, making it feasible to run on a T4 GPU (which typically has 15-16GB VRAM).

In [None]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): Traceback (most recent call last):
object address  : 0x7b32353983a0
object refcount : 2
object type     : 0x9d5ea0
object type name: KeyboardInterrupt
object repr     : KeyboardInterrupt()
lost sys.stderr
^C


In [None]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): Traceback (most recent call last):
  File "/usr/local/bin/huggingface-cli", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/huggingface_hub/commands/huggingface_cli.py", line 57, in main
    service.run()
  File "

In [None]:
# Check for GPU availability
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Model ID - you can choose other quantized models suitable for T4
# e.g., "TheBloke/Mistral-7B-Instruct-v0.1-GPTQ" or a bitsandbytes compatible one
# For bitsandbytes, we usually load the base model and apply quantization during loading.
model_id = "mistralai/Mistral-7B-Instruct-v0.1" # We'll load this with 4-bit quantization

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Load model with 4-bit quantization
# This requires bitsandbytes and accelerate to be installed
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    load_in_4bit=True, # Enable 4-bit quantization
    torch_dtype=torch.float16, # Use float16 for faster inference and less memory
    device_map="auto" # Automatically distribute model layers across available devices (GPU/CPU)
)

# Create a Hugging Face pipeline
# Note: For instruction-tuned models, the task might be "text-generation" or specific to instructions.
# We might need to adjust max_new_tokens for story generation.
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512, # Adjust as needed for story length
    temperature=0.7, # Controls randomness
    top_p=0.95 # Nucleus sampling
)

# Create LangChain LLM wrapper
llm = HuggingFacePipeline(pipeline=pipe)

print("LLM and pipeline loaded successfully!")

Using device: cpu


OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1.
401 Client Error. (Request ID: Root=1-681ec2a5-199bd2891deec4f302b6ea98;be2cd901-3d54-4e16-ad08-270f9447fe40)

Cannot access gated repo for url https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1/resolve/main/config.json.
Access to model mistralai/Mistral-7B-Instruct-v0.1 is restricted. You must have access to it and be authenticated to access it. Please log in.

### Explanation:
* **Device Check:** Ensures we are using the GPU if available.
* **Model ID:** We're using `mistralai/Mistral-7B-Instruct-v0.1`. Other quantized models (like GPTQ versions from TheBloke) could also be used, but `load_in_4bit` with `bitsandbytes` is a common and effective approach for Hugging Face models.
* **Tokenizer:** Loads the tokenizer associated with the model.
* **Model Loading (`AutoModelForCausalLM.from_pretrained`):**
    * `load_in_4bit=True`: This is the key for 4-bit quantization via `bitsandbytes`.
    * `torch_dtype=torch.float16`: Reduces memory and can speed up inference on compatible hardware.
    * `device_map="auto"`: `accelerate` handles distributing the model layers. For a single T4, it will load it onto the GPU.
* **Pipeline:** Creates a Hugging Face `pipeline` for text generation. `max_new_tokens` controls the length of the generated text. `temperature` and `top_p` influence creativity and coherence.
* **`HuggingFacePipeline`:** Wraps the Hugging Face pipeline for use with LangChain.

## Part 3: LangChain - Prompt Templates

### Explanation:
Prompt templates help in creating dynamic and reusable prompts. We define a template string with placeholders (input variables) that will be filled in at runtime.

In [None]:
story_prompt_template_str = """
<s>[INST] You are a creative storyteller. Write a short story based on the following elements:
Genre: {genre}
Main Character: {main_character_description}
Setting: {setting}
Plot Point: {plot_point}

Story: [/INST]
"""

story_prompt_template = PromptTemplate(
    input_variables=["genre", "main_character_description", "setting", "plot_point"],
    template=story_prompt_template_str
)

# Test the prompt template
formatted_prompt = story_prompt_template.format(
    genre="Fantasy",
    main_character_description="A brave knight with a mysterious past",
    setting="An ancient, enchanted forest",
    plot_point="The knight discovers a hidden magical sword"
)
print("--- Formatted Prompt Example ---")
print(formatted_prompt)

--- Formatted Prompt Example ---

<s>[INST] You are a creative storyteller. Write a short story based on the following elements:
Genre: Fantasy
Main Character: A brave knight with a mysterious past
Setting: An ancient, enchanted forest
Plot Point: The knight discovers a hidden magical sword

Story: [/INST]



### Explanation:
* We define a template string `story_prompt_template_str` with placeholders like `{genre}`, `{main_character_description}`, etc.
* The `<s>[INST]` and `[/INST]` tokens are often used for instruction-following models like Mistral Instruct to delineate user prompts from model responses.
* `PromptTemplate` takes the input variables and the template string.
* We then test it by formatting the template with example values.

## Part 4: LangChain - LLMChains

### Explanation:
An `LLMChain` is a fundamental LangChain component that combines a `PromptTemplate` with an `LLM`. It takes user inputs, formats the prompt using the template, and then passes the formatted prompt to the LLM to get a response.

In [None]:
story_chain = LLMChain(llm=llm, prompt=story_prompt_template)

print("LLMChain for story generation created.")

LLMChain for story generation created.


  story_chain = LLMChain(llm=llm, prompt=story_prompt_template)


### Test the Story Generation Chain

In [None]:
# Example 1
input_data_1 = {
    "genre": "Science Fiction",
    "main_character_description": "A curious robot exploring a new planet",
    "setting": "A vibrant, alien jungle on planet Xylar",
    "plot_point": "The robot finds an ancient artifact that hums with energy"
}
print(f"\n--- Generating Story 1 (Sci-Fi) ---")
# This can take a moment to run. Make sure previous cells (LLM loading, chain creation) have been executed.
try:
    story_1_output = story_chain.invoke(input_data_1)
    if 'text' in story_1_output:
        print(story_1_output['text'])
    else:
        print(f"Output format unexpected. Full output: {story_1_output}")
except NameError as ne:
    print(f"Error: story_chain or other necessary variables might not be defined. {ne}")
except Exception as e:
    print(f"Error generating story 1: {e}")

# Example 2
input_data_2 = {
    "genre": "Mystery",
    "main_character_description": "A witty detective with a keen eye for detail",
    "setting": "A foggy night in 1940s London",
    "plot_point": "The detective finds a cryptic note at a crime scene"
}
print(f"\n--- Generating Story 2 (Mystery) ---")
try:
    story_2_output = story_chain.invoke(input_data_2)
    if 'text' in story_2_output:
        print(story_2_output['text'])
    else:
        print(f"Output format unexpected. Full output: {story_2_output}")
except NameError as ne:
    print(f"Error: story_chain or other necessary variables might not be defined. {ne}")
except Exception as e:
    print(f"Error generating story 2: {e}")


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



--- Generating Story 1 (Sci-Fi) ---


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



<s>[INST] You are a creative storyteller. Write a short story based on the following elements:
Genre: Science Fiction
Main Character: A curious robot exploring a new planet
Setting: A vibrant, alien jungle on planet Xylar
Plot Point: The robot finds an ancient artifact that hums with energy

Story: [/INST]

In the heart of the alien jungle on planet Xylar, a small robot named Rizo was on its usual exploration mission. Rizo was a curious robot, always eager to discover new things and learn more about the universe. Its metallic legs clicked rhythmically as it moved through the dense vegetation, the bright lights of its sensors scanning the environment.

The jungle on Xylar was unlike anything Rizo had ever seen. The trees were tall and twisted, with leaves that shimmered in the strange, purple light that filtered through the canopy. The air was thick with the sounds of unknown creatures and the scent of exotic flowers. Rizo felt alive, like it was truly part of this vibrant world.

As R

In [None]:
story_1 = story_chain.run(input_data_1)
print(story_1)
Example_2_input_data_2 = { "genre": "Mystery", "main_character_description": "A witty detective with a keen eye for detail", "setting": "A foggy night in 1940s London", "plot_point": "The detective finds a cryptic note at a crime scene" }
print(f"\n--- Generating Story 2 (Mystery) ---")
story_2 = story_chain.run(Example_2_input_data_2)
print(story_2)
print(story_2)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



<s>[INST] You are a creative storyteller. Write a short story based on the following elements:
Genre: Science Fiction
Main Character: A curious robot exploring a new planet
Setting: A vibrant, alien jungle on planet Xylar
Plot Point: The robot finds an ancient artifact that hums with energy

Story: [/INST]

In the heart of the alien jungle on planet Xylar, a small robot named Rizo was on its usual exploration mission. Rizo was a curious robot, always eager to discover new things and learn more about the universe. Its metallic legs clicked rhythmically as it moved through the dense vegetation, the bright lights of its sensors scanning the environment.

The jungle on Xylar was unlike anything Rizo had ever seen. The trees were tall and twisted, with leaves that shimmered in the strange, purple light that filtered through the canopy. The air was thick with the sounds of unknown creatures and the scent of exotic flowers. Rizo felt alive, like it was truly part of this vibrant world.

As R

In [None]:
story_1 = story_chain.run(input_data_1)
print(story_1)
input_data_2 = { "genre": "Mystery", "main_character_description": "A witty detective with a keen eye for detail", "setting": "A foggy night in 1940s London", "plot_point": "The detective finds a cryptic note at a crime scene" }
print(f"\n--- Generating Story 2 (Mystery) ---")
story_2 = story_chain.run(input_data_2)
print(story_2)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



<s>[INST] You are a creative storyteller. Write a short story based on the following elements:
Genre: Science Fiction
Main Character: A curious robot exploring a new planet
Setting: A vibrant, alien jungle on planet Xylar
Plot Point: The robot finds an ancient artifact that hums with energy

Story: [/INST]

In the heart of the alien jungle on planet Xylar, a small robot named Rizo was on its usual exploration mission. Rizo was a curious robot, always eager to discover new things and learn more about the universe. Its metallic legs clicked rhythmically as it moved through the dense vegetation, the bright lights of its sensors scanning the environment.

The jungle on Xylar was unlike anything Rizo had ever seen. The trees were tall and twisted, with leaves that shimmered in the strange, purple light that filtered through the canopy. The air was thick with the sounds of unknown creatures and the scent of exotic flowers. Rizo felt alive, like it was truly part of this vibrant world.

As R

# This can take a moment to run

In [None]:
story_1 = story_chain.run(input_data_1)
print(story_1)
Example_2_input_data_2 = { "genre": "Mystery", "main_character_description": "A witty detective with a keen eye for detail", "setting": "A foggy night in 1940s London", "plot_point": "The detective finds a cryptic note at a crime scene" }

print(f"\n--- Generating Story 2 (Mystery) ---")

story_2 = story_chain.run(Example_2_input_data_2)
print(story_2)

story_2 = story_chain.run(input_data_2)
print(story_2)

  story_1 = story_chain.run(input_data_1)
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



<s>[INST] You are a creative storyteller. Write a short story based on the following elements:
Genre: Science Fiction
Main Character: A curious robot exploring a new planet
Setting: A vibrant, alien jungle on planet Xylar
Plot Point: The robot finds an ancient artifact that hums with energy

Story: [/INST]

In the heart of the alien jungle on planet Xylar, a small robot named Rizo was on its usual exploration mission. Rizo was a curious robot, always eager to discover new things and learn more about the universe. Its metallic legs clicked rhythmically as it moved through the dense vegetation, the bright lights of its sensors scanning the environment.

The jungle on Xylar was unlike anything Rizo had ever seen. The trees were tall and twisted, with leaves that shimmered in the strange, purple light that filtered through the canopy. The air was thick with the sounds of unknown creatures and the scent of exotic flowers. Rizo felt alive, like it was truly part of this vibrant world.

As R

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



<s>[INST] You are a creative storyteller. Write a short story based on the following elements:
Genre: Mystery
Main Character: A witty detective with a keen eye for detail
Setting: A foggy night in 1940s London
Plot Point: The detective finds a cryptic note at a crime scene

Story: [/INST]

It was a foggy night in 1940s London, and the rain was pouring down in sheets. The streets were slick and treacherous, and the thick fog made it nearly impossible to see anything beyond a few feet. Despite the gloomy weather, Detective Arthur Wainwright was in high spirits. He had just solved a particularly puzzling case, and he was on his way home when he stumbled upon a new mystery.

As he walked down a dark alley, he noticed a faint light flickering in the distance. Curiosity piqued, he followed the light and soon found himself at the scene of a crime. A man had been found dead on the ground, and there was a cryptic note next to him. The note was written in a code that Wainwright couldn't deciphe

### Explanation:
* We create an `LLMChain` by providing our `llm` instance and the `story_prompt_template`.
* The `.run()` method of the chain can be used to execute it. You can pass a dictionary of input variables or pass them as keyword arguments.
* **Note:** Running the LLM can take some time, especially the first time or with longer outputs. The generation lines are commented out by default to prevent accidental long runs during initial notebook execution. You can uncomment them to see the stories.

## Part 5: Building the CLI Tool with `argparse`

### Explanation:
Now, let's create a Python script that can be run from the command line. We'll use the `argparse` module to accept story elements as command-line arguments.

In [None]:
# The following code is intended to be saved as a .py file (e.g., story_cli.py)
# For demonstration, we'll define the main function and argument parsing here.

cli_script_content = """

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain_huggingface import HuggingFacePipeline  # Updated import
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
import argparse
import os

def generate_story_cli():
    parser = argparse.ArgumentParser(description="LangChain Story Generator CLI")
    parser.add_argument("--genre", type=str, required=True, help="Genre of the story (e.g., Fantasy, Sci-Fi)")
    parser.add_argument("--character", type=str, required=True, help="Description of the main character")
    parser.add_argument("--setting", type=str, required=True, help="Setting of the story")
    parser.add_argument("--plot", type=str, required=True, help="A key plot point")
    parser.add_argument("--max_tokens", type=int, default=512, help="Max new tokens for the story length")

    args = parser.parse_args()

    print("Initializing LLM... This may take a moment.")
    model_id = "mistralai/Mistral-7B-Instruct-v0.1"

    try:
        tokenizer = AutoTokenizer.from_pretrained(model_id)
        model = AutoModelForCausalLM.from_pretrained(
            model_id,
            load_in_4bit=True,
            torch_dtype=torch.float16,
            device_map="auto"
        )

        pipe = pipeline(
            "text-generation",
            model=model,
            tokenizer=tokenizer,
            max_new_tokens=args.max_tokens,
            temperature=0.7,
            top_p=0.95
        )
        llm = HuggingFacePipeline(pipeline=pipe)
# The following code is intended to be saved as a .py file (e.g., story_cli.py)
# For demonstration, we'll define the main function and argument parsing here.

cli_script_content = """

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain_huggingface import HuggingFacePipeline  # Updated import
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
import argparse
import os

def generate_story_cli():
    parser = argparse.ArgumentParser(description="LangChain Story Generator CLI")
    parser.add_argument("--genre", type=str, required=True, help="Genre of the story (e.g., Fantasy, Sci-Fi)")
    parser.add_argument("--character", type=str, required=True, help="Description of the main character")
    parser.add_argument("--setting", type=str, required=True, help="Setting of the story")
    parser.add_argument("--plot", type=str, required=True, help="A key plot point")
    parser.add_argument("--max_tokens", type=int, default=512, help="Max new tokens for the story length")

    args = parser.parse_args()

    print("Initializing LLM... This may take a moment.")
    model_id = "mistralai/Mistral-7B-Instruct-v0.1"

    try:
        tokenizer = AutoTokenizer.from_pretrained(model_id)
        model = AutoModelForCausalLM.from_pretrained(
            model_id,
            load_in_4bit=True,
            torch_dtype=torch.float16,
            device_map="auto"
        )

        pipe = pipeline(
            "text-generation",
            model=model,
            tokenizer=tokenizer,
            max_new_tokens=args.max_tokens,
            temperature=0.7,
            top_p=0.95
        )
        llm = HuggingFacePipeline(pipeline=pipe)
        print("LLM loaded successfully.")

        story_prompt_template_str = """<s>[INST] You are a creative storyteller. Write a short story based on the following elements:
Genre: {genre}
Main Character: {main_character_description}
Setting: {setting}
Plot Point: {plot_point}

Story: [/INST]"""

        story_prompt_template = PromptTemplate(
            input_variables=["genre", "main_character_description", "setting", "plot_point"],
            template=story_prompt_template_str
        )

        story_chain = LLMChain(llm=llm, prompt=story_prompt_template)

        print(f"\nGenerating story with the following elements:")
        print(f"- Genre: {args.genre}")
        print(f"- Character: {args.character}")
        print(f"- Setting: {args.setting}")
        print(f"- Plot: {args.plot}")
        print("----------------------------------------")

        story_input = {
            "genre": args.genre,
            "main_character_description": args.character,
            "setting": args.setting,
            "plot_point": args.plot
        }

        story_output = story_chain.invoke(story_input)  # Updated to invoke
        print("\n--- Generated Story ---")
        if isinstance(story_output, dict) and 'text' in story_output:
            print(story_output['text'])
        else:
            print(f"Output format unexpected or 'text' key missing. Full output: {story_output}")

    except Exception as e:
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    generate_story_cli()
"""
Setting: {setting}
Plot Point: {plot_point}

Story: [/INST]"""

        story_prompt_template = PromptTemplate(
            input_variables=["genre", "main_character_description", "setting", "plot_point"],
            template=story_prompt_template_str
        )

        story_chain = LLMChain(llm=llm, prompt=story_prompt_template)

        print(f"\nGenerating story with the following elements:")
        print(f"- Genre: {args.genre}")
        print(f"- Character: {args.character}")
        print(f"- Setting: {args.setting}")
        print(f"- Plot: {args.plot}")
        print("----------------------------------------")

        story_input = {
            "genre": args.genre,
            "main_character_description": args.character,
            "setting": args.setting,
            "plot_point": args.plot
        }

        story_output = story_chain.invoke(story_input)  # Updated to invoke
        print("\n--- Generated Story ---")
        if isinstance(story_output, dict) and 'text' in story_output:
            print(story_output['text'])
        else:
            print(f"Output format unexpected or 'text' key missing. Full output: {story_output}")

    except Exception as e:
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    generate_story_cli()
"""


IndentationError: unindent does not match any outer indentation level (<tokenize>, line 166)

In [None]:
import os

# (… your existing cli_script_content definition here …)
cli_script_content = """
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain_huggingface import HuggingFacePipeline  # Updated import
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
import argparse
import os

def generate_story_cli():
    parser = argparse.ArgumentParser(description="LangChain Story Generator CLI")
    parser.add_argument("--genre", type=str, required=True, help="Genre of the story (e.g., Fantasy, Sci-Fi)")
    parser.add_argument("--character", type=str, required=True, help="Description of the main character")
    parser.add_argument("--setting", type=str, required=True, help="Setting of the story")
    parser.add_argument("--plot", type=str, required=True, help="A key plot point")
    parser.add_argument("--max_tokens", type=int, default=512, help="Max new tokens for the story length")

    args = parser.parse_args()

    print("Initializing LLM... This may take a moment.")
    model_id = "mistralai/Mistral-7B-Instruct-v0.1"

    try:
        tokenizer = AutoTokenizer.from_pretrained(model_id)
        model = AutoModelForCausalLM.from_pretrained(
            model_id,
            load_in_4bit=True,
            torch_dtype=torch.float16,
            device_map="auto"
        )

        pipe = pipeline(
            "text-generation",
            model=model,
            tokenizer=tokenizer,
            max_new_tokens=args.max_tokens,
            temperature=0.7,
            top_p=0.95
        )
        llm = HuggingFacePipeline(pipeline=pipe)
        print("LLM loaded successfully.")

        story_prompt_template_str = \"\"\"<s>[INST] You are a creative storyteller. Write a short story based on the following elements:
Genre: {genre}
Main Character: {main_character_description}
Setting: {setting}
Plot Point: {plot_point}

Story: [/INST]\"\"\"

        story_prompt_template = PromptTemplate(
            input_variables=["genre", "main_character_description", "setting", "plot_point"],
            template=story_prompt_template_str
        )

        story_chain = LLMChain(llm=llm, prompt=story_prompt_template)

        print(f"\\nGenerating story with the following elements:")
        print(f"- Genre: {args.genre}")
        print(f"- Character: {args.character}")
        print(f"- Setting: {args.setting}")
        print(f"- Plot: {args.plot}")
        print("----------------------------------------")

        story_input = {
            "genre": args.genre,
            "main_character_description": args.character,
            "setting": args.setting,
            "plot_point": args.plot
        }

        story_output = story_chain.invoke(story_input)  # Updated to invoke
        print("\\n--- Generated Story ---")
        if isinstance(story_output, dict) and 'text' in story_output:
            print(story_output['text'])
        else:
            print(f"Output format unexpected or 'text' key missing. Full output: {story_output}")

    except Exception as e:
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    generate_story_cli()
"""

def save_script(content: str, filename: str = "story_cli.py"):
    """
    Saves the given content string to a Python script file.
    """
    path = os.path.abspath(filename)
    with open(path, "w", encoding="utf-8") as f:
        f.write(content.lstrip("\n"))  # strip leading newline for clean file start
    print(f"🚀 Script saved to {path}")

if __name__ == "__main__":
    save_script(cli_script_content)


🚀 Script saved to /content/story_cli.py


In [None]:
import os

# (… your existing cli_script_content definition here …)
cli_script_content = """
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain_huggingface import HuggingFacePipeline  # Updated import
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
import argparse
import os

def generate_story_cli():
    parser = argparse.ArgumentParser(description="LangChain Story Generator CLI")
    parser.add_argument("--genre", type=str, required=True, help="Genre of the story (e.g., Fantasy, Sci-Fi)")
    parser.add_argument("--character", type=str, required=True, help="Description of the main character")
    parser.add_argument("--setting", type=str, required=True, help="Setting of the story")
    parser.add_argument("--plot", type=str, required=True, help="A key plot point")
    parser.add_argument("--max_tokens", type=int, default=512, help="Max new tokens for the story length")

    args = parser.parse_args()

    print("Initializing LLM... This may take a moment.")
    model_id = "mistralai/Mistral-7B-Instruct-v0.1"

    try:
        tokenizer = AutoTokenizer.from_pretrained(model_id)
        model = AutoModelForCausalLM.from_pretrained(
            model_id,
            load_in_4bit=True,
            torch_dtype=torch.float16,
            device_map="auto"
        )

        pipe = pipeline(
            "text-generation",
            model=model,
            tokenizer=tokenizer,
            max_new_tokens=args.max_tokens,
            temperature=0.7,
            top_p=0.95
        )
        llm = HuggingFacePipeline(pipeline=pipe)
        print("LLM loaded successfully.")

        story_prompt_template_str = \"\"\"<s>[INST] You are a creative storyteller. Write a short story based on the following elements:
Genre: {genre}
Main Character: {main_character_description}
Setting: {setting}
Plot Point: {plot_point}

Story: [/INST]\"\"\"

        story_prompt_template = PromptTemplate(
            input_variables=["genre", "main_character_description", "setting", "plot_point"],
            template=story_prompt_template_str
        )

        story_chain = LLMChain(llm=llm, prompt=story_prompt_template)

        print(f"\\nGenerating story with the following elements:")
        print(f"- Genre: {args.genre}")
        print(f"- Character: {args.character}")
        print(f"- Setting: {args.setting}")
        print(f"- Plot: {args.plot}")
        print("----------------------------------------")

        story_input = {
            "genre": args.genre,
            "main_character_description": args.character,
            "setting": args.setting,
            "plot_point": args.plot
        }

        story_output = story_chain.invoke(story_input)  # Updated to invoke
        print("\\n--- Generated Story ---")
        if isinstance(story_output, dict) and 'text' in story_output:
            print(story_output['text'])
        else:
            print(f"Output format unexpected or 'text' key missing. Full output: {story_output}")

    except Exception as e:
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    generate_story_cli()
"""


In [None]:
story_1 = story_chain.run(input_data_1)
print(story_1)
input_data_2 = { "genre": "Mystery", "main_character_description": "A witty detective with a keen eye for detail", "setting": "A foggy night in 1940s London", "plot_point": "The detective finds a cryptic note at a crime scene" }
print(f"\n--- Generating Story 2 (Mystery) ---")
story_2 = story_chain.run(input_data_2)
print(story_2)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



<s>[INST] You are a creative storyteller. Write a short story based on the following elements:
Genre: Science Fiction
Main Character: A curious robot exploring a new planet
Setting: A vibrant, alien jungle on planet Xylar
Plot Point: The robot finds an ancient artifact that hums with energy

Story: [/INST]

In the heart of the alien jungle on planet Xylar, a small robot named Rizo was on its usual exploration mission. Rizo was a curious robot, always eager to discover new things and learn more about the universe. Its metallic legs clicked rhythmically as it moved through the dense vegetation, the bright lights of its sensors scanning the environment.

The jungle on Xylar was unlike anything Rizo had ever seen. The trees were tall and twisted, with leaves that shimmered in the strange, purple light that filtered through the canopy. The air was thick with the sounds of unknown creatures and the scent of exotic flowers. Rizo felt alive, like it was truly part of this vibrant world.

As R

### How to Run the CLI Script (from your terminal):
1. Save the code above into a file named `story_cli.py` (this cell does it for you).
2. Open your terminal.
3. Navigate to the directory where you saved `story_cli.py`.
4. Run the script with arguments, for example:
```bash
python story_cli.py --genre "Adventure" --character "A fearless explorer" --setting "A lost temple deep in the Amazon" --plot "The explorer triggers an ancient trap"
```
Or, to test it within this notebook (if you have a terminal or can run shell commands):

In [None]:
# !python /content/story_cli.py --genre "Comedy" --character "A clumsy robot chef" --setting "A chaotic kitchen during dinner rush" --plot "The robot accidentally bakes its own instruction manual into a cake" --max_tokens 256

### Explanation:
* **`argparse.ArgumentParser`**: Sets up the argument parser.
* **`add_argument`**: Defines the command-line arguments we expect (`--genre`, `--character`, etc.). `required=True` makes them mandatory.
* **`parser.parse_args()`**: Parses the arguments provided when the script is run.
* **LLM and Chain Initialization**: The script re-initializes the LLM and LangChain components. In a more advanced setup, you might serialize a pre-trained chain or have a more efficient way to load the model if the CLI is run frequently.
* **`if __name__ == "__main__":`**: Ensures the `generate_story_cli()` function runs when the script is executed directly.
* The script is written to `/home/ubuntu/story_cli.py`. You can then run it from a terminal. The example command shows how to execute it.
* The `!python ...` line in the cell above is commented out but shows how you could try to run it from within a Jupyter environment that supports shell commands.

## Part 6: Conclusion and Further Exploration

### Explanation:
Today, we've covered the basics of using LangChain with an open-source LLM to build a story generator and a CLI tool. This is just the tip of the iceberg!

### Further Ideas:
* **More Complex Chains:** Explore `SequentialChain` or `RouterChain` for more sophisticated workflows.
* **Memory:** Add memory to chains to allow for conversational interactions.
* **Output Parsers:** Use LangChain's output parsers to structure the LLM's output (e.g., into JSON).
* **Different LLMs:** Experiment with other quantized models or different model architectures.
* **Error Handling:** Add more robust error handling to the CLI tool.
* **Advanced CLI Features:** Use libraries like `Typer` or `Click` for more advanced CLI development.

---
End of Notebook. Remember to uncomment and run the LLM cells if you want to see the generated stories!

## Resources and References

This notebook demonstrates using LangChain with Hugging Face models for text generation. Below are some helpful resources:

*   **LangChain Python Documentation:** [https://python.langchain.com/](https://python.langchain.com/) - The official documentation for LangChain, covering concepts, integrations, and examples.
*   **LangChainHuggingFace Integration:** For details on using HuggingFace models with LangChain: [https://python.langchain.com/docs/integrations/llms/huggingface_pipelines/](https://python.langchain.com/docs/integrations/llms/huggingface_pipelines/) (Note: check for the latest community or specific integration docs like `langchain-huggingface`).
*   **Hugging Face Transformers:** [https://huggingface.co/docs/transformers/index](https://huggingface.co/docs/transformers/index) - Documentation for the Transformers library, model hub, and pipelines.
*   **Mistral AI & Mistral-7B-Instruct-v0.1:** [https://mistral.ai/](https://mistral.ai/) and model card [https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1).
*   **TinyLlama Project:** [https://github.com/jzhang38/TinyLlama](https://github.com/jzhang38/TinyLlama) - For information on the TinyLlama models, which are excellent for resource-constrained environments.
*   **Bitsandbytes for Quantization:** [https://github.com/TimDettmers/bitsandbytes](https://github.com/TimDettmers/bitsandbytes) - Essential for running large models with reduced memory via techniques like 4-bit quantization.
*   **PyTorch:** [https://pytorch.org/](https://pytorch.org/) - The deep learning framework used by Hugging Face Transformers.
*   **Google Colab:** [https://colab.research.google.com/](https://colab.research.google.com/) - The environment this notebook is designed for, offering free access to GPUs like the T4.

### Key Concepts Used
*   **Prompt Engineering:** Crafting effective prompts (like the `story_prompt_template_str`) is crucial for guiding the LLM's output.
*   **Quantization:** Techniques like 4-bit quantization (`load_in_4bit=True`) reduce model size and memory usage, enabling larger models on GPUs like the T4.
*   **LLM Chains (`LLMChain`):** A fundamental LangChain concept for combining an LLM with a prompt template to perform a specific task.
*   **Hugging Face Pipelines:** A high-level API from the Transformers library for easy inference with pre-trained models.

### Further Exploration
*   **LangChain Expression Language (LCEL):** For more advanced chain construction, explore LCEL for its composability and streaming capabilities.
*   **Other Quantization Methods (e.g., GPTQ):** If you need even smaller models or different performance characteristics, investigate other quantization libraries like AutoGPTQ.
*   **Alternative Open-Source LLMs:** Explore other models on the Hugging Face Hub suitable for T4 GPUs (e.g., Phi-2, other Mistral variants). Remember to check their specific prompt formats.