In [0]:
# Only required for single node, for clusters install through Compute>Libraries
#%pip install -q langchain langchain-nvidia-ai-endpoints gradio

import os
os.environ["NVIDIA_API_KEY"] = "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"
# if you get errors, please sign up for a new key

## If you encounter a typing-extensions issue, restart your runtime and try again
from langchain_nvidia_ai_endpoints import ChatNVIDIA
ChatNVIDIA.get_available_models()

[Model(id='mistralai/mathstral-7b-v0.1', model_type='chat', client='ChatNVIDIA', endpoint=None, aliases=None, supports_tools=False, supports_structured_output=False, base_model=None),
 Model(id='meta/llama2-70b', model_type='chat', client='ChatNVIDIA', endpoint=None, aliases=['ai-llama2-70b', 'playground_llama2_70b', 'llama2_70b', 'playground_llama2_13b', 'llama2_13b'], supports_tools=False, supports_structured_output=False, base_model=None),
 Model(id='meta/llama3-70b-instruct', model_type='chat', client='ChatNVIDIA', endpoint=None, aliases=['ai-llama3-70b'], supports_tools=False, supports_structured_output=False, base_model=None),
 Model(id='rakuten/rakutenai-7b-instruct', model_type='chat', client='ChatNVIDIA', endpoint=None, aliases=None, supports_tools=False, supports_structured_output=False, base_model=None),
 Model(id='nvidia/neva-22b', model_type='nv-vlm', client='ChatNVIDIA', endpoint='https://ai.api.nvidia.com/v1/vlm/nvidia/neva-22b', aliases=['ai-neva-22b', 'playground_neva_

When exploring a new library, it's important to note what are the core systems of the library and how are they used.

In LangChain, the main building block used to be the classic Chain: a small module of functionality that does something specific and can be linked up with other chains to make a system. So for all intents and purposes, it is a "building-block system" abstraction where the building blocks are easy to create, have consistent methods (invoke, generate, stream, etc), and can be linked up to work together as a system. Some example legacy chains include LLMChain, ConversationChain, TransformationChain, SequentialChain, etc.

More recently, a new recommended specification has emerged that is significantly easier to work with and extremely compact, the LangChain Expression Language (LCEL). This new format relies on a different kind of primitive - a Runnable - which is simply an object that wraps a function. Allow dictionaries to be implicitly converted to Runnables and let a pipe | operator create a Runnable that passes data from the left to the right (i.e. fn1 | fn2 is a Runnable), and you have a simple way to specify complex logic!

Here are some very representative example Runnables, created via the RunnableLambda class:

In [0]:
from langchain.schema.runnable import RunnableLambda, RunnablePassthrough
from functools import partial

################################################################################
## Very simple "take input and return it"
identity = RunnableLambda(lambda x: x)  ## Or RunnablePassthrough works

################################################################################
## Given an arbitrary function, you can make a runnable with it
def print_and_return(x, preface=""):
    print(f"{preface}{x}")
    return x

rprint0 = RunnableLambda(print_and_return)

################################################################################
## You can also pre-fill some of values using functools.partial
rprint1 = RunnableLambda(partial(print_and_return, preface="1: "))

################################################################################
## And you can use the same idea to make your own custom Runnable generator
def RPrint(preface=""):
    return RunnableLambda(partial(print_and_return, preface=preface))

################################################################################
## Chaining two runnables
chain1 = identity | rprint0
chain1.invoke("Hello World!")
print()

################################################################################
## Chaining that one in as well
output = (
    chain1           ## Prints "Welcome Home!" & passes "Welcome Home!" onward
    | rprint1        ## Prints "1: Welcome Home!" & passes "Welcome Home!" onward
    | RPrint("2: ")  ## Prints "2: Welcome Home!" & passes "Welcome Home!" onward
).invoke("Welcome Home!")

## Final Output Is Preserved As "Welcome Home!"
print("\nOutput:", output)

Hello World!

Welcome Home!
1: Welcome Home!
2: Welcome Home!

Output: Welcome Home!


There's a lot you can do with runnables, but it's important to formalize some best practices. At the moment, it's easiest to use dictionaries as our default variable containers for a few key reasons:

Passing dictionaries helps us keep track of our variables by name.

Since dictionaries allow us to propagate named variables (values referenced by keys), using them is great for locking in our chain components' outputs and expectations.

LangChain prompts expect dictionaries of values.

It's quite intuitive to specify an LLM Chain in LCEL to take in a dictionary and produce a string, and equally easy to raise said string back up to be a dictionary. This is very intentional and is partially due to the above reason.

One of the most fundamental components of classical LangChain is the LLMChain that accepts a prompt and an LLM:

A prompt, usually retrieved from a call like PromptTemplate.from_template("string with {key1} and {key2}"), specifies a template for creating a string as output. A dictionary {"key1" : 1, "key2" : 2} could be passed in to get the output "string with 1 and 2".
For chat models like ChatNVIDIA, you would use ChatPromptTemplate.from_messages instead.
An LLM takes in a string and returns a generated string.
Chat models like ChatNVIDIA work with messages instead, but it's the same idea! Using an StrOutputParser at the end will extract the content from the message.
The following is a lightweight example of a simple chat chain as described above. All it does is take in an input dictionary and use it fill in a system message to specify the overall meta-objective and a user input to specify query the model.

In [0]:
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

## Simple Chat Pipeline
chat_llm = ChatNVIDIA(model="meta/llama3-8b-instruct")

prompt = ChatPromptTemplate.from_messages([
    ("system", "Only respond in rhymes"),
    ("user", "{input}")
])

rhyme_chain = prompt | chat_llm | StrOutputParser()

print(rhyme_chain.invoke({"input" : "Tell me about birds!"}))

Birds are quite a delightful find,
With feathers and wings, they soar and entwine.
In trees, they alight, with tails so bright,
And sing their songs, with morning light.

Some have beaks that curve, some have beaks that straight,
Their chirps and chatter, fill the air and create
A symphony sweet, of melodic sound,
As birds take flight, their magic's all around.

From robins to sparrows, to hawks on high,
Each species unique, yet all touch the sky.
With colors bright, and forms so grand,
Birds are a wonder, in this world so bland.


Sometimes, you also want to have some quick reasoning that goes on behind the scenes before your response actually comes out to the user. When performing this task, you need a model with a strong instruction-following prior assumption built-in.

The following is an example "zero-shot classification" pipeline which will try to categorize a sentence into one of a couple of classes.

In order, this zero-shot classification chain:

Takes in a dictionary with two required keys, input and options.
Passes it through the zero-shot prompt to get the input to our LLM.
Passes that string to the model to get the result.
Task: Pick out several models that you think would be good for this kind of task and see how well they perform! Specifically:

Try to find models that are predictable across multiple examples. If the format is always easy to parse and extremely predictable, then the model is probably ok.
Try to find models that are also fast! This is important because internal reasoning generally happens behind the hood before the external response gets generated. Thereby, it is a blocking process which can slow down start of "user-facing" generation, making your system feel sluggish.

In [0]:
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

## Feel free to try out some more models and see if there are better lightweight options
## https://build.nvidia.com
instruct_llm = ChatNVIDIA(model="mistralai/mistral-7b-instruct-v0.2")

sys_msg = (
    "Choose the most likely topic classification given the sentence as context."
    " Only one word, no explanation.\n[Options : {options}]"
)

## One-shot classification prompt with heavy format assumptions.
zsc_prompt = ChatPromptTemplate.from_messages([
    ("system", sys_msg),
    ("user", "[[The sea is awesome]]"),
    ("assistant", "boat"),
    ("user", "[[{input}]]"),
])

## Roughly equivalent as above for <s>[INST]instruction[/INST]response</s> format
#zsc_prompt = ChatPromptTemplate.from_template(
#    f"{sys_msg}\n\n"
#    "[[The sea is awesome]][/INST]boat</s><s>[INST]"
#    "[[{input}]]"
#)

zsc_chain = zsc_prompt | instruct_llm | StrOutputParser()

def zsc_call(input, options=["car", "boat", "airplane", "bike"]):
    return zsc_chain.invoke({"input" : input, "options" : options}).split()[0]

print("-" * 80)
print(zsc_call("Should I take the next exit, or keep going to the next one?"))

print("-" * 80)
print(zsc_call("I get seasick, so I think I'll pass on the trip"))

print("-" * 80)
print(zsc_call("I'm scared of heights, so flying probably isn't for me"))

--------------------------------------------------------------------------------
car
--------------------------------------------------------------------------------
boat
--------------------------------------------------------------------------------
airplane


The previous example showed how we can coerce a dictionary into a string by passing it through a prompt -> LLM chain, so that's one easy structure to motivate the container choice. But is it just as easy to convert the string output back up to a dictionary?

Yes, it is! The simplest way is actually to use the LCEL *"implicit runnable"* syntax, which allows you to use a dictionary of functions (including chains) as a runnable that runs each function and maps the value to the key in the output dictionary.

The following is an example which exercises these utilities while also providing a few extra tools you may find useful in practice.

In [0]:
from langchain.schema.runnable import RunnableLambda, RunnablePassthrough
from functools import partial

################################################################################
## Example of dictionary enforcement methods
def make_dictionary(v, key):
    if isinstance(v, dict):
        return v
    return {key : v}

def RInput(key='input'):
    '''Coercing method to mold a value (i.e. string) to in-like dict'''
    return RunnableLambda(partial(make_dictionary, key=key))

def ROutput(key='output'):
    '''Coercing method to mold a value (i.e. string) to out-like dict'''
    return RunnableLambda(partial(make_dictionary, key=key))

def RPrint(preface=""):
    return RunnableLambda(partial(print_and_return, preface=preface))

################################################################################
## Common LCEL utility for pulling values from dictionaries
from operator import itemgetter

up_and_down = (
    RPrint("A: ")
    ## Custom ensure-dictionary process
    | RInput()
    | RPrint("B: ")
    ## Pull-values-from-dictionary utility
    | itemgetter("input")
    | RPrint("C: ")
    ## Anything-in Dictionary-out implicit map
    | {
        'word1' : (lambda x : x.split()[0]),
        'word2' : (lambda x : x.split()[1]),
        'words' : (lambda x: x),  ## <- == to RunnablePassthrough()
    }
    | RPrint("D: ")
    | itemgetter("word1")
    | RPrint("E: ")
    ## Anything-in anything-out lambda application
    | RunnableLambda(lambda x: x.upper())
    | RPrint("F: ")
    ## Custom ensure-dictionary process
    | ROutput()
)

up_and_down.invoke({"input" : "Hello World"})

A: {'input': 'Hello World'}
B: {'input': 'Hello World'}
C: Hello World
D: {'word1': 'Hello', 'word2': 'World', 'words': 'Hello World'}
E: Hello
F: HELLO


{'output': 'HELLO'}

Below is a poetry generation example that showcases how you might organize two different tasks under the guise of a single agent. The system calls back to the simple Gradio example, but extends it with some boiler-plate responses and logic behind the scenes.

It's primary feature is as follows:

On the first response, it will generate a poem based on your response.
On subsequent responses, it will keep the format and structure of your original rhyme while modifying the topic of the poem.
Problem: At present, the system should function just fine for the first part, but the second part is not yet implemented.

Objective: Implement the rest of the rhyme_chat2_stream method such that the agent is able to function normally.

To make the gradio component easier to reason with, a simplified queue_fake_streaming_gradio method is provided that will simulate the gradio chat event loop with the standard Python input method

In [0]:
[model for model in ChatNVIDIA.get_available_models()
     if ("mistral" in model.id or "meta/llama" in model.id)
         and model.model_type in ('chat', None)]

[Model(id='mistralai/mathstral-7b-v0.1', model_type='chat', client='ChatNVIDIA', endpoint=None, aliases=None, supports_tools=False, supports_structured_output=False, base_model=None),
 Model(id='meta/llama2-70b', model_type='chat', client='ChatNVIDIA', endpoint=None, aliases=['ai-llama2-70b', 'playground_llama2_70b', 'llama2_70b', 'playground_llama2_13b', 'llama2_13b'], supports_tools=False, supports_structured_output=False, base_model=None),
 Model(id='meta/llama3-70b-instruct', model_type='chat', client='ChatNVIDIA', endpoint=None, aliases=['ai-llama3-70b'], supports_tools=False, supports_structured_output=False, base_model=None),
 Model(id='mistralai/codestral-22b-instruct-v0.1', model_type='chat', client='ChatNVIDIA', endpoint=None, aliases=['ai-codestral-22b-instruct-v01'], supports_tools=False, supports_structured_output=True, base_model=None),
 Model(id='mistralai/mamba-codestral-7b-v0.1', model_type='chat', client='ChatNVIDIA', endpoint=None, aliases=None, supports_tools=False,

In [0]:
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from copy import deepcopy

instruct_llm = ChatNVIDIA(model="mistralai/mixtral-8x22b-instruct-v0.1")  ## Feel free to change the models

prompt1 = ChatPromptTemplate.from_messages([("user", (
    "INSTRUCTION: Only respond in rhymes"
    "\n\nPROMPT: {input}"
))])

prompt2 =  ChatPromptTemplate.from_messages([("user", (
    "INSTRUCTION: Only responding in rhyme, change the topic of the input poem to be about {topic}!"
    " Make it happy! Try to keep the same sentence structure, but make sure it's easy to recite!"
    " Try not to rhyme a word with itself."
    "\n\nOriginal Poem: {input}"
    "\n\nNew Topic: {topic}"
))])

## These are the main chains, constructed here as modules of functionality.
chain1 = prompt1 | instruct_llm | StrOutputParser()  ## only expects input
chain2 = prompt2 | instruct_llm | StrOutputParser()  ## expects both input and topic

################################################################################
## SUMMARY OF TASK: chain1 currently gets invoked for the first input.
##  Please invoke chain2 for subsequent invocations.

def rhyme_chat2_stream(message, history, return_buffer=True):
    '''This is a generator function, where each call will yield the next entry'''

    first_poem = None
    for entry in history:
        if entry[0] and entry[1]:
            ## If a generation occurred as a direct result of a user input,
            ##  keep that response (the first poem generated) and break out
            first_poem = "\n\n".join(entry[1].split("\n\n")[1:-1])
            break

    if first_poem is None:
        ## First Case: There is no initial poem generated. Better make one up!

        buffer = "Oh! I can make a wonderful poem about that! Let me think!\n\n"
        yield buffer

        ## Iterate over stream generator for first generation
        inst_out = ""
        chat_gen = chain1.stream({"input" : message})
        for token in chat_gen:
            inst_out += token
            buffer += token
            yield buffer if return_buffer else token

        passage = "\n\nNow let me rewrite it with a different focus! What should the new focus be?"
        buffer += passage
        yield buffer if return_buffer else passage

    else:
        ## Subsequent Cases: There is a poem to start with. Generate a similar one with a new topic!

        #yield f"Not Implemented!!!"; return ## <- TODO: Comment this out

        ########################################################################
        ## TODO: Invoke the second chain to generate the new rhymes.

        buffer = f"Sure! Here you go!\n\n" ## <- TODO: Uncomment these lines
        yield buffer

        ## TODO: Iterate over stream generator for second generation (using chain2)
        inst_out = ""
        chat_gen = chain2.stream({"input" : message, "topic": "mantis shrimp"})
        for token in chat_gen:
            inst_out += token
            buffer += token
            yield buffer if return_buffer else token
        ## END TODO
        ########################################################################

        passage = "\n\nThis is fun! Give me another topic!"
        buffer += passage
        yield buffer if return_buffer else passage

################################################################################
## Below: This is a small-scale simulation of the gradio routine.

def queue_fake_streaming_gradio(chat_stream, history = [], max_questions=3):

    ## Mimic of the gradio initialization routine, where a set of starter messages can be printed off
    for human_msg, agent_msg in history:
        if human_msg: print("\n[ Human ]:", human_msg)
        if agent_msg: print("\n[ Agent ]:", agent_msg)

    ## Mimic of the gradio loop with an initial message from the agent.
    for _ in range(max_questions):
        message = input("\n[ Human ]: ")
        print("\n[ Agent ]: ")
        history_entry = [message, ""]
        for token in chat_stream(message, history, return_buffer=False):
            print(token, end='')
            history_entry[1] += token
        history += [history_entry]
        print("\n")

## history is of format [[User response 0, Bot response 0], ...]
history = [[None, "Let me help you make a poem! What would you like for me to write?"]]

## Simulating the queueing of a streaming gradio interface, using python input
queue_fake_streaming_gradio(
    chat_stream = rhyme_chat2_stream,
    history = history
)


[ Agent ]: Let me help you make a poem! What would you like for me to write?



[ Human ]:  burgers


[ Agent ]: 
Oh! I can make a wonderful poem about that! Let me think!

In the realm where the grill meets the flame,
Cooking up burgers is the chef's game.
Ground meat patty, seasoned with care,
On a toasted bun, they're a perfect pair.
With lettuce, tomato, and a slice of cheese,
A drizzle of sauce makes it a breeze.
Enjoy them grilled, broiled, or fried,
A taste of paradise, no need to hide.
So grab some napkins, hold on tight,
To a juicy burger, quite a sight!

Now let me rewrite it with a different focus! What should the new focus be?




[ Human ]:  burgers 


[ Agent ]: 
Sure! Here you go!

Mantis shrimp, oh so fun,
Living life under the sun,
In their colorful armor so bright,
Munching on crustaceans, a delight!

Crunchy meals they do adore,
Crabs and clams for them to explore,
Their club-like claws swiftly move,
In the ocean's rhythm they groove.

With eyes that see the full spectrum,
Seeing colors, no need for reflectum,
Flashing bright, a chromatic display,
Mantis shrimp never gray!

With bubbles and pops, they create their song,
In the sea, it's where they belong,
Mantis shrimp with spirit so free,
Living their life with glee!

This is fun! Give me another topic!




[ Human ]:  asteriod


[ Agent ]: 
Sure! Here you go!

Mantis Shrimp, oh so bright and bold,
In the ocean, their tale is told.
With colors so vivid, like a dream,
On the reef, they stand supreme.

Claws that snap with a mighty sound,
Under waves, they're perfectly found.
A dance of light in the undersea,
Mantis Shrimp, you're a sight to see!

In their home beneath the blue,
A world of wonder, they come to view.
Oh, Mantis Shrimp, so small yet grand,
In the ocean, you take a stand.

Your beauty, your power, it does inspire,
A symphony of life, you ignite a fire.
Mantis Shrimp, with your hues so bright,
You're a true delight, day and night.

This is fun! Give me another topic!

