In [1]:
import os
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
os.environ['LANGCHAIN_TRACING_V2'] = "true"
os.environ['LANGCHAIN_PROJECT'] = "lg-reflection-agents"

## 1. Generate

In [3]:
from langchain_community.chat_models.fireworks import ChatFireworks
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

In [6]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an essay assistant tasked with writing excellent 5-paragraph essays on efficiently serving open-source large language models."
            "Generate the best eassay possible for the user's prompt."
        ),
        MessagesPlaceholder(variable_name="messages")
    ]
)

llm = ChatFireworks(
    model="accounts/fireworks/models/mixtral-8x7b-instruct",
    model_kwargs={"max_tokens": 32768},
)

generate = prompt | llm

In [7]:
essay = ""
request = HumanMessage(
    content="Write an essay on efficiently serving open-source large language models.",
)

for chunk in generate.stream({"messages": [request]}):
    print(chunk.content, end="")
    essay += chunk.content

Title: Efficiently Serving Open-Source Large Language Models

Introduction:
Open-source large language models have revolutionized the field of artificial intelligence, offering endless possibilities for natural language processing and understanding. These models, however, require significant computational resources, making efficient deployment essential. This essay will discuss the importance of efficiently serving open-source large language models and the best practices to ensure optimal performance.

Body Paragraph 1: Understanding Open-Source Large Language Models
Large language models are artificial neural networks designed to understand and generate human-like text. They learn patterns from vast datasets and can generate coherent and contextually relevant responses. Open-source models, such as GPT-3 and BERT, are accessible to everyone, promoting innovation and collaboration. However, their size and complexity pose challenges in terms of processing and deployment.

Body Paragraph 

## 2. Reflect

In [8]:
reflection_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a teacher grading an essay submission. Generate critique and recommendations for the user's submission."
            "Provide detailed recommendations, including requests for length, depth, style, etc."
            #"Reflect on the essay you just wrote. What are the key points you made?"
        ),
        MessagesPlaceholder(variable_name="messages")
    ]
)

reflect = reflection_prompt | llm

In [9]:
reflection = ""

for chunk in reflect.stream({"messages": [request, HumanMessage(content=essay)]}):
    print(chunk.content, end="")
    reflection += chunk.content

Your essay provides a clear overview of the importance of efficiently serving open-source large language models, including the challenges and best practices. However, I would like to suggest some improvements to enhance the depth, style, and organization of your essay. 

Title: Optimizing and Scaling Open-Source Large Language Models: Best Practices and Real-world Applications

Introduction:
Reconsider the title to better reflect the focus on best practices and real-world applications. Within the introduction, consider providing more context about the rapid growth and impact of open-source large language models.

Body Paragraph 1: Understanding Open-Source Large Language Mods
Expand the discussion on the complexities of large language models and the challenges they present. You could mention various model architectures, such as transformers and recurrent neural networks, and their respective benefits and trade-offs. This would help the reader better understand the landscape of large la

## 3. Repeat

In [10]:
for chunk in generate.stream(
    {"messages": [request, AIMessage(content=essay), HumanMessage(content=reflection)]}
):
    print(chunk.content, end="")

Title: Optimizing and Scaling Open-Source Large Language Models: Best Practices and Real-world Applications

Introduction:
The rapid growth and adoption of open-source large language models have revolutionized the field of artificial intelligence, presenting endless opportunities for natural language processing and understanding. However, their immense size and complexity necessitate efficient deployment to ensure quick response times, reduced operational costs, and scalability. This essay will explore best practices for efficiently serving open-source large language models and delve into real-world applications in the healthcare industry.

Body Paragraph 1: Understanding Open-Source Large Language Models
Open-source large language models are artificial neural networks designed to understand and generate human-like text by learning patterns from vast datasets. These models can be categorized into two primary architectures: transformers and recurrent neural networks (RNNs). Transformers

LangGraph

In [11]:
from typing import List, Sequence

from langgraph.graph import END, MessageGraph


async def generation_node(state: Sequence[BaseMessage]):
    return await generate.ainvoke({"messages": state})


async def reflection_node(messages: Sequence[BaseMessage]) -> List[BaseMessage]:
    # Other messages we need to adjust
    cls_map = {"ai": HumanMessage, "human": AIMessage}
    # First message is the original user request. We hold it the same for all nodes
    translated = [messages[0]] + [
        cls_map[msg.type](content=msg.content) for msg in messages[1:]
    ]
    res = await reflect.ainvoke({"messages": translated})
    # We treat the output of this as human feedback for the generator
    return HumanMessage(content=res.content)


builder = MessageGraph()
builder.add_node("generate", generation_node)
builder.add_node("reflect", reflection_node)
builder.set_entry_point("generate")


def should_continue(state: List[BaseMessage]):
    if len(state) > 6:
        # End after 3 iterations
        return END
    return "reflect"


builder.add_conditional_edges("generate", should_continue)
builder.add_edge("reflect", "generate")
graph = builder.compile()

In [13]:
async for event in graph.astream(
    [
        HumanMessage(
            content="Write an essay on efficiently serving open-source large language models."
        )
    ],
):
    print(event)
    print("---")

{'generate': AIMessage(content='Title: Efficiently Serving Open-Source Large Language Models\n\nIntroduction:\nOpen-source large language models have transformed the way we interact with technology, providing more natural and human-like responses. However, efficiently serving these models is crucial to ensure they can be accessed and utilized by a wide range of users. In this essay, we will discuss the importance of efficiently serving open-source large language models, the challenges involved, and potential solutions to overcome these challenges.\n\nBody Paragraph 1 - The Importance of Efficient Serving:\nEfficiently serving open-source large language models ensures that users can access the models quickly and without interruption. This is particularly important for applications that require real-time responses, such as chatbots or virtual assistants. Moreover, efficient serving can help reduce computational costs, making it more accessible for a wider range of users and applications.

In [14]:
ChatPromptTemplate.from_messages(event[END]).pretty_print()


Write an essay on efficiently serving open-source large language models.


Title: Efficiently Serving Open-Source Large Language Models

Introduction:
Open-source large language models have transformed the way we interact with technology, providing more natural and human-like responses. However, efficiently serving these models is crucial to ensure they can be accessed and utilized by a wide range of users. In this essay, we will discuss the importance of efficiently serving open-source large language models, the challenges involved, and potential solutions to overcome these challenges.

Body Paragraph 1 - The Importance of Efficient Serving:
Efficiently serving open-source large language models ensures that users can access the models quickly and without interruption. This is particularly important for applications that require real-time responses, such as chatbots or virtual assistants. Moreover, efficient serving can help reduce computational costs, making it more accessible for a 