# Use LangChain with Meta Llama 3 in Azure AI and Azure ML

You can use Meta-Llama-3 models deployed in Azure AI and Azure ML with `langchain` to create more sophisticated intelligent applications. Use `langchain_community` package with the Azure Machine Learning integration.

## Prerequisites

Before we start, there are certain steps we need to take to deploy the models:

* Register for a valid Azure account with subscription 
* Make sure you have access to [Azure AI Studio](https://learn.microsoft.com/en-us/azure/ai-studio/what-is-ai-studio?tabs=home)
* Create a project and resource group
* Select Meta-Llama-3 models from Model catalog. This example assumes you are deploying `Meta-Llama-3-70B-Instruct`.

    > Notice that some models may not be available in all the regions in Azure AI and Azure Machine Learning. On those cases, you can create a workspace or project in the region where the models are available and then consume it with a connection from a different one. To learn more about using connections see [Consume models with connections](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deployments-connections)

* Deploy with "Pay-as-you-go"

Once deployed successfully, you should be assigned for an API endpoint and a security key for inference.

For more information, you should consult Azure's official documentation [here](https://aka.ms/meta-llama-3-azure-ai-studio-docs) for model deployment and inference.

To complete this tutorial, you will need to:

* Install `langchain` and `langchain_community`:

    ```bash
    pip install langchain langchain_community
    ```

## Example

The following example demonstrate how to create a chain that uses a Meta-Llama-3 chat model deployed in Azure AI and Azure ML. The chain has been configured with a `ConversationBufferMemory`. This example has been adapted from [LangChain official documentation](https://python.langchain.com/docs/modules/memory/adding_memory).

In [None]:
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
from langchain.prompts import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
)
from langchain.schema import SystemMessage
from langchain_community.chat_models.azureml_endpoint import (
    AzureMLChatOnlineEndpoint,
    AzureMLEndpointApiType,
    LlamaChatContentFormatter,
)

Let's create an instance of our `AzureMLChatOnlineEndpoint` model. This class allow us to get access to any model deployed in Azure AI or Azure ML. For completion models use class `langchain_community.llms.azureml_endpoint.AzureMLOnlineEndpoint` with `LlamaContentFormatter` as the `content_formatter`.

In [None]:
chat_model = AzureMLChatOnlineEndpoint(
    endpoint_url="https://<endpoint-name>.<region>.inference.ai.azure.com/v1/chat/completions",
    endpoint_api_type=AzureMLEndpointApiType.serverless,
    endpoint_api_key="<key>",
    content_formatter=LlamaChatContentFormatter(),
)

> Tip: You can configure environment variables `AZUREML_ENDPOINT_URL`, `AZUREML_ENDPOINT_API_KEY`, and `AZUREML_ENDPOINT_API_TYPE` instead of passing them as arguments.

In the below prompt, we have two input keys: one for the actual input (`human_input`), and another for the input from the `Memory` class (`chat_history`).

In [None]:
prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessage(
            content="You are a chatbot having a conversation with a human. You love making references to animals on your answers."
        ),
        MessagesPlaceholder(variable_name="chat_history"),
        HumanMessagePromptTemplate.from_template("{human_input}"),
    ]
)

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

We create the chain as follows:

In [None]:
chat_llm_chain = LLMChain(
    llm=chat_model,
    prompt=prompt,
    memory=memory,
    verbose=True,
)

We can see how it works:

In [None]:
chat_llm_chain.predict(human_input="Hi there my friend")

In [None]:
chat_llm_chain.predict(
    human_input="I'm thinking on a present for my mother. Any advise?"
)

## Aditional resources

Here are some additional reference:  

* [Plan and manage costs (marketplace)](https://learn.microsoft.com/azure/ai-studio/how-to/costs-plan-manage#monitor-costs-for-models-offered-through-the-azure-marketplace)


### [LCEL (LangChain Expression Language) Example](https://python.langchain.com/v0.2/docs/concepts/#langchain-expression-language-lcel)

#### RunnableWithMessageHistory

Noteable Differences:
* Memory is handled differently, the following example incorporates chat history by using `RunnableWithMessageHistory` and an output parser `StrOutputParser`.
* Imports have also been changed to reflect the latest recommended v0.2 locations.
* The `LlamaChatContentFormatter` has been deprecated in favor of `CustomOpenAIChatContentFormatter`.



In [None]:
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.chat_history import InMemoryChatMessageHistory, BaseChatMessageHistory
from langchain_core.output_parsers import StrOutputParser
from langchain.schema import SystemMessage
from langchain_core.prompts import (
  ChatPromptTemplate,
  HumanMessagePromptTemplate,
  MessagesPlaceholder,
)
from langchain_community.chat_models.azureml_endpoint import (
  AzureMLEndpointApiType,
  AzureMLChatOnlineEndpoint,
  CustomOpenAIChatContentFormatter,
  MistralChatContentFormatter,
)

Like above, let's create an instance of our `AzureMLChatOnlineEndpoint` model. This class allow us to get access to any model deployed in Azure AI or Azure ML. For completion models use class `langchain_community.llms.azureml_endpoint.AzureMLOnlineEndpoint`. The content formatter `LlamaContentFormatter` is deprecated in favor of using `CustomOpenAIContentFormatter` as the `content_formatter`.

In [None]:
chat_model = AzureMLChatOnlineEndpoint(
    # endpoint_url="https://<endpoint-name>.<region>.inference.ai.azure.com/v1/chat/completions",
    endpoint_url=f"{AZUREAI_API_URL_LLAMA31_405BI}/v1/chat/completions",
    endpoint_api_type=AzureMLEndpointApiType.serverless,
    # endpoint_api_key="<key>",
    endpoint_api_key=AZUREAI_API_KEY_LLAMA31_405BI,
    content_formatter=CustomOpenAIChatContentFormatter(),
)

Next we create a location to store the chat history. You can have a dictionary created with defaultdict and delivered with a lambda or, as we have below, a separate function. You'll also notice the `session_id` variable, by default, the `InMemoryChatMessageHistory` class features session support.

In [None]:
session_history_store = {} # type: dict[str, BaseChatMessageHistory]

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    global session_history_store
    if session_id not in session_history_store:
        session_history_store[session_id] = InMemoryChatMessageHistory()
    return session_history_store[session_id]

In the below prompt, we have two input keys: one for the actual input (`input`), and another for the input from the `InMemoryChatMessageHistory` class (`chat_history`).

In [None]:
prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessage(
            content="You are a chatbot having a conversation with a human. You love making references to animals on your answers."
        ),
        MessagesPlaceholder(variable_name="chat_history"),
        HumanMessagePromptTemplate.from_template("{human_input}"),
    ]
)

Creating the chain with LCEL is slightly different.

In [None]:
llm_chain = prompt | chat_model | StrOutputParser()

Wrap the chain in a `RunnableWithMessageHistory` class to handle the chat history sessions.

In [None]:
chat_llm_chain = RunnableWithMessageHistory(
  runnable=llm_chain,
  get_session_history=get_session_history,
  input_messages_key="human_input",
  history_messages_key="chat_history",
)

It works slightly differently as well. We need to pass an additional argument to set the `session_id` in the `configuration`.

In [None]:
chat_llm_chain.invoke(
  input={"human_input": "Create a haiku about the ocean."},
  config={"configurable": {"session_id": "123abc"}},
)

To ask a follow-up question, be sure to pass the same `session_id` to the `configuration` argument.

In [None]:
chat_llm_chain.invoke(
  input={"human_input": "Can you turn it into a poem?"},
  config={"configurable": {"session_id": "123abc"}},
)