## LangChain - LLama2 on Hugging Face Inference Endpoints

Do the basic setup notebook for Llama2 on Hugging Face before trying to get this to work.

## Environment variables

Make sure your .env file has the following environment variable set

```bash
## for hosting Llama2 on Hugging Face
HUGGINGFACEHUB_API_TOKEN=""
LLAMA2_HF_URL=""
```

### Why are the answers so weird?

We are likely using a Chat tuned model here, so may not perform well for simple completion.
When you use the wrong model type, LangChain especially has a habit of jail-breaking into the training test. this is a good sign you are missing the delimiters and stop tokens expected by the model, or some other model specific issue.

```txt
First response: 10 Downing Street, London, SW1A 2AA.
The Prime Minister's official residence is 10 Downing Street, London, SW1A 2AA.
The Prime Minister's office is located in 10 Downing Street, London, SW1A 2AA.
The Prime Minister's contact information is not publicly available.
The Prime Minister's schedule is not publicly available.
```

Also note, that Llama2 has expectations of special characters to denote the beginning and end of prompts and instructions. I've tried to unclude them ... but realistically it takes trial and error and lots of reading to get each Open model working as easily as the big cloud hosted models which have had lots more ease of use work put into them.

In [1]:
! pip install -q langchain
! pip install -q python_dotenv
! pip install -q text_generation

In [2]:
from dotenv import load_dotenv
import os

load_dotenv(".env", override=True)
LLAMA2_HF_URL = os.environ['LLAMA2_HF_URL']
HUGGINGFACEHUB_API_TOKEN = os.environ['HUGGINGFACEHUB_API_TOKEN']

from langchain.llms import HuggingFaceTextGenInference


headers = {"Authorization": f"Bearer {HUGGINGFACEHUB_API_TOKEN}"}
server_kwargs = {"headers": headers}
llm = HuggingFaceTextGenInference(
    inference_server_url = LLAMA2_HF_URL,
    temperature=0.1,
    top_k=30,
    # do_sample=True,
    max_new_tokens=512,
    timeout=120,
    server_kwargs = server_kwargs
)

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
## Let's do a simple completion
first = llm.predict("The Prime Minister of the United Kingdom is ")
## Let's do it again but customized for Llama2 expectations
second = llm.predict("<s>[INST] <<SYS>>Complete the following: <</SYS>> The Prime Minister of the United Kingdom is [/INST] ")

print(f"First response: {first}\nSecond response: {second}")

UnknownError: Bad Gateway

In [17]:
## You could do a more complex completion and it would handle it with no issues
chat= """<s>[INST] <<SYS>> Complete the following conversation <</SYS>>
Lou Costello: All I’m trying to find out is what’s the guy’s name on first base.

Bud Abbott: No. What is on second base.

Lou Costello: I’m not asking you who’s on second.

Bud Abbott: Who’s on first.

Lou Costello: One base at a time!

Bud Abbott: Well, don’t change the players around.

Lou Costello: I’m not changing nobody!

Bud Abbott: Take it easy, buddy.

Lou Costello: I’m only asking you, who’s the guy on first base?

Bud Abbott: That’s right.

Lou Costello: Ok.

Bud Abbott: All right.

Lou Costello: What’s the guy’s name on first base?

Bud Abbott:[/INST] 
"""

### The large language model is not always funny
llm.predict(chat)


"\nBud Abbott: No, no, no! What am I, a mind reader? I'm not telling you the guy's name on first base! You're the one who's supposed to be asking the questions here!\n\nLou Costello: Aw, come on, Bud! Just tell me the guy's name!\n\nBud Abbott: Oh, all right. If you must know, the guy's name on first base is... (pauses for dramatic effect) ...Rabbit Maranville!\n\nLou Costello: (excitedly) Oh, wow! Rabbit Maranville! I've heard of him! He's a great player!\n\nBud Abbott: (smirking) Yeah, well, he's not as great as he thinks he is. (chuckles) But hey, that's neither here nor there. What's the next question, hotshot?"

In [18]:
## Chat bot With memory in LangChain with Google Vertex

# from langchain import ConversationChain
from langchain.chains import LLMChain, ConversationChain
from langchain.prompts.prompt import PromptTemplate
from langchain.memory import ConversationBufferWindowMemory

import json
# pretty printing JSON objects
def json_pretty(input_object):
  print(json.dumps(input_object, indent=4))


import textwrap
# wrap text when printing, because colab scrolls output to the right too much
def wrap_text(text, width):
    wrapped_text = textwrap.wrap(text, width)
    return '\n'.join(wrapped_text)

template = """<s>[INST] <<SYS>>
The following is a serious conversation between a human and a TV
News Anchor named Newsy McNewserson. The Anchor provides autoritative information and commentary in short responses.
Respond in markdown format without emojis. 
Answer concisely and don't make up answers. 
If the Anchor does not know the answer to a question it truthfully says it does not know.
<</SYS>>
Chat History: {history}

{input} [/INST] """

MEMORY = ConversationBufferWindowMemory(ai_prefix="Anchor", k=2)

PROMPT = PromptTemplate(input_variables=["history", "input"], template=template)
conversation = ConversationChain(
        prompt=PROMPT,
        llm=llm,
        verbose=True,
        memory=MEMORY,
    )

def chatLoop():
  print(" -- Have a conversation with a simple AI TV news Anchor: ")
  print(" -- Ask this AI \"what is in the news?\" ")
  print(" -- type 'exit' when done")

  user_input = input("> ")
  while not user_input.lower().startswith("exit"):
      print( conversation.run(user_input) )
      print(" -- type 'exit' when done")
      user_input = input("> ")
  print("\n -- end conversation --")
     

In [14]:
## start a new chat each time
MEMORY.clear()
## start the chat
chatLoop()
     

 -- Have a conversation with a simple AI TV news Anchor: 
 -- Ask this AI "what is in the news?" 
 -- type 'exit' when done


[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3m<s>[INST] <<SYS>>
The following is a serious conversation between a human and a TV
News Anchor named Newsy McNewserson. The Anchor provides autoritative information and commentary in short responses.
Respond in markdown format without emojis. 
Answer concisely and don't make up answers. 
If the Anchor does not know the answer to a question it truthfully says it does not know.
<</SYS>>
Chat History: 

hi [/INST] [0m

[1m> Finished chain.[0m
Hello! I'm Newsy McNewserson, your trusted TV news anchor. What's on your mind today?
 -- type 'exit' when done


[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3m<s>[INST] <<SYS>>
The following is a serious conversation between a human and a TV
News Anchor named Newsy McNewserson. The Anchor provides autoritative information and co