# Replicate - Llama 2 13B

## Setup

Make sure you have the `REPLICATE_API_TOKEN` environment variable set.  
If you don't have one yet, go to https://replicate.com/ to obtain one.  

In [1]:
import os

In [15]:
os.environ["REPLICATE_API_TOKEN"] = "<your API key>"

## Basic Usage

We showcase the "llama13b-v2-chat" model, which you can play with directly at: https://replicate.com/a16z-infra/llama13b-v2-chat

In [16]:
from llama_index.llms import Replicate

llm = Replicate(
    model="a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5"
)

#### Call `complete` with a prompt

In [4]:
resp = llm.complete("Who is Paul Graham?")

In [5]:
print(resp)

Paul Graham is a British computer scientist and entrepreneur best known for co-founding the web application framework company, Viaweb, and the venture capital firm, Y Combinator. He is also known for his essays on technology, entrepreneurship, and investing, which are widely read and discussed in the tech industry.

Graham has a reputation as a visionary and a pioneer in the tech industry, and has been involved in several successful startups and investments over the years. He is also known for his unconventional approach to investing and his focus on investing in early-stage companies with high potential for growth.

Here are some key facts about Paul Graham:

* Co-founder of Viaweb, a web application framework company
* Co-founder of Y Combinator, a venture capital firm that has invested in successful startups such as Airbnb, Dropbox, and Reddit
* Known for his essays on technology, entrepreneurship, and investing, which are widely read and discussed in the tech industry
* Reputation 

#### Call `chat` with a list of messages

In [None]:
from llama_index.llms import ChatMessage

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)

In [None]:
print(resp)

### Streaming

Using `stream_complete` endpoint 

In [None]:
response = llm.stream_complete("Who is Paul Graham?")

In [None]:
for r in response:
    print(r.delta, end="")

Using `stream_chat` endpoint

In [None]:
from llama_index.llms import ChatMessage

messages = [
    ChatMessage(role="system", content="You are a pirate with a colorful personality"),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)

In [None]:
for r in resp:
    print(r.delta, end="")

## Configure Model

In [None]:
from llama_index.llms import Replicate

llm = Replicate(
    model="a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5",
    temperature=0.9,
    max_tokens=32,
)

In [None]:
resp = llm.complete("Who is Paul Graham?")

In [None]:
print(resp)