## **Unleash Pre-trained LLMs with Replicate: A Step-by-Step Guide** ♋

Large language models (LLMs) are revolutionizing various fields, but accessing their power can be complex. Replicate offers a user-friendly API that simplifies working with pre-trained LLMs within your notebook. This guide will walk you through the process of using Replicate's API to leverage the capabilities of a pre-trained LLM, specifically the Llama-2-7b model, in your notebook.

### ***Prerequisites*** ⚡

*  A computer with an internet connection

*  A text editor or notebook environment (e.g., Jupyter Notebook, Google Colab)



---


---

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### ***Steps*** 🌟

▶ Create a Replicate Account:

* Visit https://replicate.com/ and sign up for a free account. Replicate offers both free and paid plans. The free plan provides limited credits for running models, but it's sufficient for experimentation.

<figure>
<center>
<img src='https://drive.google.com/uc?id=1D1qDKUcgNx5ipr9fuDv6VT9ss70AH9vR'/>
<figcaption></figcaption></center>
</figure>

▶ Obtain Your API Token:

* Once logged in, navigate to your profile settings by clicking on your username in the top right corner.
Select "API" from the left-hand menu.
Click on "Generate New Token" and give it a descriptive name (e.g., "My-Notebook-Token").
***Copy the generated token. You'll need it to access models via the API.***

<figure>
<center>
<img src='https://drive.google.com/uc?id=1zC5yzMB_ftIqKfD8bWZZ2fSaAfxwzfPy'/>
<figcaption></figcaption></center>
</figure>

▶ Install replicate

In [2]:
!pip install replicate

Collecting replicate
  Downloading replicate-0.25.1-py3-none-any.whl (39 kB)
Collecting httpx<1,>=0.21.0 (from replicate)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.21.0->replicate)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.21.0->replicate)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: h11, httpcore, httpx, replicate
Successfully installed h11-0.14.0 httpcore-1.0.5 httpx-0.27.0 replicate-0.25.1


▶ Import replicate

In [None]:
import replicate

▶ Authenticate with Replicate

In [None]:
# prompt: write full python code to use getpass for m api token

from getpass import getpass
import os
api_token = getpass(prompt='Enter your Replicate API token: ')
os.environ['REPLICATE_API_TOKEN'] = api_token

▶ Prepare Your Prompt

In [None]:
#Convert LaTeX format to Text format.
#Detect any matematical jargon/vocabulary and get its defention
#Think step by step.
#Explain each intermidate step.
#The answer should be given as a non-negative modulo 1000.
#The answer will be concise and limited to a single number.

In [None]:
# Prompts
prompt = "How to use LLM models using replicate api?"

prompt_template = f"""
SYSTEM : You are a helpful assistant, you only respond once as 'Assistant'.

USER : {prompt}

ASSISTAT :
"""



▶ Run Inference

In [None]:
output = replicate.run('a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5', # LLM model
                        input={"prompt": f"{prompt_template}", # Prompts
                        "temperature":0.1, "top_p":0.9, "max_length":450, "repetition_penalty":1})  # Model parameters

▶ Print Output

In [None]:
full_response = ""

for item in output:
  full_response += item

print(full_response)