## Running Llama 3 on Mac, Windows or Linux
This notebook goes over how you can set up and run Llama 3 locally on a Mac, Windows or Linux using [Ollama](https://ollama.com/).

### Steps at a glance:
1. Download and install Ollama.
2. Download and test run Llama 3.
3. Use local Llama 3 via Python.
4. Use local Llama 3 via LangChain.


#### 1. Download and install Ollama

On Mac or Windows, go to the Ollama download page [here](https://ollama.com/download) and select your platform to download it, then double click the downloaded file to install Ollama.

On Linux, you can simply run on a terminal `curl -fsSL https://ollama.com/install.sh | sh` to download and install Ollama.

#### 2. Download and test run Llama 3

On a terminal or console, run `ollama pull llama3` to download the Llama 3 8b chat model, in the 4-bit quantized format with size about 4.7 GB.

Run `ollama pull llama3:70b` to download the Llama 3 70b chat model, also in the 4-bit quantized format with size 39GB.

Then you can run `ollama run llama3` and ask Llama 3 questions such as "who wrote the book godfather?" or "who wrote the book godfather? answer in one sentence." You can also try `ollama run llama3:70b`, but the inference speed will most likely be too slow - for example, on an Apple M1 Pro with 32GB RAM, it takes over 10 seconds to generate one token using Llama 3 70b chat (vs over 10 tokens per second with Llama 3 8b chat).

You can also run the following command to test Llama 3 8b chat:
```
 curl http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [
    {
      "role": "user",
      "content": "who wrote the book godfather?"
    }
  ],
  "stream": false
}'
```

The complete Ollama API doc is [here](https://github.com/ollama/ollama/blob/main/docs/api.md).

#### 3. Use local Llama 3 via Python

The Python code below is the port of the curl command above.

In [1]:
import requests
import json

url = "http://localhost:11434/api/chat"

def llama3(prompt):
    data = {
        "model": "llama3",
        "messages": [
            {
              "role": "user",
              "content": prompt
            }
        ],
        "stream": False
    }
    
    headers = {
        'Content-Type': 'application/json'
    }
    
    response = requests.post(url, headers=headers, json=data)
    
    return(response.json()['message']['content'])

In [2]:
response = llama3("who wrote the book godfather")
print(response)

The novel "The Godfather" was written by Mario Puzo. The book was published in 1969 and has since become a classic of American literature.

Mario Puzo (1920-1999) was an American author, screenwriter, and playwright who is best known for his novels about the Italian-American Mafia. In addition to "The Godfather," Puzo wrote several other successful books about organized crime, including "Fools Die" (1978), "The Last Don" (1996), and "The Fourth K" (1990).

Puzo's novel "The Godfather" was a huge success, and it has been adapted into a famous film of the same name directed by Francis Ford Coppola. The book and the movie tell the story of the Corleone crime family and their struggles to maintain power in a rapidly changing world.

It's worth noting that while Puzo wrote the novel, the screenplay for the film adaptation was written by Mario Puzo and Francis Ford Coppola.


#### 4. Use local Llama 3 via LangChain

Code below use LangChain with Ollama to query Llama 3 running locally. For a more advanced example of using local Llama 3 with LangChain and agent-powered RAG, see [this](https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_rag_agent_llama3_local.ipynb).

In [7]:
!pip install langchain langchain_community

Collecting langchain_community
  Downloading langchain_community-0.2.10-py3-none-any.whl.metadata (2.7 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading marshmallow-3.21.3-py3-none-any.whl.metadata (7.1 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading mypy_extensions-1.0.0-py3-none-any.whl.metadata (1.1 kB)
Downloading langchain_community-0.2.10-py3-none-any.whl (2.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m17.2 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading dataclasses

In [8]:
from langchain_community.chat_models import ChatOllama

llm = ChatOllama(model="llama3", temperature=0)
response = llm.invoke("who wrote the book godfather?")
print(response.content)


The novel "The Godfather" was written by Mario Puzo. The book was published in 1969 and became a huge success, selling millions of copies worldwide. It tells the story of the Corleone family, an Italian-American Mafia family, and their struggles for power and loyalty.

Mario Puzo's novel was later adapted into a film directed by Francis Ford Coppola, which also became a massive hit. The movie "The Godfather" (1972) is widely considered one of the greatest films of all time, and it won several Academy Awards, including Best Picture.

Puzo went on to write several more novels and screenplays, many of which were inspired by his Italian-American heritage and his fascination with organized crime. He passed away in 1999 at the age of 78.
