# Scify is back

### Who is Scify?
An LLM chatbot who is expected to 
1) find books on [Project Gutenberg](https://www.gutenberg.org/)
2) find the txt url for each book


### How was Scify?
 * In [my experiemnt](https://github.com/dujm/book-reader/blob/master/notebooks/03-chatbot.ipynb), I used `llama2 7B` (Ollma ID `78e26419b446`) 
 * Scify did a good job in finding the relevant books, but failed to find the relevant text urls.

<br>

### How is Scify different now? 
 * In this experiment, I used Anthropic's Claude-3 model (`claude-3-opus-20240229`).

#### Result
 * Claude 3 helped Scify to significantly improve its performance (i.e. no hallucination).
 * Scify did not only find the relevant books, but also the correct txt urls. 



# Settings

### Packages

In [1]:
import os

# send HTTP requests
import requests

# process Python abstract syntax
import ast

# pretty print
from pprint import pprint



# load the .env file (for saving API keys, see details in `.env_example file`)
import dotenv
dotenv.load_dotenv()

# assign loaded API keys to variables
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")

### Variables

In [2]:
#----------------------#
# model related
#----------------------#
# API 
API = "https://api.anthropic.com/v1/messages"

# model
llm_model_id = 'claude-3-opus-20240229'


#----------------------#
# query related
#----------------------#
# topic 
topic = "Science Fiction"

# number of results I want to get
n= 5

# Build the backbone of an LLM Chatbot

In [3]:
# system prompt
system_prompt = f"""
    You are a world-class journalist. 
    You are very good at searching relevant information on a website.
    Generate a list of {n} results"""


# human input prompt 
human_prompt =f"""
     Please go to 'https://www.gutenberg.org/', search for  {topic}
     list the most popular {n} books and their matched Plain Text UTF-8 links on 'https://www.gutenberg.org/'
     Respond with the book title and links.
     For example, the most popular Book is
     ['The Time Machine': 'https://www.gutenberg.org/cache/epub/35/pg35.txt']
     Return a pretty list of python readable form. 
"""

#----------------------#
# preparing work for sending 'post' requests 
#----------------------#
# headers
headers = {
        "x-api-key": ANTHROPIC_API_KEY,
        "anthropic-version": "2023-06-01",
        "content-type": "application/json"}



# message
messages = [{"role": "user", 
             "content": human_prompt}]


# data to send
data = {
        "model": llm_model_id,
        "max_tokens": 1000,
        "temperature": 0.5,
        "system": system_prompt,
        "messages": messages,
    }

# Start Chatting

In [4]:
response = requests.post(API, 
                         headers=headers, 
                         json=data)


In [5]:
# convert to text 
response_text = response.json()['content'][0]['text']

response_text

"Here are the 5 most popular Science Fiction books on Project Gutenberg with their Plain Text UTF-8 links:\n\n[\n    ['The War of the Worlds', 'https://www.gutenberg.org/files/36/36-0.txt'],\n    ['The Time Machine', 'https://www.gutenberg.org/files/35/35-0.txt'],\n    ['Flatland: A Romance of Many Dimensions', 'https://www.gutenberg.org/cache/epub/201/pg201.txt'],\n    ['The Invisible Man', 'https://www.gutenberg.org/files/5230/5230-0.txt'],\n    ['A Princess of Mars', 'https://www.gutenberg.org/files/62/62-0.txt']\n]"

### Result 

In [6]:
# get clean result
results ="".join(response_text.splitlines()[3:n+3]).lstrip()

# find out what current syntax grammar looks like programmatically (in this case, turn a string into a list)
result_display= ast.literal_eval(results )

# pretty print
pprint(result_display)

(['The War of the Worlds', 'https://www.gutenberg.org/files/36/36-0.txt'],
 ['The Time Machine', 'https://www.gutenberg.org/files/35/35-0.txt'],
 ['Flatland: A Romance of Many Dimensions',
  'https://www.gutenberg.org/cache/epub/201/pg201.txt'],
 ['The Invisible Man', 'https://www.gutenberg.org/files/5230/5230-0.txt'],
 ['A Princess of Mars', 'https://www.gutenberg.org/files/62/62-0.txt'])
