In [None]:
#| hide
from onprem.core import *

# The Built-In Web App


**[OnPrem.LLM](https://github.com/amaiya/onprem)** includes a built-in web app to easily access and use LLMs. After [installing](https://github.com/amaiya/onprem#install) OnPrem.LLM, you can follow these steps to prepare the web app and start it:

#### Step 1: Ingest some documents using the Python API:
```python
# run at Python prompt
from onprem import LLM
llm = LLM()
llm.ingest('/your/folder/of/documents')
```

#### Step 2: Start the Web app:

```shell
# run at command-line
onprem --port 8000
```
Then, enter `localhost:8000` (or `<domain_name>:8000` if running on remote server) in your Web browser to access the application:

<img src="https://raw.githubusercontent.com/amaiya/onprem/master/images/onprem_screenshot.png" border="1" alt="screenshot" width="775"/>

The Web app is implemented with [streamlit](https://streamlit.io/): `pip install streamlit`.  If it is not already installed, the `onprem` command will ask you to install it.
Here is more information on the `onprem` command:
```sh
$:~/projects/github/onprem$ onprem --help
usage: onprem [-h] [-p PORT] [-a ADDRESS] [-v]

Start the OnPrem.LLM web app
Example: onprem --port 8000

optional arguments:
  -h, --help            show this help message and exit
  -p PORT, --port PORT  Port to use; default is 8501
  -a ADDRESS, --address ADDRESS
                        Address to bind; default is 0.0.0.0
  -v, --version         Print a version
```


The app requires a file called `webapp.yml` exists in the `onprem_data` folder in the user's home directory. This file stores information used by the Web app such as the model to use. If one does not exist, then a default one will be created for you and is also shown below:

```yaml
llm:
  # model url (or model file name if previously downloaded)
  model_url: https://huggingface.co/TheBloke/WizardLM-13B-V1.2-GGUF/resolve/main/wizardlm-13b-v1.2.Q4_K_M.gguf
  # number of layers offloaded to GPU
  n_gpu_layers: 32
  # path to vector db folder
  vectordb_path: /home/your_user_name/onprem_data/vectordb
  # path to model download folder
  model_download_path: /home/your_user_name/onprem_data
  # number of source documents used by LLM.ask and LLM.chat
  rag_num_source_docs: 6
  # minimum similarity score for a source to be used by LLM.ask and LLM.chat
  rag_score_threshold: 0.0
ui:
  # title of application
  title: OnPrem.LLM
  # subtitle in "Talk to Your Documents" screen
  rag_title:
  # path to folder containing raw documents (used to construct direct links to document sources)
  rag_source_path:
  # base url (used to construct direct links to document sources)
  rag_base_url:
```

You can edit the file based on your requirements. Variables in the `llm` section are automatically passed to the `onprem.LLM` constructor, which, in turn, passes extra `**kwargs` to `llama-cpp-python`.  For instance, you can add a `temperature` variable in the `llm` section to adjust temperature of the model in the web app (e.g., lower values closer to 0.0 for more deterministic output and higher values for more creativity). 

The default model in the auto-created YAML file is a [13B parameter model](https://huggingface.co/TheBloke/WizardLM-13B-V1.2-GGUF/resolve/main/wizardlm-13b-v1.2.Q4_K_M.gguf).  If this is too large and slow for your system, you can edit `model_url` above to use a [7B parameter model](https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGUF/resolve/main/Wizard-Vicuna-7B-Uncensored.Q4_K_M.gguf) or [3B parameter model](https://huggingface.co/juanjgit/orca_mini_3B-GGUF/resolve/main/orca-mini-3b.q4_0.gguf) with faster speed at the expense of some performance. Of course, you can also edit `model_url` to use a larger model, as well. Any model in GGUF format can be used.

## Talk To Your Documents
The Web app has two screens.  The first screen (shown above) is a UI for [retrieval augmented generation](https://arxiv.org/abs/2005.11401) or RAG (i.e., chatting with documents). Sources considered by the LLM when generating answers are displayed and ranked by answer-to-source similarity. Hovering over the question marks in the sources will display the snippets of text from a document considered by the LLM when generating answers.

**Hover Example:**

<img src="https://raw.githubusercontent.com/amaiya/onprem/master/images/hover-example.png" border="1" alt="screenshot" width="775"/>

**Source Hyperlinks:** On Linux and Mac systems where Python is installed in your home directory (e.g., `~/mambaforge`, `~/anaconda3`), displayed sources for the answer should automatically appear as hyperlinks to the original documents (e.g, PDFs, TXTs, etc.) if you populate the `rag_source_path` variable in `webapp.yml` with the the absolute path of the folder supplied to `LLM.ingest`.  You should leave `rag_base_url` blank in this case.

## Use Prompts to Solve Problems

The second screen is a UI for general prompting and allows you to supply prompts to the LLM to solve problems.

**Information Extraction Example:**

<img src="https://raw.githubusercontent.com/amaiya/onprem/master/images/extraction.png" border="1" alt="screenshot" width="775"/>

Have fun!

<img src="https://raw.githubusercontent.com/amaiya/onprem/master/images/cat_names.png" border="1" alt="screenshot" width="775"/>