# A Built-In Web App


**[OnPrem.LLM](https://github.com/amaiya/onprem)** includes a built-in web app to easily access and use LLMs. After [installing](https://github.com/amaiya/onprem#install) OnPrem.LLM, you can start it by running the following command at the command-line:

```shell
# run at command-line
onprem --port 8000
```
Then, enter `localhost:8000` in your Web browser to access the application:

<img src="https://raw.githubusercontent.com/amaiya/onprem/master/images/onprem_welcome.png" border="0" alt="screenshot" width="775"/>

The Web app is implemented with [streamlit](https://streamlit.io/): `pip install streamlit`.  If it is not already installed, the `onprem` command will ask you to install it.
Here is more information on the `onprem` command:
```sh
$:~/projects/github/onprem$ onprem --help
usage: onprem [-h] [-p PORT] [-a ADDRESS] [-v]

Start the OnPrem.LLM web app
Example: onprem --port 8000

optional arguments:
  -h, --help            show this help message and exit
  -p PORT, --port PORT  Port to use; default is 8501
  -a ADDRESS, --address ADDRESS
                        Address to bind; default is 0.0.0.0
  -v, --version         Print a version
```


The app requires a file called `webapp.yml` exists in the `onprem_data` folder in the user's home directory. This file stores information used by the Web app such as the model to use. If one does not exist, then a default one will be created for you and is also shown below:

```yaml
# Default YAML configuration
llm:  # model url (or model file name if previously downloaded)
  # if changing, be sure to update the prompt_template variable below
  model_url: https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/resolve/main/zephyr-7b-beta.Q4_K_M.ggufuf
  # number of layers offloaded to GPU
  n_gpu_layers
  # the vector store type to use (dual, dense, or sparse)
  # dual: a vector store where both Chroma semantic searches and conventional keyword searches are supported
  store_type: dual: 32
  # path to vector db folder
  vectordb_path: {datadir}/vectordb
  # path to model download folder
  model_download_path: {datadir}
  # number of source documents used by LLM.ask and LLM.chat
  rag_num_source_docs: 6
  # minimum similarity score for source to be considered by LLM.ask/LLM.chat
  rag_score_threshold: 0.0
  # verbosity of Llama.cpp
  # additional parameters added in the "llm" YAML section will be fed directly to LlamaCpp (e.g., temperature)
  #temperature: 0.0
prompt:
  # The default prompt_template is specifically for Zephyr-7B.
  # It will need to be changed if you change the model_url above.
  prompt_template: <|system|>\n</s>\n<|user|>\n{prompt}<  prompt_template:
ui:
  # title of application
  title: OnPrem.LLM
  # subtitle in "Talk to Your Documents" screen
  rag_title:
  # path to markdown file with contents that will be inserted below rag_title
  rag_text_path:
  # path to folder containing raw documents (i.e., absolute path of folder you supplied to LLM.ingest)
  rag_source_path:
  # base url (leave blank unless you're running your own separate web server to serve sour
  # whether to show the Manage page in the sidebar (TRUE or FALSE)
  show_manage: TRUE  ce dl
 in GGUF format can be used.

You can edit the file based on your requirements. Variables in the `llm` section are automatically passed to the `onprem.LLM` constructor, which, in turn, passes extra `**kwargs` to `llama-cpp-python` or the `transformers.pipeline`.  For instance, you can add a `temperature` variable in the `llm` section to adjust temperature of the model in the web app (e.g., lower values closer to 0.0 for more deterministic output and higher values for more creativity). 

The default model is a 7B-parameter model called [Zephyr-7B](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF).

Note that some models have particular prompt formats.  For instance, if using the default **Zephyr-7B** model above, as described on the [model's home page](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF#prompt-template-zephyr), the `prompt_template` in the YAML file must be set to:
```yaml
prompt:
  prompt_template: <|system|>\n</s>\n<|user|>\n{prompt}</s>\n<|assistant|>
```

If changing models, don't forget to update the `prompt_template` variable with the prompt format approrpriate for your chosen model.

## Using Prompts to Solve Problems

The first app page is a UI for interactive chatting and prompting to solve problems various problems with local LLMs.

<img src="https://raw.githubusercontent.com/amaiya/onprem/master/images/onprem_prompting.png" border="1" alt="screenshot" width="775"/>

## Talk To Your Documents
The second screen in the app is a UI for [retrieval augmented generation](https://arxiv.org/abs/2005.11401) or RAG (i.e., chatting with documents). Sources considered by the LLM when generating answers are displayed and ranked by answer-to-source similarity. Hovering over the question marks in the sources will display the snippets of text from a document considered by the LLM when generating answers.  Documents you would like to consider as sources for question-answering can be uploaded through the Web UI and this is discussed below.

<img src="https://raw.githubusercontent.com/amaiya/onprem/master/images/onprem_rag.png" border="0" alt="screenshot" width="775"/>

## Document Search 

The third screen is a UI for searching documents you've uploaded either through keyword searches or semantic searches. Documents that you would like to search can be uploaded through the Web app and is discussed next.

<img src="https://raw.githubusercontent.com/amaiya/onprem/master/images/onprem_search.png" border="0" alt="screenshot" width="775"/>

## Uploading Documents

The Web UI also includes a point-and-click interface to upload and index documents into the vector store(s). Documents can either be uploaded individually or as a zip file.

<img src="https://raw.githubusercontent.com/amaiya/onprem/master/images/onprem_upload.png" border="0" alt="screenshot" width="775"/>

Have fun!