# LLMSearch Google Colab Demo

This notebook was tested to run the following 13B model - https://huggingface.co/TheBloke/airoboros-l2-13B-gpt4-1.4.1-GGUF

In case of memory errors, tweak the config to offload some layers to CPU, or try a smaller model.

## Instuctions

* Upload or generate some documents (check supported format in README.md) in `sample_docs` folder.
    * Or use a sample pdf book provided - Pro Git - https://git-scm.com/book/en/v2
* Run the notebook.
* Optional - tweak configuration file to point to a different model


### Prepare configuration and download the model

In [1]:
%%shell

# Make folder structure
mkdir -p llm/embeddings llm/cache llm/models llm/config sample_docs

# Download sample book
wget -P sample_docs https://github.com/progit/progit2/releases/download/2.1.413/progit.pdf


--2024-10-22 10:06:58--  https://github.com/progit/progit2/releases/download/2.1.413/progit.pdf
Resolving github.com (github.com)... 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/15400220/8bb7fa36-eb97-4c76-be22-f0dfb9c7bbdd?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20241022%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241022T100658Z&X-Amz-Expires=300&X-Amz-Signature=c4ede496ad974528544e0bc2e61bebf6ef4c5bfdfc40aca9f8d0f9090e8a70c7&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Dprogit.pdf&response-content-type=application%2Foctet-stream [following]
--2024-10-22 10:06:58--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/15400220/8bb7fa36-eb97-4c76-be22-f0dfb9c7bbdd?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=release



In [2]:
%%shell

# Generate sample configuration

cat << EOF > llm/config/config.yaml

cache_folder: /content/llm/cache

embeddings:
  embeddings_path: /content/llm/embeddings
  embedding_model: # Optional embedding model specification, default is e5-large-v2. Swap to a smaller model if out of CUDA memory
    type: sentence_transformer # other supported types - "huggingface" and "instruct"
    model_name: "intfloat/e5-large-v2"
  chunk_sizes:
    - 1024
  document_settings:
  - doc_path: sample_docs/
    scan_extensions:
      - md
      - pdf
    additional_parser_settings:
      md:
        skip_first: True
        merge_sections: True
        remove_images: True

semantic_search:
  search_type: similarity # mmr
  max_char_size: 3096

  reranker:
    enabled: True
    model: "marco" # for `BAAI/bge-reranker-base` or "marco" for cross-encoder/ms-marco-MiniLM-L-6-v2
EOF

/bin/bash: line 4: BAAI/bge-reranker-base: No such file or directory




In [3]:
%%shell


cat << EOF > llm/config/model.yaml
# Geberate sample model configuration for llama-cpp
llm:
 type: llamacpp
 params:
   model_path: /content/llm/models/airoboros-l2-13b-gpt4-1.4.1.Q4_K_M.gguf
   prompt_template: |
         ### Instruction:
         Use the following pieces of context to provide detailed answer the question at the end. If answer isn't in the context, say that you don't know, don't try to make up an answer.

         ### Context:
         ---------------
         {context}
         ---------------

         ### Question: {question}
         ### Response:
   model_init_params:
     n_ctx: 1024
     n_batch: 512
     n_gpu_layers: 43

   model_kwargs:
     max_tokens: 512
     top_p: 0.1
     top_k: 40
     temperature: 0.2

EOF



In [4]:
%%shell

# Download the model
# Sample model - https://huggingface.co/TheBloke/WizardLM-13B-Uncensored-GGML/tree/main
# Optionally download a smaller model to test...


cd llm/models
wget https://huggingface.co/TheBloke/airoboros-l2-13B-gpt4-1.4.1-GGUF/resolve/main/airoboros-l2-13b-gpt4-1.4.1.Q4_K_M.gguf



--2024-10-22 10:07:25--  https://huggingface.co/TheBloke/airoboros-l2-13B-gpt4-1.4.1-GGUF/resolve/main/airoboros-l2-13b-gpt4-1.4.1.Q4_K_M.gguf
Resolving huggingface.co (huggingface.co)... 3.163.189.74, 3.163.189.90, 3.163.189.37, ...
Connecting to huggingface.co (huggingface.co)|3.163.189.74|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.hf.co/repos/e0/3e/e03e7579e79bf744415138d47bbd36e5e98975592cd72e35eea09de76be65b3a/6ed7d9b02895128854b4a9a5e78670880512e64e84163ea86858b95cc16e95c9?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27airoboros-l2-13b-gpt4-1.4.1.Q4_K_M.gguf%3B+filename%3D%22airoboros-l2-13b-gpt4-1.4.1.Q4_K_M.gguf%22%3B&Expires=1729850845&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyOTg1MDg0NX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5oZi5jby9yZXBvcy9lMC8zZS9lMDNlNzU3OWU3OWJmNzQ0NDE1MTM4ZDQ3YmJkMzZlNWU5ODk3NTU5MmNkNzJlMzVlZWEwOWRlNzZiZTY1YjNhLzZlZDdkOWIwMjg5NTEyODg1NGI0Y



### Enable building with CUDA

In [5]:
%env CMAKE_ARGS=-DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DLLAMA_CUBLAS=ON
%env FORCE_CMAKE=1

env: CMAKE_ARGS=-DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DLLAMA_CUBLAS=ON
env: FORCE_CMAKE=1


In [6]:

# Install torch and torchvision
#
# !pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

In [7]:
# !pip install --no-cache-dir git+https://github.com/snexus/llm-search
!pip install pyllmsearch

Collecting pyllmsearch
  Downloading pyllmsearch-0.8.0-py3-none-any.whl.metadata (7.4 kB)
Collecting llama-cpp-python==0.2.76 (from pyllmsearch)
  Downloading llama_cpp_python-0.2.76.tar.gz (49.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 MB[0m [31m14.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting chromadb~=0.5.5 (from pyllmsearch)
  Downloading chromadb-0.5.15-py3-none-any.whl.metadata (6.8 kB)
Collecting langchain~=0.2.14 (from pyllmsearch)
  Downloading langchain-0.2.16-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-community~=0.2.12 (from pyllmsearch)
  Downloading langchain_community-0.2.17-py3-none-any.whl.metadata (2.7 kB)
Collecting langchain-openai~=0.1.22 (from pyllmsearch)
  Downloading langchain_openai-0.

In [9]:
! llmsearch index create -c llm/config/config.yaml

2024-10-22 10:20:07.904828: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-22 10:20:07.928387: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-22 10:20:07.934257: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-22 10:20:07.948372: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[32m2024-10-22 10:20:11.751[0m | [1mINFO    [0m |

In [10]:
%%shell

llmsearch interact llm -c llm/config/config.yaml -m llm/config/model.yaml

2024-10-22 10:23:56.024551: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-22 10:23:56.044209: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-22 10:23:56.050749: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-22 10:23:56.065313: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[32m2024-10-22 10:23:59.508[0m | [1mINFO    [0m |

CalledProcessError: Command '
llmsearch interact llm -c llm/config/config.yaml -m llm/config/model.yaml
' returned non-zero exit status 1.