
 This example shows how to get started using the ModelCatalog to instantiate models and start running inferences.<br>
    A core design principle of LLMWare is that all models should be accessible in the same way, with the<br>
    underlying configuration details handled at lower levels of the implementation.   Our aspiration is that you<br>
    should be able to substitute models in any pipeline/workflow at any point with minimal, if any, code change -<br>
    and to be able to deploy heterogeneous combinations of local, private network and API-based models,<br>
    with mix-qnd-match and full 'swqp-ability' at any time.<br>
    While LLMWare supports OpenAI and other API-based models, we are "open-source-first" in our view and<br>
    support of models, and all of our examples and classes/methods are optimized first to work with small, specialized<br>
    open source models.   Our view is that essentially anything that can be done with large API models can be<br>
    replicated with smaller fine-tuned models and well-designed data pipelines and workflows.<br>
    We provide over 50+ LLMWare models that are all tested and designed for easy integration<br>
    into LLMWare pipeline and workflows, but we are 100% committed to providing an open ecosystem and supporting<br>
    the best in open source, including leading embedding models from Jina and Nomic, Llama-2, Llama-3, Mistral,<br>
    Phi-3, Yi, DeciLM, StableLM, OpenHermes, Tiny Llama, RedPajama, Pythia, Sheared LLama, Huggingface Zephyr,<br>
    SentenceTransformers and quantizations from The Bloke and Bartowski.  We also provide a full integration into GGUF,<br>
    LLama.CPP and Whisper.CPP.<br>
    It is easy to extend the ModelCatalog to include any of your favorite models from<br>
    HuggingFace, Ollama, LMStudio, LLama.CPP, or llama-cpp-python.<br>
    For open source models, we generally support both PyTorch (Huggingface) based models, as well as GGUF (LLama.CPP)<br>
    quantized models.  For most use cases, we find that GGUF quantized versions are faster, take less memory<br>
    and are generally as accurate, so we use GGUF-based models, aka "tools", in many of our examples.<br>
  


In [1]:
from llmware.models import ModelCatalog

In [3]:
all_models = ModelCatalog().list_all_models()

  ~133 models in the catalog as of May 4, 2024 - and growing all the time!

In [5]:
print("\nAll Models")


All Models


In [7]:
for i, model in enumerate(all_models):
    print("models: ", i, model)

models:  0 {'model_name': 'bert', 'display_name': 'Bert', 'model_family': 'BaseModel', 'model_category': 'base', 'model_location': 'llmware_repo'}
models:  1 {'model_name': 'roberta', 'display_name': 'Roberta', 'model_family': 'BaseModel', 'model_category': 'base', 'model_location': 'llmware_repo'}
models:  2 {'model_name': 'gpt2', 'display_name': 'GPT-2', 'model_family': 'BaseModel', 'model_category': 'base', 'model_location': 'llmware_repo'}
models:  3 {'model_name': 'all-MiniLM-L6-v2', 'display_name': 'mini-lm-sbert', 'model_family': 'HFEmbeddingModel', 'model_category': 'embedding', 'model_location': 'hf_repo', 'embedding_dims': 384, 'context_window': 512, 'link': 'https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2', 'custom_model_files': [], 'custom_model_repo': '', 'hf_repo': 'sentence-transformers/all-MiniLM-L6-v2'}
models:  4 {'model_name': 'all-mpnet-base-v2', 'display_name': 'mpnet-base', 'model_family': 'HFEmbeddingModel', 'model_category': 'embedding', 'model_loc

  ~30 embedding models - including Nomic, Jina, leading sentence transformers, llmware industry-bert models

In [9]:
embedding_models = ModelCatalog().list_embedding_models()


<br>
print("\nEmbedding Models")<br>
for i, model in enumerate(embedding_models):<br>
    print("embedding models: ", i, model)<br>


  ~92 open source models

In [11]:
open_source_models = ModelCatalog().list_open_source_models()


<br>
print("\nOpen Source Models")<br>
for i, model in enumerate(open_source_models):<br>
    print("open source models: ", i, model)<br>


  Inference is Easy - same two lines every time<br>
  1.  load_model - use the model_name or display_name, and it will be looked up and instantiated<br>
  2.  inference -  all models support an inference method - call it to run a basic inference

In [13]:
model_name = "slim-tags-3b-tool"
model = ModelCatalog().load_model(model_name)
response = model.inference("비오는 저녁, 저녁 메뉴는 뭐가 좋을까?")



In [15]:
import pandas as pd

model_name = "phi-3-gguf"
model = ModelCatalog().load_model(model_name)
response = model.inference("비오는 날 어울리는 시는?")

df = pd.DataFrame(list(response.items()), columns=['Key', 'Value'])
print(df)

            Key                                              Value
0  llm_response   비오는 날에는 일부 한국의 가장 유명한 시인들이나 전통적인 시문록을 사용하는 것이...
1         usage  {'input': 34, 'output': 100, 'total': 134, 'me...


In [17]:
print(f"\ntest inference - {model_name} - response: ", response)


test inference - phi-3-gguf - response:  {'llm_response': ' 비오는 날에는 일부 한국의 가장 유명한 시인들이나 전통적인 시문록을 사용하는 것이 일반적입니다. 예를 들어, 후보선인 "비를 내리는 날"라고', 'usage': {'input': 34, 'output': 100, 'total': 134, 'metric': 'tokens', 'processing_time': 2.8126730918884277}}


  Check out other examples in Models, Prompts, SLIM-Agents Embeddings, and Use_Cases<br>
  -- configuration in loading model - temperature, sample, max_output<br>
  -- add models to the Catalog for easy invocation<br>
  -- integrating sources and building RAG workflows in Prompts<br>
  -- fact-checking and post-processing<br>
  -- using function-calls on small specialized models<br>
  -- agent based workflows<br>
  -- installing embeddings on a library collection