Skip to content
Wassim edited this page Aug 14, 2023 · 13 revisions

Introduction

This page presents the scope of 'VectorWisdom' organization. 'VectorWisdom' is a non profit research organization which focus on the way up the pyramid of data, information, knowledge and wisdom. This is achieved using machine learning and more specifically on Large Language Models 'LLMs', their unit elements the Embeddings and transformers that generate them, the Vector databases that stores the Embeddings and Semantic search which can reference the sources.

Large Language Models

A Large Language Model is a neural network that can have up to millions or billions of parameters (neurones weights) and targets natural language processing applications. It can have different architectures such as GPT (Generative pre-trained transformer) or BERT (Bidirectional Encoder Representations from Transformers)

Hugging face is a community driven website that collects state of the art datasets and models but also provides a machine learning framework for building, training and deploying the hosted datasets and models.

Langchain is a framework abstracting other libraries to speed up development of LLM Apps

Models

chat libraries

Embeddings

Tokens are words or parts of words. Tokens can be transformed to a large list of numbers representing features or meanings, these numbers are structured in a vector called Embedding.

Transformers

A transformer is the trained model that can generate Embeddings

Images Transformers

  • CNN models : VGG, ResNet, Inception
  • OpenCV : SIFT, SURF, ORB
  • FastText
  • ImageNet
  • Hugging face transformers (pre-trained models) : ViT, DeiT

Vector databases

A Vector database is a storage management for vectors that is optimized for Embeddings or dense vectors. It can scale to a large number of Embeddings.

Semantic Search

Semantic search is a search that tries to 'understand' the user query and match it with meaningful result. It can be referred to as an extension of text search to include similarities and meanings in the search match as opposed to Full text search which only finds an exact characters match. Semantic search can covert word tokens to Embeddings for fuzzy and synonyms search which make it use the same principle as LLMs

comparisions

Analysis

LLMs and search

  • Transformers can be used for both instant search and context input for LLMs, they offer eased integrations e.g. with ML frameworks such as OpenAI and HuggingFace.
  • LLMs can support a Generative Search which pipes search results through LLM models
  • LLMs contain all the data within its internal parameters and does not know where the data is coming from, therefore not appropriate for search, rather for providing answers (falling from the sky).
  • A Semantic search engine has the task to connect a user query with a known reference that the user can open for further consultation.

Preference of semantic search over an LLM prompt is a matter of result representation and use case.