----------------------
# Open vs Closed Models


![Open vs Closed](./data/open-v-closed.jpg)


Proprietary models are still the top performing general purpose LLM. These models are trained on vast amounts of data at considerable expense. Some companies have released smaller versions of their models under different open frameworks and licenses


## Closed/Proprietary LLMs

Closed-source LLMs, keep their internal workings under wraps. These models are typically created by well-funded companies that can pour resources into development and ongoing refinement.

* Limited Access: You might need special permission or a paid subscription to use them.
* Customisation: Fine-tuning pre-trained models and adjusting parameters may be your only customization options, as the underlying code and architecture are often inaccessible.
* Support: The companies that develop these models often provide vendor support, which can be helpful if you lack in-house expertise.
* Proprietary Licensing: Expect specific terms and conditions for using the LLM, as the ownership remains with the developer.

eg: OpenAI ChatGPT, Google Gemini, Antropic Claude, Cohere Command


## Open LLM

Most of the top performing open models are derived from closed models or much larger models. Open models are freely available to the public. This fosters innovation, collaboration, and community driven development

* Customization: Open access to the model's inner workings allows for deep customization, innovation, tailoring the LLM to specific needs.
  * Quantisation
  * LoRA
* Community-Driven Support: A large and active developer community provides assistance, fostering collaboration and knowledge sharing.
* Licensing: Depending on the chosen open-source license, commercial or research use might require specific actions.
* Transparency builds trust: Open development helps fostering trust and reliability.
* Accessible base models (LLaMA, Mistral): Several high-quality, open base models are readily available, ranging from smaller (3B-13B parameters) to larger options (30B-70B parameters). While these may not be the absolute biggest models, they offer a good balance of power and accessibility.
* Open models going thru' very quick iterations (every 1~2 wks)

eg. LLaMA family, Mistral, DBRX



### Open source vs open weight

This is a critical question in the field of large language models (LLMs). While releasing a model's weights can enable others to use the model for certain tasks, it doesn't provide full transparency or allow for retraining the model from scratch. Here's a breakdown of the two approaches:

**Open Source:**

True open-source release involves sharing everything necessary to rebuild and train the model from the ground up. This includes:
* Model architecture code: The blueprint for the LLM's structure and organization.
* Training methodology and hyperparameters: The specific steps and settings used to train the model.
* Original training dataset: The data the model was trained on, which can significantly impact its performance and biases.
* Documentation and other relevant details: Any additional information to aid in understanding and using the model.

Open-sourcing a LLM offers several advantages:
* Transparency: Anyone can inspect the inner workings of the model, fostering trust and reliability.
* Customization: Users can tailor the model to their specific needs by modifying the code and training data.
* Collaboration: An open-source LLM fosters a community of developers who can contribute to its improvement.

However, releasing a LLM as open-source requires a substantial commitment from the developers, as it necessitates making proprietary information public.


**Open Weights:**

Releasing only the model's weights is a less comprehensive approach to open-sourcing. The weights are the learned parameters of the neural network that determine the model's behavior. While releasing the weights allows others to:
* Use the model for inference: Apply the model to new data to generate predictions.
* Fine-tune the model: Adjust the model's behavior for a specific task by training it on additional data.

Crucially, it doesn't allow for:
* Retraining the model from scratch: Users cannot rebuild the model with different data or training settings.
* Understanding how the model works: The inner workings of the model remain opaque without access to the training code and dataset.



![Open vs close model ELO](./data/arena_elo.jpg)



-----------------
References:

* Leaked Google document: [We Have No Moat, And Neither Does OpenAI](https://www.semianalysis.com/p/google-we-have-no-moat-and-neither)
* https://promptengineering.org/llm-open-source-vs-open-weights-vs-restricted-weights/




-----------------------
# Popular open models


## LLaMa Family

* LLaMA (Large Language Model Meta AI) are a family of LLM models release by [Meta](https://ai.meta.com/blog/meta-llama-3/). Original model released to researchers under non-commercial license (Mar '23) however model weights were leaked.
* A large number of researchers have extended LLaMA models by either instruction tuning or continual pretraining
  *  Instruction tuning LLaMA has become a major approach to developing customized or specialized models, due to the relatively low computational costs.
* LLaMA is available on HuggingFace [Meta profile](https://huggingface.co/meta-llama) a 
* [Purple Llama](https://ai.meta.com/blog/purple-llama-open-trust-safety-generative-ai) project covering tools and evaluations to help developers build responsibly with open generative AI models
* LLaMa 3 release Apr '24



![LLaMA](./data/llama.png)


* Stanford researcher created the [Alpaca](https://github.com/tatsu-lab/stanford_alpaca) by fine-tuning the LLaMA-7B model using the [Alpaca-52K](https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json) instruction-following data 
*  Alpaca outperforms the LLaMA model and only cost about \$500 to train.
* [Vicuna](https://lmsys.org/blog/2023-03-30-vicuna/) fine-tuned LLaMa model using 70K user-shared ChatGPT conversations. The cost of training Vicuna-13B was around \$300.




---------------------
## Mistral

![Mistral](./data/mistral.png)


* Mistral provides 2 open weights models that can be used without restrictions (Apache 2.0 license)
  * Mistral-7B
  * Mixtral-8x22B mixtral of experts (MoE) model
* Mistral don't expose how models were trained, recipe, how to collect the data
* Multiple language support in models
* Focus on developer experience
* Weights available on HuggingFace [Mistral AI profile](https://huggingface.co/mistralai)


-----------------------
## DBRX

* DBRX is a general purpose LLM created by [Databricks](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm)
* Mixture of experts (MoE) architecture --> a large number of smaller experts. DBRX has 16 experts and chooses 4
* Weights available on HuggingFace [Databrick profile](https://huggingface.co/databricks)


-----------------------
## Other open models


### Google

* [Gemma](https://ai.google.dev/gemma) is a family of lightweight open models from Google


### Microsoft

* Microsoft [Phi 3](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) family of small curated data models


### Apple

* Apple [OpenELM](https://huggingface.co/apple/OpenELM) family of models released April '24
* These are small language models with a focus of embedding in devices
* Pay attention to the license


### Snowflake

* Snowflake released a massive 480B parameter open model called [Arctic](https://huggingface.co/Snowflake/snowflake-arctic-instruct) in April '24
* Release under Apache 2.0 license

  
### Grok (X)

* Open weights
* Available on HuggingFace [xAI profile](https://huggingface.co/xai-org)


### Alibaba 

* [Qwen](https://github.com/QwenLM/Qwen) is a family of models from Alibaba




-----------------------
# Running LLMs locally

  
## Ollama

* [Ollama](https://ollama.com/) 
  * Simple and fast
  * Large range of [models](https://ollama.com/library) available
  * ```ollama pull llama3```
  * ```ollama run llama3```
  * [Modelfile](https://github.com/ollama/ollama/blob/main/docs/modelfile.md)
    * ```ollama create my-model-name -f my-model-file```
    * ```ollma run my-model-name```
     

-----------------------
## GTP4ALL

* [GPT4ALL](https://gpt4all.io/)
  * Can add local data collection enabling chat with you private data
  * Limited models
   

-----------------------
## LLaMa.cpp

* [LLaMa.cpp](https://github.com/ggerganov/llama.cpp)
  * Open source implementation of Meta's LLaMa architecture in C/C++
  * Minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud
  * Used as the inference engine in tools lie Ollama
  * Uses **GGUF** format. Modern and efficient model format that bundles all necessary information for loading and running a model
  * Supports most popular open weight models including Llama, Mistral, Gemma
   

----------------------
## Transformers.js - Run transformers in the browser

Hugging Face [Transformers.js](https://huggingface.co/docs/transformers.js/en/index) allows you to run ML/transformers directly in your browser using Microsoft [ONNX](https://onnxruntime.ai/) runtime
* No server compute
* Privacy - no data sent to servers
* Rich JS tools to interact with browser
* [WebGPU](https://caniuse.com/webgpu) support --> speed


---------------------------
## Other alternatives - Groq

* Specialised chip to run GenAI inference [Groq cloud](https://console.groq.com/playground)
* Groq positions itself as a challenger to Nvidia with its Language Processing Unit (LPU) chips as demand for specialized computer chips has skyrocketed
* https://simonwillison.net/2024/Apr/22/llama-3/



---------------------
## Hugging Face

* [Hugging Face](https://huggingface.co/) Github of machine learning
  * Models
  * **Datasets** eg. (Banking77](https://huggingface.co/datasets/PolyAI/banking77)
  * Spaces: Build and publish ML applications
* If interested in building apps using Hugging Face models learn the [Transformers Library](https://huggingface.co/docs/transformers/index)
  *  Open-source implementations of transformer models for text, image, and audio tasks
  *  [Transformers.js](https://huggingface.co/docs/transformers.js/index) run transformers/ML directly in your browser eliminating the need for a server.



------------------------
# LLM Leaderboard

* HuggingFace [leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
* LMSYS Chatbot Arena [leaderboard](https://chat.lmsys.org/?leaderboard)



## Model Benchmarks

* [AI2 reasoning challenge](https://allenai.org/data/arc) (ARC) - grade school science questions
* [HellaSwag](https://rowanzellers.com/hellaswag/) - common sense
* [MMLU](https://github.com/hendrycks/test) (Massive Multitask Manguage Understanding) - how diverse LLM knowledge is
* [TruthfulQA](https://github.com/sylinrl/TruthfulQA) - how truthful is a model
* [WinoGrande](https://winogrande.allenai.org/) - commonsense reasoning
* [GSM8K](https://paperswithcode.com/dataset/gsm8k) - maths reasoning


---------------------
# References


## Survey Papers - arxiv.org
* [A Survey of Large Language Models](https://arxiv.org/pdf/2303.18223.pdf)
* [Large Language Models: A Survey](https://arxiv.org/pdf/2402.06196.pdf)


## HuggingFace

* Hugging Face [EPAM profile](https://huggingface.co/epam)
* HuggingFace [course](https://www.youtube.com/watch?v=00GKzGyWFEs&list=PLo2EIpI_JMQvWfQndUesu0nPBAtZ9gP1o) - Technical


## LangChain

* Langchain support for [open models](https://python.langchain.com/docs/integrations/llms/)


## Learnings

* DeepLearning.AI [short courses](https://www.deeplearning.ai/short-courses/)
* [Understanding Deeplearning](https://udlbook.github.io/udlbook/) 
  
