# Introduction
You have two main options: hosting models on your servers or using OpenAI's API. Local hosting gives you control but has hardware and maintenance costs. Some models are not for commercial use. The right choice depends on budget, expertise, and data sensitivity. Models vary in size and capabilities. GPT-3 Ada is small and fast, best for simple tasks. GPT-4 is large, slow, and versatile. 

# Popular LLM Models using Langchain

### Cohere
- Command: For dialogue-like interactions.
- Generation (base): For generative tasks.
- Summarize (summarize-xlarge): For generating summaries.<br>
You can use Cohere for free with rate-limited usage during learning and prototyping. Usage remains free until going into production, but production usage might be more expensive compared to OpenAI APIs (e.g., $2.5 for 1K tokens). Cohere provides customized models for specific tasks, potentially leading to better results. LangChain's Cohere class simplifies accessing these models with the syntax Cohere(model="<MODEL_NAME>", cohere_api_key="<API_KEY>").

### GPT-3.5 Turbo - Key Points

- **Model**: GPT-3.5 is a language model developed by OpenAI.
- **Turbo Version**: The turbo version is recommended for affordability and human-like text generation.
- **API Access**: Accessible through OpenAI API endpoints.
- **Optimized**: Optimized for chat applications and performs well on other generative tasks.
- **Language Support**: Capable of processing text in 96 languages.
- **Context Length**: Offers a context length of up to 16K tokens.
- **Cost-Effective**: Considered the most cost-effective option in the OpenAI collection.
- **Price**: Priced at $0.002 per 1000 tokens.
- **Access**: To access, use the "gpt-3.5-turbo" key when initializing ChatOpenAI or OpenAI classes.

### GPT-4 - Key Points

- **Model**: GPT-4 is a competent multimodal model developed by OpenAI.
- **Parameters**: The number of parameters and training procedures are undisclosed.
- **Powerful**: Considered the latest and most powerful model published by OpenAI.
- **Multi-Modality**: Capable of processing both text and image inputs.
- **Accessibility**: Not publicly available; access through submitting an early access request on OpenAI platform.
- **Variants**: Two variants available - "gpt-4" and "gpt-4-32k".
- **Context Length**: Different context lengths for the variants - 8192 and 32768 tokens.

### AI21 Jurassic-2 - Key Points

- **Model**: AI21's Jurassic-2 is a language model available in three sizes: Jumbo, Grande, and Large.
- **Size and Pricing**: Model sizes (Jumbo, Grande, Large) with distinct price points.
- **Powerful Jumbo Version**: The Jumbo version is marked as the most powerful model by AI21.
- **Generative Capability**: Described as a general-purpose model with excellent performance on various generative tasks.
- **Multilingual Understanding**: The J2 model understands seven languages.
- **Fine-Tuning**: Capable of being fine-tuned on custom datasets.
- **Access and Integration**: Obtain API key from AI21 platform; access models using the AI21() class.

### StableLM Alpha - Key Points

- **Model**: StableLM Alpha is a language model developed by Stable Diffusion.
- **Access Methods**: Accessible via HuggingFace Hub (ID: stabilityai/stablelm-tuned-alpha-3b) for local hosting or Replicate API.
- **API Pricing**: Replicate API pricing ranges from $0.0002 to $0.0023 per second.
- **Sizes**: Available in two sizes: 3 billion and 7 billion parameters.
- **License**: Weights for StableLM Alpha are available under CC BY-SA 4.0 license with commercial use access.
- **Context Length**: StableLM has a context length of 4096 tokens.


### Dolly-v2-12B - Key Points

- **Model**: Dolly-v2-12B is a language model created by Databricks.
- **Access Methods**: Accessible via HuggingFace Hub (ID: databricks/dolly-v2-3b) for local hosting or Replicate API.
- **API Pricing**: Replicate API pricing is in the same range as mentioned before.
- **Parameters**: Dolly-v2-12B has 12 billion parameters.
- **License**: Available under an open source license for commercial use.
- **Base Model**: The base model used for Dolly-v2-12B is Pythia-12B.

### GPT4ALL - Key Points

- **Model**: GPT4ALL is based on meta’s LLaMA model with 7B parameters.
- **Developer**: Developed by Nomic-AI.
- **Access Methods**: Accessible through GPT4ALL and Hugging Face Local Pipelines.
- **License**: Published under a GPL 3.0 open-source license.
- **Commercial Use**: Not free for commercial applications.
- **Usage**: Available for researchers to use for projects and experiments.
- **Capability**: Detailed capability and usage process discussed in a previous lesson.


## LLM Platforms that can integrate into LangChain

### Cohere - Key Points

- **Company**: Cohere is a Canadian-based startup specializing in natural language processing models.
- **Focus**: Their models are designed to help companies enhance human-machine interactions.
- **Model**: Provides access to the Cohere xlarge model through an API.
- **Parameters**: The model has 52 billion parameters.
- **API Pricing**: Pricing is based on embeddings and costs $1 for every 1000 embeddings.
- **Installation**: Cohere offers an easy-to-follow installation process for their package.
- **Access**: Access to the model requires installation of their package.
- **Interaction**: Using LangChain, developers can easily interact with Cohere models.
- **Prompts**: Developers can create prompts with input variables to generate responses from the Cohere API.

### OpenAI - Key Points

- **Company**: OpenAI is one of the biggest companies focusing on large language models (LLMs).
- **Conversational Model**: They introduced ChatGPT, a conversational model that caught mainstream media attention for its potency.
- **API Endpoints**: OpenAI offers a diverse range of API endpoints for various natural language processing (NLP) tasks.
- **Variety**: Their API provides options for different NLP tasks with varying price points.
- **LangChain Integration**: The LangChain library offers multiple classes to conveniently access OpenAI's models.
- **Examples**: Previous lessons demonstrated the use of classes like ChatGPT and GPT4 in LangChain.

### Hugging Face Hub - Key Points

- **Company**: Hugging Face is a company specializing in natural language processing (NLP) technologies.
- **Pre-trained Models**: They develop and offer pre-trained language models and provide a platform for NLP model development and deployment.
- **Model and Dataset Hosting**: The platform hosts a vast collection of over 120k models and 20k datasets.
- **Spaces Service**: Hugging Face offers the Spaces service, allowing researchers and developers to quickly create demos and showcase model capabilities.
- **Large-Scale Models**: The platform hosts various large-scale models like StableLM, Dolly, and Camel.
- **HuggingFaceHub Class**: The HuggingFaceHub class facilitates the downloading and initialization of models.
- **Intel CPU Optimization**: Integration provides access to models optimized for Intel CPUs using the Intel® Extension for PyTorch library.
- **Enhanced Performance**: The optimization library leverages Intel®'s advanced architectural designs to enhance CPU and GPU performance.
- **Speed Enhancements**: Reports show significant speed improvements, such as a 3.8x speedup with the BLOOMZ model on Intel® Xeon® 4s CPU.
- **Efficiency Gains**: Integration with Intel® Xeon® CPU and optimization library leads to up to 6.5x inference speed increase.
- **Efficiency Examples**: Models like Whisper and GPT-J benefit from these efficiency gains.

### Amazon SageMakerEndpoint - Key Points

- **Infrastructure**: Amazon SageMaker enables users to easily train and host machine-learning models.
- **High-Performance and Low-Cost**: Provides a high-performance and cost-effective environment for experiments and large-scale models.
- **LangChain Integration**: The LangChain library offers a user-friendly interface for querying deployed models.
- **Simplified Process**: Users can access models without writing complex API codes.
- **Model Loading**: Models can be loaded using the endpoint_name, which is the model's unique name from SageMaker.
- **Authentication**: Credentials can be managed using the credentials_profile_name, specifying the authentication profile to use.

### Hugging Face Local Pipelines

Hugging Face Local Pipelines is a powerful tool that enables users to run Hugging Face models locally using the `HuggingFacePipeline` class. The Hugging Face Model Hub boasts an impressive collection of:
- Over 120,000 models
- 20,000 datasets
- 50,000 demo apps (Spaces)

All of these resources are publicly available and open source, fostering collaboration and facilitating the creation of machine learning models.

Users have two main ways to access these models:
1. Utilize the local pipeline wrapper to run models locally.
2. Call the hosted inference endpoints using the `HuggingFaceHub` class.

Before starting, ensure the Transformers Python package is installed. Once installed, follow these steps:
1. Load the desired model using `model_id`, `task`, and any additional model arguments.
2. Integrate the model into an `LLMChain` by creating a `PromptTemplate` and `LLMChain` object.
3. Run input through the `LLMChain` for language processing tasks.


### Azure OpenAI - Key Points

- **Access via Azure**: OpenAI's models can also be accessed through Microsoft's Azure platform.

### AI21 - Key Points

- **Company**: AI21 is a company that offers access to their powerful Jurassic-2 large language models through their API.
- **Model**: Provides access to the Jurassic-2 model with 178 billion parameters.
- **Cost**: The API is reasonably priced at $0.01 for every 1k tokens.
- **Interact**: Developers can interact with AI21 models by creating prompts with LangChain that incorporate input variables.
- **Language Processing**: Take advantage of the powerful language processing capabilities offered by AI21.


### Aleph Alpha is a company that offers a family of large language models known as the Luminous series.
- The Luminous family includes three models: Luminous-base, Luminous-extended, and Luminous-supreme.
- These models vary in terms of complexity and capabilities.
- Aleph Alpha's pricing model is token-based.
- The base prices per model for every 1000 input tokens are as follows:
  - Luminous-base: 0.03€
  - Luminous-extended: 0.045€
  - Luminous-supreme: 0.175€
  - Luminous-supreme-control: 0.21875€


### Banana - Machine Learning Infrastructure

- **Company**: Banana
- **Focus**: Machine learning infrastructure
- **Tools**: Provides tools for building machine learning models
- **Integration with LangChain**:
  - Install Banana package
  - Includes SDK for Python
- **Required Tokens**:
  - BANANA_API_KEY
  - YOUR_MODEL_KEY
- **Keys Obtained**: From Banana platform
- **Process**:
  1. Set the keys
  2. Create an object with YOUR_MODEL_KEY
  3. Integrate with LLMChain
  4. Use PromptTemplate for input
  5. Run through LLMChain for processing



### CerebriumAI - Accessing LLM Models

- **Company**: CerebriumAI
- **Alternative to**: AWS Sagemaker
- **API Access**: Provides access to LLM models through API
- **Pre-trained Models**:
  - Whisper
  - MT0
  - FlanT5
  - GPT-Neo
  - Roberta
  - Pygmalion
  - Tortoise
  - GPT4All
- **Instance Creation**:
  - Developers create an instance of CerebriumAI
  - Provide endpoint URL and relevant parameters
  - Parameters include max length, temperature, etc.



### DeepInfra - LLM API with A100 GPUs

- **Service**: DeepInfra
- **API**: Offers a range of LLMs (e.g., distilbert-base-multilingual-cased, bert-base, whisper-large, gpt2, dolly-v2-12b)
- **Integration**: Connected to LangChain via API
- **Hardware**: Runs on A100 GPUs
- **Optimized**: GPUs optimized for inference performance and low latency
- **Pricing**: More affordable than Replicate
  - $0.0005/second
  - $0.03/minute
- **Free Trial**: Provides a 1-hour free trial of serverless GPU computing
- **Experimentation**: Allows users to experiment with different models



### ForefrontAI Platform Integration with LangChain

- **Platform Overview**: ForefrontAI is a platform designed to empower users to fine-tune and utilize various open-source large language models, including GPT-J, GPT-NeoX, T5, and more.
- **Model Variety**: The platform offers access to a diverse range of models, allowing developers to choose models that best suit their specific language processing needs.
- **Pricing Plans**: ForefrontAI provides different pricing plans to accommodate various usage scenarios. For instance, the Starter plan is priced at $29/month. This plan includes:
  - 5 million serverless tokens.
  - Access to 5 fine-tuned models.
  - Support for 1 user.
  - Discord support for assistance and collaboration.

- **Fine-Tuning Capabilities**: With ForefrontAI, developers have the opportunity to fine-tune models according to their requirements. This customization ensures that the models perform optimally for specific tasks and domains.

- **Integration with LangChain**: By integrating ForefrontAI with LangChain, developers can seamlessly access, fine-tune, and utilize a wide array of open-source large language models. This integration extends LangChain's capabilities for enhanced language processing tasks.


### GooseAI Integration with LangChain

- **Platform Overview**: GooseAI is a fully managed NLP-as-a-Service platform that provides seamless access to various language models, including GPT-Neo, Fairseq, and GPT-J.
- **Pricing Structure**: The pricing for GooseAI is based on different model sizes and usage patterns. For instance, the 125M model has a pricing structure that includes a base price for up to 25 tokens per request, which is $0.000035. Additionally, there's an incremental fee of $0.000001.
- **Integration with LangChain**: To use GooseAI with LangChain, you need to follow a few steps:
  - Install the `openai` package.
  - Obtain the Environment API Key from GooseAI.
  - Set the Environment API Key in your code.
  - Create a GooseAI instance.
  - Define a Prompt Template for Question and Answer.
  - Initiate the LLMChain in LangChain.
  - Provide a question to run through the LLMChain.

- **Seamless Workflow**: The integration ensures a seamless workflow where users can easily interact with GooseAI models through the LangChain framework. This streamlines the process of running language processing tasks.

- **Enhanced Capabilities**: By integrating GooseAI with LangChain, users can tap into the capabilities of GPT-Neo, Fairseq, GPT-J, and other models, enhancing their language processing endeavors.


### Llama-cpp Integration with LangChain

- **Integration Purpose**: Llama-cpp, a Python binding for the llama.cpp library, has seamlessly integrated into the LangChain framework.
- **Access to LLM Models**: This integration empowers users to access a diverse range of Large Language Model (LLM) models provided by Llama-cpp. These models include LLaMA 🦙, Alpaca, GPT4All, Chinese LLaMA / Alpaca, Vigogne (French), Vicuna, Koala, OpenBuddy 🐶 (Multilingual), Pygmalion 7B, and Metharme 7B.
- **Expanded Options**: With this integration, LangChain users are presented with a wide array of options to choose from, catering to their specific language processing requirements.
- **Benefits**: By incorporating Llama-cpp into LangChain, users gain access to powerful language models. This enables them to generate humanistic and step-by-step responses to their input questions, enhancing their language processing capabilities.
- **Seamless Interaction**: The integration ensures a seamless interaction between Llama-cpp models and the LangChain framework, making the process efficient and user-friendly.


### Manifest Integration with LangChain

- **Tool Purpose**: Manifest is an integration tool designed to enhance the capabilities of LangChain, boosting its power and user-friendliness for various language processing tasks.
- **Bridge Between LangChain and Hugging Face Models**: Manifest acts as a bridge, connecting LangChain with local Hugging Face models. This integration enables users to easily access and utilize Hugging Face models within LangChain.
- **Seamless Integration**: Manifest has been seamlessly integrated into LangChain, providing users with enhanced functionalities and capabilities for language processing tasks.
- **Usage and Benefits**: To make use of Manifest within LangChain, users can follow the provided instructions. This typically involves installing the `manifest-ml` package and configuring the connection settings as required.
- **Comprehensive Language Processing**: Once integrated, users can leverage the capabilities of Manifest alongside LangChain. This allows for a comprehensive language processing experience, combining the strengths of both tools.


### Modal Integration with LangChain

- **Integration**: Modal seamlessly integrates into LangChain, enhancing the language processing workflow with powerful cloud computing capabilities.
- **Cloud Computing**: While Modal doesn't provide specific language models (LLMs), it serves as the infrastructure that allows LangChain to leverage serverless cloud computing.
- **Benefits**: Integrating Modal into LangChain enables users to directly access on-demand cloud resources from their local computers using Python scripts.
- **Installation and Authentication**: Users can install the Modal client library and generate a new token for authentication, establishing a connection to the Modal server.
- **Usage Example**: In a LangChain example, a Modal LLM can be instantiated using the endpoint URL. A PromptTemplate is then defined to structure the input for the language processing task.
- **Task Execution**: LangChain executes the LLMChain with the specified prompt, running tasks such as answering questions using the Modal-powered cloud resources.


### NLP Cloud Integration with LangChain

- **Integration**: NLP Cloud seamlessly integrates with LangChain, offering a comprehensive suite of high-performance pre-trained and custom models for various natural language processing (NLP) tasks.
- **Model Suite**: The platform provides a diverse range of models, including both pre-trained and custom options.
- **Designed for Production**: The models are designed for production use, ensuring reliability and performance.
- **Access via REST API**: Users can access the models through a REST API, making it easy to integrate into their applications.
- **Task Execution**: By executing the LLMChain with the specified prompt, users can perform NLP tasks such as answering questions seamlessly.


### Petals Integration with LangChain

- **Integration**: Petals are seamlessly integrated into LangChain, enabling the utilization of over 100 billion language models within a decentralized architecture similar to BitTorrent. 
- **Guidance**: This notebook provides guidance on incorporating Petals into the LangChain workflow. 
- **Diverse Range**: Petals offer a diverse range of language models, enhancing natural language understanding and generation capabilities. 
- **Enhanced Capabilities**: Its integration with LangChain enhances the platform's language processing capabilities. 
- **Decentralized Architecture**: Petals operate under a decentralized model, providing users with powerful language processing capabilities in a distributed environment.


### PipelineAI - Cloud-based Scaling for LLM Models

- **Integration**: Seamlessly integrated into LangChain
- **Scalability**: Allows users to scale their machine-learning models in the cloud
- **API Access**: Provides API access to a range of LLM (Large Language Model) models
- **Supported Models**: Offers models like GPT-J, Stable Diffusion, ESRGAN, DALL·E, GPT-2, GPT-Neo
- **Customization**: Each model comes with specific parameters and capabilities
- **Cloud Advantage**: Empowers users to leverage cloud scalability and power
- **LangChain Ecosystem**: Enhances machine-learning workflows within the LangChain environment



###  PredictionGuard - Enhanced Language Model Usage in LangChain

- **Integration**: Seamlessly integrated into LangChain framework
- **Wrapper**: Provides a powerful wrapper for language model usage
- **Installation**: Requires installation of predictionguard and LangChain libraries
- **Advanced Integration**: Can be integrated into LangChain's LLMChain for more advanced tasks
- **Enhanced Experience**: Adds an additional layer of control and safety to language model outputs
- **Optimized Usage**: Enhances LangChain's capabilities while ensuring safer model outputs


### PromptLayer - Enhanced Control for OpenAI GPT Prompt Engineering in LangChain

- **Integration**: Seamlessly integrated into LangChain framework
- **Control and Management**: Offers enhanced control and management of GPT prompt engineering
- **Middleware**: Acts as a middleware between users' code and OpenAI's Python library
- **Recording and Tracking**: Enables recording, tracking, and exploration of OpenAI API requests
- **Dashboard**: Utilizes the PromptLayer dashboard for managing API requests and outputs
- **Package Installation**: Requires installation of the 'promptlayer' package
- **Template Attachment**: Users can attach templates to requests for evaluation within the dashboard
- **Template and Model Evaluation**: Enables evaluating different templates and models in the PromptLayer dashboard
- **Enhanced Prompt Engineering**: Enhances LangChain's prompt engineering capabilities with enhanced control and insights


### Replicate - Seamlessly Integrated LLM Models in LangChain

- **Integration**: Seamlessly integrated into LangChain framework
- **LLM Model Variety**: Offers a wide range of LLM models for various applications
- **Models Offered**: Includes models like vicuna-13b, bark, speaker-transcription, stablelm-tuned-alpha-7b, Kandinsky-2, stable-diffusion, and more
- **Diverse Applications**: Covers language generation, generative audio, speaker transcription, language modeling, text-to-image generation, and more
- **Specific Parameters**: Each model has specific parameters and capabilities
- **Customization**: Enables users to choose the most suitable model for their specific needs
- **Flexible Pricing**: Provides pricing options based on computational resources required for running the models
- **Deployment Simplification**: Simplifies deployment of custom machine-learning models at scale
- **Effective Interaction**: Integrated into LangChain for effective interaction with the models

### Runhouse - Seamlessly Integrated Remote Compute and Data Management in LangChain

- **Integration**: Seamlessly integrated into LangChain framework
- **Remote Compute**: Provides powerful remote compute capabilities
- **Data Management**: Offers data management capabilities across different environments and users
- **Flexibility**: Allows hosting models on your own GPU infrastructure or using on-demand GPUs from cloud providers like AWS, GCP, and Azure
- **Available Models**: Provides LLM models such as gpt2 and google/flan-t5-small for utilization within LangChain
- **Hardware Configuration**: Users can specify the desired hardware configuration
- **Advanced Workflows**: Combines with LangChain for advanced language model workflows
- **Efficient Execution**: Enables efficient model execution and collaboration across various environments and users




### StochasticAI - Simplifying Deep Learning Model Workflow in LangChain

- **Aim**: Aims to simplify deep learning model workflows within LangChain
- **Efficient Environment**: Provides users with an efficient and user-friendly environment for model interaction and deployment
- **Lifecycle Management**: Streamlines the lifecycle management of Deep Learning models
- **Acceleration Platform**: StochasticAI's Acceleration Platform simplifies tasks like model uploading, versioning, training, compression, and acceleration
- **Production Deployment**: Facilitates deployment of models into production environments
- **Interact with LangChain**: Users can effortlessly interact with StochasticAI models within LangChain
- **Available Models**: Offers LLM models like FLAN-T5, GPT-J, Stable Diffusion 1, and Stable Diffusion 2
- **Diverse Capabilities**: These models provide diverse capabilities for various language-related tasks

### Writer - Powerful Language Content Generation in LangChain

- **Integration**: Seamlessly integrated into LangChain
- **Powerful Platform**: Provides users with a powerful platform for generating diverse language content
- **Effortless Interaction**: LangChain users can effortlessly interact with a range of LLM models for language generation
- **Available Models**: Provides various LLM models for different language generation needs:
  - Palmyra Small (128m)
  - Palmyra 3B (3B)
  - Palmyra Base (5B)
  - Camel 🐪 (5B)
  - Palmyra Large (20B)
  - InstructPalmyra (30B)
  - Palmyra-R (30B)
  - Palmyra-E (30B)
  - Silk Road
- **Diverse Capacities**: These models offer different capacities for improving language understanding, generative pre-training, following instructions, and retrieval-augmented generation
