Skip to content

Commit

Permalink
pgcat docs
Browse files Browse the repository at this point in the history
  • Loading branch information
levkk committed Apr 25, 2024
1 parent 82dc23f commit 99abe8f
Show file tree
Hide file tree
Showing 13 changed files with 212 additions and 55 deletions.
57 changes: 57 additions & 0 deletions pgml-cms/docs/.gitbook/assets/pgcat_1.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pgml-cms/docs/.gitbook/assets/pgcat_2.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pgml-cms/docs/.gitbook/assets/pgcat_3.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pgml-cms/docs/.gitbook/assets/pgcat_4.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pgml-cms/docs/.gitbook/assets/pgcat_5.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pgml-cms/docs/.gitbook/assets/pgcat_6.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pgml-cms/docs/.gitbook/assets/pgcat_7.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 6 additions & 6 deletions pgml-cms/docs/README.md
Expand Up @@ -6,17 +6,17 @@ description: The key concepts that make up PostgresML.

PostgresML is a complete MLOps platform built on PostgreSQL. Our operating principle is:

> _Move the models to the database, rather than constantly moving the data to the models._
> _Move models to the database, rather than constantly moving data to the models._
The data for ML & AI systems is inherently larger and more dynamic than the models. It's more efficient, manageable and reliable to move the models to the database, rather than continuously moving data to the models.
Data for ML & AI systems is inherently larger and more dynamic than the models. It's more efficient, manageable and reliable to move models to the database, rather than continuously moving data to the models.

## AI engine

PostgresML allows you to take advantage of the fundamental relationship between data and models, by extending the database with the following capabilities:

* **Model Serving** - GPU accelerated inference engine for interactive applications, with no additional networking latency or reliability costs
* **Model Store** - Access to open-source models including state of the art LLMs from HuggingFace, and track changes in performance between versions
* **Model Training** - Train models with your application data using more than 50 algorithms for regression, classification or clustering tasks; fine tune pre-trained models like LLaMA and BERT to improve performance
* **Model Store** - Access to open-source models including state of the art LLMs from Hugging Face, and track changes in performance between versions
* **Model Training** - Train models with your application data using more than 50 algorithms for regression, classification or clustering tasks; fine tune pre-trained models like Llama and BERT to improve performance
* **Feature Store** - Scalable access to model inputs, including vector, text, categorical, and numeric data: vector database, text search, knowledge graph and application data all in one low-latency system

<figure><img src=".gitbook/assets/ml_system.svg" alt="Machine Learning Infrastructure (2.0) by a16z"><figcaption class="mt-2"><p>PostgresML handles all of the functions <a href="https://a16z.com/emerging-architectures-for-modern-data-infrastructure/">described by a16z</a></p></figcaption></figure>
Expand All @@ -34,14 +34,14 @@ The PostgresML team also provides [native language SDKs](https://github.com/post

While using the SDK is completely optional, SDK clients can perform advanced machine learning tasks in a single SQL request, without having to transfer additional data, models, hardware or dependencies to the client application.

Use cases include:
Some of the use cases include:

* Chat with streaming responses from state-of-the-art open source LLMs
* Semantic search with keywords and embeddings
* RAG in a single request without using any third-party services
* Text translation between hundreds of languages
* Text summarization to distill complex documents
* Forecasting timeseries data for key metrics with and metadata
* Forecasting time series data for key metrics with and metadata
* Anomaly detection using application data

## Our mission
Expand Down
8 changes: 4 additions & 4 deletions pgml-cms/docs/SUMMARY.md
Expand Up @@ -3,7 +3,7 @@
## Introduction

* [Overview](README.md)
* [Getting Started](introduction/getting-started/README.md)
* [Getting started](introduction/getting-started/README.md)
* [Create your database](introduction/getting-started/create-your-database.md)
* [Connect your app](introduction/getting-started/connect-your-app.md)
* [Import your data](introduction/getting-started/import-your-data/README.md)
Expand Down Expand Up @@ -52,12 +52,12 @@

## Product

* [Cloud Database](product/cloud-database/README.md)
* [Cloud database](product/cloud-database/README.md)
* [Serverless](product/cloud-database/serverless.md)
* [Dedicated](product/cloud-database/dedicated.md)
* [Enterprise](product/cloud-database/plans.md)
* [Vector Database](product/vector-database.md)
* [PgCat Proxy](product/pgcat/README.md)
* [Vector database](product/vector-database.md)
* [PgCat pooler](product/pgcat/README.md)
* [Features](product/pgcat/features.md)
* [Installation](product/pgcat/installation.md)
* [Configuration](product/pgcat/configuration.md)
Expand Down
16 changes: 9 additions & 7 deletions pgml-cms/docs/introduction/getting-started/README.md
Expand Up @@ -2,16 +2,18 @@
description: Setup a database and connect your application to PostgresML
---

# Getting Started
# Getting started

A PostgresML deployment consists of multiple components working in concert to provide a complete Machine Learning platform. We provide a fully managed solution in [our cloud](create-your-database), and document a self-hosted installation in [Developer Docs](/docs/resources/developer-docs/quick-start-with-docker).
A PostgresML deployment consists of multiple components working in concert to provide a complete Machine Learning platform:

* PostgreSQL database, with `pgml`, `pgvector` and many other extensions installed, including backups, metrics, logs, replicas and high availability
* PgCat pooler to provide secure access and model load balancing across thousands of clients
* A web application to manage deployed models and share experiments and analysis in SQL notebooks
* PostgreSQL database, with `pgml`, `pgvector` and many other extensions that add features useful in day-to-day and machine learning use cases
* [PgCat pooler](/docs/product/pgcat/) to load balance thousands of concurrenct client requests across several database instances
* A web application to manage deployed models and share experiments analysis with SQL notebooks

<figure class="m-3"><img src="../../.gitbook/assets/architecture.png" alt="PostgresML architecture"><figcaption></figcaption></figure>
We provide a fully managed solution in [our cloud](create-your-database), and document a self-hosted installation in the [Developer Docs](/docs/resources/developer-docs/quick-start-with-docker).

<figure class="my-4"><img src="../../.gitbook/assets/architecture.png" alt="PostgresML architecture"><figcaption></figcaption></figure>

By building PostgresML on top of a mature database, we get reliable backups for model inputs and proven scalability without reinventing the wheel, so that we can focus on providing access to the latest developments in open source machine learning and artificial intelligence.

This guide will help you get started with a generous free account, that includes access to GPU accelerated models and 5 GB of storage, or you can skip to our [Developer Docs](/docs/resources/developer-docs/quick-start-with-docker) to see how to run PostgresML locally with our Docker image.
This guide will help you get started with a generous [free account](create-your-database), that includes access to GPU accelerated models and 5 GB of storage, or you can skip to our [Developer Docs](/docs/resources/developer-docs/quick-start-with-docker) to see how to run PostgresML locally with our Docker image.
46 changes: 42 additions & 4 deletions pgml-cms/docs/product/pgcat/README.md
Expand Up @@ -2,10 +2,48 @@
description: Nextgen PostgreSQL Pooler
---

# PgCat
# PgCat pooler

PgCat is PostgreSQL connection pooler and proxy which scales PostgresML deployments. It supports read/write query separation, multiple replicas, automatic traffic distribution and load balancing, sharding, and many more features expected out of high availability enterprise grade Postgres databases.
<div class="row">
<div class="col-12 col-md-4">
<figure class="my-4">
<img class="mb-3" src="../../.gitbook/assets/pgcat_1.svg" height="auto" width="185" alt="PgCat logo">
<figcaption></figcaption>
</figure>
</div>
<div class="col-12 col-md-8">
<p>PgCat is PostgreSQL connection pooler and proxy which scales PostgreSQL (and PostgresML) databases beyond a single instance.</p>
<p>
It supports replicas, load balancing, sharding, failover, and many more features expected out of high availability enterprise-grade PostgreSQL deployment.
</p>
<p>
Written in Rust using Tokio, it takes advantage of multiple CPUs and the safety and performance guarantees of the Rust language.
</p>
</div>
</div>

Written in Rust and powered by Tokio, it takes advantage of multiple CPUs, and the safety and performance guarantees of the Rust language.

PgCat, like PostgresML, is free and open source, distributed under the MIT license. It's currently running in our Cloud, powering both Serverless and Dedicated databases.
PgCat, like PostgresML, is free and open source, distributed under the MIT license. It's currently running in our [cloud](https://postgresml.org/signup), powering both Serverless and Dedicated databases.

## [Features](features)

PgCat implements the PostgreSQL wire protocol and can understand and optimally route queries & transactions based on their characteristics. For example, if your database deployment consists of a primary and replica, PgCat can send all `SELECT` queries to the replica, and all other queries to the primary, creating a read/write traffic separation.

<figure>
<img class="mb-3" src="../../.gitbook/assets/pgcat_4.png" alt="PgCat architecture" width="95%" height="auto">
<figcaption><i>PgCat deployment at scale</i></figcaption>
</figure>

<br>

If you have more than one primary, sharded with either the Postgres hashing algorithm or a custom sharding function, PgCat can parse queries, extract the sharding key, and route the query to the correct shard without requiring any modifications on the client side.

PgCat has many more features which are more thoroughly described in the [PgCat features](features) section.

## [Installation](installation)

PgCat is open source and available from our [GitHub repository](https://github.com/postgresml/pgcat) and, if you're running Ubuntu 22.04, from our Aptitude repository. You can read more about how to install PgCat in the [installation](installation) section.

## [Configuration](configuration)

PgCat, like many other PostgreSQL poolers, has its own configuration file format (it's written in Rust, so of course we use TOML). The settings and their meaning are documented in the [configuration](configuration) section.

0 comments on commit 99abe8f

Please sign in to comment.