LLM As Database Administrator

Features • News • QuickStart • Cases • Customization • FAQ • Community • Contributors

【English | 中文】

🧗 We aim to provide a collection of useful, user-friendly, and advanced database tools. These tools are built around LLMs, including query optimization (online demo), system diagnosis (D-Bot), and anomaly simulation (attacker)

✨ Features

System Diagnosis (D-Bot)

Well-Founded Diagnosis: D-Bot can provide founded diagnosis by utilizing relevant database knowledge (with document2experience).
Practical Tool Utilization: D-Bot can utilize both monitoring and optimization tools to improve the maintenance capability (with tool learning and tree of thought).
In-depth Reasoning: Compared with vanilla LLMs, D-Bot will achieve competitive reasoning capability to analyze root causes (with multi-llm communications).

A demo of using D-Bot

db_diag.mp4

📰 What's New

[2023/9/10] Add diagnosis logs 🔗 link and replay button in the frontend ⏱ link
[2023/9/09] Add typical anomalies 🔗 link
[2023/9/05] A unique framework is available! Start diag+tool service with a single command, experiencing 5x speed up!! 🚀 link
[2023/8/25] Support vue-based website interface. More flexible and beautiful! 🔗 link
[2023/8/22] Support tool retrieval for 60+ APIs 🔗 link
[2023/8/16] Support multi-level optimization functions 🔗 link
[2023/8/10] Our vision papers are released (continuously update)
- LLM As DBA. [paper] [中文解读] [twitter] [slides]
- DB-GPT: Large Language Model Meets Database. [paper]

This project is evolving with new features 👫👫
Don't forget to star ⭐ and watch 👀 to stay up to date :)

🕹 QuickStart

D-Bot

Folder Structure

.
├── multiagents
│   ├── agent_conf                        # Settings of each agent
│   ├── agents                            # Implementation of different agent types 
│   ├── environments                      # E.g., chat orders / chat update / terminal conditions
│   ├── knowledge                         # Diagnosis experience from documents
│   ├── llms                              # Supported models
│   ├── memory                            # The content and summary of chat history
│   ├── response_formalize_scripts        # Useless content removal of model response
│   ├── tools                             # External monitoring/optimization tools for models
│   └── utils                             # Other functions (e.g., database/json/yaml operations)

1. Prerequisites

PostgreSQL v12 or higher

Additionally, install extensions like pg_stat_statements (track slow queries), pg_hint_plan (optimize physical operators), and hypopg (create hypothetical Indexes).
Prometheus ~~and Grafana (tutorial)~~

Check prometheus.md for detailed installation guides.

Grafana is no longer a necessity with our vue-based frontend.

2. Package Installation

Step 1: Install python packages.

pip install -r requirements.txt

Step 2: Configure environment variables.

Export your OpenAI API key

# macos
export OPENAI_API_KEY="your_api_key_here"

# windows
set OPENAI_API_KEY="your_api_key_here"

Step 3: Add database/anomaly/prometheus settings into tool_config_example.yaml and rename into tool_config.yaml:

```bash
POSTGRESQL:
  host: 182.92.xxx.x
  port: 5432
  user: xxxx
  password: xxxxx
  dbname: postgres

BENCHSERVER:
  server_address: 8.131.xxx.xx
  username: root
  password: xxxxx
  remote_directory: /root/benchmark

PROMETHEUS:
  api_url: http://8.131.xxx.xx:9090/
  postgresql_exporter_instance: 172.27.xx.xx:9187
  node_exporter_instance: 172.27.xx.xx:9100
```

You can ignore the settings of BENCHSERVER, which is not used in this version.

If accessing openai service via vpn, execute this command:

# macos
export https_proxy=http://127.0.0.1:7890 http_proxy=http://127.0.0.1:7890 all_proxy=socks5://127.0.0.1:7890

Test your openai key

cd others
python openai_test.py

3. Diagnosis & Optimization

Website Interface

We also provide a local website demo for this environment. You can launch it with

# cd website
cd front_demo
rm -rf node_modules/
rm -r package-lock.json
# install dependencies for the first run (nodejs, ^16.13.1 is recommended)
npm install  --legacy-peer-deps
# back to root directory
cd ..
# launch the local server and open the website
sh run_demo.sh

Modify the "python app.py" command within run_demo.sh if multiple versions of Python are installed.

After successfully launching the local server, visit http://127.0.0.1:9228/ to trigger the diagnosis procedure.

Command-line Interface

python main.py

🎩 Anomalies

Within the anomaly_trigger directory, we aim to offer scripts that could incur typical anomalies, e.g.,

Root Cause	Description	Case
	Long execution time for large data insertions
	Long execution time for large data fetching
	Missing indexes causing performance issues	🔗 link
	Unnecessary and redundant indexes in tables
	Unused space caused by data modifications
	Poor performance of Join operators
	Non-promotable subqueries in SQL
	Outdated statistical info affecting execution plan
	Lock contention issues
	Severe external CPU resource contention
	IO resource contention affecting SQL performance
	High-concurrency inserts affecting SQL execution	🔗 link
	High-concurrency commits affecting SQL execution	🔗 link
	Workload concentration affecting SQL execution	🔗 link
	Tool small allocated memory space
	Reach the max I/O capacity or throughput

📎 Customize Your KnowledgeBase And Tools

1. Knowledge Preparation

Extract knowledge from both code (./knowledge_json/knowledge_from_code) and documents (./knowledge_json/knowledge_from_document).
- Add code blocks into diagnosis_code.txt file -> Rerun the extract_knowledge.py script -> Check the update results and sync to root_causes_dbmind.jsonl.

2. Tool Preparation

Tool APIs (for optimization)

Module Functions

index_selection (equipped) heuristic algorithm

query_rewrite (equipped) 45 rules

physical_hint (equipped) 15 parameters

For functions within [query_rewrite, physical_hint], you can use api_test.py script to verify the effectiveness.

If the function actually works, append it to the api.py of corresponding module.

Tool Usage Algorithm (tree of thought)
```
cd tree_of_thought
python test_database.py
```
History messages may take up many tokens, and so carefully decide the turn number.

💁 FAQ

🤨 The '.sh' script command cannot be executed on windows system.

Switch the shell to *git bash* or use *git bash* to execute the '.sh' script.

🤨 "No module named 'xxx'" on windows system.

This error is caused by issues with the Python runtime environment path. You need to perform the following steps:

Step 1: Check Environment Variables.

You must configure the "Scripts" in the environment variables.

Step 2: Check IDE Settings.

For VS Code, download the Python extension for code. For PyCharm, specify the Python version for the current project.

⏱ Todo

~~Project cleaning~~
~~Support more anomalies~~
Strictly constrain the llm outputs (excessive irrelevant information) based on the matched knowledge
Add more communication mechanisms
Support more knowledge sources
Support localized private models (e.g., llama/vicuna/luca)
Release training datasets
Support other databases (e.g., mysql/redis)

👫 Community

Relevant Projects

https://github.com/OpenBMB/AgentVerse

https://github.com/OpenBMB/BMTools

Citation

Feel free to cite us if you like this project.

@misc{zhou2023llm4diag,
      title={LLM As DBA}, 
      author={Xuanhe Zhou, Guoliang Li, Zhiyuan Liu},
      year={2023},
      eprint={2308.05481},
      archivePrefix={arXiv},
      primaryClass={cs.DB}
}

@misc{zhou2023dbgpt,
      title={DB-GPT: Large Language Model Meets Database}, 
      author={Xuanhe Zhou, Zhaoyan Sun, Guoliang Li},
      year={2023},
      archivePrefix={Data Science and Engineering},
}

📧 Contributors

Other Collaborators: Wei Zhou, Kunyi Li.

We thank all the contributors to this project. Do not hesitate if you would like to get involved or contribute!

Name		Name	Last commit message	Last commit date
Latest commit History 195 Commits
anomaly_trigger		anomaly_trigger
api		api
case_analysis		case_analysis
config		config
front_demo		front_demo
imgs		imgs
knowledge_json		knowledge_json
logs		logs
materials		materials
models		models
monitoring_scripts		monitoring_scripts
multiagents		multiagents
others		others
prompt_template_scripts		prompt_template_scripts
prompt_templates		prompt_templates
response_formalize_scripts		response_formalize_scripts
tree_of_thought		tree_of_thought
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README-Chinese.md		README-Chinese.md
README.md		README.md
app.py		app.py
app_factory.py		app_factory.py
classpath.txt		classpath.txt
diag_time.txt		diag_time.txt
host_local_tools.py		host_local_tools.py
main.py		main.py
requirements.txt		requirements.txt
run_demo.sh		run_demo.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM As Database Administrator

✨ Features

System Diagnosis (D-Bot)

📰 What's New

🕹 QuickStart

D-Bot

Folder Structure

1. Prerequisites

2. Package Installation

3. Diagnosis & Optimization

Website Interface

Command-line Interface

🎩 Anomalies

📎 Customize Your KnowledgeBase And Tools

1. Knowledge Preparation

2. Tool Preparation

💁 FAQ

⏱ Todo

👫 Community

Relevant Projects

Citation

📧 Contributors

About

Releases

Packages

Languages

Module	Functions
index_selection (equipped)	heuristic algorithm
query_rewrite (equipped)	45 rules
physical_hint (equipped)	15 parameters

License

Ideny42/DB-GPT

Folders and files

Latest commit

History

Repository files navigation

LLM As Database Administrator

✨ Features

System Diagnosis (D-Bot)

📰 What's New

🕹 QuickStart

D-Bot

Folder Structure

1. Prerequisites

2. Package Installation

3. Diagnosis & Optimization

Website Interface

Command-line Interface

🎩 Anomalies

📎 Customize Your KnowledgeBase And Tools

1. Knowledge Preparation

2. Tool Preparation

💁 FAQ

⏱ Todo

👫 Community

Relevant Projects

Citation

📧 Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages