TaoGPT-7B

Introduction

TaoGPT-7B is a pioneering project blending technology with the innovative field of Tao Science. The objective is to fine-tune the Mistral 7B Language Model (LLM) on Tao Science data to enhance its proficiency in this novel area. A Retrieval Augmentation pipeline is implemented to enrich outputs with relevant information.

Getting Started

To start with TaoGPT-7B, follow these steps:

Fine-tune Mistral 7B on Tao Science data.
Utilize the Colab notebooks for training and inference.

Notebooks:

Project Structure

The project is structured into several components:

Data:
- /unstructured: Contains PDFs and unstructured data on Tao Science.
- /structured: Datasets derived from PDFs.
Data Preparation:
- dataprep.ipynb: Transforms unstructured data into a structured format.
Fine-Tuning:
- finetuning.ipynb: Focuses on the supervised fine-tuning of Mistral 7B using Tao Science datasets.
Inference:
- inference.ipynb: Tests the capabilities of the fine-tuned model using RAG and Gradio for user interaction.

Technologies Used

TaoGPT-7B employs various technologies:

Mistral 7B (LLM): Central model, tailored for Tao Science.
Langchain: Ensures seamless project component integration.
Transformers Library: Provides LLM fine-tuning and inference tools.
Weaviate: Manages efficient data retrieval.
Gradio: Creates an interactive interface for model engagement.

Contributing

Contributions to TaoGPT-7B are appreciated. For contribution guidelines, see CONTRIBUTING.md.

Contributors

For a list of contributors, visit:

License

TaoGPT-7B is under the MIT License. See LICENSE.md for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
TaoGPT-embeddings		TaoGPT-embeddings
app		app
archives		archives
data		data
dataset_prep		dataset_prep
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
chat_with_TaoGPT.ipynb		chat_with_TaoGPT.ipynb
finetuning.ipynb		finetuning.ipynb
finetuning_instruct.ipynb		finetuning_instruct.ipynb
inference.ipynb		inference.ipynb

License

agencyxr/taogpt7B

Folders and files

Latest commit

History

Repository files navigation

TaoGPT-7B

Table of Contents

Introduction

Getting Started

Notebooks:

Project Structure

Technologies Used

Contributing

Contributors

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages