Simple Copilot for Specialized Question-Answering

About

This project is an implementation of OpenAI's Search and Ask Cookbook. It serves as a specialized Copilot capable of answering specific questions based on provided data sets. The system is versatile, allowing for the incorporation of data ranging from Mobile App Reviews to Corporate Handbooks.

Features

Answer questions in natural language
Uses OpenAI's embeddings for precise search
Uses OpenAI's chat/completions for answering questions
Extensible to various data sources and types
Easily configurable for different use-cases

How it works

OpenAI embeddings convert documents and queries into vector representations for comparison. They map text and code to vectors in a high-dimensional space, with closer embeddings indicating similar data. Practical applications include search, clustering and recommendations.

We can expand to chat-based applications using the search-and-ask method:

Search: A knowledge base is formed with document embeddings for each section. When a user queries, the question is converted into a query embedding to find relevant sections in the knowledge base.
Ask: This relevant information both results and the user query is then used to create a prompt to generate user responses.

Dependencies

Run the following command to install the necessary packages:

pip3 install -U argparse openai pandas yaml scipy tiktoken colorama

Quick Start

Generate Embeddings: First, you'll need to create embeddings for your text data.
```
python generate_embeddings.py --config config-custom.yml
```

Query the System: Once the embeddings are ready, start querying.

python search_and_ask_embeddings.py --config config-custom.yml --query "Your question goes here"

Data Preparation

Place your .csv or .tsv data files under the data/ folder. It's advisable to use a single row for each complete piece of information, such as a sentence or definition. Afterward, generate the embeddings to enable natural language querying.

Configuration

Create different configuration YAML files tailored to various data sets and use-cases. This modular approach allows you to reuse the same source code while easily switching between different configurations.

Helpful Resources

License

This project is open source, under the Unlicense.

Contributing

Contributions are welcome! Please read the contributing guidelines to get started.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yml		config.yml
generate_embeddings.py		generate_embeddings.py
search_and_ask_embeddings.py		search_and_ask_embeddings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple Copilot for Specialized Question-Answering

About

Features

How it works

Dependencies

Quick Start

Data Preparation

Configuration

Helpful Resources

License

Contributing

About

Releases

Packages

Languages

License

basiclines/simple-copilot

Folders and files

Latest commit

History

Repository files navigation

Simple Copilot for Specialized Question-Answering

About

Features

How it works

Dependencies

Quick Start

Data Preparation

Configuration

Helpful Resources

License

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages