<a href="https://colab.research.google.com/github/Maplemx/Agently/blob/main/playground/long_text_to_qa_pairs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Long Text to Question & Answer Pairs

## Demo Description

**Author:** Agently Team

**Prompt Language:** English

**Agent Components:** None

**Description:**

When we try to build a vector database of our own knowledge or try to fine-tune language model, question & answer pairs are more useful than a very long piece of text.

How can we generate question & answer pairs from long text and storage them in a structure data format like dictionaries in a list in an easy way? This demo shows an easy solution powered by Agently framework.

当我们尝试使用向量数据库构建我们自己的知识库（比如做RAG知识增强时），或是当我们尝试微调我们的模型时，使用问答对比直接传入大块的文本更加合适。

那我们应该怎样通过长文本生成相关的问答对呢？我们怎样让这些生成的问答对是结构化的，能够被存储到具有特定数据结构的变量（比如在一个list里存入大量结构一致的dict）里呢？这个案例使用Agently框架给出了一种简单的实现思路。

## Step 1: Install Packages

In [None]:
!pip install -U Agently
!pip install requests

## Step 2: Demo Code

> ⚠️ Directly run this demo may take a long time, consume a significant amount of tokens and cost a lot of money because this demo use [Agently README document](http://github.com/Maplemx/Agently) by default and it is a very long markdown document.
>
> 警告：直接启动这个DEMO会等待很长时间，并耗用大量的Token，因为它处理的目标文档是Agently的[项目介绍文档](http://github.com/Maplemx/Agently)，这是一个非常长的文档

If you want to try this demo, make sure you change `document_link` to a shorter markdown document first.

如果你想要尝试运行这个DEMO，请务必先行修改`document_link`，指向更短的markdown文档

In [16]:
import Agently
import requests
import time

# Model Settings
agent_factory = Agently.AgentFactory()\
    .set_settings("model.OpenAI.auth", { "api_key": "" })\
    .set_settings("model.OpenAI.options", { "model": "gpt-3.5-turbo-16k" })
    # recommend using 16k or larger context model for this kind of tasks

# Download document
document_link = "https://raw.githubusercontent.com/Maplemx/Agently/main/README.md"
document_content = ""
response = requests.get(document_link)
if response.status_code == 200:
    document_content = response.content.decode("utf-8")

# Work Settings
piece_length_control = 1000
sleep_time = 5 # sleep for a while in case of reaching API request limit

# Chop document
chunks = document_content.split("\n\n")
paragraphs = []
paragraph_num = -1
for chunk in chunks:
    if chunk.startswith("#"):
        paragraphs.append(chunk + "\n\n")
        paragraph_num += 1
    else:
        paragraphs[paragraph_num] += chunk + "\n\n"

text_pieces = []
text_piece_num = 0
for paragraph in paragraphs:
    if len(text_pieces) == 0:
        text_pieces.append(paragraph)
    else:
        if len(text_pieces[text_piece_num] + paragraph) > piece_length_control:
            text_pieces.append(paragraph)
            text_piece_num += 1
        else:
            text_pieces[text_piece_num] += paragraph

# Generate QA Pairs
qa_pairs = []
agent = agent_factory.create_agent()
for text_piece in text_pieces:
    print("[Working on]: ", text_piece.split("\n")[0])
    result = agent\
        .input({"text": text_piece })\
        .instruct("Generate at least 5 question and answer pairs about {text}")\
        .output([{
            "question": ("String", "Question you may ask about {text}"),
            "answer": ("String", "Your answer to {question} according {text}"),
        }])\
        .start()
    qa_pairs.append({
        "origin_piece": text_piece,
        "qa_pairs": result,
    })
    print("[Done] Start next work in " + str(sleep_time) + " seconds.")
    time.sleep(sleep_time)
print("[All Works Done]\n")

# Print QA Paris
for item in qa_pairs:
    print("[Origin Text Piece]: \n", item["origin_piece"], end="\n")
    for qa in item["qa_pairs"]:
        print("Question: ", qa["question"])
        print("Answer: ", qa["answer"], end="\n")
    print("------")

[Working on]:  ## **_<font color = "red">Agent</font><font color = "blue">ly</font>_ 3.0 Guidebook**
[Done] Start next work in 5 seconds.
[Working on]:  ### Chat Group & Community
[Done] Start next work in 5 seconds.
[Working on]:  # Interact with the agent instance like calling a function
[Done] Start next work in 5 seconds.
[Working on]:  ### What is AI Agent Native Application?
[Done] Start next work in 5 seconds.
[Working on]:  ### DEMO 1: SQL Generator
[Done] Start next work in 5 seconds.
[Working on]:  ### DEMO 2: Character Creator (and Chat with the Character)
[Done] Start next work in 5 seconds.
[Working on]:  # Create Character
[Done] Start next work in 5 seconds.
[Working on]:  ## Easy to Enhance and Update: Enhance AI Agent using Plugins instead of Rebuild a Whole New Agent
[Done] Start next work in 5 seconds.
[Working on]:  ### Why does **_<font color = "red">Agent</font><font color = "blue">ly</font>_** care plugin-to-enhance so much?
[Done] Start next work in 5 seconds.
[



---

[**_<font color = "red">Agent</font><font color = "blue">ly</font>_** Framework - Speed up your AI Agent Native application development](https://github.com/Maplemx/Agently)