# Example of generating QAs for IRS PDF
In this example, we will show you how to generate question-answers (QAs) from a pdf using Huggingface's models via `uniflow`'s [TransformQAHuggingFaceJsonFormatConfig](https://github.com/CambioML/uniflow/blob/main/uniflow/flow/config.py#L168).

For this example, we're using [IRS 2023](https://www.irs.gov/pub/irs-pdf/p535.pdf).

### Before running the code

You will need to `uniflow` conda environment to run this notebook. You can set up the environment following the instruction: https://github.com/CambioML/uniflow/tree/main#installation.

For more details, see this [instruction](https://github.com/CambioML/uniflow/tree/main?tab=readme-ov-file#huggingfacemodelconfig)

Finally, we are storing the IRS dataset in the `data\raw_input` directory as "IRS_2023.pdf". You can download the file from [here](https://www.irs.gov/pub/irs-pdf/p535.pdf).

### Update system path

In [1]:
%reload_ext autoreload
%autoreload 2

import sys

sys.path.append(".")
sys.path.append("..")
sys.path.append("../..")

### Install helper packages

In [2]:
!{sys.executable} -m pip install langchain pandas pypdf



### Import Dependency

In [31]:
import os

import pandas as pd
from dotenv import load_dotenv

from uniflow.flow.client import ExtractClient, TransformClient
from uniflow.flow.config import ExtractPDFConfig, TransformQAHuggingFaceJsonFormatConfig
from uniflow.op.extract.split.constants import MARKDOWN_HEADER_SPLITTER
from uniflow.op.model.model_config import HuggingfaceModelConfig, NougatModelConfig
from uniflow.op.prompt import PromptTemplate, Context

load_dotenv()

False

### Prepare the input data
<!-- First, we need to pre-process the PDF to get text chunks that we can feed into the model. We will use `PyPDFLoader` from langchain. -->
First, we need to pre-process the PDF to get text chunks that we can feed into the model. We will use `uniflow`'s `ExtractPDFConfig`.

#### Load PDF

In [4]:
pdf_file = "IRS_2023.pdf"

##### Set current directory and input data directory.

In [5]:
dir_cur = os.getcwd()
input_file = os.path.join(f"{dir_cur}/data/raw_input/", pdf_file)

#### Set correct path

In [6]:
data = [
    {"filename": input_file},
]

from pprint import pprint

pprint(data)

[{'filename': '/home/ubuntu/uniflow/example/transform/data/raw_input/IRS_2023.pdf'}]


#### Create extract_config

In [7]:
extract_config = ExtractPDFConfig(
    model_config=NougatModelConfig(
        model_name="0.1.0-small",
        batch_size=1,  # When batch_size>1, nougat will run on CUDA, otherwise it will run on CPU
    ),
    splitter=MARKDOWN_HEADER_SPLITTER,
)

#### Create extract_client

In [8]:
extract_client = ExtractClient(extract_config)

  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]


#### Run extract_client

In [9]:
extract_output = extract_client.run(data)

  0%|          | 0/1 [00:00<?, ?it/s]

INFO: likely hallucinated title at the end of the page: ## Costs You Can Deduct or Capitalize Page 27


100%|██████████| 1/1 [08:53<00:00, 533.49s/it]


#### Prepare sample prompts

First, we need to demonstrate sample prompts for LLM. We do this by giving a sample instruction to the PromptTemplate class

In [11]:
sample_instruction = """Assume you are an expert on tax, please generate as many question as possible based on the context. 
Make sure those questions can cover any question people can think of by reading the context."""

guided_prompt = PromptTemplate(instruction=sample_instruction)

In [13]:
input_context = [Context(context=ctx) for ctx in extract_output[0]["output"][0]["text"]]

print("sample size of processed input data: ", len(input_context))

input_context[:2]

sample size of processed input data:  197


[Context(context="**Publication 535**\n**Publication 535**\npublication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed.\n_Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications.\n_Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online."),
 Context(context='## Future Developments\nFor the latest information about developments related to Pub. 535, such as legislation enacted after it was published, go to _IRS

### Use LLM to generate data

In this example, we will use the TransformQAHuggingFaceJsonFormatConfig's LLM to generate questions and answers. Let's import the config and client of this model.

Here, we pass in our `guided_prompt` to the `TransformQAHuggingFaceJsonFormatConfig` to use our customized instructions, instead of the `uniflow` default ones.

We also want to get the response in the `json` format instead of the `text` default, so we set the `response_format` to `json_object`.

You can update the `batch_size` based on the size of the data

In [12]:
question_config = TransformQAHuggingFaceJsonFormatConfig(
    prompt_template=guided_prompt,
    model_config=HuggingfaceModelConfig(
        batch_size=1,
        response_start_key="question",
        response_format={"type": "json_object"},
    ),
)
question_client = TransformClient(question_config)

Loading checkpoint shards: 100%|██████████| 3/3 [00:04<00:00,  1.60s/it]


Now we call the `run` method on the `question_client` object to execute the question generation operation on the data shown above.

In [14]:
output_quesiton = question_client.run(input_context)

  0%|          | 0/197 [00:00<?, ?it/s]

100%|██████████| 197/197 [4:56:10<00:00, 90.20s/it]   


In [16]:
output_quesiton

[{'output': [{'response': ["instruction: Assume you are an expert on tax, please generate as many question as possible based on the context. \nMake sure those questions can cover any question people can think of by reading the context.\ncontext: **Publication 535**\n**Publication 535**\npublication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed.\n_Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications.\n_Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent 

### Process the output

Let's take a look of the generated output. We need to do a little postprocessing on the raw output.

In [68]:
# Extracting context, question, and answer into a DataFrame
contexts = []
questions = []
answers = []

In [69]:
for output in output_quesiton:
    for i in output.get("output", []):
        for response in i.get("response", []):
            parts = response.split("\n")
            response_dict = {}
            last_key = None

            for i, part in enumerate(parts):
                if ":" in part:
                    key, value = part.split(":", 1)
                    key = key.strip()
                    value = value.strip()
                    response_dict[key] = [value]
                    last_key = key
                elif last_key is not None:
                    if len(part) == 0:
                        continue
                    response_dict[last_key].append(part)

            if any(
                key not in response_dict
                for key in ["instruction", "context", "question"]
            ):
                continue

            contexts.append(response_dict["context"])
            questions.append(response_dict["question"])

In [None]:
pd.set_option("display.max_colwidth", 1000)
pd.set_option("display.width", 1000)

df = pd.DataFrame(
    {
        "Context": [
            " ".join(context) if isinstance(context, list) else context
            for context in contexts
        ],
        "Question": questions,
    }
)

df = df.explode("Question")

In [92]:
df

Unnamed: 0,Context,Question
0,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online.",
0,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online.",1. Where can I find specific tax topics in Publication 535?
0,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online.",2. How do I use the search feature on the IRS Interactive Tax Assistant (ITA) page to find tax topics related to Publication 535?
0,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online.",3. What categories of tax topics are available on the IRS Interactive Tax Assistant (ITA) page that relate to Publication 535?
0,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online.","4. Can I download both current and prior-year forms, instructions, and publications related to Publication 535 from IRS.gov/Forms?"
...,...,...
195,"### How Can You Learn About Your Taxpayer Rights? The Taxpayer Bill of Rights describes 10 basic rights that all taxpayers have when dealing with the IRS. Go to _Taxpayer.ak/.pick/.pick_ to help you understand what these rights mean to you and how they apply. These are _your_ rights. Know them. Use them. You can find a list of your rights and the IRS's obligations to protect them in _Pub.L.Y. Your Rights as a Taxayer_. It includes the following. 1. **The Right To Be Informed.** Taxayers have the right to know what they need to do to comply with the tax laws. They are entitled to clear explanations of the laws and IRS procedures in all tax forms, instructions, publications, notices, and correspondence. They have the right to be informed of IRS decisions about their tax accounts and to receive clear explanations of the outcomes. 2. **The Right to Quality Service.** Taxayers have the right to receive prompt, courous, and professional assistance in their dealings with the IRS, to be sp...",36. What are some common misconceptions about tax filing requirements?
195,"### How Can You Learn About Your Taxpayer Rights? The Taxpayer Bill of Rights describes 10 basic rights that all taxpayers have when dealing with the IRS. Go to _Taxpayer.ak/.pick/.pick_ to help you understand what these rights mean to you and how they apply. These are _your_ rights. Know them. Use them. You can find a list of your rights and the IRS's obligations to protect them in _Pub.L.Y. Your Rights as a Taxayer_. It includes the following. 1. **The Right To Be Informed.** Taxayers have the right to know what they need to do to comply with the tax laws. They are entitled to clear explanations of the laws and IRS procedures in all tax forms, instructions, publications, notices, and correspondence. They have the right to be informed of IRS decisions about their tax accounts and to receive clear explanations of the outcomes. 2. **The Right to Quality Service.** Taxayers have the right to receive prompt, courous, and professional assistance in their dealings with the IRS, to be sp...",37. How does percentage depletion work for oil and gas wells?
195,"### How Can You Learn About Your Taxpayer Rights? The Taxpayer Bill of Rights describes 10 basic rights that all taxpayers have when dealing with the IRS. Go to _Taxpayer.ak/.pick/.pick_ to help you understand what these rights mean to you and how they apply. These are _your_ rights. Know them. Use them. You can find a list of your rights and the IRS's obligations to protect them in _Pub.L.Y. Your Rights as a Taxayer_. It includes the following. 1. **The Right To Be Informed.** Taxayers have the right to know what they need to do to comply with the tax laws. They are entitled to clear explanations of the laws and IRS procedures in all tax forms, instructions, publications, notices, and correspondence. They have the right to be informed of IRS decisions about their tax accounts and to receive clear explanations of the outcomes. 2. **The Right to Quality Service.** Taxayers have the right to receive prompt, courous, and professional assistance in their dealings with the IRS, to be sp...",38. What is the difference between exploratory expenses and development costs for tax purposes?
195,"### How Can You Learn About Your Taxpayer Rights? The Taxpayer Bill of Rights describes 10 basic rights that all taxpayers have when dealing with the IRS. Go to _Taxpayer.ak/.pick/.pick_ to help you understand what these rights mean to you and how they apply. These are _your_ rights. Know them. Use them. You can find a list of your rights and the IRS's obligations to protect them in _Pub.L.Y. Your Rights as a Taxayer_. It includes the following. 1. **The Right To Be Informed.** Taxayers have the right to know what they need to do to comply with the tax laws. They are entitled to clear explanations of the laws and IRS procedures in all tax forms, instructions, publications, notices, and correspondence. They have the right to be informed of IRS decisions about their tax accounts and to receive clear explanations of the outcomes. 2. **The Right to Quality Service.** Taxayers have the right to receive prompt, courous, and professional assistance in their dealings with the IRS, to be sp...",39. What are the key differences between partnerships and corporations for tax purposes?


In [94]:
df = df.drop_duplicates()
df

Unnamed: 0,Context,Question
0,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online.",
0,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online.",1. Where can I find specific tax topics in Publication 535?
0,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online.",2. How do I use the search feature on the IRS Interactive Tax Assistant (ITA) page to find tax topics related to Publication 535?
0,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online.",3. What categories of tax topics are available on the IRS Interactive Tax Assistant (ITA) page that relate to Publication 535?
0,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online.","4. Can I download both current and prior-year forms, instructions, and publications related to Publication 535 from IRS.gov/Forms?"
...,...,...
195,"### How Can You Learn About Your Taxpayer Rights? The Taxpayer Bill of Rights describes 10 basic rights that all taxpayers have when dealing with the IRS. Go to _Taxpayer.ak/.pick/.pick_ to help you understand what these rights mean to you and how they apply. These are _your_ rights. Know them. Use them. You can find a list of your rights and the IRS's obligations to protect them in _Pub.L.Y. Your Rights as a Taxayer_. It includes the following. 1. **The Right To Be Informed.** Taxayers have the right to know what they need to do to comply with the tax laws. They are entitled to clear explanations of the laws and IRS procedures in all tax forms, instructions, publications, notices, and correspondence. They have the right to be informed of IRS decisions about their tax accounts and to receive clear explanations of the outcomes. 2. **The Right to Quality Service.** Taxayers have the right to receive prompt, courous, and professional assistance in their dealings with the IRS, to be sp...",36. What are some common misconceptions about tax filing requirements?
195,"### How Can You Learn About Your Taxpayer Rights? The Taxpayer Bill of Rights describes 10 basic rights that all taxpayers have when dealing with the IRS. Go to _Taxpayer.ak/.pick/.pick_ to help you understand what these rights mean to you and how they apply. These are _your_ rights. Know them. Use them. You can find a list of your rights and the IRS's obligations to protect them in _Pub.L.Y. Your Rights as a Taxayer_. It includes the following. 1. **The Right To Be Informed.** Taxayers have the right to know what they need to do to comply with the tax laws. They are entitled to clear explanations of the laws and IRS procedures in all tax forms, instructions, publications, notices, and correspondence. They have the right to be informed of IRS decisions about their tax accounts and to receive clear explanations of the outcomes. 2. **The Right to Quality Service.** Taxayers have the right to receive prompt, courous, and professional assistance in their dealings with the IRS, to be sp...",37. How does percentage depletion work for oil and gas wells?
195,"### How Can You Learn About Your Taxpayer Rights? The Taxpayer Bill of Rights describes 10 basic rights that all taxpayers have when dealing with the IRS. Go to _Taxpayer.ak/.pick/.pick_ to help you understand what these rights mean to you and how they apply. These are _your_ rights. Know them. Use them. You can find a list of your rights and the IRS's obligations to protect them in _Pub.L.Y. Your Rights as a Taxayer_. It includes the following. 1. **The Right To Be Informed.** Taxayers have the right to know what they need to do to comply with the tax laws. They are entitled to clear explanations of the laws and IRS procedures in all tax forms, instructions, publications, notices, and correspondence. They have the right to be informed of IRS decisions about their tax accounts and to receive clear explanations of the outcomes. 2. **The Right to Quality Service.** Taxayers have the right to receive prompt, courous, and professional assistance in their dealings with the IRS, to be sp...",38. What is the difference between exploratory expenses and development costs for tax purposes?
195,"### How Can You Learn About Your Taxpayer Rights? The Taxpayer Bill of Rights describes 10 basic rights that all taxpayers have when dealing with the IRS. Go to _Taxpayer.ak/.pick/.pick_ to help you understand what these rights mean to you and how they apply. These are _your_ rights. Know them. Use them. You can find a list of your rights and the IRS's obligations to protect them in _Pub.L.Y. Your Rights as a Taxayer_. It includes the following. 1. **The Right To Be Informed.** Taxayers have the right to know what they need to do to comply with the tax laws. They are entitled to clear explanations of the laws and IRS procedures in all tax forms, instructions, publications, notices, and correspondence. They have the right to be informed of IRS decisions about their tax accounts and to receive clear explanations of the outcomes. 2. **The Right to Quality Service.** Taxayers have the right to receive prompt, courous, and professional assistance in their dealings with the IRS, to be sp...",39. What are the key differences between partnerships and corporations for tax purposes?


In [96]:
df = df[df["Question"].str.strip().astype(bool)]
df

Unnamed: 0,Context,Question
0,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online.",1. Where can I find specific tax topics in Publication 535?
0,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online.",2. How do I use the search feature on the IRS Interactive Tax Assistant (ITA) page to find tax topics related to Publication 535?
0,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online.",3. What categories of tax topics are available on the IRS Interactive Tax Assistant (ITA) page that relate to Publication 535?
0,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online.","4. Can I download both current and prior-year forms, instructions, and publications related to Publication 535 from IRS.gov/Forms?"
0,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online.","5. How long does it take for the IRS to process my request for ordering current forms, instructions, and publications through their website?"
...,...,...
195,"### How Can You Learn About Your Taxpayer Rights? The Taxpayer Bill of Rights describes 10 basic rights that all taxpayers have when dealing with the IRS. Go to _Taxpayer.ak/.pick/.pick_ to help you understand what these rights mean to you and how they apply. These are _your_ rights. Know them. Use them. You can find a list of your rights and the IRS's obligations to protect them in _Pub.L.Y. Your Rights as a Taxayer_. It includes the following. 1. **The Right To Be Informed.** Taxayers have the right to know what they need to do to comply with the tax laws. They are entitled to clear explanations of the laws and IRS procedures in all tax forms, instructions, publications, notices, and correspondence. They have the right to be informed of IRS decisions about their tax accounts and to receive clear explanations of the outcomes. 2. **The Right to Quality Service.** Taxayers have the right to receive prompt, courous, and professional assistance in their dealings with the IRS, to be sp...",36. What are some common misconceptions about tax filing requirements?
195,"### How Can You Learn About Your Taxpayer Rights? The Taxpayer Bill of Rights describes 10 basic rights that all taxpayers have when dealing with the IRS. Go to _Taxpayer.ak/.pick/.pick_ to help you understand what these rights mean to you and how they apply. These are _your_ rights. Know them. Use them. You can find a list of your rights and the IRS's obligations to protect them in _Pub.L.Y. Your Rights as a Taxayer_. It includes the following. 1. **The Right To Be Informed.** Taxayers have the right to know what they need to do to comply with the tax laws. They are entitled to clear explanations of the laws and IRS procedures in all tax forms, instructions, publications, notices, and correspondence. They have the right to be informed of IRS decisions about their tax accounts and to receive clear explanations of the outcomes. 2. **The Right to Quality Service.** Taxayers have the right to receive prompt, courous, and professional assistance in their dealings with the IRS, to be sp...",37. How does percentage depletion work for oil and gas wells?
195,"### How Can You Learn About Your Taxpayer Rights? The Taxpayer Bill of Rights describes 10 basic rights that all taxpayers have when dealing with the IRS. Go to _Taxpayer.ak/.pick/.pick_ to help you understand what these rights mean to you and how they apply. These are _your_ rights. Know them. Use them. You can find a list of your rights and the IRS's obligations to protect them in _Pub.L.Y. Your Rights as a Taxayer_. It includes the following. 1. **The Right To Be Informed.** Taxayers have the right to know what they need to do to comply with the tax laws. They are entitled to clear explanations of the laws and IRS procedures in all tax forms, instructions, publications, notices, and correspondence. They have the right to be informed of IRS decisions about their tax accounts and to receive clear explanations of the outcomes. 2. **The Right to Quality Service.** Taxayers have the right to receive prompt, courous, and professional assistance in their dealings with the IRS, to be sp...",38. What is the difference between exploratory expenses and development costs for tax purposes?
195,"### How Can You Learn About Your Taxpayer Rights? The Taxpayer Bill of Rights describes 10 basic rights that all taxpayers have when dealing with the IRS. Go to _Taxpayer.ak/.pick/.pick_ to help you understand what these rights mean to you and how they apply. These are _your_ rights. Know them. Use them. You can find a list of your rights and the IRS's obligations to protect them in _Pub.L.Y. Your Rights as a Taxayer_. It includes the following. 1. **The Right To Be Informed.** Taxayers have the right to know what they need to do to comply with the tax laws. They are entitled to clear explanations of the laws and IRS procedures in all tax forms, instructions, publications, notices, and correspondence. They have the right to be informed of IRS decisions about their tax accounts and to receive clear explanations of the outcomes. 2. **The Right to Quality Service.** Taxayers have the right to receive prompt, courous, and professional assistance in their dealings with the IRS, to be sp...",39. What are the key differences between partnerships and corporations for tax purposes?


#### If you want to save the output and do the rest of the generation later, you can save and retrieve it here.

In [97]:
# df.to_pickle('my_dataframe.pkl')

In [4]:
# df_new = pd.read_pickle('my_dataframe.pkl')

In [26]:
# df_new

In [27]:
# df = df_new
# df

#### We need to create the prompt and instruction for answer generation

In [12]:
answer_instruction = """
Based on the context provided, generate an answer that directly addresses the question. Start your response with the question number followed by a period and a space. For example, if the question is number 1, begin your answer with '1. ' followed by the response.
"""


answer_prompt = PromptTemplate(instruction=answer_instruction)

print("answer_instruction:")
print(answer_instruction, "\n")

answer_instruction:

Based on the context provided, generate an answer that directly addresses the question. Start your response with the question number followed by a period and a space. For example, if the question is number 1, begin your answer with '1. ' followed by the response.
 



#### TransformConfig for answer

In [13]:
answer_config = TransformQAHuggingFaceJsonFormatConfig(
    prompt_template=answer_prompt,
    model_config=HuggingfaceModelConfig(
        batch_size=16,
        response_start_key="answer",
        response_format={"type": "json_object"},
    ),
)
answer_client = TransformClient(answer_config)

Loading checkpoint shards: 100%|██████████| 3/3 [00:04<00:00,  1.50s/it]


#### Format data to feed into answer_client

In [10]:
input_question = [
    Context(
        context=row["Context"],
        question=row["Question"],
    )
    for index, row in df.iterrows()
]

print("sample size of processed input data: ", len(input_question))

input_question[:2]

sample size of processed input data:  2622


[Context(context="**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online. ", question='1. Where can I find specific tax topics in Publication 535?'),
 Context(context="**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ sec

#### `run` the `answer_client`

In [11]:
output_answer = answer_client.run(input_question)

100%|██████████| 164/164 [2:52:01<00:00, 62.93s/it]   


#### Process the output

In [14]:
contexts = []
questions = []
answers = []

for output in output_answer:
    for i in output.get("output", []):
        for response in i.get("response", []):
            parts = response.split("\n")
            response_dict = {}
            last_key = None

            for i, part in enumerate(parts):
                if not part or len(part) == 0:
                    continue
                if ":" in part:
                    key, value = part.split(":", 1)
                    key, value = key.strip(), value.strip()
                    if key not in response_dict:
                        response_dict[key] = value
                    else:
                        print("duplicate values")
                    last_key = key
                else:
                    response_dict[last_key] += " " + part

            if any(
                key not in response_dict
                for key in ["instruction", "context", "question", "answer"]
            ):
                continue

            contexts.append(response_dict["context"])
            questions.append(response_dict["question"])
            answers.append(response_dict["answer"])

In [15]:
pd.set_option("display.max_colwidth", 1000)
pd.set_option("display.width", 100)

print(len(contexts))
print(len(questions))
print(len(answers))

df = pd.DataFrame({"Context": contexts, "Question": questions, "Answer": answers})

2368
2368
2368


In [16]:
df

Unnamed: 0,Context,Question,Answer
0,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online.",1. Where can I find specific tax topics in Publication 535?,1. You can find specific tax topics in Publication 535 by going to the IRS Interactive Tax Assistant page at IRS.gov ( Hela/ITA ). Use the search feature or view the categories listed to find topics related to your query.
1,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online.",2. How do I use the search feature on the IRS Interactive Tax Assistant (ITA) page to find tax topics related to Publication 535?,"2. To use the search feature on the IRS Interactive Tax Assistant (ITA) page to find tax topics related to Publication 535, follow these steps: 1. Go to the IRS Interactive Tax Assistant (ITA) page at IRS.gov/Help/ITA. 2. In the Search box located in the top right corner of the page, type ""Publication 535"" or specific keywords related to the topic within the publication. 3. Press Enter or click the magnifying glass icon next to the search box to initiate the search. 4. Review the results displayed on the screen and select the relevant tax topic to access more information and resources."
2,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online.",3. What categories of tax topics are available on the IRS Interactive Tax Assistant (ITA) page that relate to Publication 535?,"3. On the IRS Interactive Tax Assistant (ITA) page, the following categories related to Publication 535 may be found: ""Getting tax help,"" which includes topics on how to obtain assistance with tax issues, including information about Publication 535 itself. Other relevant categories might include ""Tax credits and deductions"" or ""Retirement plans and IRAs."" Use the search feature on the IRS Interactive Tax Assistant page to explore these and other potential categories in more detail."
3,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online.","4. Can I download both current and prior-year forms, instructions, and publications related to Publication 535 from IRS.gov/Forms?","4. Yes, you can download both current and prior-year forms, instructions, and publications related to Publication 535 from IRS.gov/Forms."
4,"**Publication 535** **Publication 535** publication or the _How To Get Tax Help_ section at the end of this publication, go to the IRS Interactive Tax Assistant page at _IRS.gov_ _Hela/ITA_ where you can find topics by using the search feature or viewing the categories listed. _Getting tax forms, instructions, and publications_. Go to _IRS.gov/Forms_ to download current and prior-year forms, instructions, and publications. _Ordering tax forms, instructions, and publications._ Go to _IRS.gov/OrderForms_ to order current forms, instructions, and publications; call 800-829-3676 to order prior-year forms and instructions. The IRS will process your order for forms and publications as soon as possible. **Don't** resmbur requests you already sent us. You can get forms and publications faster online.","5. How long does it take for the IRS to process my request for ordering current forms, instructions, and publications through their website?","5. There is no specific time mentioned in the context for the IRS to process a request for ordering current forms, instructions, and publications through their website. However, it is suggested that individuals can get these materials faster online compared to ordering them."
...,...,...,...
2363,### What Is TAS? TAS is an _Independent_ organization within the IRS that helps taxpayers and protects taxpayer rights. Their job is to ensure that every taxpayer is treated fairly and that you know and understand your rights under the _Taxpayer Bill of Rights_.,5. What types of issues can taxpayers bring before TAS for resolution?,"5. Taxpayers can bring various issues before TAS for resolution, including but not limited to: disputes involving income tax, payroll tax, estate and gift tax, and collection matters. They can also seek assistance in resolving identity theft cases or appeals related to penalties or interest assessments. Additionally, TAS may help taxpayers who believe they have received incorrect refunds or overassessments. However, it's important to note that TAS does not have the authority to change the law or grant amnesty from taxes owed."
2364,### What Is TAS? TAS is an _Independent_ organization within the IRS that helps taxpayers and protects taxpayer rights. Their job is to ensure that every taxpayer is treated fairly and that you know and understand your rights under the _Taxpayer Bill of Rights_.,6. How long does it typically take for TAS to review and respond to a case once it has been submitted?,"6. The length of time it takes for TAS to review and respond to a case can vary greatly depending on the complexity of the issue and the volume of cases they are currently handling. On average, it may take anywhere from several months to over a year for a decision to be made. However, TAS strives to resolve each case as quickly as possible while ensuring thoroughness and fairness. It's important to note that the processing time can also depend on whether additional information or documentation is required from the taxpayer during the review process."
2365,### What Is TAS? TAS is an _Independent_ organization within the IRS that helps taxpayers and protects taxpayer rights. Their job is to ensure that every taxpayer is treated fairly and that you know and understand your rights under the _Taxpayer Bill of Rights_.,7. Are there any costs associated with filing a request with TAS for help with a tax issue?,"7. No, there are no fees or costs associated with filing a request for assistance with Taxpayer Advocacy Service (TAS) regarding a tax issue. The service is free to all eligible taxpayers. However, keep in mind that some actions taken as part of resolving a tax matter may require payment, such as owing back taxes or penalties. But the act of contacting TAS itself does not come with any cost."
2366,### What Is TAS? TAS is an _Independent_ organization within the IRS that helps taxpayers and protects taxpayer rights. Their job is to ensure that every taxpayer is treated fairly and that you know and understand your rights under the _Taxpayer Bill of Rights_.,8. Does contacting TAS affect any ongoing audit or collection process with the IRS?,"8. Contacting the Taxpayer Advocacy Service (TAS) does not automatically stop any ongoing audits or collections processes with the Internal Revenue Service (IRS). However, TAS may be able to help taxpayers resolve issues they have with the IRS, including those related to audits and collections. If you are concerned about how contacting TAS might impact your specific situation, it's best to consult with a tax professional or contact TAS directly for guidance."


In [20]:
df.to_csv("output.csv", index=False)