In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID";
os.environ["CUDA_VISIBLE_DEVICES"]="0"; 

## Generative Question-Answering in `ktrain` Using OpenAI Models

As of v0.37.x of **ktrain** supports **Generative Question-Answering** using OpenAI's models like GPT-3.5-turbo. You can get an API key at the [OpenAI website](https://platform.openai.com/account/api-keys) and set it in the cell below.


In [None]:
# This notebook won't incur very many charges, but you
# can go to openai.com to view incurred charges from API calls
os.environ['OPENAI_API_KEY'] = 'ENTER YOUR OPENAI API KEY HERE'

In [None]:
from ktrain .text.qa import GenerativeQA
genqa = GenerativeQA()

#### Let's download the ktrain paper from ArXiv and extract text from it using the `TextExtractor`

In [None]:
!wget --user-agent="Mozilla" https://arxiv.org/pdf/2004.10703.pdf -O /tmp/downloaded_paper.pdf -q
from ktrain.text.textextractor import TextExtractor
text = TextExtractor().extract('/tmp/downloaded_paper.pdf')

#### Adding documents to the index
Although we could add the document suppyling the path to the downloaded PDF paper directly, we will instead just use the extracted text.

In [None]:
genqa.add_doc(text=text)

#### Let's submit a query

The `GenerativeQA` module will return an answer with citations to documents in your index (in our case, there is only one). The `GenerativeQA` model is a simple wrapper to the `paper-qa` package.  By default, citations are in the form of MD5 hashes of the text supplied as input.  You can supply custom citations and citation keys by supplying the `citation` and `key` parameters to `add_doc`.

In [None]:
print(genqa.query('What is ktrain?'))

Question: What is ktrain?

Ktrain is a low-code Python library designed to make machine learning more accessible and easier to apply for both beginners and experienced practitioners. It provides a simple unified interface enabling one to quickly solve a wide range of tasks in as little as three or four "commands" or lines of code. Ktrain can be used with any machine learning model implemented in TensorFlow Keras (tf.keras) and includes out-of-the-box support for text data (e.g., text classification, sequence tagging, open-domain question-answering), vision data (e.g., image classification), graph data (e.g., node classification, link prediction), and tabular data. (md5:1daab15d256e4843ffe094079711bf9c)

However, it should be noted that the provided context only provides a brief overview of ktrain and its capabilities. For more detailed information, it is recommended to refer to the official documentation and resources.

References

1. (md5:1daab15d256e4843ffe094079711bf9c): Document md

#### Save the current state of the document index and other data

In [None]:
genqa.save('/tmp/my_generative_qa')

#### Re-load the document index

In [None]:
genqa = GenerativeQA()

In [None]:
genqa.load('/tmp/my_generative_qa')

In [None]:
print(genqa.query('What is ktrain?'))

Question: What is ktrain?

Ktrain is a low-code Python library that simplifies the process of building, training, inspecting, and applying machine learning models. It provides a unified interface that enables both beginners and experienced practitioners to quickly solve a wide range of tasks with just a few lines of code. Ktrain can be used with any machine learning model implemented in TensorFlow Keras (tf.keras) and includes out-of-the-box support for text data, vision data, graph data, and tabular data. The text also mentions that ktrain provides examples of both supervised and non-supervised machine learning tasks, including named entity recognition, node classification with graph neural networks, theme discovery, and zero-shot topic classification. (md5:1daab15d256e4843ffe094079711bf9c) However, it is not clear from the context what specific machine learning models are supported by ktrain.

References

1. (md5:1daab15d256e4843ffe094079711bf9c): Document md5:1daab15d256e4843ffe0940

#### Delete the document index to start over

In [None]:
genqa.clear_index()

are you sure you want to delete the vector index? (y/n)y


#### Since the documents were deleted, there is no longer data to answer the question

In [None]:
print(genqa.query('What is ktrain?'))

Question: What is ktrain?

I cannot answer this question due to insufficient information.


