-
Notifications
You must be signed in to change notification settings - Fork 268
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
4 changed files
with
302 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,262 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%reload_ext autoreload\n", | ||
"%autoreload 2\n", | ||
"%matplotlib inline\n", | ||
"import os\n", | ||
"os.environ[\"CUDA_DEVICE_ORDER\"]=\"PCI_BUS_ID\";\n", | ||
"os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\"; " | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Generative Question-Answering in `ktrain` Using OpenAI Models\n", | ||
"\n", | ||
"As of v0.37.x of **ktrain** supports **Generative Question-Answering** using OpenAI's models like GPT-3.5-turbo. You can get an API key at the [OpenAI website](https://platform.openai.com/account/api-keys) and set it in the cell below.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"os.environ['OPENAI_API_KEY'] = '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'" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# This notebook won't incur very many charges, but you\n", | ||
"# can go to openai.com to view incurred charges from API calls\n", | ||
"os.environ['OPENAI_API_KEY'] = 'ENTER YOUR OPENAI API KEY HERE'" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from ktrain .text.qa import GenerativeQA\n", | ||
"genqa = GenerativeQA()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Let's download the ktrain paper from ArXiv and extract text from it using the `TextExtractor`" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!wget --user-agent=\"Mozilla\" https://arxiv.org/pdf/2004.10703.pdf -O /tmp/downloaded_paper.pdf -q\n", | ||
"from ktrain.text.textextractor import TextExtractor\n", | ||
"text = TextExtractor().extract('/tmp/downloaded_paper.pdf')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Adding documents to the index\n", | ||
"Although we could add the document suppyling the path to the downloaded PDF paper directly, we will instead just use the extracted text." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"genqa.add_doc(text=text)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Let's submit a query\n", | ||
"\n", | ||
"The `GenerativeQA` module will return an answer with citations to documents in your index (in our case, there is only one). The `GenerativeQA` model is a simple wrapper to the `paper-qa` package. By default, citations are in the form of MD5 hashes of the text supplied as input. You can supply custom citations and citation keys by supplying the `citation` and `key` parameters to `add_doc`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Question: What is ktrain?\n", | ||
"\n", | ||
"Ktrain is a low-code Python library designed to make machine learning more accessible and easier to apply for both beginners and experienced practitioners. It provides a simple unified interface enabling one to quickly solve a wide range of tasks in as little as three or four \"commands\" or lines of code. Ktrain can be used with any machine learning model implemented in TensorFlow Keras (tf.keras) and includes out-of-the-box support for text data (e.g., text classification, sequence tagging, open-domain question-answering), vision data (e.g., image classification), graph data (e.g., node classification, link prediction), and tabular data. (md5:1daab15d256e4843ffe094079711bf9c)\n", | ||
"\n", | ||
"However, it should be noted that the provided context only provides a brief overview of ktrain and its capabilities. For more detailed information, it is recommended to refer to the official documentation and resources.\n", | ||
"\n", | ||
"References\n", | ||
"\n", | ||
"1. (md5:1daab15d256e4843ffe094079711bf9c): Document md5:1daab15d256e4843ffe094079711bf9c\n", | ||
"\n", | ||
"\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"print(genqa.query('What is ktrain?'))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Save the current state of the document index and other data" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"genqa.save('/tmp/my_generative_qa')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Re-load the document index" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"genqa = GenerativeQA()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"genqa.load('/tmp/my_generative_qa')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Question: What is ktrain?\n", | ||
"\n", | ||
"Ktrain is a low-code Python library that simplifies the process of building, training, inspecting, and applying machine learning models. It provides a unified interface that enables both beginners and experienced practitioners to quickly solve a wide range of tasks with just a few lines of code. Ktrain can be used with any machine learning model implemented in TensorFlow Keras (tf.keras) and includes out-of-the-box support for text data, vision data, graph data, and tabular data. The text also mentions that ktrain provides examples of both supervised and non-supervised machine learning tasks, including named entity recognition, node classification with graph neural networks, theme discovery, and zero-shot topic classification. (md5:1daab15d256e4843ffe094079711bf9c) However, it is not clear from the context what specific machine learning models are supported by ktrain.\n", | ||
"\n", | ||
"References\n", | ||
"\n", | ||
"1. (md5:1daab15d256e4843ffe094079711bf9c): Document md5:1daab15d256e4843ffe094079711bf9c\n", | ||
"\n", | ||
"\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"print(genqa.query('What is ktrain?'))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Delete the document index to start over" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"are you sure you want to delete the vector index? (y/n)y\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"genqa.clear_index()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Since the documents were deleted, there is no longer data to answer the question" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Question: What is ktrain?\n", | ||
"\n", | ||
"I cannot answer this question due to insufficient information.\n", | ||
"\n", | ||
"\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"print(genqa.query('What is ktrain?'))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "python3", | ||
"language": "python", | ||
"name": "python3" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters