Skip to content

Commit

Permalink
Merge branch 'develop'
Browse files Browse the repository at this point in the history
  • Loading branch information
amaiya committed Oct 13, 2021
2 parents a882d96 + eace7d6 commit 72a0299
Show file tree
Hide file tree
Showing 20 changed files with 1,567 additions and 47 deletions.
14 changes: 13 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,19 @@ Most recent releases are shown at the top. Each release shows:
- **Changed**: Additional parameters, changes to inputs or outputs, etc
- **Fixed**: Bug fixes that don't change documented behaviour

## 0.28.0 (2021-10-13)

### New:
- `text.AnswerExtractor` is a universal information extractor powered by a Question-Answering module and capable of extracting user-specfied information from texts.
- `text.TextExtractor` is a text extraction pipeline (e.g., convert PDFs to plain text)

### Changed
- changed transformers pin to `transformers>=4.0.0,<=4.10.3`

### Fixed:
- N/A


## 0.27.3 (2021-09-03)

### New:
Expand All @@ -19,7 +32,6 @@ Most recent releases are shown at the top. Each release shows:
- change API call to support newest `causalnlp`



## 0.27.2 (2021-07-28)

### New:
Expand Down
2 changes: 1 addition & 1 deletion FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -806,7 +806,7 @@ You can safely ignore the error, if it arises from downloading Hugging Face **tr

If you have documents in formats like `.pdf`, `.docx`, or `.pptx` formats and want to use them in a training set or with various **ktrain** features
like zero-shot-learning or text summarization, they will need to be converted to plain text format first (i.e., `.txt` files). You can use the
`ktrain.text.textutils.extract_copy` function to automatically do this. Alternatively, you can use other tools like [Apache Tika](https://tika.apache.org/) to do the conversion.
`ktrain.text.textutils.extract_copy` function to automatically do this. As of v0.28.x of ktrain, there is also the [TextExtractor](https://nbviewer.org/github/amaiya/ktrain/blob/develop/examples/text/text_extraction_example.ipynb) that can be used for conversion. Alternatively, you can use other tools like [Apache Tika](https://tika.apache.org/) to do the conversion.

With respect to Question-Answering, the `SimpleQA.index_from_folder` method includes a `use_text_extraction` argument. When set to `True`, question-answering can be performed on document sets
comprised of many different file types. More information on this is included in the [question-answering example notebook](https://github.com/amaiya/ktrain/blob/master/examples/text/question_answering_with_bert.ipynb).
Expand Down
29 changes: 29 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,31 @@


### News and Announcements
- **2021-10-15**
- **ktrain v0.28.x** is released and now includes the `AnswerExtractor`, which allows you to extract any information of interest from documents by simply phrasing it in the form of a question. A short example is shown here, but see the [example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/develop/examples/text/qa_information_extraction.ipynb) for more information.
```python
# QA-Based Information Extraction

# DataFrame BEFORE
df.head()
# Text
#0 Three major risk factors for COVID-19 were sex (male), age (≥60), and severe pneumonia.
#1 His speciality is medical risk assessments, and he is 30 years old.
#2 A total of nine studies including 356 patients were included in this study.

# AnswerExtractor will create two new columns: 'Risk Factors' and 'Sample Size'
from ktrain.text import AnswerExtractor
ae = AnswerExtractor()
df = ae.extract(df.Text.values, df, [('What are the risk factors?', 'Risk Factors'),
('How many individuals in sample?', 'Sample Size')])

# DataFrame AFTER
df[['Risk Fctors', 'Sample Size']].head()
# Risk Factors Sample Size
#0 sex (male), age (≥60), and severe pneumonia None
#1 None None
#2 None 356
```
- **2021-07-20**
- **ktrain v0.27.x** is released and now supports causal inference using [meta-learners](https://arxiv.org/abs/1706.03461). See the [example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/develop/examples/tabular/causal_inference_example.ipynb) for more information.
- **2021-07-15**
Expand All @@ -35,6 +60,8 @@
- **Easy-to-Use Built-In Search Engine**: perform keyword searches on large collections of documents <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/question_answering_with_bert.ipynb)]</sup></sub>
- **Zero-Shot Learning**: classify documents into user-provided topics **without** training examples <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/zero_shot_learning_with_nli.ipynb)]</sup></sub>
- **Language Translation**: translate text from one language to another <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/language_translation_example.ipynb)]</sup></sub>
- **Text Extraction**: Extract text from PDFs, Word documents, etc. <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/text/text_extraction_example.ipynb)]</sup></sub>
- **Universal Information Extraction**: extract any kind of information from documents by simply phrasing it in the form of a question <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/develop/examples/text/qa_information_extraction.ipynb)]</sup></sub>
- `vision` data:
- **image classification** (e.g., [ResNet](https://arxiv.org/abs/1512.03385), [Wide ResNet](https://arxiv.org/abs/1605.07146), [Inception](https://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf)) <sub><sup>[[example notebook](https://colab.research.google.com/drive/1WipQJUPL7zqyvLT10yekxf_HNMXDDtyR)]</sup></sub>
- **image regression** for predicting numerical targets from photos (e.g., age prediction) <sub><sup>[[example notebook](https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/examples/vision/utk_faces_age_prediction-resnet50.ipynb)]</sup></sub>
Expand Down Expand Up @@ -314,6 +341,8 @@ pip install torch
pip install shap
# for ktrain.tabular.causal_inference_model
pip install causalnlp
# for ktrain.text.TextExtractor
pip install textract
```
If the above libaries are not installed, **ktrain** will complain when a method or function needing either any of the above is invoked.
Notice that **ktrain** is using forked versions of the `eli5` and `stellargraph` libraries above in order to support TensorFlow2.
Expand Down
4 changes: 4 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ This directory contains various example notebooks using *ktrain*. The directory
- [Open-Domain Question-Answering](#textqa): ask questions to a large text corpus and receive exact candidate answers
- [Zero-Shot Learning](#zsl): classify documents by user-supplied topics **without** any training examples
- [Language Translation](#translation): an example of language translation using pretrained MarianMT models
- [Text Extraction](#textextraction): extract text from PDFs, Word documents, etc.
- [Universal Information Extraction](#extraction): an example of using a Question-Answering model for information extraction
- `vision`:
- [image classification](#imageclass): models for image datasetsimage classification examples using various models and datasets
- [image regression](#imageregression): example of predicting numerical values purely from images/photos
Expand Down Expand Up @@ -142,6 +144,8 @@ The objective of the CoNLL2003 task is to classify sequences of words as belongi
### <a name="textqa"></a>Open-Domain Question-Answering: [question_answering_with_bert.ipynb](https://github.com/amaiya/ktrain/tree/master/examples/text)
### <a name="zsl"></a>Zero-Shot Learning: [zero_shot_learning_with_nli.ipynb](https://github.com/amaiya/ktrain/tree/master/examples/text)
### <a name="translation"></a>Language Translation: [language_translation_example.ipynb](https://github.com/amaiya/ktrain/tree/master/examples/text)
### <a name="textextraction"></a>Text Extraction: [text_extraction_example.ipynb](https://github.com/amaiya/ktrain/tree/master/examples/text)
### <a name="extraction"></a>Universal Information Extraction: [qa_information_extraction.ipynb](https://github.com/amaiya/ktrain/tree/master/examples/text)


## Vision Data
Expand Down

0 comments on commit 72a0299

Please sign in to comment.