Skip to content

Commit

Permalink
Merge branch 'develop'
Browse files Browse the repository at this point in the history
  • Loading branch information
amaiya committed Feb 1, 2020
2 parents 760ce6e + beb8363 commit 995fcdd
Show file tree
Hide file tree
Showing 6 changed files with 23 additions and 6 deletions.
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,19 @@ Most recent releases are shown at the top. Each release shows:
- **Fixed**: Bug fixes that don't change documented behaviour


## 0.9.1 (2020-02-01)

### New:
- N/A

### Changed:
- `text.TextPreprocessor` prints sequence length statistics

### Fixed:
- fixed `utils.nclasses_from_data` for `ktrain.Dataset` instances
- prevent `detect_lang` failing when Pandas Series is supplied


## 0.9.0 (2020-01-31)

### New:
Expand Down
2 changes: 1 addition & 1 deletion examples/text/text_regression_example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -330,7 +330,7 @@
"output_type": "stream",
"text": [
"fasttext: a fastText-like model [http://arxiv.org/pdf/1607.01759.pdf]\n",
"linreg: linear text ression using a trainable Embedding layer\n",
"linreg: linear text regression using a trainable Embedding layer\n",
"bigru: Bidirectional GRU with pretrained word vectors [https://arxiv.org/abs/1712.09405]\n",
"standard_gru: simple 2-layer GRU with randomly initialized embeddings\n",
"bert: Bidirectional Encoder Representations from Transformers (BERT) [https://arxiv.org/abs/1810.04805]\n",
Expand Down
6 changes: 5 additions & 1 deletion ktrain/text/preprocessor.py
Original file line number Diff line number Diff line change
Expand Up @@ -326,7 +326,11 @@ def detect_lang(texts, sample_size=32):
"""
detect language
"""
if not isinstance(texts, (list, np.ndarray)): texts = [texts]
if isinstance(texts, (pd.Series, pd.DataFrame)):
texts = texts.values
if isinstance(texts, str): texts = [texts]
if not isinstance(texts, (list, np.ndarray)):
raise ValueError('texts must be a list or NumPy array of strings')
lst = []
for doc in texts[:sample_size]:
try:
Expand Down
2 changes: 1 addition & 1 deletion ktrain/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ def nsamples_from_data(data):

def nclasses_from_data(data):
if is_iter(data):
if isinstance(data, Dataset): return data.nsamples()
if isinstance(data, Dataset): return data.nclasses()
elif is_ner(data=data): return len(data.p._label_vocab._id2token) # NERSequence
elif is_huggingface(data=data): # Hugging Face Transformer
return data.y.shape[1]
Expand Down
2 changes: 1 addition & 1 deletion ktrain/version.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
__all__ = ['__version__']
__version__ = '0.9.0'
__version__ = '0.9.1'
Original file line number Diff line number Diff line change
Expand Up @@ -430,7 +430,7 @@
"## Wrapping our Data in an Instance of `ktrain.Dataset`\n",
"To use this custom data format of two inputs in *ktrain*, we will wrap it in a `ktrain.Dataset` instance, which is simply a `tf.keras` Sequence wrapper. We must be sure to override and implment the required methods (e.g., `def nsamples` and `def get_y`). The `ktrain.Dataset` class is simply a subclass of `tf.keras.utils.Sequence`. See the TensorFlow documentation on the [Sequence class](https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence) for more information on how Sequence wrappers work.\n",
"\n",
"Note that, in the implementation below, we have made `MyCustomDataset` more general such that it can wrap lists of "
"Note that, in the implementation below, we have made `MyCustomDataset` more general such that it can wrap lists containing an arbitrary number of inputs instead of just the two needed in our example. "
]
},
{
Expand Down Expand Up @@ -864,7 +864,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's look at our most expensive prediction. Our most expensive prediction (`$404`) is associated with an expensive wine priced at `$800`, which is good. However, we are `~$400` off. Again, our model has trouble with expensive wines. This is somewhat understandable since our model only looks at short textual descriptions and the winer - neither of which contain clear indicators of their exorbitant prices."
"Let's look at our most expensive prediction. Our most expensive prediction (`$404`) is associated with an expensive wine priced at `$800`, which is good. However, we are `~$400` off. Again, our model has trouble with expensive wines. This is somewhat understandable since our model only looks at short textual descriptions and the winery - neither of which contain clear indicators of their exorbitant prices."
]
},
{
Expand Down

0 comments on commit 995fcdd

Please sign in to comment.