Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Migrating to 0.4.0 #909

Merged
merged 5 commits into from Feb 21, 2018
Merged

Migrating to 0.4.0 #909

merged 5 commits into from Feb 21, 2018

Conversation

joelgrus
Copy link
Contributor

my plan is to link to this from the release notes.

let me know if I missed anything or explained anything poorly.

@joelgrus
Copy link
Contributor Author

@nelson-liu also interested in your feedback

Copy link
Member

@schmmd schmmd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic!

@nelson-liu
Copy link
Contributor

thanks, this looks great!

Copy link
Contributor

@matt-gardner matt-gardner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

In addition, the base `DatasetReader` constructor now takes a `lazy: bool` parameter,
which means that your subclass constructor should also take that parameter
(unless you don't want to allow laziness, but why wouldn't you?)
and explicitly pass it the superclass constructor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"pass it to"?

# whatever other initialization you need
```

(For the reasoning behind this change, see the [Laziness tutorial](https://github.com/allenai/allennlp/blob/master/tutorials/getting_started/laziness.md).)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove the parentheses here.


# CHANGES YOU ARE MUCH LESS LIKELY TO RUN INTO

If you only ever use our the command line tools (`python -m allennlp.run ...`) to train / evaluate models,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could say something in here about writing model / dataset reader code ("use our command line tools" doesn't make it obvious that you also shouldn't have to worry about anything else if you only wrote Models or DatasetReaders). Also, you have "use our the command line tools".


In 0.4.0, `DatasetReader.read()` returns an `Iterable[Instance]`,
which could be a list of instances or could produce them lazily.
This means that the indexing and tensorization needs to happen elsewhere.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A sentence saying where the logic moved would be nice.


As `Dataset` no longer exists, we replaced `Vocabulary.from_dataset()`
with `Vocabulary.from_instances()`, which accepts an `Iterable[Instance]`.
In particulary, you'd most likely call this with the results of one or more calls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"particulary"


To handle tensorization,
0.4.0 introduces the notion of a `Batch`,
which is basically just a list of `Instance`s.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above you have Tensor s (with a space) and here you have it with no space. Not sure which is right.


Furthermore, each `Instance` now knows whether it's been indexed,
so in the eager case (when all instances stay in memory),
the indexing only happens in the first iteration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like "This means that your first epoch will bit a little bit slower than all other epochs" would be good, I think.

@joelgrus joelgrus merged commit 6ad7d5b into allenai:master Feb 21, 2018
gabrielStanovsky pushed a commit to gabrielStanovsky/allennlp that referenced this pull request Sep 7, 2018
* migration guide

* tweaks

* address PR feedback
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants