Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

second tutorial (configuration.md) #212

Merged
merged 18 commits into from
Aug 30, 2017
Merged

second tutorial (configuration.md) #212

merged 18 commits into from
Aug 30, 2017

Conversation

joelgrus
Copy link
Contributor

@joelgrus joelgrus commented Aug 28, 2017

in addition,

  • renames run to run.py (I left run there so as not to break anyone's workflow, but it logs a "deprecated" warning, and I plan to get rid of it eventually)
  • changes "getting started" tutorial to use installation-independent python -m allennlp.run
  • exports main as allennlp.commands.main() so that we can tell people to make their own run scripts by importing and calling it

@joelgrus joelgrus changed the title [WIP] second tutorial second tutorial (configuration.md) Aug 29, 2017
@joelgrus joelgrus requested review from matt-gardner, schmmd and DeNeutoy and removed request for DeNeutoy August 29, 2017 18:35
let's take a deeper look at our experiment configuration file,
[tutorials/getting_started/simple_tagger.json](https://github.com/allenai/allennlp/blob/master/tutorials/getting_started/simple_tagger.json).

The configuration is a JSON (or [HOCON](https://github.com/typesafehub/config/blob/master/HOCON.md)) object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this confusing. Is it JSON or HOCON? Perhaps you mean "The configuration is HOCON (but don't worry if you are not familiar with HOCON--JSON is a valid subset of HOCON)"

that we need to override the list of non-padded namespaces:

```js
"non_padded_namespaces": [],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we doing this to get padding, or simply @@UNKNOWN@@ tokens for certain postags?

Let's first look at the text field embedder configuration:

```js
"text_field_embedder": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation here is quite weird.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it help to have comments on these lines?

The `"tokens"` namespace (which consists of integer encodings of the lowercased words in the input)
gets fed into an
[`Embedding`](http://docs.allennlp.org/en/latest/api/allennlp.modules.token_embedders.html?highlight=embedding#allennlp.modules.token_embedders.embedding.Embedding)
module that embeds the vocabulary words in a 50-dimensional space.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

append "(specified by embedding-dim)"?


```js
"stacked_encoder": {
"type": "lstm",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would only indent by 4, rather than by 8. You could also remove the initial whitespace.

concatenated with a 50-dimensional vector for `"token_characters`";
that is, a 100-dimensional vector.

### The Seq2SeqEncoder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a Seq2SeqEncoder? The JSON below has nothing about Seq2Seq but rather specifies stacked_encoder. I'm guessing that stacked_encoder = Seq2SeqEncoder but it's an implciit connection.


Finally, we'll run the training for 40 epochs;
we'll stop prematurely if we get no improvement for 10 epochs;
and we'll train on the CPU.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might add instruction (or point somewhere) for how to specify a GPU, as this field is a bit opaque.


## What's Next

TODO(joelgrus): next part of the tutorial
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

Copy link
Member

@schmmd schmmd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!

Each encoding has a "namespace", in this case `"tokens"` and `"token_characters"`.
The `SequenceLabelField` also has a namespace, `"labels"`.

## non-padded namespaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping this would be a topic almost all users could ignore, that just shows up in some deep dive. It's unfortunate that we need to use it here. Can you construct the validation dataset such that it's not necessary in this tutorial?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's a good idea, I'll just filter out the lines with the stray tags

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I fixed the datasets and removed all references to non-padded namespaces from the tutorial and config file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@joelgrus joelgrus merged commit 61b6813 into master Aug 30, 2017
@joelgrus joelgrus deleted the click branch August 30, 2017 22:33
schmmd pushed a commit that referenced this pull request Feb 26, 2020
* Set default input limits for forms (#209)
* Use rudder for deploy related functionality. (#211)
schmmd pushed a commit that referenced this pull request Feb 26, 2020
* Set default input limits for forms (#209)
* Use rudder for deploy related functionality. (#211)
schmmd pushed a commit that referenced this pull request Feb 26, 2020
* Set default input limits for forms (#209)

* Limit the max input of text inputs and areas.

* maxlength -> maxLength

* Use rudder for deploy related functionality. (#211)

* Use rudder for deploy related functionality.

This ensures that the application is deployed to the latest n'
greatest skiff cluster.

* Use the right image path.

* Release pending changes. (#212) (#213)

* Set default input limits for forms (#209)
* Use rudder for deploy related functionality. (#211)

* Bert srl (#214)

* update srl model

* update description

* bump to commit which includes srl model

* set timeout to 15 minutes

* update to latest image, add numbers (#216)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants