Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model][MXNet] MXNet Tree LSTM example #279

Merged
merged 8 commits into from Dec 27, 2018
Merged

[Model][MXNet] MXNet Tree LSTM example #279

merged 8 commits into from Dec 27, 2018

Conversation

szha
Copy link
Member

@szha szha commented Dec 8, 2018

Description

continue #234

@jermainewang
Copy link
Member

Continue from your last comments:

I wanted to use gluonnlp for this. The thing is, the vocabulary of SST is never exposed, so I cannot shuffle the embedding outside of SST class beforehand.

Maybe you could use this? https://github.com/dmlc/dgl/blob/master/python/dgl/data/tree.py#L68 . We could expose this member and doc it clearly.

@szha
Copy link
Member Author

szha commented Dec 9, 2018

Not really. That's still only available after instantiating SST class.

@jermainewang
Copy link
Member

@szha feel free to change the data module as you wish. We could change the dataset to accept the external vocabulary.

@szha
Copy link
Member Author

szha commented Dec 14, 2018

Epoch 00019 | Step 00005 | Loss 4429.1836 | Acc 0.8113 | Root Acc 0.4727 | Time(s) 0.1602
Epoch 00019 | Step 00010 | Loss 4375.8833 | Acc 0.8147 | Root Acc 0.5352 | Time(s) 0.1601
Epoch 00019 | Step 00015 | Loss 4424.3398 | Acc 0.8081 | Root Acc 0.5703 | Time(s) 0.1600
Epoch 00019 | Step 00020 | Loss 4459.7549 | Acc 0.8126 | Root Acc 0.5156 | Time(s) 0.1598
Epoch 00019 | Step 00025 | Loss 4357.2935 | Acc 0.8135 | Root Acc 0.4961 | Time(s) 0.1596
Epoch 00019 | Step 00030 | Loss 4382.1328 | Acc 0.8193 | Root Acc 0.4961 | Time(s) 0.1593
Epoch 00019 training time 7.0636s
Epoch 00019 | Dev Acc 0.8139 | Root Acc 0.4723
0.04089534687986153
0.04089534687986153
Epoch 00020 | Step 00005 | Loss 4621.7788 | Acc 0.8121 | Root Acc 0.5508 | Time(s) 0.1593
Epoch 00020 | Step 00010 | Loss 4439.5488 | Acc 0.8166 | Root Acc 0.5117 | Time(s) 0.1593
Epoch 00020 | Step 00015 | Loss 4391.4717 | Acc 0.8120 | Root Acc 0.5430 | Time(s) 0.1593
Epoch 00020 | Step 00020 | Loss 4558.4761 | Acc 0.8156 | Root Acc 0.5586 | Time(s) 0.1594
Epoch 00020 | Step 00025 | Loss 4441.6011 | Acc 0.8065 | Root Acc 0.5977 | Time(s) 0.1592
Epoch 00020 | Step 00030 | Loss 4231.6099 | Acc 0.8100 | Root Acc 0.5195 | Time(s) 0.1593
Epoch 00020 training time 8.3417s
Epoch 00020 | Dev Acc 0.8143 | Root Acc 0.4668
0.040486393411062915
0.040486393411062915
Epoch 00021 | Step 00005 | Loss 4228.3027 | Acc 0.8208 | Root Acc 0.5469 | Time(s) 0.1595
Epoch 00021 | Step 00010 | Loss 4437.4014 | Acc 0.8099 | Root Acc 0.5117 | Time(s) 0.1594
Epoch 00021 | Step 00015 | Loss 4464.4297 | Acc 0.8190 | Root Acc 0.5273 | Time(s) 0.1595
Epoch 00021 | Step 00020 | Loss 4361.5220 | Acc 0.8083 | Root Acc 0.5117 | Time(s) 0.1598
Epoch 00021 | Step 00025 | Loss 4393.3721 | Acc 0.8164 | Root Acc 0.4961 | Time(s) 0.1598
Epoch 00021 | Step 00030 | Loss 4480.3940 | Acc 0.8085 | Root Acc 0.4727 | Time(s) 0.1601
Epoch 00021 training time 8.7110s
Epoch 00021 | Dev Acc 0.8138 | Root Acc 0.4714
------------------------------------------------------------------------------------
Epoch 00011 | Test Acc 0.8063 | Root Acc 0.4855

@jermainewang
Copy link
Member

It seems that the speed has improved a lot !

@jermainewang
Copy link
Member

Is this ready to be reviewed?

@szha
Copy link
Member Author

szha commented Dec 17, 2018

yes, it's ready to be reviewed.

@szha szha requested a review from zheng-da December 18, 2018 06:42
Copy link
Member

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

> [**Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks**](http://arxiv.org/abs/1503.00075)
> *Kai Sheng Tai, Richard Socher, and Christopher Manning*.

The provided implementation can achieve a test accuracy of 51.72 which is comparable with the result reported in the original paper: 51.0(±0.5).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does MXNet Tree-LSTM produce the same result as PyTorch? That's interesting

return batcher_dev

def prepare_glove():
if not (os.path.exists('glove.840B.300d.txt')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think PyTorch Tree LSTM should prepare glove inside training script too.

{'learning_rate': args.lr})

dur = []
L = gluon.loss.SoftmaxCrossEntropyLoss(axis=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In DyNet implementation, they use reduction=sum instead of mean, I'm also not sure which one is better, but in practice using sum produce higher results.

@zheng-da
Copy link
Collaborator

@szha what you did improves its speed?

@zheng-da zheng-da merged commit 1e50cd2 into dmlc:master Dec 27, 2018
@szha szha deleted the treelstm branch December 27, 2018 20:13
@szha
Copy link
Member Author

szha commented Dec 27, 2018

Hybridization has more noticable effect on throughput when batch size is bigger.

@jermainewang jermainewang mentioned this pull request Feb 18, 2019
26 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants