Retraining on floydhub.com is not available #20

cnlinxi · 2020-07-27T08:48:15Z

Can you provide corpus? We cannot retraining this model on floydhub.com in readme. Thanks a lot.

p16i · 2020-07-28T08:41:49Z

@cnlinxi sorry for your inconvenience. I thought using floydhub would be sustainable but it seems very costly in a long run. So, I've decided to cancel my subscription, hence losing the datasets there.

I'll get back to you regarding the corpus. Would you mind sharing a bit on what you plan to do with the code?

cnlinxi · 2020-08-01T02:18:11Z

@heytitle Sorry for reply too late. I hope to use this model to segment Thai words, and hope to improve it. I hope to provide a good Thai text regularization method.

p16i · 2020-08-08T08:58:58Z

@cnlinxi sorry again for my response. You can find the data at https://codeforthailand.s3-ap-southeast-1.amazonaws.com/attacut-related/data.zip

Please unzip and make sure the root directory is at ./data. The content of the archive contains

Only the first two are relevant for training; sampling-0 means all the dateset, while sampling-10 means only 10 files are used. You can use sampling-10 for quick training.

Before running the training command below, make sure that you have the ./artifacts directory.

python ./scripts/train.py --model-name seq_sy_ch_conv_concat \
 --model-params "embc:8|embs:8|conv:8|l1:6|do:0.1" \
 --data-dir ./data/best-syllable-crf-and-character-seq-feature-sampling-0  \
 --output-dir ./artifacts/model-xx  \
 --epoch 2 \
 --batch-size 1024 \
 --lr 0.001 \
 --lr-schedule "step:5|gamma:0.5"

cnlinxi · 2020-08-09T14:28:52Z

@heytitle thank you very much. I have trained this model on BEST 2010. Great work:)

charlesfufu · 2021-01-01T01:35:07Z

https://codeforthailand.s3-ap-southeast-1.amazonaws.com/attacut-related/data.zip

Is word split by "~" in "best-syllable-tokenized" dataset?

cnlinxi changed the title ~~Retraining on floydhub.com is not availuable~~ Retraining on floydhub.com is not available Jul 27, 2020

cnlinxi closed this as completed Aug 9, 2020

charlesfufu mentioned this issue Jan 1, 2021

the meaning of label data #27

Closed

p16i mentioned this issue May 24, 2021

What is the format of the input data？ #29

Closed

p16i mentioned this issue Aug 8, 2021

Datasets cannot be accessed #31

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retraining on floydhub.com is not available #20

Retraining on floydhub.com is not available #20

cnlinxi commented Jul 27, 2020

p16i commented Jul 28, 2020

cnlinxi commented Aug 1, 2020

p16i commented Aug 8, 2020

cnlinxi commented Aug 9, 2020

charlesfufu commented Jan 1, 2021

Retraining on floydhub.com is not available #20

Retraining on floydhub.com is not available #20

Comments

cnlinxi commented Jul 27, 2020

p16i commented Jul 28, 2020

cnlinxi commented Aug 1, 2020

p16i commented Aug 8, 2020

cnlinxi commented Aug 9, 2020

charlesfufu commented Jan 1, 2021