Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How many iterations does it run and ETA ? #36

Closed
mohammedayub44 opened this issue Nov 13, 2018 · 3 comments
Closed

How many iterations does it run and ETA ? #36

mohammedayub44 opened this issue Nov 13, 2018 · 3 comments

Comments

@mohammedayub44
Copy link

Hi,

How many iterations does it run ?
Any idea how long does this take to run finish on a t2.xlarge instance ?
@armatthews

Thanks !

Mohammed Ayub

@mohammedayub44
Copy link
Author

mohammedayub44 commented Nov 13, 2018

Kinda figured it out going through the code base, I think it runs 5 iterations by default. Takes about 20 min /iteration on a t2.xlarge machine.

Mohammed Ayub

@ryan-minyu
Copy link

I have problem with larger dataset while using ./fast_align
I found that I could not deal with the data more than 1000 lines maybe. Have you ever met the same problem or how do you usually run the script?
For me it is:
./fast_align -i ~/sockeye_autopilot/systems/wmt14_en_de/data/bpe/new_combined_srctrgaa -d -o -v > forward.align_endea
Thanks a lot!

@mohammedayub44
Copy link
Author

mohammedayub44 commented Aug 4, 2019

@Ryanight I have no problem running files with more than 1000 lines. Maybe the input file has some problems. I have created the input file and the running the commands like below:

create the input file:

pr -mtJS' ||| ' /home/ubuntu/mayub/datasets/train_tokenized_bpe_applied.en /home/ubuntu/mayub/datasets/train_tokenized_bpe_applied.es > /home/ubuntu/mayub/datasets/fastalign_input_en_es.txt

forward train:
fast_align/build/fast_align -i /home/ubuntu/mayub/datasets/fastalign_input_en_es.txt -d -o -v > /home/ubuntu/mayub/datasets/forward_align_en_es.align

reverse train:
fast_align/build/fast_align -i /home/ubuntu/mayub/datasets/fastalign_input_en_es.txt -d -o -v -r > /home/ubuntu/mayub/datasets/reverse_align_en_es.align

symmetrize:
fast_align/build/atools -i /home/ubuntu/mayub/datasets/forward_align_en_es.align -j /home/ubuntu/mayub/datasets/reverse_align_en_es.align -c grow-diag-final-and > /home/ubuntu/mayub/datasets/corpus_en_es.gdfa

Unfortunately i have not found a good tool that can detect errors or troubleshoot the above process if the files contain error in alignments, which may in some cases. Open to suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants