simple bs4 based web crawl for a corpus in need of statistical machine translation
This Project collects Bible Dataset for Ethiopian languages and English respective transalatioin:
From [https://www.jw.org/am/]
This is a NLP Data Collection Effort for to increase NLP data in Under-resourced languages.
- print(get_book_data('english'))
- print(get_book_data('amharic'))
- print(get_book_data('tigrigna'))
- print(get_book_data('oromifa'))