Recommend Wikipedia articles for creation
- `pyspark` (is assumed to be available in the cluster)
The output looks something like this:
wikidata_id | normalized_rank |
---|---|
Q125576 | 0.930232 |
Q127418 | 0.928457 |
Q125576 | 0.927625 |
Q226697 | 0.919053 |
… |
Code documentation adheres to the Google Style Python Docstrings[fn:1].
For other documentation see the doc
folder.
[fn:1] https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html
- From the root directory run:
- python3 setup.py sdist bdist_wheel
- Generated files will be in ./dist/
- Upload to PyPi:
- twine upload dist/*
- Make changes, commit, upload to Gerrit.
- Clone your patch to stat1007.
- cd article-recommender
- Optional: edit article_recommender/recommend.py and set TRAIN_RANGE_DAYS to 10, and TOP_LANGUAGES_COUNT to 1 for faster train times.
- Generate recommendations, e.g.:
spark2-submit --master yarn --deploy-mode client\ article_recommender/recommend.py kk uz 20190401
- In case of an error find the application ID in the logs and search for the cause of the error, e.g.: yarn logs -applicationId application_1553764233554_53161