ADSTurboBee

Pipeline (set of workers) for populating TurboBee (store).

Short Summary

Create and cache static html abstract pages that hydrate to a bumblebee page. This is useful for crawlers and faster page loading. The cache page is built using a template and the results from a single request to /v1/search/query. The abstract page template is computed by a chromiumn browser running in a separate process. This browser downloads the bumblebee abstract page. The cached page is persisted using a post to /v1/store/update which is handled by the turbobee service.

To process a single bibcode, one could: root@linuxkit-025000000001:/app# python run.py -q '{"q": "bibcode:2003ASPC..295..361M"}'

And this is essentially what happens if you use the --filename option to provide a list of bibcodes.

One can test the scraper in the turbobee pipeline container with: root@linuxkit-025000000001:/app# curl 'localhost:3001/scrape' -X POST -d '["https://dev.adsabs.harvard.edu/#abs/2019LRR....22....1I/abstract"]' -H 'Content-Type: application/

Queues and objects

- bumblebee
    Messages that go into the queue are consumed by workers that update static pages
- user
    Messages from this queue are consumed by workers that update user specific content (it is exactly parallel to bumblebee, but deals with different content; api requests etc).

Setup (recommended)

`$ cd adstb/`
`$ virtualenv python`
`$ source python/bin/activate`
`$ pip install -r requirements.txt`
`$ pip install -r dev-requirements.txt`
`$ vim local_config.py` # edit, edit
`$ alembic upgrade head` # initialize database

Testing

Always write unittests (even: always write unitests first!). Travis will run automatically. On your desktop run:

`$ py.test`

Maintainer(s)

Roman

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
adstb		adstb
alembic		alembic
puppeteer		puppeteer
.coveragerc		.coveragerc
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
alembic.ini		alembic.ini
config.py		config.py
dev-requirements.txt		dev-requirements.txt
init.sh		init.sh
pytest.ini		pytest.ini
requirements.txt		requirements.txt
run.py		run.py

License

adsabs/ADSTurboBeePipeline

Folders and files

Latest commit

History

Repository files navigation

ADSTurboBee

Short Summary

Queues and objects

Setup (recommended)

Testing

Maintainer(s)

About

Resources

License

Stars

Watchers

Forks

Languages