Skip to content

al-serebrov/scrapinghub-elasticsearch-loader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Load items from Scrapy Cloud to ElasticSearch instance

Installation

Install dependencies:

virtualenv venv
source venv/bin/activate
pip install .

Also you need to install ElasticSearch, or install docker and docker-compose to use the docker-compose.yml config from this project.

Usage

Fire up ElasticSearch

Launch it if you have local installation and make sure that it's running or use a configuration from this project and run ElasticSearch and Kibana with command:

docker-compose up -d

Set environmental variables

In order to use this script you need you Scrapy Cloud API key, add it to environmenatal variable SH_APIKEY:

export SH_APIKEY="your_key"

Run script

The project has a command line interface "shes" (ScrapingHub - ElasticSearch), try running it and see a help message:

$ ./shes.py -h
Download items to ElasticSearch.

usage: shes.py -j JOB_ID [-e ELASTICSEARCH_URL] [-i INDEX] [-t DOC_TYPE] [-h]

Download items from Scrapinhub cloud and upload them to ElasticSearch index.

optional arguments:
  -h, --help            show this help message and exit
  -j JOB_ID, --job_id JOB_ID                                Required Scrapy Cloud job idetentifier
  -e ELASTICSEARCH_URL, --elasticsearch ELASTICSEARCH_URL   URL of ElasticSearch instance, [default: localhost:9200]
  -i INDEX, --index index                                   Index name, defaults to job_id
  -t DOC_TYPE, --type DOC_TYPE                              Document type, [default: product]

About

Load items from Scrapinghub to ElasticSearch

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages