Skip to content

Commit

Permalink
Merge pull request #11 from Gathondu/failing-scraper-slack-notification
Browse files Browse the repository at this point in the history
Failing scraper slack notification
  • Loading branch information
DavidLemayian committed Jun 22, 2017
2 parents 2212a2e + f9a84ae commit 510b0cd
Show file tree
Hide file tree
Showing 5 changed files with 55 additions and 25 deletions.
20 changes: 17 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,27 @@ You can set the required environment variables like so
$ export MORPH_AWS_REGION=<aws_region>
$ export MORPH_AWS_ACCESS_KEY_ID= <aws_access_key_id>
$ export MORPH_AWS_SECRET_KEY= <aws_secret_key>
$ export ES_HOST='<elastic_search_host_endpoint>'
$ export WEBHOOK_URL='<slack_webhook_url>'
$ export ES_HOST= <elastic_search_host_endpoint> (DO NOT SET THIS IF YOU WOULD LIKE TO USE ELASTIC SEARCH LOCALLY ON YOUR MACHINE)
$ export WEBHOOK_URL= <slack_webhook_url> (DO NOT SET THIS IF YOU DON'T WANT TO POST ERROR MESSAGES ON SLACK)
```
**If you want to use elasticsearch locally on your machine use the following instructions to set it up**

For linux and windows users, follow instructions from this [link](https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html)

For mac users run `brew install elasticsearch` on your terminal

**If you want to post messages on slack**

Set up `Incoming Webhooks` [here](https://slack.com/signin?redir=%2Fservices%2Fnew%2Fincoming-webhook) and set the global environment for the `WEBHOOK_URL`

If you set up elasticsearch locally run it `$ elasticsearch`

You can now run the scrapers `$ python scraper.py` (It might take a while)

You can now run the scrapers `$ python scraper.py` (It might take a while and you might need to change the endpoints in config.py if you haven't authorization for them)

## Running the tests
_**make sure if you use elasticsearch locally, it's running**_

Use nosetests to run tests (with stdout) like this:
```$ nosetests --nocapture```

9 changes: 8 additions & 1 deletion circle.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,16 @@
machine:
python:
version: 2.7.13
version: 2.7.5
java:
version: openjdk8
dependencies:
pre:
- pip install -r requirements.txt
post:
- wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.4.2.tar.gz
- tar -xzf elasticsearch-5.4.2.tar.gz
- elasticsearch-5.4.2/bin/elasticsearch: {background: true}
- sleep 10 && wget --waitretry=5 --retry-connrefused -v http://127.0.0.1:9200/

test:
override:
Expand Down
4 changes: 2 additions & 2 deletions healthtools/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,12 @@
"region_name": os.getenv("MORPH_AWS_REGION", 'eu-west-1')
}
ES = {
"host": os.getenv("ES_HOST"),
"host": os.getenv("ES_HOST", None),
"index": "healthtools"
}

TEST_DIR = os.getcwd() + "/healthtools/tests"

SLACK = {
"url": os.getenv("WEBHOOK_URL")
"url": os.getenv("WEBHOOK_URL", None)
}
40 changes: 23 additions & 17 deletions healthtools/scrapers/base_scraper.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,18 @@ def __init__(self):
# set up authentication credentials
awsauth = AWS4Auth(AWS["aws_access_key_id"], AWS["aws_secret_access_key"], AWS["region_name"], 'es')
# client host for aws elastic search service
self.es_client = Elasticsearch(
hosts=ES['host'],
port=443,
http_auth=awsauth,
use_ssl=True,
verify_certs=True,
connection_class=RequestsHttpConnection,
serializer=JSONSerializerPython2()
)
if ES['host']:
self.es_client = Elasticsearch(
hosts=ES['host'],
port=443,
http_auth=awsauth,
use_ssl=True,
verify_certs=True,
connection_class=RequestsHttpConnection,
serializer=JSONSerializerPython2()
)
else:
self.es_client = Elasticsearch('127.0.0.1')

def scrape_site(self):
'''
Expand Down Expand Up @@ -251,14 +254,17 @@ def format_for_elasticsearch(self, entry):

def print_error(self, message):
"""
post messages to slack and print them on the terminal
print error messages in the terminal
if slack webhook is set up post the errors to slack
"""
print(message)
response = requests.post(
SLACK['url'],
data=json.dumps(
{"text": "```{}```".format(message)}),
headers={'Content-Type': 'application/json'}
)
print('{{{0}}} - '.format(datetime.now().strftime('%Y-%m-%d %H:%M:%S')) + message)
response = None
if SLACK['url']:
response = requests.post(
SLACK['url'],
data=json.dumps(
{"text": "```{}```".format(message)}),
headers={'Content-Type': 'application/json'}
)
return response

7 changes: 5 additions & 2 deletions healthtools/tests/test_scrapers.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,10 @@ def test_health_facilities_scraper_gets_token(self):
self.health_facilities_scraper.get_token()
self.assertIsNotNone(self.health_facilities_scraper.access_token)

def test_scrapper_sends_slack_notification(self):
def test_scrapper_prints_notification(self):
response = self.base_scraper.print_error("Tests are passing")
self.assertEqual(response.status_code, 200)
if SLACK['url']:
self.assertEqual(response.status_code, 200)
else:
self.assertIsNone(response)

0 comments on commit 510b0cd

Please sign in to comment.