Skip to content
This repository has been archived by the owner on Nov 4, 2022. It is now read-only.

adobe/helix-theblog-scanner

Helix - TheBlog Scanner

TheBlog should run periodically (via an Openwhisk trigger) and scan theblog.adobe.com to determine if new blog entries have been created. For each new blog entry detected, it invokes TheBlog Importer.

The execution flow looks like this:

  • fetch the content of the theblog.adobe.com homepage
  • compute the list of links on the page
  • for each link, check if it present in a list of already processed urls stored in a OneDrive XLSX file (/importer/urls.xlsx)
  • if not present, invoke helix-theblog-importer action

It happens sometimes that the post entries published on theblog.adobe.com are corrupted and get fixed later. The scanner may have already detected and triggered the import of the corrupted version. To re-trigger the import, simply remove the entry from the /importer/urls.xlsx file (delete row): if the blog entry is still visible on the homepage, it will be re-imported. If not, then you need to manual trigger the import: change the URL and run the test https://github.com/adobe/helix-theblog-importer/blob/master/test/index.test.js#L24.

Status

CircleCI GitHub license GitHub issues LGTM Code Quality Grade: JavaScript semantic-release

Setup

Installation

Deploy the action:

npm run deploy

Create a five mins triggers:

wsk trigger create five-mins-trigger --feed /whisk.system/alarms/alarm --param cron "*/5 * * * *"

Link the trigger to a rule:

wsk rule update five-mins-scan five-mins-trigger helix-theblog/helix-theblog-scanner@latest

Required env variables:

Connection to OneDrive:

  • AZURE_ONEDRIVE_CLIENT_ID
  • AZURE_ONEDRIVE_CLIENT_SECRET
  • AZURE_ONEDRIVE_REFRESH_TOKEN

OneDrive shared folder that contains the /importer/urls.xlsx file:

  • AZURE_ONEDRIVE_ADMIN_LINK

Openwhish credentials to invoke the helix-theblog-importer action:

  • OPENWHISK_API_KEY
  • OPENWHISK_API_HOST

Coralogix credentials to log:

  • CORALOGIX_API_KEY
  • CORALOGIX_LOG_LEVEL

Development

Deploying Helix Service

Deploying Helix Service requires the wsk command line client, authenticated to a namespace of your choice. For Project Helix, we use the helix namespace.

All commits to master that pass the testing will be deployed automatically. All commits to branches that will pass the testing will get commited as /helix-theblog/helix-theblog-scanner@ci<num> and tagged with the CI build number.