First steps to lambda deploy #13

GeoWill · 2021-06-16T11:43:08Z

Notes on workflow

Login to aws sso cli
aws sso login --profile dc-lgsf-dev

Build
sam build --template sam-template.yaml

Test a function locally
sam local invoke ScraperWorkerFunction --event lgsf/aws_lambda/fixtures/sqs-message.json --profile dc-lgsf-dev

nb sqs-message.json adapted from output of sam local generate-event sqs receive-message

Deploy to dev
sam deploy --profile dc-lgsf-dev

ToDo

(turn into issues)

commands in the aws_lambda app to trigger sned jobs to the queue
Use environment variables instead of self.options["aws_lambda"]
Build a layer with just the python dependencies installed for quicker iterations
use circleci to rebuild and deploy the stack.
Reinstate requests cache

requirements.txt

GeoWill · 2021-07-03T13:07:33Z

I also added the AmazonSQSFullAccess policy to the LGSFLambdaExecutionRole as with out it I was getting 'Queue not found, or insufficient permissions' type errors from the QueueBuilder function.

lgsf/commands/base.py

sam-template.yaml

lgsf/scrapers/base.py

lgsf/aws_lambda/handlers.py

sam-template.yaml

This is because requests_cache uses sqlite on disk, which won't be possible in aws_lambda. Ideally this will be re-instated to be used when running locally.

Makefile is mostly to create requirements.txt. sam-template.yaml defines a codecommit repository and a scraper queue The plan is to load scrapers into the queue with one lambda function then use the queue to trigger another lambda per scraper to actually run. Scraped data will then be committed to the code commit repo.

GeoWill · 2021-08-11T12:47:59Z

lgsf/councillors/scrapers.py



-class BaseCouncillorScraper(ScraperBase):
+class BaseCouncillorScraper(CodeCommitMixin, ScraperBase):


This feels a bit brittle, as the order matters here. But I guess that's just multiple inheritance.

GeoWill · 2021-08-11T12:48:50Z

lgsf/scrapers/base.py


-requests_cache.install_cache("scraper_cache", expire_after=60 * 60 * 24)
+# import requests_cache


Commented out because it should probably be reinstated behind a check for lambda env

GeoWill · 2021-08-11T12:51:41Z

lgsf/scrapers/base.py

@@ -97,3 +100,193 @@ def save_raw(self, filename, content):
    def save_json(self, obj):
        file_name = "{}.json".format(obj.as_file_name())
        self._save_file("json", file_name, obj.as_json())
+
+
+class CodeCommitMixin:


Had a stab at pulling out the codecommit logic into a mixin for the scrapers. Not sure it's the best way of doing it, and haven't made it clear what methods the child classes need to implement, but thought I would get a better handle on whether it was a working system when I do the polling station scrapers.

This seems ok to me — the only other pattern we could investigate is the way Django does pluggable storage in some places: define a storage interface that is subclassed and then set that storage class in settings / globally / by some other logic.

So for example, we'd have

class LocalFileSystemStorage(Storage): pass class CodeCommitStorage(Storage): pass

And then the BaseCouncillorScraper could assign self.storage = CodeCommitStorage() and later do self.storage.save() or whatever.

Happy to talk more about this pattern if you think it's useful. Some more reading:

https://docs.djangoproject.com/en/3.2/ref/files/storage/

https://github.com/jschneier/django-storages/blob/master/storages/backends/dropbox.py#L68

lgsf/aws_lambda/handlers.py

GeoWill force-pushed the cloudformation-first-steps branch from 82ae813 to 3482e26 Compare June 16, 2021 11:45

symroe reviewed Jun 16, 2021

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

GeoWill mentioned this pull request Jul 3, 2021

Command refactor #15

Merged