Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

Bring current with changes in Glue #23

Open
dacort opened this issue Mar 16, 2021 · 5 comments
Open

Bring current with changes in Glue #23

dacort opened this issue Mar 16, 2021 · 5 comments

Comments

@dacort
Copy link
Contributor

dacort commented Mar 16, 2021

I left and returned to AWS. Since that time, Glue has:

I want to research what impact these changes have and if they should be incorporated. Initial thoughts are that:

  • Some of the manual table creation I'm doing could be replaced with crawlers
  • Python3 upgrade is a must

In addition, I need to go through the backlog. :)

@dacort dacort pinned this issue Mar 16, 2021
@CalvinLeather
Copy link
Contributor

CalvinLeather commented Mar 30, 2021

As a band-aid, opened this PR while waiting on python 3 upgrade. Happy to help contribute to / test python 3 upgrade, our team is using tool actively.

@dacort
Copy link
Contributor Author

dacort commented Mar 30, 2021

Thanks @CalvinLeather ! Much appreciated. I hope to be able to devote some time to this in the coming weeks.

@dacort
Copy link
Contributor Author

dacort commented May 5, 2021

Adding a note here about blueprints - they could be useful for building more comprehensive Glue deployments for this project, specifically workflows which could schedule the jobs.

@dacort
Copy link
Contributor Author

dacort commented May 6, 2021

Looked into Blueprints a little bit yesterday. Looks like they could successfully be used to bootstrap Classifiers, Crawlers, and table definitions. In addition a schedule/trigger can also be set up so they could be a good end-to-end direction to go.

We can still make use of this library – all of the table definitions and classifiers are still necessary, as are some of the source-specific transforms.

We could essentially parameterize the whole thing:

  • Create a single AGSL Blueprint
  • A Workflow per log type can be created from that Blueprint
  • Based on the log type selected, the necessary crawlers and tables get created

image

image

@brandon-fryslie
Copy link

Is there any documentation on how to achieve what this library achieves with AWS Glue directly?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants