Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Differentiating btween lamba and spark architectures #1

Closed
jmandel1027 opened this issue Aug 16, 2019 · 3 comments
Closed
Assignees

Comments

@jmandel1027
Copy link

Hey there!

Really enjoyed your article on Towards Data science! Great explainer on the arch and shows the scope of all the diff moving parts quite well.

One thing that might be helpful for other peeps coming to this repo would be separate branches (or other repo, what have you) with the the spark and aws setups respectively. At a glance it's kind of confusing to see what elements are for which config.

That said this is a pretty sweet project and ref point for peeps trying to dig into these tools!

@chollinger93
Copy link
Owner

Hey,

thanks :)

This is a good point, but I am not a huge fans of persistent branches (as opposed to release branches) for functionality. However, the README.md could definitely use some work.

The long term plan is to add a 3rd version that re-uses part of the Spark code, but uses less Hadoop focused technologies.

@jmandel1027
Copy link
Author

jmandel1027 commented Aug 24, 2019

For sure, yeah release tags would totally do the trick and provide an easy way to access the correct state for each version.

Ooh interesting how so? As like a more optimized way to mount the files from s3 and perform the queries over spark? Thats super interesting. Athena is very powerful but sometimes not well suited for things in a real time fashion.

One thing that might be interesting to explore is a lambda to activate a Fargate task, this would give you more flexibility to execute the spark jobs without worrying about hitting a timeout although i think they're like 15 mins now so might be moot. Pairing a lambda step function + Fargate tasks might be an ergonomic way to handle the query and return the results.

@chollinger93
Copy link
Owner

Created separate branches

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants