You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Really enjoyed your article on Towards Data science! Great explainer on the arch and shows the scope of all the diff moving parts quite well.
One thing that might be helpful for other peeps coming to this repo would be separate branches (or other repo, what have you) with the the spark and aws setups respectively. At a glance it's kind of confusing to see what elements are for which config.
That said this is a pretty sweet project and ref point for peeps trying to dig into these tools!
The text was updated successfully, but these errors were encountered:
This is a good point, but I am not a huge fans of persistent branches (as opposed to release branches) for functionality. However, the README.md could definitely use some work.
The long term plan is to add a 3rd version that re-uses part of the Spark code, but uses less Hadoop focused technologies.
For sure, yeah release tags would totally do the trick and provide an easy way to access the correct state for each version.
Ooh interesting how so? As like a more optimized way to mount the files from s3 and perform the queries over spark? Thats super interesting. Athena is very powerful but sometimes not well suited for things in a real time fashion.
One thing that might be interesting to explore is a lambda to activate a Fargate task, this would give you more flexibility to execute the spark jobs without worrying about hitting a timeout although i think they're like 15 mins now so might be moot. Pairing a lambda step function + Fargate tasks might be an ergonomic way to handle the query and return the results.
Hey there!
Really enjoyed your article on Towards Data science! Great explainer on the arch and shows the scope of all the diff moving parts quite well.
One thing that might be helpful for other peeps coming to this repo would be separate branches (or other repo, what have you) with the the spark and aws setups respectively. At a glance it's kind of confusing to see what elements are for which config.
That said this is a pretty sweet project and ref point for peeps trying to dig into these tools!
The text was updated successfully, but these errors were encountered: