Stars
List of Computer Science courses with video lectures.
My notes for AWS Solutions Architect Associate.
This is a repo documenting the best practices in PySpark.
A collection of inspiring lists, manuals, cheatsheets, blogs, hacks, one-liners, cli/web tools and more.
An evolving how-to guide for securing a Linux server.
Resumes generated using the GitHub informations
😎 Awesome lists about all kinds of interesting topics
A curated list of awesome big data frameworks, ressources and other awesomeness.
A curated list of data engineering tools for software developers
Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.
ReAir is a collection of easy-to-use tools for replicating tables and partitions between Hive data warehouses.
Data cleansing tutorial for chipy scientific SIG
📚 Parameterize, execute, and analyze notebooks
📘 The interactive computing suite for you! ✨
A python Web HDFS based tool for inter/intra-cluster data copying.
The Python micro framework for building web applications.
50+ DockerHub public images for Docker & Kubernetes - DevOps, CI/CD, GitHub Actions, CircleCI, Jenkins, TeamCity, Alpine, CentOS, Debian, Fedora, Ubuntu, Hadoop, Kafka, ZooKeeper, HBase, Cassandra,…
Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers
Chaos Monkey is a resiliency tool that helps applications tolerate random instance failures.
📖 A collection of pure bash alternatives to external processes.
A list of helpful Scala related questions you can use to interview potential candidates.
A curated list of awesome Apache Spark packages and resources.
Examples for High Performance Spark