This repository contains lab work, Jupyter notebooks, and concise notes produced during the Big Data Summer Training. It focuses on practical commands, examples, and reusable snippets.
- Jupyter Notebooks & lab exercises (organized by topic folders)
- Technical notes and key takeaways
- Practice examples, datasets, and use-case simulations
- Commands, configuration snippets, and environment setup
- Big Data Era & Kunpeng Architecture
- HDFS + ZooKeeper — distributed storage and cluster coordination
- HBase + Hive — NoSQL and distributed data warehousing (SQL-like)
- ClickHouse — OLAP database for fast analytics
- MapReduce + YARN — distributed processing and resource manager
- Spark + Flink — batch and stream processing
- Flume + Kafka — data ingestion and real-time messaging pipelines
- Elasticsearch — search and analytics
| Tool / Tech | Use case |
|---|---|
| Linux, SQL, Python | Foundations for scripting and querying |
| HDFS | Distributed data storage |
| Hive | SQL-style querying on big data |
| HBase | NoSQL for large-scale datasets |
| Kafka | Real-time messaging |
| Spark & Flink | Data processing engines (batch & stream) |
| ClickHouse | High-performance analytics |
| Flume, Sqoop | Data ingestion from logs and DBs |
| Elasticsearch | Search and analytics |
| ZooKeeper | Cluster coordination |
# HDFS (pseudo-distributed) hdfs namenode -format start-dfs.sh start-yarn.sh
bin/zookeeper-server-start.sh config/zookeeper.properties & bin/kafka-server-start.sh config/server.properties
/README.html ← this file (HTML README)
/notebooks/ ← Jupyter notebooks organized by topic
/data/ ← sample datasets (small, non-sensitive)
/scripts/ ← helper scripts and setup commands
/notes/ ← short markdown notes and key takeaways- Personal reference and step-by-step notes
- Complete recap of the training with runnable examples
- Practical showcase of Big Data skills for projects, interviews, or collaborations
If you'd like to collaborate or discuss Big Data topics, reach out on LinkedIn or open an issue in this repo.
Duaa Abd-Elati Connect on LinkedIn Made during the NTI Big Data Summer Training — you may reuse or adapt this README.