Skip to content

DuaA-A/Big-Data

Repository files navigation

Big Data Training — NTI

This repository contains lab work, Jupyter notebooks, and concise notes produced during the Big Data Summer Training. It focuses on practical commands, examples, and reusable snippets.

What you'll find here

  • Jupyter Notebooks & lab exercises (organized by topic folders)
  • Technical notes and key takeaways
  • Practice examples, datasets, and use-case simulations
  • Commands, configuration snippets, and environment setup

Topics covered

  • Big Data Era & Kunpeng Architecture
  • HDFS + ZooKeeper — distributed storage and cluster coordination
  • HBase + Hive — NoSQL and distributed data warehousing (SQL-like)
  • ClickHouse — OLAP database for fast analytics
  • MapReduce + YARN — distributed processing and resource manager
  • Spark + Flink — batch and stream processing
  • Flume + Kafka — data ingestion and real-time messaging pipelines
  • Elasticsearch — search and analytics

Tools & technologies

Tool / TechUse case
Linux, SQL, PythonFoundations for scripting and querying
HDFSDistributed data storage
HiveSQL-style querying on big data
HBaseNoSQL for large-scale datasets
KafkaReal-time messaging
Spark & FlinkData processing engines (batch & stream)
ClickHouseHigh-performance analytics
Flume, SqoopData ingestion from logs and DBs
ElasticsearchSearch and analytics
ZooKeeperCluster coordination

Example commands

# HDFS (pseudo-distributed)
hdfs namenode -format
start-dfs.sh
start-yarn.sh

Kafka (local)

bin/zookeeper-server-start.sh config/zookeeper.properties & bin/kafka-server-start.sh config/server.properties

Repository structure (suggested)

/README.html        ← this file (HTML README)
/notebooks/          ← Jupyter notebooks organized by topic
/data/               ← sample datasets (small, non-sensitive)
/scripts/            ← helper scripts and setup commands
/notes/              ← short markdown notes and key takeaways

Goal of this repo

  • Personal reference and step-by-step notes
  • Complete recap of the training with runnable examples
  • Practical showcase of Big Data skills for projects, interviews, or collaborations

Let's connect

If you'd like to collaborate or discuss Big Data topics, reach out on LinkedIn or open an issue in this repo.

Duaa Abd-Elati Connect on LinkedIn Made during the NTI Big Data Summer Training — you may reuse or adapt this README.

About

hands-on journey through the Big Data training by NTI. Includes labs, notebooks, and notes on tools like HDFS, Spark, Kafka, Flink, Hive, HBase and more.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors