Big Data Training — NTI

This repository contains lab work, Jupyter notebooks, and concise notes produced during the Big Data Summer Training. It focuses on practical commands, examples, and reusable snippets.

What you'll find here

Jupyter Notebooks & lab exercises (organized by topic folders)
Technical notes and key takeaways
Practice examples, datasets, and use-case simulations
Commands, configuration snippets, and environment setup

Topics covered

Big Data Era & Kunpeng Architecture
HDFS + ZooKeeper — distributed storage and cluster coordination
HBase + Hive — NoSQL and distributed data warehousing (SQL-like)
ClickHouse — OLAP database for fast analytics
MapReduce + YARN — distributed processing and resource manager
Spark + Flink — batch and stream processing
Flume + Kafka — data ingestion and real-time messaging pipelines
Elasticsearch — search and analytics

Tools & technologies

Tool / Tech	Use case
Linux, SQL, Python	Foundations for scripting and querying
HDFS	Distributed data storage
Hive	SQL-style querying on big data
HBase	NoSQL for large-scale datasets
Kafka	Real-time messaging
Spark & Flink	Data processing engines (batch & stream)
ClickHouse	High-performance analytics
Flume, Sqoop	Data ingestion from logs and DBs
Elasticsearch	Search and analytics
ZooKeeper	Cluster coordination

Example commands

# HDFS (pseudo-distributed) hdfs namenode -format start-dfs.sh start-yarn.sh Kafka (local)

bin/zookeeper-server-start.sh config/zookeeper.properties & bin/kafka-server-start.sh config/server.properties

Repository structure (suggested)

/README.html        ← this file (HTML README)
/notebooks/          ← Jupyter notebooks organized by topic
/data/               ← sample datasets (small, non-sensitive)
/scripts/            ← helper scripts and setup commands
/notes/              ← short markdown notes and key takeaways

Goal of this repo

Personal reference and step-by-step notes
Complete recap of the training with runnable examples
Practical showcase of Big Data skills for projects, interviews, or collaborations

Let's connect

If you'd like to collaborate or discuss Big Data topics, reach out on LinkedIn or open an issue in this repo.

Duaa Abd-Elati Connect on LinkedIn Made during the NTI Big Data Summer Training — you may reuse or adapt this README.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
HBase.docx		HBase.docx
Hbase lab _March.pdf		Hbase lab _March.pdf
Hive_lab_2_3.pdf		Hive_lab_2_3.pdf
Hive_lab_2_4.pdf		Hive_lab_2_4.pdf
README.md		README.md
Spam Detector1.pdf		Spam Detector1.pdf
airports(in).csv		airports(in).csv
department(in).csv		department(in).csv
employee(in).csv		employee(in).csv
flights.csv.xlsx		flights.csv.xlsx
flights.xlsx		flights.xlsx
flink_Up2.pdf		flink_Up2.pdf
hive_lab_A.pdf		hive_lab_A.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Training — NTI

What you'll find here

Topics covered

Tools & technologies

Example commands

Kafka (local)

Repository structure (suggested)

Goal of this repo

Let's connect

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Big Data Training — NTI

What you'll find here

Topics covered

Tools & technologies

Example commands

Kafka (local)

Repository structure (suggested)

Goal of this repo

Let's connect

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages