# Exercise 1: HDFS Basics

## Learning Objectives
- Understand HDFS architecture (NameNode, DataNodes, blocks)
- Perform file operations: upload, list, download, delete
- Explore replication factor and block concepts
- Use the HDFS Web UI to visualize the filesystem

## Prerequisites
- Cluster is running (`./scripts/start-lab.sh`)
- Sanity checks passed

---

## Part 1: Exploring HDFS Commands

HDFS commands are similar to Linux filesystem commands, but prefixed with `hdfs dfs -`

In [None]:
# List the root directory of HDFS
!hdfs dfs -ls /

In [None]:
# Create a directory in HDFS for our exercises
!hdfs dfs -mkdir -p /user/student/data

In [None]:
# Verify the directory was created
!hdfs dfs -ls /user/student

## Part 2: Uploading Files to HDFS

Let's upload our sample data to HDFS.

In [None]:
# Check what files are available locally
!ls -la /home/jovyan/data/sales/

In [None]:
# Upload the transactions file to HDFS
!hdfs dfs -put /home/jovyan/data/sales/transactions.csv /user/student/data/

In [None]:
# Upload products catalog
!hdfs dfs -put /home/jovyan/data/products/catalog.csv /user/student/data/
!hdfs dfs -put /home/jovyan/data/products/catalog.json /user/student/data/

In [None]:
# Verify uploads
!hdfs dfs -ls -h /user/student/data/

## Part 3: Understanding Blocks and Replication

HDFS splits large files into blocks (default 128MB in production, 16MB in our lab).
Each block is replicated across multiple DataNodes for fault tolerance.

In [None]:
# Check block information for our file
!hdfs fsck /user/student/data/transactions.csv -files -blocks -locations

In [None]:
# Check the current replication factor
!hdfs dfs -stat '%r' /user/student/data/transactions.csv

In [None]:
# Change replication factor to 3
!hdfs dfs -setrep 3 /user/student/data/transactions.csv

In [None]:
# Verify the new replication factor
!hdfs fsck /user/student/data/transactions.csv -files -blocks -locations

### üîç Checkpoint Question 1
Open the HDFS NameNode UI at http://localhost:9870 and navigate to:
**Utilities ‚Üí Browse the file system**

Find the `transactions.csv` file. 
- How many blocks does it have?
- What is the block size?
- On which DataNodes are the blocks stored?

**Your Answer:**

(Write your observations here)

## Part 4: Reading Files from HDFS

In [None]:
# Preview the first few lines
!hdfs dfs -head /user/student/data/transactions.csv

In [None]:
# Count total lines
!hdfs dfs -cat /user/student/data/transactions.csv | wc -l

In [None]:
# Get file statistics
!hdfs dfs -du -h /user/student/data/

## Part 5: Cluster Health Check

In [None]:
# Check DataNode status
!hdfs dfsadmin -report