# ‚ö° Store Real-Time Streamed Data from Spark Streaming into HDFS

This guide shows how to **capture live streaming data** using Spark Streaming and **save it to HDFS** at regular intervals.

---

## ‚úÖ Step 1: Start a Socket Stream

Run this in a terminal:

```bash
nc -lk 9999
````

This creates a simple text stream on port **9999**.

---

## üîπ Step 2: Spark Streaming Script to Save Data to HDFS

Create a Spark Streaming application (`SaveToHDFS.scala`):

```scala
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}

object SaveToHDFS {
  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setAppName("SaveStreamToHDFS").setMaster("local[*]")
    val ssc = new StreamingContext(conf, Seconds(5))

    // Connect to socket stream
    val lines = ssc.socketTextStream("localhost", 9999)

    // Save each batch to HDFS
    lines.saveAsTextFiles("hdfs://<namenode>:9000/user/stream/output/streamdata", "txt")

    ssc.start()
    ssc.awaitTermination()
  }
}
```

---

## ‚ñ∂Ô∏è Step 3: Run the Spark Streaming App

```bash
spark-submit --class SaveToHDFS SaveToHDFS.jar
```

---

## üî• Output

* Spark collects streaming data every **5 seconds**
* Each batch is stored into HDFS at:

```
/user/stream/output/streamdata-<timestamp>.txt
```
