# Handling Event-time and Window Operation

이벤트가 생성된 타임라인의 관점에서 처리 로직을 적용하고 싶음 -> 각 기기들에서 찍어서 보낸 timestamp 를 활용

스트림이 하나의 row 를 이루기 때문에, 타임스탬프도 그저 그 중 하나의 컬럼이 되는 것.

스트림 처리하는 시스템의 내부 시계가 아닌, 생산 시스템의 관점에서 이벤트들의 타임라인을 해석해야 함 - 찍힌 타임스탬프를 기준으로

타임스탬프를 기준으로, 생산 시스템에서 특정 기간 동안 생산된 데이터에 대해서 관찰해보고 싶을 때, 윈도우를 사용

### 슬라이딩 윈도우

예를 들어 최근 10분간의 실시간 검색어 순위를 만들고 싶고, 검색어는 5분마다 갱신될 예정

![sliding](https://spark.apache.org/docs/3.0.3/img/structured-streaming-window.png)

### 슬라이딩 5분, 윈도우 10분

11시 55분부터 12시 5분까지의 키워드 집계 -> 12시 5분의 실검 => 10분짜리 윈도우

12시 정각부터 12시 10분까지의 키워드 집계 -> 12시 10분의 실검 => 10분짜리 윈도우

12시 5분부터 12시 15분까지의 키워드 집계 -> 12시 15분의 실검 => 10분짜리 윈도우

...

12시 50분부터 13시 정각까지의 키워드 집계 -> 13시 정각의 실검 => 10분짜리 윈도우

슬라이딩(보고 기간) => 5분

---

## 워터마크

일반적으로 타임스탬프로 선언된 필드는 단조 증가하면서 엔진의 내부 타임라인이 증가함, 그러나 데이터가 늦게 도착할 수 있음

현재 설정된 타임라인보다 일정 시간 이상 차이나는 이벤트들을 폐기시키는 ```워터마크```

지금 열심히 5분부터 15분까지의 키워드를 집계 내는데, 갑자기 12시 2분에 만들어진 데이터가 이제 도착했음

물론 구조적스트리밍은 이를 반영 가능 - 집계 냈던 중간데이터들을 일정 기간 유지하고 있어서, 늦게 온 애들도 집계내서 해당 윈도우가 갱신되게.

근데 이 데이터 유지를 무한정 할 수 있는게 아니니까, 저 ```일정기간```을 어떻게 잡을건지 bound 를 쳐놔야함

이 경계선은 그니까, 얼마나 데이터가 늦게 도착해야 집계에서 빼버릴 것인지를 결정하는 친구 - 이를 워터마킹이라고 함

![watermark](https://spark.apache.org/docs/3.0.3/img/structured-streaming-watermark-update-mode.png)

스파크엔진이 현재 이벤트타임을 추적해서, 너무 오래된 친구들은 저장하지 않음

엔진에 계속해서 데이터가 들어오는데, 타임스탬프는 계속 증가해나갈것이고, 이 타임스탬프 최댓값을 엔진이 계속 추적(윈도우 보고시점에만)

이 추적값에서 (워터마크 크기) 안에 들어오는 타임스탬프 데이터들만 해당 윈도우 집계에 반영시킴

그니까 이제 애초에 워터마크를 벗어난 윈도우친구들은 더이상 수정안되는것이고, 워터마크 바깥의 데이터가 늦게 들어와도 그냥 드랍해버림

---

## 테이블 처리 로직 짜기 (배치 쿼리)

In [1]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, TimestampType

spark=SparkSession.builder.appName("sparkdf").getOrCreate()
schema = StructType().add("time", "string").add("id", "string").add("text", "string").add("source", "string")
table = spark.read.format("csv").option("header","false").schema(schema).load("/data/Structured_Streaming/Twitter/twitter_01")
table.show(5)

[Stage 0:>                                                          (0 + 1) / 1]

+-------------+-------------------+--------------------+--------------------+
|         time|                 id|                text|              source|
+-------------+-------------------+--------------------+--------------------+
|1633253932947|1444597814432915464|RT @myhappytots: ...|<a href="http://t...|
|1633253933522|1444597816844771328|The latest The ph...|<a href="https://...|
|1633253934811|1444597822251220998|RT @sayalook: My ...|<a href="https://...|
|1633253935379|1444597824633643009|RT @AlkayalWajdi:...|<a href="https://...|
|1633253936413|1444597828970549250|RT @AlkayalWajdi:...|<a href="https://...|
+-------------+-------------------+--------------------+--------------------+
only showing top 5 rows



                                                                                

In [2]:
from pyspark.sql.functions import from_unixtime, substring, to_timestamp
from pyspark.sql.functions import col, udf
from pyspark.sql.functions import explode, split, lower, regexp_replace, trim
from pyspark.sql.types import StringType

func = udf(lambda x: x.lower().split(">")[1].split("<")[0], StringType())

devices = table.withColumn("timestamp", to_timestamp(from_unixtime(substring("time", 1, 10), format="yyyy-MM-dd HH:mm:ss"), 'yyyy-MM-dd HH:mm:ss')).\
              withColumn("device", explode(split(trim(regexp_replace(func("source"), r"[^a-z]", " ")), " "))).\
              select("timestamp", "device")
devices.show(10)

[Stage 1:>                                                          (0 + 1) / 1]

+-------------------+-------+
|          timestamp| device|
+-------------------+-------+
|2021-10-03 09:38:52|twitter|
|2021-10-03 09:38:52|    for|
|2021-10-03 09:38:52|android|
|2021-10-03 09:38:53|  paper|
|2021-10-03 09:38:53|     li|
|2021-10-03 09:38:54|whopcod|
|2021-10-03 09:38:54|twitter|
|2021-10-03 09:38:54|    app|
|2021-10-03 09:38:55| python|
|2021-10-03 09:38:55|retweet|
+-------------------+-------+
only showing top 10 rows



                                                                                

In [3]:
results = devices.where("device not in ('twitter', 'for')")
results.show(5)

[Stage 2:>                                                          (0 + 1) / 1]

+-------------------+-------+
|          timestamp| device|
+-------------------+-------+
|2021-10-03 09:38:52|android|
|2021-10-03 09:38:53|  paper|
|2021-10-03 09:38:53|     li|
|2021-10-03 09:38:54|whopcod|
|2021-10-03 09:38:54|    app|
+-------------------+-------+
only showing top 5 rows



                                                                                

In [4]:
test = results.groupBy("device").count().orderBy(col("count").desc()).limit(3)
test.show()



+-------+-----+
| device|count|
+-------+-----+
|    bot|   13|
|android|   12|
|    app|   11|
+-------+-----+





---

## 배치 쿼리를 스트림처리에 동일하게 적용

In [5]:
spark = SparkSession.builder.appName("StructuredStreamingTest").getOrCreate()

schema = StructType().add("time", "string").add("id", "string").add("text", "string").add("source", "string")
lines = spark.readStream.option("sep",",").csv("/data/Structured_Streaming/", schema=schema)

In [6]:
func = udf(lambda x: x.lower().split(">")[1].split("<")[0] if x else None, StringType())

devices = lines.withColumn("timestamp", to_timestamp(from_unixtime(substring("time", 1, 10), format="yyyy-MM-dd HH:mm:ss"), 'yyyy-MM-dd HH:mm:ss')).\
              withColumn("device", explode(split(trim(regexp_replace(func("source"), r"[^a-z]", " ")), " "))).\
              select("timestamp", "device")
results = devices.where("device not in ('twitter', 'for', 'bot')")

In [7]:
from pyspark.sql.functions import window

wwwc = results.withWatermark("timestamp", "20 minutes")\
              .groupBy(window(results.timestamp, "10 minutes", "5 minutes"), results.device)\
              .count().sort(['window', 'count'], ascending=False).limit(5)
#orderBy(col("window").desc(), col("count").desc()).limit(5)
wwwc = wwwc.withColumn("end", wwwc.window["end"]).withColumn("start", wwwc.window["start"])\
            .select("start", "end", "device", "count")

In [None]:
query = wwwc.writeStream.outputMode("complete").format("console").start()
query.awaitTermination()

21/10/04 05:58:41 WARN StreamingQueryManager: Temporary checkpoint location created which is deleted normally when the query didn't fail: /tmp/temporary-15496724-3eb3-443a-93ba-40afdef01eb2. If it's required to delete it under any circumstances, please set spark.sql.streaming.forceDeleteTempCheckpointLocation to true. Important to know deleting temp checkpoint folder is best effort.
                                                                                

-------------------------------------------
Batch: 0
-------------------------------------------
+-------------------+-------------------+---------+-----+
|              start|                end|   device|count|
+-------------------+-------------------+---------+-----+
|2021-10-03 09:40:00|2021-10-03 09:50:00|incorrect|   11|
|2021-10-03 09:40:00|2021-10-03 09:50:00| socially|   11|
|2021-10-03 09:40:00|2021-10-03 09:50:00|         |   11|
|2021-10-03 09:40:00|2021-10-03 09:50:00|      app|    4|
|2021-10-03 09:40:00|2021-10-03 09:50:00|  android|    4|
+-------------------+-------------------+---------+-----+



                                                                                

-------------------------------------------
Batch: 1
-------------------------------------------
+-------------------+-------------------+---------+-----+
|              start|                end|   device|count|
+-------------------+-------------------+---------+-----+
|2021-10-03 09:40:00|2021-10-03 09:50:00|  android|   21|
|2021-10-03 09:40:00|2021-10-03 09:50:00| socially|   18|
|2021-10-03 09:40:00|2021-10-03 09:50:00|incorrect|   18|
|2021-10-03 09:40:00|2021-10-03 09:50:00|         |   15|
|2021-10-03 09:40:00|2021-10-03 09:50:00|      app|   14|
+-------------------+-------------------+---------+-----+



                                                                                

-------------------------------------------
Batch: 2
-------------------------------------------
+-------------------+-------------------+---------+-----+
|              start|                end|   device|count|
+-------------------+-------------------+---------+-----+
|2021-10-03 09:40:00|2021-10-03 09:50:00|  android|   36|
|2021-10-03 09:40:00|2021-10-03 09:50:00|      app|   27|
|2021-10-03 09:40:00|2021-10-03 09:50:00| socially|   19|
|2021-10-03 09:40:00|2021-10-03 09:50:00|incorrect|   19|
|2021-10-03 09:40:00|2021-10-03 09:50:00|      web|   19|
+-------------------+-------------------+---------+-----+



                                                                                

-------------------------------------------
Batch: 3
-------------------------------------------
+-------------------+-------------------+-------+-----+
|              start|                end| device|count|
+-------------------+-------------------+-------+-----+
|2021-10-03 09:45:00|2021-10-03 09:55:00|android|   15|
|2021-10-03 09:45:00|2021-10-03 09:55:00|    app|   11|
|2021-10-03 09:45:00|2021-10-03 09:55:00|    web|   10|
|2021-10-03 09:45:00|2021-10-03 09:55:00| iphone|    8|
|2021-10-03 09:45:00|2021-10-03 09:55:00|retweer|    3|
+-------------------+-------------------+-------+-----+



                                                                                

-------------------------------------------
Batch: 4
-------------------------------------------
+-------------------+-------------------+-------+-----+
|              start|                end| device|count|
+-------------------+-------------------+-------+-----+
|2021-10-03 09:45:00|2021-10-03 09:55:00|android|   26|
|2021-10-03 09:45:00|2021-10-03 09:55:00|    app|   25|
|2021-10-03 09:45:00|2021-10-03 09:55:00|    web|   21|
|2021-10-03 09:45:00|2021-10-03 09:55:00| iphone|   14|
|2021-10-03 09:45:00|2021-10-03 09:55:00| nodejs|   10|
+-------------------+-------------------+-------+-----+



                                                                                

-------------------------------------------
Batch: 5
-------------------------------------------
+-------------------+-------------------+-------+-----+
|              start|                end| device|count|
+-------------------+-------------------+-------+-----+
|2021-10-03 09:45:00|2021-10-03 09:55:00|    app|   39|
|2021-10-03 09:45:00|2021-10-03 09:55:00|    web|   34|
|2021-10-03 09:45:00|2021-10-03 09:55:00|android|   34|
|2021-10-03 09:45:00|2021-10-03 09:55:00| nodejs|   23|
|2021-10-03 09:45:00|2021-10-03 09:55:00| iphone|   20|
+-------------------+-------------------+-------+-----+



                                                                                

-------------------------------------------
Batch: 6
-------------------------------------------
+-------------------+-------------------+-------+-----+
|              start|                end| device|count|
+-------------------+-------------------+-------+-----+
|2021-10-03 09:50:00|2021-10-03 10:00:00|    app|   15|
|2021-10-03 09:50:00|2021-10-03 10:00:00|    web|   14|
|2021-10-03 09:50:00|2021-10-03 10:00:00|android|   10|
|2021-10-03 09:50:00|2021-10-03 10:00:00| nodejs|    8|
|2021-10-03 09:50:00|2021-10-03 10:00:00|       |    8|
+-------------------+-------------------+-------+-----+



                                                                                

-------------------------------------------
Batch: 7
-------------------------------------------
+-------------------+-------------------+-------+-----+
|              start|                end| device|count|
+-------------------+-------------------+-------+-----+
|2021-10-03 09:50:00|2021-10-03 10:00:00|    app|   26|
|2021-10-03 09:50:00|2021-10-03 10:00:00|    web|   23|
|2021-10-03 09:50:00|2021-10-03 10:00:00|android|   16|
|2021-10-03 09:50:00|2021-10-03 10:00:00| iphone|   16|
|2021-10-03 09:50:00|2021-10-03 10:00:00| nodejs|   12|
+-------------------+-------------------+-------+-----+



                                                                                

-------------------------------------------
Batch: 8
-------------------------------------------
+-------------------+-------------------+--------+-----+
|              start|                end|  device|count|
+-------------------+-------------------+--------+-----+
|2021-10-03 09:55:00|2021-10-03 10:05:00|     web|    4|
|2021-10-03 09:55:00|2021-10-03 10:05:00|     app|    4|
|2021-10-03 09:55:00|2021-10-03 10:05:00|  python|    1|
|2021-10-03 09:55:00|2021-10-03 10:05:00| retweet|    1|
|2021-10-03 09:55:00|2021-10-03 10:05:00|azuerbot|    1|
+-------------------+-------------------+--------+-----+



                                                                                

-------------------------------------------
Batch: 9
-------------------------------------------
+-------------------+-------------------+------------+-----+
|              start|                end|      device|count|
+-------------------+-------------------+------------+-----+
|2021-10-03 09:55:00|2021-10-03 10:05:00|     android|   14|
|2021-10-03 09:55:00|2021-10-03 10:05:00|         app|   13|
|2021-10-03 09:55:00|2021-10-03 10:05:00|codedailybot|   12|
|2021-10-03 09:55:00|2021-10-03 10:05:00|      iphone|   11|
|2021-10-03 09:55:00|2021-10-03 10:05:00|         web|    9|
+-------------------+-------------------+------------+-----+



                                                                                

-------------------------------------------
Batch: 10
-------------------------------------------
+-------------------+-------------------+-------+-----+
|              start|                end| device|count|
+-------------------+-------------------+-------+-----+
|2021-10-03 09:55:00|2021-10-03 10:05:00|    app|   33|
|2021-10-03 09:55:00|2021-10-03 10:05:00|android|   30|
|2021-10-03 09:55:00|2021-10-03 10:05:00|    web|   24|
|2021-10-03 09:55:00|2021-10-03 10:05:00| nodejs|   22|
|2021-10-03 09:55:00|2021-10-03 10:05:00| iphone|   18|
+-------------------+-------------------+-------+-----+



                                                                                

-------------------------------------------
Batch: 11
-------------------------------------------
+-------------------+-------------------+---------------+-----+
|              start|                end|         device|count|
+-------------------+-------------------+---------------+-----+
|2021-10-03 10:00:00|2021-10-03 10:10:00|     continuous|   17|
|2021-10-03 10:00:00|2021-10-03 10:10:00|       learning|   17|
|2021-10-03 10:00:00|2021-10-03 10:10:00|        goaidev|    4|
|2021-10-03 10:00:00|2021-10-03 10:10:00|thedeveloperbot|    4|
|2021-10-03 10:00:00|2021-10-03 10:10:00|            app|    2|
+-------------------+-------------------+---------------+-----+



                                                                                

-------------------------------------------
Batch: 12
-------------------------------------------
+-------------------+-------------------+------------------+-----+
|              start|                end|            device|count|
+-------------------+-------------------+------------------+-----+
|2021-10-03 10:00:00|2021-10-03 10:10:00|        continuous|   17|
|2021-10-03 10:00:00|2021-10-03 10:10:00|          learning|   17|
|2021-10-03 10:00:00|2021-10-03 10:10:00|thesecretjuniordev|   13|
|2021-10-03 10:00:00|2021-10-03 10:10:00|               app|   13|
|2021-10-03 10:00:00|2021-10-03 10:10:00|            nodejs|   10|
+-------------------+-------------------+------------------+-----+



                                                                                

-------------------------------------------
Batch: 13
-------------------------------------------
+-------------------+-------------------+----------+-----+
|              start|                end|    device|count|
+-------------------+-------------------+----------+-----+
|2021-10-03 10:00:00|2021-10-03 10:10:00|   twitchi|   27|
|2021-10-03 10:00:00|2021-10-03 10:10:00|   android|   26|
|2021-10-03 10:00:00|2021-10-03 10:10:00|       app|   22|
|2021-10-03 10:00:00|2021-10-03 10:10:00|continuous|   17|
|2021-10-03 10:00:00|2021-10-03 10:10:00|  learning|   17|
+-------------------+-------------------+----------+-----+



                                                                                

-------------------------------------------
Batch: 14
-------------------------------------------
+-------------------+-------------------+-------+-----+
|              start|                end| device|count|
+-------------------+-------------------+-------+-----+
|2021-10-03 10:00:00|2021-10-03 10:10:00|android|   36|
|2021-10-03 10:00:00|2021-10-03 10:10:00|    app|   32|
|2021-10-03 10:00:00|2021-10-03 10:10:00|twitchi|   27|
|2021-10-03 10:00:00|2021-10-03 10:10:00|  tweet|   24|
|2021-10-03 10:00:00|2021-10-03 10:10:00|    web|   23|
+-------------------+-------------------+-------+-----+



                                                                                

-------------------------------------------
Batch: 15
-------------------------------------------
+-------------------+-------------------+-------+-----+
|              start|                end| device|count|
+-------------------+-------------------+-------+-----+
|2021-10-03 10:05:00|2021-10-03 10:15:00|android|   19|
|2021-10-03 10:05:00|2021-10-03 10:15:00|    app|   15|
|2021-10-03 10:05:00|2021-10-03 10:15:00|    web|   13|
|2021-10-03 10:05:00|2021-10-03 10:15:00|   dlvr|   12|
|2021-10-03 10:05:00|2021-10-03 10:15:00|     it|   12|
+-------------------+-------------------+-------+-----+



                                                                                

-------------------------------------------
Batch: 16
-------------------------------------------
+-------------------+-------------------+-------+-----+
|              start|                end| device|count|
+-------------------+-------------------+-------+-----+
|2021-10-03 10:05:00|2021-10-03 10:15:00|android|   35|
|2021-10-03 10:05:00|2021-10-03 10:15:00|    app|   27|
|2021-10-03 10:05:00|2021-10-03 10:15:00|    web|   22|
|2021-10-03 10:05:00|2021-10-03 10:15:00| iphone|   16|
|2021-10-03 10:05:00|2021-10-03 10:15:00|   dlvr|   14|
+-------------------+-------------------+-------+-----+



                                                                                

-------------------------------------------
Batch: 17
-------------------------------------------
+-------------------+-------------------+---------------+-----+
|              start|                end|         device|count|
+-------------------+-------------------+---------------+-----+
|2021-10-03 10:10:00|2021-10-03 10:20:00|               |    7|
|2021-10-03 10:10:00|2021-10-03 10:20:00|            app|    5|
|2021-10-03 10:10:00|2021-10-03 10:20:00|            web|    4|
|2021-10-03 10:10:00|2021-10-03 10:20:00|thedeveloperbot|    3|
|2021-10-03 10:10:00|2021-10-03 10:20:00|         python|    2|
+-------------------+-------------------+---------------+-----+



                                                                                

-------------------------------------------
Batch: 18
-------------------------------------------
+-------------------+-------------------+------------+-----+
|              start|                end|      device|count|
+-------------------+-------------------+------------+-----+
|2021-10-03 10:10:00|2021-10-03 10:20:00|    socially|   22|
|2021-10-03 10:10:00|2021-10-03 10:20:00|   incorrect|   22|
|2021-10-03 10:10:00|2021-10-03 10:20:00|codedailybot|   13|
|2021-10-03 10:10:00|2021-10-03 10:20:00|         app|   12|
|2021-10-03 10:10:00|2021-10-03 10:20:00|     android|   10|
+-------------------+-------------------+------------+-----+



                                                                                

-------------------------------------------
Batch: 19
-------------------------------------------
+-------------------+-------------------+---------+-----+
|              start|                end|   device|count|
+-------------------+-------------------+---------+-----+
|2021-10-03 10:10:00|2021-10-03 10:20:00|  android|   23|
|2021-10-03 10:10:00|2021-10-03 10:20:00|incorrect|   22|
|2021-10-03 10:10:00|2021-10-03 10:20:00| socially|   22|
|2021-10-03 10:10:00|2021-10-03 10:20:00|      app|   22|
|2021-10-03 10:10:00|2021-10-03 10:20:00|  twitchi|   20|
+-------------------+-------------------+---------+-----+



                                                                                

-------------------------------------------
Batch: 20
-------------------------------------------
+-------------------+-------------------+--------+-----+
|              start|                end|  device|count|
+-------------------+-------------------+--------+-----+
|2021-10-03 10:15:00|2021-10-03 10:25:00|     app|    5|
|2021-10-03 10:15:00|2021-10-03 10:25:00|     web|    5|
|2021-10-03 10:15:00|2021-10-03 10:25:00| android|    4|
|2021-10-03 10:15:00|2021-10-03 10:25:00|   cyber|    2|
|2021-10-03 10:15:00|2021-10-03 10:25:00|security|    2|
+-------------------+-------------------+--------+-----+



                                                                                

-------------------------------------------
Batch: 21
-------------------------------------------
+-------------------+-------------------+-----------------+-----+
|              start|                end|           device|count|
+-------------------+-------------------+-----------------+-----+
|2021-10-03 10:15:00|2021-10-03 10:25:00|          android|   18|
|2021-10-03 10:15:00|2021-10-03 10:25:00|              app|   18|
|2021-10-03 10:15:00|2021-10-03 10:25:00|              web|   14|
|2021-10-03 10:15:00|2021-10-03 10:25:00|           iphone|   11|
|2021-10-03 10:15:00|2021-10-03 10:25:00|econometriclubbot|    6|
+-------------------+-------------------+-----------------+-----+



                                                                                

-------------------------------------------
Batch: 22
-------------------------------------------
+-------------------+-------------------+-------+-----+
|              start|                end| device|count|
+-------------------+-------------------+-------+-----+
|2021-10-03 10:15:00|2021-10-03 10:25:00|    app|   37|
|2021-10-03 10:15:00|2021-10-03 10:25:00|    web|   31|
|2021-10-03 10:15:00|2021-10-03 10:25:00|android|   25|
|2021-10-03 10:15:00|2021-10-03 10:25:00|   tech|   17|
|2021-10-03 10:15:00|2021-10-03 10:25:00| iphone|   13|
+-------------------+-------------------+-------+-----+



                                                                                

-------------------------------------------
Batch: 23
-------------------------------------------
+-------------------+-------------------+---------+-----+
|              start|                end|   device|count|
+-------------------+-------------------+---------+-----+
|2021-10-03 10:20:00|2021-10-03 10:30:00|incorrect|   19|
|2021-10-03 10:20:00|2021-10-03 10:30:00| socially|   19|
|2021-10-03 10:20:00|2021-10-03 10:30:00|      app|    8|
|2021-10-03 10:20:00|2021-10-03 10:30:00| azuerbot|    6|
|2021-10-03 10:20:00|2021-10-03 10:30:00|      web|    6|
+-------------------+-------------------+---------+-----+



                                                                                

-------------------------------------------
Batch: 24
-------------------------------------------
+-------------------+-------------------+---------+-----+
|              start|                end|   device|count|
+-------------------+-------------------+---------+-----+
|2021-10-03 10:20:00|2021-10-03 10:30:00|  twitchi|   21|
|2021-10-03 10:20:00|2021-10-03 10:30:00|      app|   19|
|2021-10-03 10:20:00|2021-10-03 10:30:00|incorrect|   19|
|2021-10-03 10:20:00|2021-10-03 10:30:00| socially|   19|
|2021-10-03 10:20:00|2021-10-03 10:30:00|  android|   16|
+-------------------+-------------------+---------+-----+



                                                                                

-------------------------------------------
Batch: 25
-------------------------------------------
+-------------------+-------------------+---------------+-----+
|              start|                end|         device|count|
+-------------------+-------------------+---------------+-----+
|2021-10-03 10:25:00|2021-10-03 10:35:00|thedeveloperbot|    1|
|2021-10-03 10:25:00|2021-10-03 10:35:00|            web|    1|
|2021-10-03 10:25:00|2021-10-03 10:35:00|        goaidev|    1|
|2021-10-03 10:25:00|2021-10-03 10:35:00|            app|    1|
|2021-10-03 10:20:00|2021-10-03 10:30:00|            app|   35|
+-------------------+-------------------+---------------+-----+



                                                                                

-------------------------------------------
Batch: 26
-------------------------------------------
+-------------------+-------------------+------------------+-----+
|              start|                end|            device|count|
+-------------------+-------------------+------------------+-----+
|2021-10-03 10:25:00|2021-10-03 10:35:00|               app|   17|
|2021-10-03 10:25:00|2021-10-03 10:35:00|thesecretjuniordev|   14|
|2021-10-03 10:25:00|2021-10-03 10:35:00|               web|   14|
|2021-10-03 10:25:00|2021-10-03 10:35:00|           android|    9|
|2021-10-03 10:25:00|2021-10-03 10:35:00|      codedailybot|    8|
+-------------------+-------------------+------------------+-----+



                                                                                

-------------------------------------------
Batch: 27
-------------------------------------------
+-------------------+-------------------+------------------+-----+
|              start|                end|            device|count|
+-------------------+-------------------+------------------+-----+
|2021-10-03 10:25:00|2021-10-03 10:35:00|           android|   27|
|2021-10-03 10:25:00|2021-10-03 10:35:00|               app|   24|
|2021-10-03 10:25:00|2021-10-03 10:35:00|               web|   21|
|2021-10-03 10:25:00|2021-10-03 10:35:00|               fab|   20|
|2021-10-03 10:25:00|2021-10-03 10:35:00|thesecretjuniordev|   14|
+-------------------+-------------------+------------------+-----+



                                                                                

-------------------------------------------
Batch: 28
-------------------------------------------
+-------------------+-------------------+---------+-----+
|              start|                end|   device|count|
+-------------------+-------------------+---------+-----+
|2021-10-03 10:30:00|2021-10-03 10:40:00|incorrect|   13|
|2021-10-03 10:30:00|2021-10-03 10:40:00| socially|   13|
|2021-10-03 10:30:00|2021-10-03 10:40:00|  goaidev|    8|
|2021-10-03 10:30:00|2021-10-03 10:40:00|  android|    6|
|2021-10-03 10:30:00|2021-10-03 10:40:00|      app|    5|
+-------------------+-------------------+---------+-----+



                                                                                

-------------------------------------------
Batch: 29
-------------------------------------------
+-------------------+-------------------+---------+-----+
|              start|                end|   device|count|
+-------------------+-------------------+---------+-----+
|2021-10-03 10:30:00|2021-10-03 10:40:00|  twitchi|   21|
|2021-10-03 10:30:00|2021-10-03 10:40:00|      app|   18|
|2021-10-03 10:30:00|2021-10-03 10:40:00|      web|   15|
|2021-10-03 10:30:00|2021-10-03 10:40:00|incorrect|   13|
|2021-10-03 10:30:00|2021-10-03 10:40:00| socially|   13|
+-------------------+-------------------+---------+-----+



## csv 파일 생성

bash start-cluster.sh 를 실행시켰던 터미널에서 아래 명령어 수행

```bash
python3 generator.py 3
```

파이썬 스크립트는 5초마다 다른 경로에 있던 트위터 데이터를 "/data/Structured_Streaming" 경로로 옮겨 스트리밍하게 들어오듯 만듦