<a href="https://colab.research.google.com/github/JotaBlanco/QuixStreamsNotebooks/blob/main/Tutorials/Quix_Streams_PUB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install Quix Streams
Just use pip install to download the Quix Streams library. 

[Quix Streams](https://github.com/quixio/quix-streams) is an open source Python library for processing streaming data. It’s aimed at people who work with time-series data streams — from developers and ML engineers to data scientists and data engineers.

In [None]:
! pip install quixstreams

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting quixstreams
  Downloading quixstreams-0.5.0-py3-none-manylinux2014_x86_64.whl (47.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.8/47.8 MB[0m [31m14.5 MB/s[0m eta [36m0:00:00[0m
Collecting Deprecated<2,>=1.1
  Downloading Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)
Installing collected packages: Deprecated, quixstreams
Successfully installed Deprecated-1.2.13 quixstreams-0.5.0


# Import the libraries
We will be using mainly pandas, quix, matplotlib and seaborn.

In [None]:
import pandas as pd
import quixstreams as qx

# 1 - Create client
Let's start by creating a Quix client that we'll use to publish and subscribe to Kafka topics.

In [None]:
token = 'sdk-296f2b9decff4770a525ff7d8855a78d'
client = qx.QuixStreamingClient(token)
# client.api_url = "https://portal-api.dev.quix.ai"
client

<quixstreams.quixstreamingclient.QuixStreamingClient at 0x7f034cbfc6d0>

# 2 - Producer client
To publish data to one topic, we will need to create a producer client pointing to that topic.

In [None]:
topic_name = "test-topic"
topic_producer = client.get_topic_producer(topic_name)
topic_producer

<quixstreams.topicproducer.TopicProducer at 0x7f032c5be790>

# 3 - Streams
Streams are ways to distribute the messages load into a topic very efficiently, allowing escalation whilst ensuring chronolofical order. Streams are to topics what road lines are to highways. 
We don't need many streams yet, but let's see how they are created:

In [None]:
stream_id = "test-stream_1"
test_stream_1 = topic_producer.create_stream(stream_id)
test_stream_1

<quixstreams.streamproducer.StreamProducer at 0x7f032c27ca90>

In [None]:
stream_id = "test-stream_2"
test_stream_2 = topic_producer.create_stream(stream_id)
test_stream_2

<quixstreams.streamproducer.StreamProducer at 0x7f032c27cee0>

You can add properties to streams, like names and metadata.

In [None]:
test_stream_1.properties.name = "Tutorial Test Stream 1"
test_stream_1.properties.metadata["Test Number"] = "1"
test_stream_2.properties.name = "Tutorial Test Stream 2"
test_stream_2.properties.metadata["Test Number"] = "2"

# 4 - Data format
There are two data formats Quix Stream can use to publish data to topics:

## 4.1 TimeseriesData
TimeseriesData is the formal class in Quix Streams which represents a time series data packet in memory. The format consists of a list of Timestamps with their corresponding parameter names and values for each timestamp.

You should imagine a TimeseriesData as a table where the Timestamp is the first column of that table and where the parameters are the columns for the values of that table.

In [None]:
# This dataframe follows the proper TimeseriesData format: timestamp and different parameters
df = pd.DataFrame({
    "Timestamp": [pd.Timestamp.now(), pd.Timestamp.now() + pd.Timedelta("5sec")],
    "Param A": [10, 20],
    "Column B": [12, 9]
})
df

Unnamed: 0,Timestamp,Param A,Column B
0,2023-03-10 13:34:56.217320,10,12
1,2023-03-10 13:35:01.217375,20,9


In [None]:
# This is the actual way to define a qx.TimeseriesData object for that same data
timeseries_data = qx.TimeseriesData()
timeseries_data.add_timestamp(pd.Timestamp.now()) \
                .add_value("Param A", 10) \
                .add_value("Column B", 12)
timeseries_data.add_timestamp(pd.Timestamp.now() + pd.Timedelta("5sec")) \
                .add_value("Param A", 20) \
                .add_value("Column B", 9)

<quixstreams.models.timeseriesdatatimestamp.TimeseriesDataTimestamp at 0x7fab64667730>

Conversions between qx.TimeseriesData and pd.Dataframe formats are easy:

In [None]:
timeseries_data.to_dataframe()

Unnamed: 0,timestamp,Param A,Column B
0,1678389096492070000,10.0,12.0
1,1678389101492699000,20.0,9.0


In [None]:
qx.TimeseriesData.from_panda_dataframe(df)

<quixstreams.models.timeseriesdata.TimeseriesData at 0x7fab64667ca0>

## 4.2 EventData
EventData consists of a record with a Timestamp, an EventId and an EventValue.

You should imagine a list of EventData instances as a simple table of three columns where the Timestamp is the first column of that table and the EventId and EventValue are the second and third columns, as shown in the following table.

In [None]:
event_data_1 = qx.EventData(
    event_id = "Door Open", 
    time = pd.Timestamp.now(), 
    value = "The front door of the house has just been open")
event_data_1

<quixstreams.models.eventdata.EventData at 0x7fab6465d6a0>

In [None]:
event_data_2 = qx.EventData(
    event_id = "Door Closed", 
    time = pd.Timestamp.now(), 
    value = "The front door of the house is back to closed state")
event_data_2

<quixstreams.models.eventdata.EventData at 0x7fab6465d760>

# 5 - Publish data to the topic
Let's publish each of the data messages created to one stream now:

## 5.1 TimeseriesData
Let's see how to publish the TimeseriesData object we created earlier.

In [None]:
timeseries_data

<quixstreams.models.timeseriesdata.TimeseriesData at 0x7fab64667400>

In [None]:
# Publishing it to stream 1
test_stream_1.timeseries.publish(timeseries_data)

In [None]:
# Publishing it to stream 2
test_stream_2.timeseries.publish(timeseries_data)

In [None]:
# Also, pd.DataFrame objects can be published very simply  
# (conversion to qx.TimeseriesData object is done automatically)
df

Unnamed: 0,Timestamp,Param A,Column B
0,2023-03-09 19:11:35.939881,10,12
1,2023-03-09 19:11:40.940005,20,9


In [None]:
# Publishing it to stream 1
test_stream_1.timeseries.publish(df)

In [None]:
# Publishing it to stream 2
test_stream_2.timeseries.publish(df)

## 5.2 EventData
Let's now publish the EventData messages we created earlier:

In [None]:
# Publishing event 1 to stream 1
test_stream_1.events.publish(event_data_1)

In [None]:
# Publishing event 1 to stream 2
test_stream_1.events.publish(event_data_1)

In [None]:
# Publishing event 2 to stream 1
test_stream_1.events.publish(event_data_2)

In [None]:
# Publishing event 2 to stream 2
test_stream_2.events.publish(event_data_2)

# 6 - Publishing a csv

'time', 'timestamp', 'datetime' or first integer

https://www.kaggle.com/datasets/taranvee/smart-home-dataset-with-weather-information

In [None]:
csv_url = "https://raw.githubusercontent.com/JotaBlanco/Telemetry-Data/main/Datasets/IoT_home_data_sample.csv"

df = pd.read_csv(csv_url)
df.head()

Unnamed: 0,time,use [kW],gen [kW],House overall [kW],Dishwasher [kW],Furnace 1 [kW],Furnace 2 [kW],Home office [kW],Fridge [kW],Wine cellar [kW],...,visibility,summary,apparentTemperature,pressure,windSpeed,cloudCover,windBearing,precipIntensity,dewPoint,precipProbability
0,1452036690,0.67355,0.042183,0.67355,1.7e-05,0.020533,0.065133,0.040833,0.113383,0.00805,...,10.0,Clear,35.13,1017.59,6.69,0.0,254.0,0.0,30.58,0.0
1,1452036691,0.658683,0.043633,0.658683,0.0,0.020567,0.065583,0.0408,0.113267,0.007917,...,10.0,Clear,35.13,1017.59,6.69,0.0,254.0,0.0,30.58,0.0
2,1452036692,0.658417,0.044217,0.658417,0.0,0.0206,0.06625,0.0408,0.1131,0.008033,...,10.0,Clear,35.13,1017.59,6.69,0.0,254.0,0.0,30.58,0.0
3,1452036693,0.656833,0.045233,0.656833,1.7e-05,0.020617,0.065983,0.040817,0.113083,0.007967,...,10.0,Clear,35.13,1017.59,6.69,0.0,254.0,0.0,30.58,0.0
4,1452036694,0.55315,0.046667,0.55315,0.0,0.02065,0.065033,0.040733,0.112933,0.007983,...,10.0,Clear,35.13,1017.59,6.69,0.0,254.0,0.0,30.58,0.0


In [None]:
q

##2.5 - Pusblish random data

In [None]:
import random

In [None]:
df = pd.DataFrame({
    "Timestamp": [pd.Timestamp.now()],
    "Param A": [random.randint(10, 20)],
    "Param B": [random.randint(0, 10)]
})
stream = random.choice([test_stream_1, test_stream_2])
stream.timeseries.publish(df)