<div style="display: flex; align-items: center;">
    <img src="SAGE_logo.jpeg" alt="descripción" width="150" style="margin-right: 10px; vertical-align: middle;">
    <h1>NSF National Data Platform (NDP)</h1>
</div>

<h3 style="text-align: center; margin-top: 0;">Streaming Data from SAGE Pilot</h3>

**Contact:** Scientific and Computing Imaging Institute, University of Utah ([ivan.rodero@utah.edu](mailto:ivan.rodero@utah.edu))

<div style="display: flex; align-items: center;">
    <img src="https://new.nsf.gov/themes/custom/nsf_theme/components/images/logo/logo-desktop.svg" alt="NSF Logo" width="120" style="margin-right: 10px; vertical-align: middle;">
    <span style="font-size: 10px; margin-top:10px;">The National Data Platform was funded by NSF 2333609 under CI, CISE Research Resources programs. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funders.</span>
</div>

### Importing Libraries

- `import asyncio`: Used for writing concurrent code using the async/await syntax.
- `import json`: This library is used for parsing JSON data.
- `import os`: Provides a way of using operating system dependent functionality like reading or writing to a file.
- `import webbrowser`: Allows displaying web-based documents to users.
- `from datetime import datetime`: For handling date and time information.
- `from IPython.display import clear_output, display`: Used for displaying output in Jupyter Notebooks and clearing the output respectively.
- `import ipywidgets as widgets`: Provides interactive HTML widgets for Jupyter notebooks.
- `from aiokafka import AIOKafkaConsumer`: Asynchronous Kafka client that consumes messages from a Kafka topic.
- `from plotly.graph_objs import FigureWidget, Scatter`: For creating interactive visualizations using Plotly within Jupyter Notebooks.
- `from plotly.subplots import make_subplots`: Utility to create figures with multiple subplots.
- `import plotly.graph_objects as go`: Contains graph objects for all types of visualizations.
- `import numpy as np`: Fundamental package for scientific computing with Python.
- `import nest_asyncio`: Apply a patch to the asyncio module to allow nested use of asyncio.run and loop.run_until_complete.
- `from ai_forecast import IncrementalModel, parse_timestamp`: Imports custom classes and functions for forecasting and timestamp parsing.

In [2]:
import asyncio
import json
import os
import webbrowser
from datetime import datetime
from IPython.display import clear_output, display
import ipywidgets as widgets
from ipywidgets import IntText, HBox, Label
from aiokafka import AIOKafkaConsumer
from plotly.graph_objs import FigureWidget, Scatter
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import numpy as np
import nest_asyncio
nest_asyncio.apply()
from ai_forecast import IncrementalModel, parse_timestamp
import requests
import pandas as pd
from basic_functions import get_and_display_consumer_data, stream_and_visualize_data

In [3]:
KAFKA_HOST = "155.101.6.194"
KAFKA_PORT = "9092"
consumers = ["wind", "temperature", "humidity", "pressure", "air_quality"]

In [4]:
# To see the active consumers, we can make a call to the Master API endpoint that retrieves and displays them:
#consumer_ids = get_and_display_consumer_data('http://master_api:8000')

### Real-Time Data Streaming and Visualization

The `stream_and_visualize_data` function is at the heart of our real-time data analysis and visualization tool. It connects to a Kafka topic as a consumer using a given `consumer_id` and streams data in real time. The function then processes and visualizes this data dynamically, providing insights into trends as they occur.

Key components of this function include:
- **Kafka Consumer Initialization**: Establishes a connection to a Kafka topic to consume messages.
- **Data Processing**: Upon receiving data, it parses the JSON payload, extracts relevant information, and updates the data model.
- **Model Training and Prediction**: Utilizes an incremental model to train on the newly arrived data and makes future predictions based on the model.
- **Dynamic Visualization**: Leverages Plotly to plot real-time data and predictions. The visualization includes both historical data and the latest data points to show trends over time.
- **Saving Option**: Optionally, the visualized data can be saved as an HTML file for offline viewing and shared with others.

This approach enables real-time monitoring and forecasting, making it invaluable for applications requiring up-to-the-minute data analysis, such as environmental monitoring, financial market tracking, or IoT device management.


### Visualizing Real-Time Data for a Selected Consumer

After retrieving and displaying the active consumers, we can focus on a specific consumer to visualize their data in real time. By selecting a consumer ID from the previously obtained array of consumer IDs, we can tailor our data visualization to show trends and predictions related to that particular consumer's data stream.

The code snippet below demonstrates how to select the third consumer from our list (noting that Python uses zero-based indexing) and visualize their real-time data along with future predictions:

In [11]:
consumer_id = consumers[0]
await stream_and_visualize_data(KAFKA_HOST, KAFKA_PORT, consumer_id, predictions=20, timestamp=False, save=False)

FigureWidget({
    'data': [{'mode': 'lines',
              'name': 'Real Data - Historical',
              'type': 'scatter',
              'uid': '9927abe5-ace9-45ea-963e-ee7f603c3cf8'},
             {'mode': 'lines+markers',
              'name': 'Real Data - Latest',
              'type': 'scatter',
              'uid': '48e80745-0099-49d9-9893-0eea753b6a1a'},
             {'line': {'dash': 'dot'},
              'mode': 'lines',
              'name': 'Predictions - Historical',
              'type': 'scatter',
              'uid': '38acecd6-e35d-4473-9616-038d5deb87d1'},
             {'line': {'dash': 'dot'},
              'mode': 'lines+markers',
              'name': 'Predictions - Latest',
              'type': 'scatter',
              'uid': '9d80f4ed-4cbe-4362-8d8c-6a623c8eecf8'}],
    'layout': {'autosize': True,
               'height': 600,
               'template': '...',
               'title': {'text': 'Real-Time Data and Predictions for consumer wind'},
               


The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.


The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.



CancelledError: 