<div style="display: flex; align-items: center;">
    <img src="SAGE_logo.jpeg" alt="descripción" width="150" style="margin-right: 10px; vertical-align: middle;">
    <h1>NSF National Data Platform (NDP)</h1>
</div>

<h3 style="text-align: center; margin-top: 0;">Streaming Data from SAGE Pilot</h3>

**Contact:** Scientific and Computing Imaging Institute, University of Utah ([ivan.rodero@utah.edu](mailto:ivan.rodero@utah.edu))

<div style="display: flex; align-items: center;">
    <img src="https://new.nsf.gov/themes/custom/nsf_theme/components/images/logo/logo-desktop.svg" alt="NSF Logo" width="120" style="margin-right: 10px; vertical-align: middle;">
    <span style="font-size: 10px; margin-top:10px;">The National Data Platform was funded by NSF 2333609 under CI, CISE Research Resources programs. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funders.</span>
</div>

In [None]:
from basic_functions import get_and_display_consumer_data, stream_and_visualize_data
import warnings
from IPython.display import display, HTML

css_style = """
<style>
.output {
    max-height: 600px;
    overflow-y: auto;
    max-width: 100%;
    overflow-x: auto;
}
</style>
"""

display(HTML(css_style))
warnings.filterwarnings("ignore", message="The behavior of DataFrame concatenation with empty or all-NA entries is deprecated")


In [None]:
KAFKA_HOST = "155.101.6.194"
KAFKA_PORT = "9092"
consumers = ["wind", "temperature", "humidity", "pressure", "air_quality"]

### Real-Time Data Streaming and Visualization

The `stream_and_visualize_data` function is at the heart of our real-time data analysis and visualization tool. It connects to a Kafka topic as a consumer using a given `consumer_id` and streams data in real time. The function then processes and visualizes this data dynamically, providing insights into trends as they occur.

Key components of this function include:
- **Kafka Consumer Initialization**: Establishes a connection to a Kafka topic to consume messages.
- **Data Processing**: Upon receiving data, it parses the JSON payload, extracts relevant information, and updates the data model.
- **Model Training and Prediction**: Utilizes an incremental model to train on the newly arrived data and makes future predictions based on the model.
- **Dynamic Visualization**: Leverages Plotly to plot real-time data and predictions. The visualization includes both historical data and the latest data points to show trends over time.
- **Saving Option**: Optionally, the visualized data can be saved as an HTML file for offline viewing and shared with others.

This approach enables real-time monitoring and forecasting, making it invaluable for applications requiring up-to-the-minute data analysis, such as environmental monitoring, financial market tracking, or IoT device management.


### Visualizing Real-Time Data for a Selected Consumer

After retrieving and displaying the active consumers, we can focus on a specific consumer to visualize their data in real time. By selecting a consumer ID from the previously obtained array of consumer IDs, we can tailor our data visualization to show trends and predictions related to that particular consumer's data stream.

The code snippet below demonstrates how to select the third consumer from our list (noting that Python uses zero-based indexing) and visualize their real-time data along with future predictions:

In [None]:
consumer_id = consumers[0]
await stream_and_visualize_data(KAFKA_HOST, KAFKA_PORT, consumer_id, predictions=20)