In [1]:
from IPython.display import display, Markdown, Latex

from netdata_llm_agent import NetdataLLMAgent


def print_md(text):
    display(Markdown(text))


# list of netdata urls to interact with
netdata_urls = [
    'https://london3.my-netdata.io/', 
    'https://bangalore.my-netdata.io/',
    'https://newyork.my-netdata.io/',
    'https://sanfrancisco.my-netdata.io/',
    'https://singapore.my-netdata.io/',
    'https://toronto.my-netdata.io/',
]

# create agent
agent = NetdataLLMAgent(netdata_urls)

In [12]:
# chat with agent
msg = 'What hosts are reachable from the london node?'
print_md(agent.chat(msg, return_last=True))

The london node you're referring to is likely `https://london3.my-netdata.io/`. The nodes or hosts that are generally part of this setup and are reachable from this node include:

- `https://bangalore.my-netdata.io/`
- `https://newyork.my-netdata.io/`
- `https://sanfrancisco.my-netdata.io/`
- `https://singapore.my-netdata.io/`
- `https://toronto.my-netdata.io/`

These URLs correspond to nodes which are part of a distributed Netdata monitoring setup.

In [3]:
msg = 'What is my CPU utilization recently on london?'
print_md(agent.chat(msg, return_last=True))

Recently, the CPU utilization on the London host has shown the following metrics (averaged over roughly the last hour):

- **User**: Approximately 4.5% to 5%
- **System**: Approximately 1.6% to 2.1%
- **I/O Wait**: Less than 0.1%
- **Steal**: Less than 0.2%
- **SoftIRQ**: Around 0.45% to 0.55%
- **Nice**, **IRQ**, and **Guest**: Negligible across the board.

The user and system components of CPU utilization are the most significant contributors during this period.

In [4]:
msg = 'What apps are using most cpu on new york? over the last 15 minutes lets say'
print_md(agent.chat(msg, return_last=True))

Over the last 15 minutes, the "ebpf_plugin" application on the New York server has shown varying CPU utilization. Here's a summary of the CPU usage (in fractions of a core):

- User CPU utilization ranges from 0.00 to 0.04.
- System CPU utilization ranges from 0.02 to 0.10.

Apps are low on user CPU usage but show a slight increase periodically in system CPU usage. If you're looking for applications utilizing more CPU resources, we might need to check other specific applications. Please let me know if you would like further details on a particular app.

In [5]:
msg = 'What users are using most cpu on sanfransicso the last hour?'
print_md(agent.chat(msg, return_last=True))

In the last hour on the San Francisco node, the following users showed noteworthy CPU utilization:

1. **netdata**: Consistently showed CPU utilization, albeit relatively low.
2. **messagebus**: Had occasional spikes in CPU usage but generally low.
3. **root**: Exhibited noticeable CPU utilization.
4. **Debian-exim**: Occasionally showed some CPU utilization peaks.
5. **systemd**: Other users like `systemd-journal-upload` and sub-components showed minimal to no CPU activity.

The `user` processes generally exhibit very low CPU utilization during this period on the San Francisco node. If you're looking for specific high CPU usage instances, we might need to explore specific timestamps or application-specific charts. Please let me know if there is any specific application or time period you'd like to investigate further.

In [6]:
msg = 'What specific users are using most cpu on bangalore?'
print_md(agent.chat(msg, return_last=True))

Based on the recent data for CPU utilization by users on the Bangalore server, the users consuming the most CPU resources are as follows:

1. **user.netdata:**
   - CPU Utilization: Varies between approximately 9% to 17% of a single core. This appears to be the most CPU-intensive user.

2. **user.sshd:**
   - CPU Usage: Typically shows 0% usage, with occasional spikes. Not significant under typical operation.

3. **user.root:**
   - Occasional CPU spikes with values like 2.17% at some points, but generally lower when compared to user.netdata.

Other users such as **user.www-data, user.Debian-exim,** and **user.unbound** display negligible or zero CPU consumption during this observed period.

If you require further analysis on other specific timelines or additional user data, please let me know!

In [7]:
msg = 'what is the ram usage like on bangalore node?'
print_md(agent.chat(msg, return_last=True))

The current RAM usage on the Bangalore node is as follows:

- **Free**: 89.37 MiB
- **Used**: 937.57 MiB
- **Cached**: 808.17 MiB
- **Buffers**: 131.84 MiB

If you have any other questions or need further details, feel free to ask!

In [8]:
msg = 'How are the mysql metrics looking on the london?'
print_md(agent.chat(msg, return_last=True))

Here's an overview of the MySQL metrics for the London server over the past hour:

1. **Threads**:
   - The number of connected threads fluctuated around 1, with running threads consistently at 1.

2. **Active Connections**:
   - Active connections remained stable at 1 with a connection limit of 151.

3. **InnoDB I/O**:
   - No read or write operations were reported, indicating low or no disk activity within the monitored period.

4. **Queries**:
   - An average of around 6 queries were handled per second, with no slow queries recorded during this time.

Overall, the MySQL server is experiencing low activity with stable connections and query management. If you need more detailed analysis or further metrics, feel free to ask!

In [9]:
msg = 'any active alarms on the toronto node?'
print_md(agent.chat(msg, return_last=True))

I couldn't retrieve the current active alarms from the Toronto node. Please try again later or check the node's status directly.

In [10]:
msg = 'what does that alarm mean?'
print_md(agent.chat(msg, return_last=True, continue_chat=True))

I apologize for any confusion. Since I couldn't retrieve the active alarms from the Toronto node, I'm unable to provide information about a specific alarm's meaning at this time. If you have a specific alarm message or description, please provide it, and I can help explain what it typically means.

In [11]:
msg = 'How are the nginx metrics looking on the london node over he last couple of hours?'
print_md(agent.chat(msg, return_last=True))

For the last couple of hours, here's how the nginx metrics are performing on the London node:

### Bandwidth:
**Received** and **Sent Bandwidth** peaked and varied over time. Here's the last few data points:
- Latest Median Received: **~60 Mbps**
- Latest Median Sent: **~540 Mbps**

### Request Processing Time:
The request processing time shows distinct peaks, but remains fairly consistent:
- Latest Average Processing Time: **~0.85 - 0.9 seconds**

### Requests by URL Pattern:
Different endpoints and patterns vary significantly in terms of requests, where key patterns like "charts" and "registry" show consistent activity:
- Majority of requests are for `charts` and `registry` patterns with steady values.

This overview gives you a solid high-level view of the recent nginx activity. Adjustments may be required if maintaining these numbers isn't meeting performance goals efficiently.