In [18]:
from IPython.display import display, Markdown

from netdata_llm_agent import NetdataLLMAgent


def print_md(text):
    display(Markdown(text))


# list of netdata urls to interact with
netdata_urls = [
    'https://localhost:19999/',
    'https://london3.my-netdata.io/', 
    'https://bangalore.my-netdata.io/',
    'https://newyork.my-netdata.io/',
    'https://sanfrancisco.my-netdata.io/',
    'https://singapore.my-netdata.io/',
    'https://toronto.my-netdata.io/',
]

# create agent
agent = NetdataLLMAgent(netdata_urls, model='gpt-4o-mini')
# create agent using anthropic
# agent = NetdataLLMAgent(netdata_urls, model='claude-3-5-sonnet-20241022', platform='anthropic')
# create agent using ollama
# agent = NetdataLLMAgent(netdata_urls, model='llama3.1', platform='ollama')

# chat with the agent
agent.chat('How much disk space is on london?', verbose=True, no_print=False)


How much disk space is on london?
Tool Calls:
  get_charts (call_Ail79vhFSkyZeCBCleqC5pTE)
 Call ID: call_Ail79vhFSkyZeCBCleqC5pTE
  Args:
    netdata_host_url: https://london3.my-netdata.io/
Name: get_charts

[
  [
    "system.idlejitter",
    "CPU Idle Jitter (system.idlejitter)"
  ],
  [
    "netdata.statsd_metrics",
    "Metrics in the netdata statsd database (netdata.statsd_metrics)"
  ],
  [
    "netdata.statsd_useful_metrics",
    "Useful metrics in the netdata statsd database (netdata.statsd_useful_metrics)"
  ],
  [
    "netdata.statsd_events",
    "Events processed by the netdata statsd server (netdata.statsd_events)"
  ],
  [
    "netdata.statsd_reads",
    "Read operations made by the netdata statsd server (netdata.statsd_reads)"
  ],
  [
    "netdata.statsd_bytes",
    "Bytes read by the netdata statsd server (netdata.statsd_bytes)"
  ],
  [
    "netdata.statsd_packets",
    "Network packets processed by the netdata statsd server (netdata.statsd_packets)"
  ],
  [
    "ne

In [17]:
# chat with agent
msg = 'What hosts are reachable from the london node?'
print_md(agent.chat(msg, return_last=True))

The following hosts are reachable from the London node:

1. **registry.my-netdata.io** - 0 hops
2. **toronto.netdata.rocks** - 6 hops
3. **sanfrancisco** - 6 hops
4. **frankfurt.netdata.rocks** - 4 hops
5. **newyork.netdata.rocks** - 4 hops
6. **singapore.netdata.rocks** - 6 hops
7. **d1.firehol.org** - 4 hops
8. **bangalore** - 3 hops

The host **cdn77** is not reachable.

In [3]:
msg = 'What is my CPU utilization recently on london?'
print_md(agent.chat(msg, return_last=True))

Over the past hour, the CPU utilization on the London server (`https://london3.my-netdata.io/`) has varied across different components. Here's a summary of CPU usage in percentages for key components:

- **User** CPU utilization has fluctuated, with an average around 4.3%.
- **System** CPU utilization has averaged around 1.8%.
- **I/O Wait** has been minimal, averaging around 0.02%.
- **SoftIRQ** has hovered around 0.64%.
- **Steal** has been very low, averaging around 0.09%.

These values indicate a relatively low load on the system, with user processes consuming the majority of CPU resources. If you need more detailed analysis or further breakdowns, feel free to ask!

In [5]:
msg = 'What apps are using most cpu on new york over the last 15 minutes?'
print_md(agent.chat(msg, return_last=True))

Over the last 15 minutes, here are the CPU utilization metrics for several applications on the New York node:

1. **app.go_d_plugin_cpu_utilization**:
   - Average User CPU: 0.58
   - Average System CPU: 0.28

2. **app.ebpf_plugin_cpu_utilization**:
   - Average User CPU: 0.00
   - Average System CPU: 0.05

3. **app.NETWORK-VIEWER_cpu_utilization**:
   - Average User CPU: 0.00
   - Average System CPU: 0.00

4. **app.nfacct_plugin_cpu_utilization**:
   - Average User CPU: 0.00
   - Average System CPU: 0.00

5. **app.systemd_cpu_utilization**:
   - Average User CPU: 0.51
   - Average System CPU: 0.02

6. **app.netdata_cpu_utilization**:
   - Average User CPU: 0.49
   - Average System CPU: 0.42

7. **app.httpd_cpu_utilization**:
   - Average User CPU: 9.00 (notable peak)
   - Average System CPU: 2.07

8. **app.vpn_cpu_utilization**:
   - Average User CPU: 0.00
   - Average System CPU: 0.09

9. **app.logs_cpu_utilization**:
   - Average User CPU: 0.00
   - Average System CPU: 0.09

10. **app.nfs_cpu_utilization**:
    - Average User CPU: 0.00
    - Average System CPU: 0.00

### Notable Observations:
- The **app.httpd_cpu_utilization** has the highest CPU utilization, particularly in User CPU usage.
- Several applications show negligible or zero CPU utilization over this period.

Please let me know if you need more specific information or other charts!

In [5]:
msg = 'What users are using most cpu on sanfransicso the last hour?'
print_md(agent.chat(msg, return_last=True))

Here are the user processes' CPU utilization percentages over the last hour in San Francisco. The values represent the average utilization for each data point taken at intervals within the hour:

- User CPU Utilization: Averages between roughly 8.5% to 10.5%.
- System CPU Utilization: Ranges between approximately 2.1% to 3.0%.

Note that there are some missing data points. However, overall trends indicate that user processes are consuming significantly more CPU than system processes. If you need a specific breakdown by users, further analysis can be performed based on user-specific data if available.

In [14]:
msg = 'What specific users are using most cpu on bangalore?'
print_md(agent.chat(msg, return_last=True))

On the Bangalore node, the current CPU utilization by specific users is as follows:

1. **User `netdata`**
   - User CPU Utilization: 12.000247%
   - System CPU Utilization: 4.6673811%

2. **User `root`**
   - User CPU Utilization: 2.0012143%
   - System CPU Utilization: 3.3342598%
   - Total CPU Utilization: 5.3354741%

3. **User `www-data`**
   - User CPU Utilization: 0%
   - System CPU Utilization: 0%
   - Total CPU Utilization: 0%

4. **User `Debian-exim`**
   - User CPU Utilization: 0%
   - System CPU Utilization: 0%
   - Total CPU Utilization: 0%

5. **User `unbound`**
   - User CPU Utilization: 0%
   - System CPU Utilization: 0%
   - Total CPU Utilization: 0%

6. **User `daemon`**
   - User CPU Utilization: 0%
   - System CPU Utilization: 0%
   - Total CPU Utilization: 0%

7. **User `logind`** (in the context of users, like sessions)
   - Online Users: 0
   - Active Users: 0

Based on the current data, the user `netdata` is utilizing the highest CPU among the users, followed by `root`. Other users are not consuming any CPU resources at this time.

In [7]:
msg = 'what is the ram usage like on bangalore node?'
print_md(agent.chat(msg, return_last=True))

The RAM usage on the Bangalore node is as follows (in MB):

- **Free RAM**: varies between approximately 157 MB to 170 MB.
- **Used RAM**: hovers around 951 MB to 964 MB.
- **Cached**: remains steady at approximately 717 MB.
- **Buffers**: remains steady at approximately 127 MB. 

These values give a snapshot of the current RAM utilization.

In [8]:
msg = 'How are the mysql metrics looking on the london?'
print_md(agent.chat(msg, return_last=True))

Here's a quick summary of the current MySQL metrics for the London host:

1. **CPU Utilization**: 
   - User CPU utilization fluctuates between approximately 0.28 to 0.48.
   - System CPU utilization remains much lower in comparison, occasionally spiking.

2. **Memory Usage**: 
   - Unfortunately, there was an SSL error while attempting to retrieve memory usage data, preventing a complete analysis. It may be necessary to check the connection or the server's SSL configuration.

3. **Connections**: 
   - The data indicates no open or aborted connections for the checked interval.

4. **Queries**: 
   - Both regular and slow queries appear steady, maintaining a rate of approximately six queries/sec, with no slow queries detected in the last few minutes.

If you have any specific metrics you'd like me to investigate or need further assistance, feel free to ask!

In [11]:
msg = 'any active alarms on toronto?'
print_md(agent.chat(msg, return_last=True))

There are currently two active alarms on the Toronto Netdata instance:

1. **Alarm Name:** 10min_qos_packet_drops
   - **Chart:** `tc.eth0_dropped`
   - **Status:** WARNING
   - **Value:** 2 packets
   - **Info:** Dropped packets in the last 5 minutes
   - **Summary:** QOS packet drops

2. **Alarm Name:** 10min_qos_packet_drops
   - **Chart:** `tc.eth0-ifb_dropped`
   - **Status:** WARNING
   - **Value:** 4 packets
   - **Info:** Dropped packets in the last 5 minutes
   - **Summary:** QOS packet drops

Both alarms indicate that there are packet drops occurring in the specified time frame.

In [12]:
msg = 'what do the alarms mean? should i be worried?'
print_md(agent.chat(msg, return_last=True, continue_chat=True))

The active alarms indicate that there are packets being dropped in the network interface cards monitored by the respective charts. Here's a breakdown of what this means:

1. **Packet Drops**: Packet drops occur when a device cannot process incoming packets fast enough. This can be due to several reasons such as high traffic, insufficient resources (CPU, memory), network congestion, or hardware issues.

2. **Implications**:
   - If your application relies on real-time communication (like VoIP, online gaming, or streaming services), packet drops can lead to performance issues like lag, interruptions, or lower quality service.
   - If this is happening on a critical service or application, it may affect the availability or reliability of your services.

3. **Should You Be Worried?**:
   - **Short Intervals**: If this is a transient issue (i.e., it happens for a short time and is followed by no further drops), it may not be a major concern.
   - **Persisting Issues**: If these alarms persist, you should investigate the cause. It may require tuning your network settings, allocating more resources, or checking for hardware issues.

### Recommended Actions:
- Monitor the charts for trends in packet drops.
- Check the health of your network equipment (routers, switches).
- Analyze traffic patterns to see if there are spikes in usage that coincide with the packet drops.
- Consider setting up additional alarms or performance metrics to get alerted if the situation worsens.

In summary, while packet drops are a cause for concern, especially if they’re consistent, they are not necessarily an immediate crisis. Proactive monitoring and analysis will help ensure that any potential issues are managed swiftly.

In [11]:
msg = 'How are the nginx metrics looking on the london node over he last couple of hours?'
print_md(agent.chat(msg, return_last=True))

Here's a summary of the Nginx metrics on the London node over the last couple of hours:

1. **Connections**: The average number of active connections is around 151, with fluctuations between 135 and 167 observed periodically.

2. **Requests**: Approximately 10–11 requests were handled on average per data point during the period.

3. **Accepted and Handled Connections**: There's a steady rate of accepted and handled connections, which appears to hover slightly under 1 connection per data interval on average.

4. **Connection Status**:
   - Most connections are idle, ranging from around 128 to 166 on average.
   - The number of active writing connections is generally stable at about 1.

These metrics indicate relatively steady activity on the Nginx server in London with no significant variations observed in the last couple of hours. If you need further details or have specific questions about these metrics, feel free to ask!