<a href="https://colab.research.google.com/github/brendanpshea/intro_to_networks/blob/main/Networks_12_Monitoring.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction: The Importance of Network Monitoring

**Network monitoring** is like having a health monitoring system for your computer network. Just as doctors monitor vital signs in patients, network administrators monitor the health and performance of their networks to prevent problems and ensure everything runs smoothly.

Key reasons why network monitoring is essential:
* Detecting and preventing network outages before users are affected
* Identifying security threats in real-time
* Optimizing network performance for better user experience
* Planning for future network growth based on usage patterns
* Troubleshooting network issues quickly when they do occur

Without proper monitoring, network problems might only be discovered when users complain that "the internet is down" or "the network is slow." By then, the damage is done, and users are already frustrated. Monitoring tools provide early warning signs of developing issues, much like how fever can indicate illness before more serious symptoms appear.

In this chapter, we'll explore the various technologies and solutions used for effective network monitoring.

In [None]:
# @title
import base64
import requests
from IPython.display import SVG, display, HTML

def mm(graph: str) -> None:
    """
    Fetch and display a Mermaid diagram as SVG.

    Parameters:
      graph (str): Mermaid graph definition.
    """
    # 1. Encode the graph to Base64
    b64 = base64.urlsafe_b64encode(graph.encode('utf-8')).decode('ascii')
    # 2. Construct the SVG URL
    url = f'https://mermaid.ink/svg/{b64}'
    # 3. Fetch SVG content
    svg_data = requests.get(url).text
    # 4. Render inline in Jupyter

    display(HTML(f'{svg_data}'))

mm("""
flowchart TD
    A[Network Discovery] --> B[Monitoring Methods]
    B --> C1[SNMP]
    B --> C2[Packet Capture]
    B --> C3[Flow Data]
    B --> C4[Port Mirroring]
    B --> C5[Log Aggregation]
    C1 --> D[Monitoring Solutions]
    C2 --> D
    C3 --> D
    C4 --> D
    C5 --> D
    D --> E1[Traffic Analysis]
    D --> E2[Performance Monitoring]
    D --> E3[Availability Monitoring]
    D --> E4[Configuration Monitoring]
    E1 --> F[Network Health]
    E2 --> F
    E3 --> F
    E4 --> F

    classDef methods fill:#d4f1f9,stroke:#05b2dc,stroke-width:2px
    classDef solutions fill:#ffe6cc,stroke:#ff9900,stroke-width:2px

    class A,B methods
    class C1,C2,C3,C4,C5 methods
    class D,E1,E2,E3,E4,F solutions""")

## Network Discovery: Ad Hoc and Scheduled Approaches

Before you can monitor a network, you need to know what devices are connected to it. **Network discovery** is like taking inventory of everything in your network - it finds and identifies all computers, routers, switches, printers, and other devices.

Network discovery can be performed in two main ways:

* **Ad hoc discovery** - Performed on demand when troubleshooting specific issues or after network changes. This is like doing a quick count of people in a room only when you need to know.

* **Scheduled discovery** - Runs automatically at regular intervals (hourly, daily, weekly). This is like having an automated attendance system that regularly counts and records who is present.

Example of network discovery results:

| Device Name | IP Address | Type | Location | Status |
|-------------|------------|------|----------|--------|
| Router-Main | 192.168.1.1 | Router | Server Room | Online |
| Switch-Floor1 | 192.168.1.2 | Switch | 1st Floor Closet | Online |
| Printer-HR | 192.168.1.101 | Printer | HR Department | Online |
| Server-File | 192.168.1.10 | Server | Server Room | Online |
| AP-Lobby | 192.168.1.201 | Access Point | Main Lobby | Offline |

Notice how discovery not only finds devices but also categorizes them by type and notes their status. This information provides the foundation for all network monitoring activities.

## Understanding SNMP: Versions, Community Strings, and Authentication

**Simple Network Management Protocol (SNMP)** is a standard protocol used to collect information from and manage network devices such as routers, switches, servers, printers, and more. It provides a common language for network devices to communicate status and configuration information.

 and security features:

* **SNMP v2c**:
  * Uses community strings for authentication
  * Data transmitted in plain text (not encrypted)
  * Widely supported but less secure

* **SNMP v3**:
  * Adds authentication and encryption
  * Provides message integrity
  * Offers access control features
  * Most secure version of SNMP

* **Community strings**:
  * Act like passwords for SNMP v2c
  * Default strings (like "public" and "private") should be changed
  * Example: `snmpget -v2c -c my_community_string 192.168.1.1 system.sysUpTime.0`

SNMP is the foundation of many network monitoring systems, providing standardized access to device information. When implementing SNMP, always use v3 when possible for better security, and never use default community strings in production environments.

## MIBs and SNMP Traps: Getting Critical Network Information

SNMP uses two important components to manage and communicate network device information: MIBs and traps.

**Management Information Base (MIB)** is like a catalog of all the information available on a network device. Imagine a library card catalog that tells you exactly where to find any book - the MIB does the same for network information.

Each piece of information in a MIB has a unique address called an Object Identifier (OID). Some common OIDs include:

| OID | What It Represents | Example Value |
|-----|-------------------|---------------|
| 1.3.6.1.2.1.1.1.0 | System Description | "Cisco IOS Software, Version 15.2" |
| 1.3.6.1.2.1.1.3.0 | System Uptime | "15 days, 7 hours, 23 minutes" |
| 1.3.6.1.2.1.2.2.1.8.1 | Interface Status | "1" (up) or "2" (down) |
| 1.3.6.1.2.1.25.2.3.1.6.1 | CPU Usage | "75" (percent) |

**SNMP Traps** are automatic alerts sent by devices when something important happens. Instead of the administrator checking each device constantly, traps allow devices to report problems on their own - like a smoke detector that alerts you when there's smoke.

Example SNMP trap message:
```
May 12 14:32:45 Switch1 TRAP:
  Type: linkDown
  Interface: GigabitEthernet1/0/1
  Reason: "Admin shutdown"
```

This trap shows that an interface on Switch1 has gone down because an administrator disabled it.

Sample SNMP trap message:
```
TRAP received from 192.168.1.1: LinkDown Trap
Interface: GigabitEthernet1/0/1
Time: May 12, 2025 14:32:45
```

MIBs and traps work together to provide both passive monitoring (querying MIBs) and active alerting (receiving traps) for complete visibility into network health and performance.

In [None]:
# @title
mm("""
flowchart LR
    A[SNMP Manager] -- "GET Request" --> B[SNMP Agent]
    B -- "Response (MIB Data)" --> A
    B -- "SNMP Trap (Alert)" --> A

    subgraph NMS[Network Management System]
        A
        C[Monitoring Dashboard]
        D[Alert System]
        A --- C
        A --- D
    end

    subgraph Network Devices
        B
        E[MIB Database]
        B --- E
    end

    classDef manager fill:#d4f1f9,stroke:#05b2dc,stroke-width:2px
    classDef agent fill:#ffe6cc,stroke:#ff9900,stroke-width:2px

    class A,C,D manager
    class B,E agent""")

## Packet Capture and Flow Data: Seeing Network Traffic in Action

Network administrators need to analyze the actual data moving through their networks to troubleshoot problems and detect security issues.

**Packet Capture** records and analyzes the complete contents of data packets transmitted across a network. Using tools like Wireshark, administrators can see exactly what's happening on the network.

Sample packet capture output:
```
Time         Source IP     Destination IP  Protocol  Info
10:15:22.350 192.168.1.5   8.8.8.8         DNS       Standard query A example.com
10:15:22.425 8.8.8.8       192.168.1.5     DNS       Standard query response A 93.184.216.34
10:15:22.426 192.168.1.5   93.184.216.34   TCP       56788 → 80 [SYN]
```

How to interpret this packet capture:
1. A computer (192.168.1.5) asked a DNS server (8.8.8.8) for the IP address of example.com
2. The DNS server responded that example.com is at IP address 93.184.216.34
3. The computer then started to set up a TCP connection to that IP address on port 80 (web)

**Flow Data** provides summary information about network conversations without capturing the actual data content. It's like seeing that two people had a 5-minute phone call without hearing what they said.

Sample flow data:

| Start Time | Source IP | Destination IP | Protocol | Bytes | Packets |
|------------|-----------|----------------|----------|-------|---------|
| 14:20:15   | 192.168.1.5 | 8.8.8.8      | UDP      | 325   | 5      |
| 14:20:16   | 10.0.0.12  | 172.16.5.10   | TCP      | 1420  | 12     |

Sample flow data output:
```
Start Time          Source IP      Dest IP        Protocol  Bytes   Packets
14:20:15            192.168.1.5    8.8.8.8        UDP       325     5
14:20:16            10.0.0.12      172.16.5.10    TCP       1420    12
```

While packet capture provides more detail, it requires more storage and processing power. Flow data offers a lighter-weight alternative that still provides valuable information about network traffic patterns and can help identify **baseline metrics** - the normal operational patterns against which anomalies can be detected.

In [None]:
# @title
%%html
<svg viewBox="0 0 800 500" xmlns="http://www.w3.org/2000/svg">
  <style>
    .title { font-family: Arial, sans-serif; font-size: 24px; font-weight: bold; text-anchor: middle; }
    .subtitle { font-family: Arial, sans-serif; font-size: 18px; font-weight: bold; text-anchor: middle; }
    .label { font-family: Arial, sans-serif; font-size: 14px; text-anchor: middle; }
    .content { font-family: 'Courier New', monospace; font-size: 12px; }
    .machine { font-family: Arial, sans-serif; font-size: 14px; text-anchor: middle; }
    .arrow { stroke: #333; stroke-width: 2; fill: none; marker-end: url(#arrowhead); }
    .packet { fill: #e1f5fe; stroke: #0288d1; stroke-width: 2; }
    .device { fill: #e8f5e9; stroke: #2e7d32; stroke-width: 2; }
    .tool { fill: #fff3e0; stroke: #e65100; stroke-width: 2; }
    .header { fill: #f9fbe7; }
    .protocol-dns { fill: #bbdefb; }
    .protocol-tcp { fill: #c8e6c9; }
    .protocol-http { fill: #ffccbc; }
  </style>

  <defs>
    <marker id="arrowhead" markerWidth="10" markerHeight="7" refX="9" refY="3.5" orient="auto">
      <polygon points="0 0, 10 3.5, 0 7" fill="#333" />
    </marker>
  </defs>

  <text x="400" y="30" class="title">Packet Capture Analysis</text>

  <!-- Network Diagram -->
  <rect x="100" y="70" width="600" height="160" rx="10" fill="#f5f5f5" stroke="#bdbdbd" stroke-width="2" />
  <text x="400" y="90" class="subtitle">Network Setup</text>

  <!-- Client Computer -->
  <rect x="150" y="120" width="120" height="60" rx="10" class="device" />
  <text x="210" y="155" class="machine">Client Computer</text>
  <text x="210" y="175" class="machine">192.168.1.5</text>

  <!-- DNS Server -->
  <rect x="400" y="120" width="120" height="60" rx="10" class="device" />
  <text x="460" y="155" class="machine">DNS Server</text>
  <text x="460" y="175" class="machine">8.8.8.8</text>

  <!-- Web Server -->
  <rect x="650" y="120" width="120" height="60" rx="10" class="device" />
  <text x="710" y="155" class="machine">Web Server</text>
  <text x="710" y="175" class="machine">93.184.216.34</text>

  <!-- Traffic Flows -->
  <path d="M 270 140 L 390 140" class="arrow" />
  <text x="330" y="130" class="label">DNS Query</text>

  <path d="M 400 160 L 280 160" class="arrow" transform="translate(10, 0)" />
  <text x="330" y="180" class="label">DNS Response</text>

  <path d="M 270 150 C 350 150, 550 100, 640 150" class="arrow" />
  <text x="450" y="110" class="label">Web Request</text>

  <!-- Packet Capture Display -->
  <rect x="100" y="250" width="600" height="230" rx="10" fill="#f5f5f5" stroke="#bdbdbd" stroke-width="2" />
  <text x="400" y="270" class="subtitle">Wireshark Packet Capture</text>

  <!-- Packet Table Header -->
  <rect x="120" y="290" width="560" height="25" class="header" stroke="#bdbdbd" />
  <text x="150" y="307" class="content">No.</text>
  <text x="200" y="307" class="content">Time</text>
  <text x="280" y="307" class="content">Source</text>
  <text x="380" y="307" class="content">Destination</text>
  <text x="480" y="307" class="content">Protocol</text>
  <text x="580" y="307" class="content">Info</text>

  <!-- DNS Query Packet -->
  <rect x="120" y="315" width="560" height="25" class="protocol-dns" stroke="#bdbdbd" />
  <text x="150" y="332" class="content">1</text>
  <text x="200" y="332" class="content">0.000</text>
  <text x="280" y="332" class="content">192.168.1.5</text>
  <text x="380" y="332" class="content">8.8.8.8</text>
  <text x="480" y="332" class="content">DNS</text>
  <text x="580" y="332" class="content">Query A example.com</text>

  <!-- DNS Response Packet -->
  <rect x="120" y="340" width="560" height="25" class="protocol-dns" stroke="#bdbdbd" />
  <text x="150" y="357" class="content">2</text>
  <text x="200" y="357" class="content">0.075</text>
  <text x="280" y="357" class="content">8.8.8.8</text>
  <text x="380" y="357" class="content">192.168.1.5</text>
  <text x="480" y="357" class="content">DNS</text>
  <text x="580" y="357" class="content">Response A 93.184.216.34</text>

  <!-- TCP SYN Packet -->
  <rect x="120" y="365" width="560" height="25" class="protocol-tcp" stroke="#bdbdbd" />
  <text x="150" y="382" class="content">3</text>
  <text x="200" y="382" class="content">0.076</text>
  <text x="280" y="382" class="content">192.168.1.5</text>
  <text x="380" y="382" class="content">93.184.216.34</text>
  <text x="480" y="382" class="content">TCP</text>
  <text x="580" y="382" class="content">56788→80 [SYN]</text>

  <!-- TCP SYN-ACK Packet -->
  <rect x="120" y="390" width="560" height="25" class="protocol-tcp" stroke="#bdbdbd" />
  <text x="150" y="407" class="content">4</text>
  <text x="200" y="407" class="content">0.152</text>
  <text x="280" y="407" class="content">93.184.216.34</text>
  <text x="380" y="407" class="content">192.168.1.5</text>
  <text x="480" y="407" class="content">TCP</text>
  <text x="580" y="407" class="content">80→56788 [SYN, ACK]</text>

  <!-- HTTP GET Packet -->
  <rect x="120" y="415" width="560" height="25" class="protocol-http" stroke="#bdbdbd" />
  <text x="150" y="432" class="content">5</text>
  <text x="200" y="432" class="content">0.153</text>
  <text x="280" y="432" class="content">192.168.1.5</text>
  <text x="380" y="432" class="content">93.184.216.34</text>
  <text x="480" y="432" class="content">HTTP</text>
  <text x="580" y="432" class="content">GET / HTTP/1.1</text>

  <!-- Packet Analyzer Tool -->
  <rect x="30" y="330" width="50" height="100" rx="5" class="tool" />
  <text x="55" y="385" class="label" transform="rotate(-90, 55, 385)">Wireshark</text>
</svg>

## Port Mirroring: Creating a Window into Your Network

**Port mirroring** is like setting up a security camera in a store - it creates a copy of all the network traffic passing through selected ports so you can analyze it without disrupting the actual traffic. Without port mirroring, you wouldn't be able to see the traffic flowing between devices.

Port mirroring can be configured in different ways depending on what you want to monitor:

* **One-to-one mirroring** - Copies traffic from a single port to the monitor port
* **Many-to-one mirroring** - Copies traffic from multiple ports to one monitor port
* **VLAN-based mirroring** - Copies all traffic in a VLAN to the monitor port

Here's how port mirroring works in a network:

| Original Path | With Port Mirroring | Purpose |
|---------------|---------------------|---------|
| Client → Switch → Server | Client → Switch → Server<br>↓<br>Monitor | Analyze client-server communication |
| Computer A → Switch → Computer B | Computer A → Switch → Computer B<br>↓<br>IDS/IPS System | Detect security threats |
| User → Switch → Internet | User → Switch → Internet<br>↓<br>Traffic Analyzer | Track bandwidth usage |

Example configuration for a Cisco switch:
```
Switch# configure terminal
Switch(config)# monitor session 1 source interface gigabitethernet1/0/1
Switch(config)# monitor session 1 destination interface gigabitethernet1/0/24
```

This configuration copies all traffic from port 1/0/1 to port 1/0/24, where a monitoring device would be connected.

## Log Aggregation: Centralizing Network Information

Network devices generate large volumes of log data that contain valuable information about network status, security events, and performance issues. **Log aggregation** refers to the collection and centralized storage of logs from multiple sources for analysis and alerting.

**Log aggregation** refers to collecting log messages from multiple network devices in one central location. This makes it easier to spot patterns and troubleshoot problems. Without log aggregation, administrators would need to log into each device separately to view its logs.

There are two main components used in log aggregation systems:

* **Syslog collector** - A central server that receives logs from network devices in a standardized format
* **Security Information and Event Management (SIEM)** - An advanced system that correlates events across devices, provides real-time analysis, and alerts on security incidents

Here's an example of aggregated log messages from multiple devices:

| Timestamp | Device | Severity | Message |
|-----------|--------|----------|---------|
| 14:05:23 | Router1 | Warning | Interface GigabitEthernet0/1 down |
| 14:05:24 | Switch3 | Info | User admin logged in |
| 14:05:30 | Firewall | Critical | Multiple failed login attempts detected |
| 14:05:45 | Router1 | Info | Interface GigabitEthernet0/1 up |

Notice how logs from different devices appear in chronological order, making it easy to see that Router1's interface went down and then came back up, while during that same period there was a login to Switch3 and a security alert from the firewall.

Benefits of centralized logging:
* Faster troubleshooting of network issues
* Comprehensive security monitoring
* Historical data for trend analysis
* Compliance with regulatory requirements
* **Anomaly alerting/notification** when unusual patterns are detected

By aggregating logs from all network devices into a single system, administrators gain a holistic view of network operations and can more easily identify patterns and issues that might otherwise go unnoticed.

In [None]:
# @title
mm("""
flowchart LR
    A[Router] -- "Syslog Messages" --> E
    B[Switch] -- "Syslog Messages" --> E
    C[Firewall] -- "Syslog Messages" --> E
    D[Server] -- "Syslog Messages" --> E

    E[Syslog Collector] --> F[SIEM System]

    F --> G[Alert System]
    F --> H[Analysis Dashboard]
    F --> I[Long-term Storage]

    classDef devices fill:#d4f1f9,stroke:#05b2dc,stroke-width:2px
    classDef logSystem fill:#ffe6cc,stroke:#ff9900,stroke-width:2px
    classDef outputs fill:#e6ffe6,stroke:#00cc00,stroke-width:2px

    class A,B,C,D devices
    class E,F logSystem
    class G,H,I outputs""")

## Monitoring Solutions: Traffic, Performance, and Availability

Network monitoring solutions combine various techniques to provide comprehensive visibility into different aspects of network health and performance.

Network monitoring solutions focus on three essential types of monitoring, each providing different insights into your network's health. Think of these like different medical tests that check different aspects of a patient's health.

**Traffic Analysis** examines what data is flowing through your network. This is like monitoring the vehicles on a highway - you can see how many cars, trucks, and motorcycles are using each road.

Sample traffic analysis report:

| Application | Bandwidth Used | % of Total | Change from Last Week |
|-------------|---------------|------------|------------------------|
| Web Browsing | 350 Mbps | 35% | +5% |
| Video Streaming | 250 Mbps | 25% | +10% |
| File Transfers | 200 Mbps | 20% | -3% |
| Email | 100 Mbps | 10% | No change |
| Other | 100 Mbps | 10% | -2% |

**Performance Monitoring** measures how well the network is functioning. This helps identify problems before users complain about slow connections.

Common performance metrics and their interpretations:

| Metric | Good | Warning | Critical | What It Means |
|--------|------|---------|----------|---------------|
| Latency | <50ms | 50-100ms | >100ms | Time for data to travel (like postal delivery time) |
| Packet Loss | <0.1% | 0.1-1% | >1% | Percentage of data lost in transit (like missing mail) |
| Bandwidth Usage | <70% | 70-90% | >90% | How full your network "pipe" is |

**Availability Monitoring** tracks whether network devices and services are functioning. This is the most basic monitoring - is the device on and responding?

**Baseline metrics** are normal measurements for your specific network. Once you know what's normal, you can identify what's abnormal.

**Baseline metrics** are established for each of these monitoring areas to determine what's normal for your network. Once baselines are established, automated alerts can be configured to notify administrators when measurements deviate significantly from normal patterns.

In [None]:
# @title
%%html
<svg viewBox="0 0 800 500" xmlns="http://www.w3.org/2000/svg">
  <style>
    .title { font-family: Arial, sans-serif; font-size: 24px; font-weight: bold; text-anchor: middle; }
    .subtitle { font-family: Arial, sans-serif; font-size: 18px; font-weight: bold; text-anchor: middle; }
    .label { font-family: Arial, sans-serif; font-size: 14px; }
    .value { font-family: Arial, sans-serif; font-size: 16px; font-weight: bold; }
    .small-text { font-family: Arial, sans-serif; font-size: 10px; }
    .chart-label { font-family: Arial, sans-serif; font-size: 12px; text-anchor: middle; }
    .axis { stroke: #333; stroke-width: 1; }
    .grid { stroke: #ccc; stroke-width: 1; stroke-dasharray: 5, 5; }
    .data-good { stroke: #4caf50; stroke-width: 2; fill: none; }
    .data-warning { stroke: #ff9800; stroke-width: 2; fill: none; }
    .data-critical { stroke: #f44336; stroke-width: 2; fill: none; }
    .panel { fill: white; stroke: #bdbdbd; stroke-width: 1; }
    .good { fill: #c8e6c9; stroke: #4caf50; stroke-width: 1; }
    .warning { fill: #ffe0b2; stroke: #ff9800; stroke-width: 1; }
    .critical { fill: #ffcdd2; stroke: #f44336; stroke-width: 1; }
    .meter-bg { fill: #eeeeee; stroke: #bdbdbd; stroke-width: 1; }
    .meter-fill-good { fill: #4caf50; }
    .meter-fill-warning { fill: #ff9800; }
    .meter-fill-critical { fill: #f44336; }
  </style>

  <text x="400" y="30" class="title">Network Performance Dashboard</text>

  <!-- Latency Panel -->
  <rect x="50" y="60" width="220" height="200" rx="5" class="panel" />
  <text x="160" y="85" class="subtitle">Latency</text>

  <!-- Latency Chart -->
  <line x1="70" y1="180" x2="250" y2="180" class="axis" />
  <line x1="70" y1="180" x2="70" y2="100" class="axis" />

  <line x1="70" y1="160" x2="250" y2="160" class="grid" />
  <line x1="70" y1="140" x2="250" y2="140" class="grid" />
  <line x1="70" y1="120" x2="250" y2="120" class="grid" />
  <line x1="70" y1="100" x2="250" y2="100" class="grid" />

  <text x="65" y="160" class="small-text" text-anchor="end">25ms</text>
  <text x="65" y="140" class="small-text" text-anchor="end">50ms</text>
  <text x="65" y="120" class="small-text" text-anchor="end">75ms</text>
  <text x="65" y="100" class="small-text" text-anchor="end">100ms</text>

  <polyline points="70,130 90,125 110,135 130,120 150,125 170,115 190,145 210,125 230,120 250,110" class="data-good" />

  <text x="70" y="195" class="small-text">9AM</text>
  <text x="160" y="195" class="small-text">12PM</text>
  <text x="250" y="195" class="small-text">3PM</text>

  <!-- Latency Current Value -->
  <rect x="100" y="210" width="120" height="30" rx="15" class="good" />
  <text x="160" y="230" class="value" text-anchor="middle">27ms</text>

  <!-- Packet Loss Panel -->
  <rect x="290" y="60" width="220" height="200" rx="5" class="panel" />
  <text x="400" y="85" class="subtitle">Packet Loss</text>

  <!-- Packet Loss Meter -->
  <rect x="340" y="110" width="120" height="20" rx="10" class="meter-bg" />
  <rect x="340" y="110" width="6" height="20" rx="10" class="meter-fill-good" />

  <text x="400" y="150" class="value" text-anchor="middle">0.05%</text>

  <rect x="320" y="180" width="50" height="20" rx="5" class="good" />
  <text x="345" y="194" class="small-text" text-anchor="middle">< 0.1%</text>

  <rect x="380" y="180" width="50" height="20" rx="5" class="warning" />
  <text x="405" y="194" class="small-text" text-anchor="middle">0.1-1%</text>

  <rect x="440" y="180" width="50" height="20" rx="5" class="critical" />
  <text x="465" y="194" class="small-text" text-anchor="middle">> 1%</text>

  <!-- Bandwidth Panel -->
  <rect x="530" y="60" width="220" height="200" rx="5" class="panel" />
  <text x="640" y="85" class="subtitle">Bandwidth Usage</text>

  <!-- Bandwidth Chart -->
  <line x1="550" y1="180" x2="730" y2="180" class="axis" />
  <line x1="550" y1="180" x2="550" y2="100" class="axis" />

  <line x1="550" y1="160" x2="730" y2="160" class="grid" />
  <line x1="550" y1="140" x2="730" y2="140" class="grid" />
  <line x1="550" y1="120" x2="730" y2="120" class="grid" />
  <line x1="550" y1="100" x2="730" y2="100" class="grid" />

  <text x="545" y="160" class="small-text" text-anchor="end">25%</text>
  <text x="545" y="140" class="small-text" text-anchor="end">50%</text>
  <text x="545" y="120" class="small-text" text-anchor="end">75%</text>
  <text x="545" y="100" class="small-text" text-anchor="end">100%</text>

  <polyline points="550,160 570,150 590,155 610,140 630,130 650,145 670,125 690,115 710,90 730,105" class="data-warning" />

  <text x="550" y="195" class="small-text">9AM</text>
  <text x="640" y="195" class="small-text">12PM</text>
  <text x="730" y="195" class="small-text">3PM</text>

  <!-- Bandwidth Current Value -->
  <rect x="580" y="210" width="120" height="30" rx="15" class="warning" />
  <text x="640" y="230" class="value" text-anchor="middle">78%</text>

  <!-- Device Status Panel -->
  <rect x="50" y="280" width="700" height="200" rx="5" class="panel" />
  <text x="400" y="305" class="subtitle">Device Status</text>

  <!-- Device Status Table -->
  <rect x="70" y="320" width="660" height="30" fill="#f5f5f5" stroke="#bdbdbd" />
  <text x="100" y="340" class="label">Device</text>
  <text x="280" y="340" class="label">IP Address</text>
  <text x="430" y="340" class="label">Response Time</text>
  <text x="560" y="340" class="label">Packet Loss</text>
  <text x="680" y="340" class="label">Status</text>

  <!-- Router Row -->
  <rect x="70" y="350" width="660" height="30" fill="white" stroke="#bdbdbd" />
  <text x="100" y="370" class="label">Main Router</text>
  <text x="280" y="370" class="label">192.168.1.1</text>
  <text x="430" y="370" class="label">5ms</text>
  <text x="560" y="370" class="label">0%</text>
  <rect x="660" y="355" width="40" height="20" rx="10" class="good" />
  <text x="680" y="369" class="small-text" text-anchor="middle">UP</text>

  <!-- Switch Row -->
  <rect x="70" y="380" width="660" height="30" fill="white" stroke="#bdbdbd" />
  <text x="100" y="400" class="label">Core Switch</text>
  <text x="280" y="400" class="label">192.168.1.2</text>
  <text x="430" y="400" class="label">3ms</text>
  <text x="560" y="400" class="label">0%</text>
  <rect x="660" y="385" width="40" height="20" rx="10" class="good" />
  <text x="680" y="399" class="small-text" text-anchor="middle">UP</text>

  <!-- Server Row -->
  <rect x="70" y="410" width="660" height="30" fill="white" stroke="#bdbdbd" />
  <text x="100" y="430" class="label">Web Server</text>
  <text x="280" y="430" class="label">192.168.1.10</text>
  <text x="430" y="430" class="label">12ms</text>
  <text x="560" y="430" class="label">0%</text>
  <rect x="660" y="415" width="40" height="20" rx="10" class="good" />
  <text x="680" y="429" class="small-text" text-anchor="middle">UP</text>

  <!-- Firewall Row -->
  <rect x="70" y="440" width="660" height="30" fill="white" stroke="#bdbdbd" />
  <text x="100" y="460" class="label">Firewall</text>
  <text x="280" y="460" class="label">192.168.1.254</text>
  <text x="430" y="460" class="label">35ms</text>
  <text x="560" y="460" class="label">0.2%</text>
  <rect x="660" y="445" width="40" height="20" rx="10" class="warning" />
  <text x="680" y="459" class="small-text" text-anchor="middle">WARN</text>

## Configuration Monitoring and API Integration

Modern networks require not only monitoring of traffic and performance but also tracking changes to device configurations and integrating with other IT systems.

**Configuration Monitoring** tracks and manages changes to network device settings. Instead of manually checking each device's settings, configuration monitoring tools automatically detect and report any changes.

Configuration monitoring features:
* Backup of router and switch configurations
* Change detection and alerting
* Compliance checking
* Historical configuration tracking

**Application Programming Interface (API)** is a way for different software programs to communicate with each other. Think of an API as a waiter in a restaurant - the waiter takes your order (request), brings it to the kitchen (system), and returns with your food (response).

In networking, APIs allow monitoring tools to:
* Request information from network devices
* Send commands to devices
* Share data with other IT systems
* Automate repetitive tasks

Example of a simple API request and response:

| API Request | What It Means |
|-------------|---------------|
| GET https://networkmonitor.com/api/devices/router1/status | "Tell me the status of router1" |
| Response: {"status": "online", "uptime": "15 days, 7 hours", "cpu": "12%"} | "Router1 is online with 15 days uptime and 12% CPU usage" |

This simple example shows how a monitoring dashboard might request information about a router using an API call. The router's management system responds with the requested data in a format that's easy for computers to process.

Example API request to check device status:
```
GET https://api.monitoring-system.com/devices/router1/status
Response: {"device": "router1", "status": "online", "uptime": "15d:7h:22m"}
```

By monitoring configurations and leveraging APIs, organizations can maintain better control over network changes and integrate network monitoring into broader IT management processes. This approach helps prevent misconfigurations, which are a common cause of network outages and security incidents.

In [None]:
# @title
%%html
<svg viewBox="0 0 800 500" xmlns="http://www.w3.org/2000/svg">
  <style>
    .title { font-family: Arial, sans-serif; font-size: 24px; font-weight: bold; text-anchor: middle; }
    .layer-title { font-family: Arial, sans-serif; font-size: 18px; font-weight: bold; fill: white; text-anchor: middle; }
    .item-text { font-family: Arial, sans-serif; font-size: 14px; text-anchor: middle; }
    .arrow { stroke: #333; stroke-width: 2; fill: none; marker-end: url(#arrowhead); }
  </style>

  <defs>
    <marker id="arrowhead" markerWidth="10" markerHeight="7" refX="9" refY="3.5" orient="auto">
      <polygon points="0 0, 10 3.5, 0 7" fill="#333" />
    </marker>
  </defs>

  <text x="400" y="40" class="title">Layers of Network Monitoring</text>

  <!-- Layer 1: Discovery -->
  <rect x="100" y="80" width="600" height="80" rx="10" fill="#4285f4" />
  <text x="400" y="120" class="layer-title">DISCOVERY LAYER</text>
  <rect x="150" y="140" width="120" height="30" rx="5" fill="white" stroke="#333" />
  <text x="210" y="160" class="item-text">Ad Hoc Discovery</text>
  <rect x="340" y="140" width="120" height="30" rx="5" fill="white" stroke="#333" />
  <text x="400" y="160" class="item-text">Scheduled Discovery</text>
  <rect x="530" y="140" width="120" height="30" rx="5" fill="white" stroke="#333" />
  <text x="590" y="160" class="item-text">Network Mapping</text>

  <!-- Layer 2: Data Collection -->
  <rect x="100" y="180" width="600" height="80" rx="10" fill="#ea4335" />
  <text x="400" y="220" class="layer-title">DATA COLLECTION LAYER</text>
  <rect x="120" y="240" width="80" height="30" rx="5" fill="white" stroke="#333" />
  <text x="160" y="260" class="item-text">SNMP</text>
  <rect x="230" y="240" width="80" height="30" rx="5" fill="white" stroke="#333" />
  <text x="270" y="260" class="item-text">Traps</text>
  <rect x="340" y="240" width="80" height="30" rx="5" fill="white" stroke="#333" />
  <text x="380" y="260" class="item-text">Flow Data</text>
  <rect x="450" y="240" width="110" height="30" rx="5" fill="white" stroke="#333" />
  <text x="505" y="260" class="item-text">Packet Capture</text>
  <rect x="590" y="240" width="90" height="30" rx="5" fill="white" stroke="#333" />
  <text x="635" y="260" class="item-text">Syslog</text>

  <!-- Layer 3: Analysis -->
  <rect x="100" y="280" width="600" height="80" rx="10" fill="#fbbc05" />
  <text x="400" y="320" class="layer-title">ANALYSIS LAYER</text>
  <rect x="130" y="340" width="120" height="30" rx="5" fill="white" stroke="#333" />
  <text x="190" y="360" class="item-text">Traffic Analysis</text>
  <rect x="270" y="340" width="120" height="30" rx="5" fill="white" stroke="#333" />
  <text x="330" y="360" class="item-text">Performance</text>
  <rect x="410" y="340" width="120" height="30" rx="5" fill="white" stroke="#333" />
  <text x="470" y="360" class="item-text">Availability</text>
  <rect x="550" y="340" width="120" height="30" rx="5" fill="white" stroke="#333" />
  <text x="610" y="360" class="item-text">Configuration</text>

  <!-- Layer 4: Reporting -->
  <rect x="100" y="380" width="600" height="80" rx="10" fill="#34a853" />
  <text x="400" y="420" class="layer-title">REPORTING LAYER</text>
  <rect x="150" y="440" width="110" height="30" rx="5" fill="white" stroke="#333" />
  <text x="205" y="460" class="item-text">Dashboards</text>
  <rect x="290" y="440" width="110" height="30" rx="5" fill="white" stroke="#333" />
  <text x="345" y="460" class="item-text">Alerts</text>
  <rect x="430" y="440" width="110" height="30" rx="5" fill="white" stroke="#333" />
  <text x="485" y="460" class="item-text">Reports</text>
  <rect x="570" y="440" width="110" height="30" rx="5" fill="white" stroke="#333" />
  <text x="625" y="460" class="item-text">API Integration</text>

  <!-- Arrows between layers -->
  <path d="M 400 160 L 400 180" class="arrow" />
  <path d="M 400 260 L 400 280" class="arrow" />
  <path d="M 400 360 L 400 380" class="arrow" />
</svg>

## Conclusion: Building Effective Network Monitoring Systems

Effective network monitoring combines multiple technologies and approaches, much like how home security uses door sensors, motion detectors, and cameras together for complete protection.

Key takeaways from this chapter:

* Start with thorough network discovery to identify all assets
* Implement SNMP for standardized device monitoring
* Use packet capture and flow data for detailed traffic analysis
* Set up port mirroring to access network traffic for analysis
* Centralize logs with syslog and SIEM solutions
* Monitor traffic, performance, and availability metrics
* Track configuration changes to prevent problems
* Leverage APIs to integrate with other IT systems

Real-world benefits of comprehensive network monitoring:

| Business Need | Monitoring Solution | Result |
|---------------|---------------------|--------|
| Reduce downtime | Availability monitoring | 99.9% network uptime achieved |
| Improve security | Log aggregation with SIEM | Security incidents detected 75% faster |
| Optimize performance | Traffic analysis | User complaints reduced by 60% |
| Meet compliance requirements | Configuration monitoring | Passed security audit with no findings |
| Better troubleshooting | Packet capture | Average resolution time decreased by 45% |

Remember that network monitoring is not a "set it and forget it" activity but an ongoing process that requires regular attention and refinement as your network evolves and grows.

## Review With Quizlet

In [7]:
%%html
<iframe src="https://quizlet.com/1042947020/learn/embed?i=psvlh&x=1jj1" height="700" width="100%" style="border:0"></iframe>

## Network Monitoring Glossary

| Term | Definition |
|------|------------|
| SNMP | Simple Network Management Protocol - a standard used for collecting information from and managing network devices. |
| MIB | Management Information Base - a hierarchical database that defines what information is available from a network device through SNMP. |
| OID | Object Identifier - a unique address for a specific piece of information within a MIB, such as system uptime or interface status. |
| Community String | A text string that acts like a password in SNMP v2c, allowing access to a device's SNMP information. |
| SNMP Trap | A notification sent from a network device to a management system when specific events occur, such as interface status changes. |
| Packet Capture | The process of recording and analyzing the complete contents of data packets transmitted across a network. |
| Flow Data | Summary information about network conversations, showing source/destination addresses, protocols, and data volumes without capturing content. |
| Port Mirroring | A technique that copies network traffic from one or more switch ports to a monitoring port for analysis. |
| Baseline Metrics | Normal operational measurements for a specific network, used to identify abnormal behavior or performance issues. |
| Anomaly Detection | The identification of unusual patterns that do not conform to expected network behavior. |
| Syslog | A standard protocol used by network devices to send event messages to a logging server. |
| SIEM | Security Information and Event Management - a system that provides real-time analysis of security alerts generated by network hardware and applications. |
| Traffic Analysis | The process of examining data flow through a network to identify patterns, bottlenecks, and security issues. |
| Latency | The time delay between the transmission and receipt of data, typically measured in milliseconds (ms). |
| Packet Loss | The failure of transmitted packets to reach their destination, often expressed as a percentage of total packets sent. |
| Jitter | Variation in packet delay, which can affect real-time applications like VoIP and video conferencing. |
| Ad Hoc Discovery | Network device identification performed on demand, typically for troubleshooting or after network changes. |
| Scheduled Discovery | Automated, regular scanning of a network to identify devices and track changes over time. |
| Availability Monitoring | Tracking whether network devices and services are operational and responding to requests. |
| Performance Monitoring | Measuring metrics like bandwidth utilization, response time, and throughput to assess network quality. |
| Configuration Monitoring | Tracking changes to network device settings to detect unauthorized modifications and ensure compliance. |
| API | Application Programming Interface - allows different software systems to communicate with each other programmatically. |
| Bandwidth Utilization | The percentage of a network link's capacity that is being used for data transmission. |
| Alert Threshold | A predefined value that, when exceeded, triggers a notification to network administrators. |
| Protocol Analyzer | A tool that decodes network protocols to help troubleshoot and optimize network communications. |
| Uptime | The period during which a device or service is operational and available, often expressed as a percentage. |
| QoS | Quality of Service - mechanisms to control resources and guarantee performance levels for specific network traffic types. |
| SPAN | Switched Port Analyzer - Cisco's term for port mirroring, creating a copy of network traffic for analysis. |
| TAP | Test Access Point - a hardware device that provides access to network traffic without disrupting the flow of data. |
| RMON | Remote Monitoring - a standard specification that provides advanced network monitoring capabilities using SNMP. |