<a href="https://colab.research.google.com/github/brendanpshea/intro_to_networks/blob/main/Networks_08_NetworkMonitoring.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Network Monitoring
**Brendan  Shea, PhD**

In today's interconnected world, networks are the backbone of modern business operations. From small offices to global enterprises, organizations rely on their networks to support critical applications, services, and communications. However, a network is only as good as its reliability and performance. This is where network monitoring comes into play.

**Network monitoring** is the systematic process of observing, analyzing, and maintaining computer networks to ensure optimal performance, reliability, and security. Think of it as a health monitoring system for your network – constantly checking vital signs and alerting you when something isn't quite right.

## Why Monitor Networks?

Network administrators face numerous challenges in maintaining healthy networks. Without proper monitoring, issues can go undetected until they cause significant problems. **Network outages** (periods when network services are unavailable) can result in lost productivity, decreased customer satisfaction, and financial losses. A study by Gartner estimates that network downtime can cost organizations an average of $5,600 per minute.

Effective network monitoring helps organizations:
- Detect and resolve issues before they impact users
- Optimize network performance and resource utilization
- Maintain security by identifying suspicious activities
- Plan for future growth by understanding usage patterns
- Ensure compliance with service level agreements (SLAs)

## Core Components of Network Monitoring

To effectively monitor a network, several key components work together:

**Monitoring tools** are software applications or hardware devices that collect and analyze network data. These range from simple ping tests to sophisticated enterprise monitoring platforms.

**Network devices** such as routers, switches, and servers generate valuable data about their operation and performance. This data is collected through various protocols and mechanisms, which we'll explore throughout this chapter.

**Metrics** are specific measurements that indicate the health and performance of network components. Common metrics include:
- **Bandwidth utilization**: The amount of network capacity being used
- **Latency**: The time it takes for data to travel from source to destination
- **Packet loss**: The percentage of data packets that fail to reach their destination
- **Error rates**: The frequency of transmission errors on network interfaces

## Evolution of Network Monitoring

Network monitoring has evolved significantly from its early days. Traditional monitoring relied heavily on simple tools like ping and traceroute. Modern monitoring solutions now incorporate:

- **Automation**: Automatic discovery and monitoring of network devices
- **Machine learning**: Intelligent analysis of network patterns to predict issues
- **Cloud integration**: Monitoring of both on-premises and cloud resources
- **Real-time analytics**: Immediate insights into network behavior
- **Visualization**: Clear, graphical representations of network status

## The Importance of Proactive Monitoring

One of the most critical concepts in network monitoring is the shift from reactive to proactive monitoring. **Reactive monitoring** involves responding to problems after they occur, while **proactive monitoring** focuses on identifying and addressing potential issues before they impact users.

Consider a scenario where a network link is gradually becoming saturated. A reactive approach would only address this when users complain about slow performance. In contrast, proactive monitoring would identify the trend early, allowing network administrators to add capacity or redistribute traffic before users experience problems.

## Network Monitoring Skills

As we progress through this chapter, you'll learn essential skills for network monitoring, including:
- Understanding monitoring protocols and standards
- Configuring monitoring tools and alerts
- Analyzing network metrics and trends
- Troubleshooting network issues
- Documenting network behavior and changes

With these foundational concepts in mind, let's explore how these principles are applied in real-world scenarios through our case study at Wonka's Chocolate Factory, where maintaining a reliable network is as crucial as maintaining the perfect chocolate temperature.

# Network Monitoring at Wonka's: A Sweet Success Story

Welcome to Wonka's Chocolate Factory, where the magic of chocolate-making meets the challenges of modern network management. As the world's most innovative candy manufacturer, Wonka's relies heavily on its network infrastructure to maintain its competitive edge and ensure the safety of its unique manufacturing processes.

## The Factory Environment

Wonka's main production facility spans 500,000 square feet and operates 24/7, producing countless chocolate bars, everlasting gobstoppers, and other confectionery delights. The facility's network supports everything from automated manufacturing systems to temperature control sensors, inventory management, and office operations.

The network infrastructure at Wonka's has grown organically over the years, much like the whimsical additions to the factory itself. This growth has created unique challenges for the network team, led by Chief Network Engineer Charlie Bucket and his team of Oompa Loompa technicians.

## Critical Network Requirements

The factory's network must support several mission-critical operations:

* Manufacturing Systems
  * Automated production line controls
  * Real-time temperature and humidity monitoring
  * Quality control sensors
  * Inventory tracking systems
  * Recipe database access

* Business Operations
  * Employee workstations
  * Security systems
  * Environmental controls
  * Vendor management systems
  * Customer order processing

## Network Infrastructure Overview

The current network infrastructure includes:

| Component Type | Quantity | Primary Function |
|---------------|----------|------------------|
| Core Switches | 4 | High-speed backbone connectivity |
| Access Switches | 50 | End-device connectivity |
| Wireless APs | 100 | Mobile device support |
| IoT Sensors | 1000+ | Manufacturing process monitoring |
| Firewalls | 2 | Security and access control |

## The Monitoring Challenge

When Wonka's experienced a network outage that caused a batch of chocolate to overheat, resulting in a loss of $500,000 worth of product, management realized they needed a more robust network monitoring solution. The incident highlighted several critical issues:

The factory's network had grown more complex than their existing monitoring tools could effectively handle. Security requirements had evolved, particularly around protecting secret recipes and manufacturing processes. The increasing number of IoT devices and sensors required more sophisticated monitoring approaches.

## Project Goals

Charlie Bucket's team was tasked with implementing a comprehensive network monitoring solution that would address these challenges. Throughout this chapter, we'll follow their journey as they implement various monitoring technologies and best practices. Their experience will serve as a practical example of how organizations can:

1. Design and implement effective monitoring strategies
2. Select and configure appropriate monitoring tools
3. Establish baselines and thresholds
4. Develop response procedures for various scenarios
5. Balance security requirements with operational needs

As we explore each monitoring technology and concept, we'll see how Wonka's implemented it in their environment and the specific challenges they faced. Their successes (and occasional failures) will provide valuable insights into real-world network monitoring practices.

By studying Wonka's approach to network monitoring, you'll learn how to apply these concepts in your own environment, regardless of whether you're monitoring chocolate production or other critical business operations. Remember, as Willy Wonka himself often says, "A little nonsense now and then is relished by the wisest men" – but not when it comes to network monitoring!

# Simple Network Management Protocol (SNMP) and Traps

**Simple Network Management Protocol (SNMP)** is a standardized protocol designed to collect and organize information about managed devices on IP networks. Think of SNMP as a universal language that allows network devices to share information about their status, performance, and configuration with monitoring systems.

## Basic SNMP Operation

In its most basic form, SNMP operates using a manager-agent model. The **SNMP manager** is typically a centralized monitoring system that collects and processes information. **SNMP agents** are software components that run on network devices (like routers, switches, or servers) and provide information to the manager.

Communication between managers and agents happens in two primary ways:
1. Polling: The manager regularly asks agents for specific information
2. Traps: Agents proactively send alerts to the manager when something significant occurs

## Understanding SNMP Traps

**SNMP traps** are unsolicited messages sent from SNMP agents to managers to notify them about significant events. Think of traps as urgent messages that say "Hey, something important just happened!" Unlike regular SNMP polling where the manager asks for information, traps are initiated by the agents themselves.

Common scenarios that might trigger SNMP traps include:
- A network interface going down
- Temperature exceeding a threshold
- CPU utilization reaching critical levels
- Authentication failures
- Power supply failures

### How Traps Work

When a monitored condition occurs, the following sequence takes place:

1. The agent detects a predefined condition or event
2. The agent generates a trap message containing:
   - The type of event
   - The time it occurred
   - Other relevant details about the event
3. The agent sends the trap to one or more configured trap receivers (managers)
4. The manager processes the trap and takes appropriate action (like generating alerts)

## Case Study Example: Wonka's SNMP Implementation

At Wonka's Chocolate Factory, Charlie's team configured SNMP traps on their chocolate tempering machines. If the temperature varies by more than 2 degrees from the optimal range, an SNMP trap is generated. This allows the monitoring system to immediately alert technicians before the chocolate quality is affected.

**Basic SNMP trap configuration on a Wonka tempering machine:**
```
snmp-server enable traps
snmp-server host 192.168.1.10 version 2c PUBLIC
snmp-server trap-source GigabitEthernet0/1
temperature-monitor threshold 31.5 trap
```

### Understanding the Configuration Commands

- `snmp-server enable traps`: Activates SNMP trap functionality on the device
- `snmp-server host 192.168.1.10 version 2c PUBLIC`: Specifies the trap receiver's IP address (192.168.1.10), SNMP version (2c), and community string (PUBLIC)
- `snmp-server trap-source GigabitEthernet0/1`: Defines which interface will be used as the source for sending traps
- `temperature-monitor threshold 31.5 trap`: Sets the temperature threshold to 31.5°C and enables trap generation when exceeded

## Sample SNMP Trap Messages

| Timestamp | Severity | Source IP | Trap Type | Message |
|-----------|----------|-----------|------------|---------|
| 2025-01-17 08:23:15 | Critical | 192.168.1.100 | Temperature | Tempering machine #3 temperature exceeds threshold: Current: 33.5°C, Threshold: 31.5°C |
| 2025-01-17 08:45:22 | Warning | 192.168.1.101 | Link Down | Interface GigabitEthernet0/1 changed state to down |
| 2025-01-17 09:12:03 | Info | 192.168.1.102 | Authentication | New SNMP manager connection from 192.168.1.50 |
| 2025-01-17 09:30:45 | Critical | 192.168.1.103 | Power Supply | Power supply unit 2 failure detected |
| 2025-01-17 10:15:33 | Warning | 192.168.1.104 | CPU Utilization | CPU usage exceeded 85% threshold: Current: 92% |

## The Importance of Trap Management

While SNMP traps are valuable for real-time monitoring, they need to be carefully managed to be effective:

- **Trap Filtering**: Not all traps are equally important. Organizations need to determine which traps require immediate attention and which can be logged for later analysis.
- **Trap Storm Prevention**: During major events, devices might generate many traps simultaneously. Monitoring systems need mechanisms to handle these "trap storms" without becoming overwhelming.
- **Trap Validation**: Some traps might be false positives. Monitoring systems often validate traps against other data sources before triggering alerts.

In our next section, we'll explore the Management Information Base (MIB), which defines what information can be monitored and what traps can be generated by SNMP-enabled devices.

# Management Information Base (MIB)

When we first set up network monitoring, one of the most confusing aspects can be understanding where monitored information comes from and how it's organized. The **Management Information Base (MIB)** provides this organization. Think of a MIB as a catalog or dictionary that defines what information we can monitor on our network devices. Just as a library catalog tells you which books are available and where to find them, a MIB tells monitoring systems what information is available from network devices and how to access it.

## Understanding MIB Organization

MIBs are organized in a tree structure, similar to how you might organize files on your computer. At the top is a root directory, and below it are branches for different types of information. Each piece of information in this tree has a unique address called an **Object Identifier (OID)**.

Let's use a real-world example. Imagine you're looking for a book in a library. You might follow a path like: Floor 3 → Technology Section → Networking → TCP/IP. Similarly, when looking for information in a MIB, you follow a path through the tree. For instance, if you want to know how long a device has been running, you would look for the information at OID 1.3.6.1.2.1.1.3 (sysUpTime).

## Types of MIBs

Network administrators work with two main categories of MIBs:

Standard MIBs | Enterprise MIBs
--------------|----------------
Defined by internet standards | Created by equipment vendors
Used by all network devices | Specific to particular devices
Contains common information (like interface status) | Contains vendor-specific information
Examples: system uptime, interface statistics | Examples: Cisco power supply status

## Working with MIBs in Practice

Let's return to our Wonka's Chocolate Factory case study to see how MIBs work in the real world. The tempering machines at Wonka's need constant monitoring to maintain perfect chocolate consistency. The machines' vendor provided a custom MIB that defines all the information available for monitoring.

When Charlie's team wants to monitor the temperature of Tempering Machine #3, they use the MIB to understand exactly how to request this information. The MIB tells them that temperature readings are stored at OID 1.3.6.1.4.1.9999.1.1.1. More importantly, it tells them that this value is measured in degrees Celsius and should fall between 30°C and 32°C.

But the MIB provides more than just the location of information. It also tells Charlie's team:
- What type of value to expect (in this case, a temperature reading)
- What range of values is normal
- What the values mean (31.5 means 31.5 degrees Celsius)
- Whether the values can be changed (read-only or read-write)

Network administrators use special software called MIB browsers to work with this information. A MIB browser translates the cryptic OID numbers into human-readable names and helps navigate the tree structure. Instead of memorizing that 1.3.6.1.4.1.9999.1.1.1 means "temperature," administrators can simply look for "temperingMachineTemp" in their MIB browser.

## MIB Management in Practice

Managing MIBs is an important part of network monitoring. At Wonka's, Charlie maintains a central repository of all MIBs used in the factory. When new equipment is installed, its MIBs are added to this repository. This ensures that their monitoring systems always know how to interpret the information they receive from every device.

Some information might appear in multiple MIBs. For example, both the standard MIB and the tempering machine's MIB might report temperature values. Charlie's team documents which MIB they use for each type of monitoring to ensure consistency in their measurements and alerts.

Understanding MIBs is crucial for effective network monitoring, but don't worry if it seems complex at first. Start with the basics - understand that MIBs define what you can monitor, and gradually explore more detailed aspects as you become comfortable with network monitoring concepts. In our next section, we'll look at different versions of SNMP and how they affect network monitoring security.

In [1]:
# @title
%%html
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Wonka's Simple MIB Browser</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            max-width: 800px;
            margin: 20px auto;
            padding: 20px;
        }
        .mib-tree {
            border: 1px solid #ccc;
            padding: 20px;
            margin-bottom: 20px;
        }
        .mib-details {
            border: 1px solid #ccc;
            padding: 20px;
            background-color: #f9f9f9;
        }
        .folder {
            cursor: pointer;
            color: #0066cc;
        }
        .folder:hover {
            text-decoration: underline;
        }
        .oid {
            color: #666;
            font-size: 0.9em;
        }
        button {
            margin: 5px;
            padding: 5px 10px;
        }
        .value {
            font-weight: bold;
            color: #009900;
        }
    </style>
</head>
<body>
    <h1>Wonka's Simple MIB Browser</h1>
    <div>Factory Location: East Wing, Floor 2</div>
    <hr>

    <div class="mib-tree">
        <h3>MIB Tree</h3>
        <div id="mibTree"></div>
    </div>

    <div class="mib-details">
        <h3>Selected OID Details</h3>
        <div id="details"></div>
    </div>

    <script>
        const mibData = {
            'enterprises.wonka.tempering': {
                oid: '1.3.6.1.4.1.9999.1',
                children: {
                    'Machine1': {
                        oid: '1.3.6.1.4.1.9999.1.1',
                        children: {
                            'temperature': {
                                oid: '1.3.6.1.4.1.9999.1.1.1',
                                type: 'Integer32',
                                access: 'read-only',
                                value: '31.5',
                                description: 'Current temperature of tempering machine 1 (Celsius)',
                                range: '30.0 to 32.0'
                            },
                            'status': {
                                oid: '1.3.6.1.4.1.9999.1.1.2',
                                type: 'String',
                                access: 'read-only',
                                value: 'Running',
                                description: 'Operational status of tempering machine 1',
                                range: 'Running, Stopped, Maintenance'
                            }
                        }
                    },
                    'Machine2': {
                        oid: '1.3.6.1.4.1.9999.1.2',
                        children: {
                            'temperature': {
                                oid: '1.3.6.1.4.1.9999.1.2.1',
                                type: 'Integer32',
                                access: 'read-only',
                                value: '30.8',
                                description: 'Current temperature of tempering machine 2 (Celsius)',
                                range: '30.0 to 32.0'
                            },
                            'status': {
                                oid: '1.3.6.1.4.1.9999.1.2.2',
                                type: 'String',
                                access: 'read-only',
                                value: 'Maintenance',
                                description: 'Operational status of tempering machine 2',
                                range: 'Running, Stopped, Maintenance'
                            }
                        }
                    }
                }
            }
        };

        function displayMibTree(data, parent = 'mibTree', indent = 0) {
            const container = document.getElementById(parent);
            if (indent === 0) container.innerHTML = '';

            for (let key in data) {
                const div = document.createElement('div');
                div.style.marginLeft = `${indent * 20}px`;

                if (data[key].children) {
                    // This is a folder
                    div.innerHTML = `<span class="folder">📁 ${key}</span> <span class="oid">${data[key].oid || ''}</span>`;
                    div.querySelector('.folder').onclick = () => showDetails(key, data[key]);
                    container.appendChild(div);
                    displayMibTree(data[key].children, parent, indent + 1);
                } else {
                    // This is a leaf node
                    div.innerHTML = `<span class="folder">📄 ${key}</span> <span class="oid">${data[key].oid}</span>`;
                    div.querySelector('.folder').onclick = () => showDetails(key, data[key]);
                    container.appendChild(div);
                }
            }
        }

        function showDetails(name, node) {
            const details = document.getElementById('details');
            if (!node.type) {
                details.innerHTML = `<h4>${name}</h4>
                    <p>OID: ${node.oid}</p>
                    <p>Type: Branch node (contains sub-elements)</p>`;
                return;
            }

            details.innerHTML = `
                <h4>${name}</h4>
                <p>OID: ${node.oid}</p>
                <p>Type: ${node.type}</p>
                <p>Access: ${node.access}</p>
                <p>Current Value: <span class="value">${node.value}</span></p>
                <p>Description: ${node.description}</p>
                <p>Valid Range: ${node.range}</p>
                <button onclick="refreshValue('${node.oid}')">Refresh Value</button>
            `;
        }

        function refreshValue(oid) {
            // Simulate getting a new value
            const randomValues = {
                '1.3.6.1.4.1.9999.1.1.1': (30 + Math.random() * 2).toFixed(1),
                '1.3.6.1.4.1.9999.1.2.1': (30 + Math.random() * 2).toFixed(1),
                '1.3.6.1.4.1.9999.1.1.2': ['Running', 'Stopped', 'Maintenance'][Math.floor(Math.random() * 3)],
                '1.3.6.1.4.1.9999.1.2.2': ['Running', 'Stopped', 'Maintenance'][Math.floor(Math.random() * 3)]
            };

            const valueSpan = document.querySelector('.value');
            if (valueSpan) {
                valueSpan.textContent = randomValues[oid] || 'Error reading value';
            }
        }

        // Initial display
        displayMibTree(mibData);
    </script>
</body>
</html>

# SNMP Versions: Security and Evolution

Network protocols evolve over time, usually to address security concerns or add new features. **SNMP version 2c (v2c)** and **SNMP version 3 (v3)** represent different approaches to balancing security with ease of use. Understanding these versions is crucial for protecting your network monitoring infrastructure.

## SNMP Version 2c Overview

SNMPv2c is widely used due to its simplicity, but it has significant security limitations. The 'c' in v2c stands for "community-based" security, which is a basic password-like authentication system. Think of community strings like a shared password for a group - anyone who knows the password can access the information.

A typical SNMPv2c configuration might look like this:
```
snmp-server community READ-ONLY-ACCESS ro
snmp-server community FULL-ACCESS rw
```

In this example:
- `READ-ONLY-ACCESS` is the community string for reading data
- `FULL-ACCESS` is the community string for reading and writing data
- `ro` means read-only access
- `rw` means read-write access

## Security Concerns with SNMPv2c

At Wonka's, Charlie's team initially used SNMPv2c because it was easy to set up. However, they quickly discovered its limitations:

1. Community strings are sent in plain text, making them vulnerable to network sniffing
2. There's no way to verify who sent a message
3. No protection against message tampering
4. No encryption of monitoring data

After a competitor attempted to intercept temperature data from their tempering machines, Charlie's team realized they needed stronger security.

## SNMP Version 3 Features

SNMPv3 addresses these security concerns by adding several critical security features:

**Authentication**: Verifies that messages come from valid sources using secure algorithms like SHA
**Privacy**: Encrypts SNMP messages to prevent eavesdropping
**Access Control**: Provides fine-grained control over who can access what information

A typical SNMPv3 configuration looks more complex:
```
snmp-server group WONKA-ADMINS v3 priv
snmp-server user Charlie WONKA-ADMINS v3 auth sha AuthPass1234 priv aes PrivPass5678
```

## Choosing Between Versions

To help network administrators choose the appropriate SNMP version, here's a comparison:

| Feature | SNMPv2c | SNMPv3 |
|---------|---------|---------|
| Security Level | Basic | Advanced |
| Configuration Complexity | Simple | Complex |
| CPU Usage | Low | Higher |
| Message Size | Smaller | Larger |
| Authentication | Community String | Username/Password |
| Encryption | None | Available |

## Implementing SNMPv3 at Wonka's

When Charlie's team migrated to SNMPv3, they developed a clear transition plan:

First, they created an inventory of all SNMP-managed devices, from tempering machines to network switches. They then configured SNMPv3 on a test group of devices, maintaining v2c access temporarily. After confirming everything worked properly, they gradually rolled out SNMPv3 across the factory.

The team found that while SNMPv3 required more initial setup time, the improved security was worth the effort. They could now ensure that only authorized personnel could access sensitive manufacturing data, and all monitoring traffic was encrypted to protect their secret recipes.

In our next section, we'll explore how community strings work in more detail, even though they're primarily used in v2c, as understanding them helps build a foundation for more advanced SNMP security concepts.

# SNMP Community Strings: Basic Access Control

When first learning about SNMP security, many network administrators start with community strings. A **community string** acts like a password that allows access to a device's SNMP data. While modern networks should use SNMPv3 for security, understanding community strings helps build a foundation for SNMP security concepts.

## Understanding Community Strings

Think of a community string like the password to a shared file server. Everyone who needs access uses the same password, and the password determines what they can do - either read files, or read and write files. Similarly, SNMP community strings come in two main types:

- **Read-only (ro)**: Allows monitoring tools to read device information but not make changes
- **Read-write (rw)**: Allows both reading information and making configuration changes

## Real-World Example: Wonka's Initial SNMP Setup

When Charlie first set up SNMP monitoring at Wonka's, he configured basic community strings on a tempering machine:

```
# Basic community string configuration
snmp-server community ChocolateReader ro
snmp-server community ChocolateManager rw
snmp-server location "Tempering Room 3"
snmp-server contact "Charlie Bucket"
```

Let's break down what each line does:
- Line 1 creates a read-only community named "ChocolateReader"
- Line 2 creates a read-write community named "ChocolateManager"
- Lines 3-4 set basic device information

## Common Community String Problems

This configuration had several problems that led Wonka's to upgrade to SNMPv3:

| Problem | Risk | Solution |
|---------|------|----------|
| Plain-text transmission | Anyone could capture community strings with a network sniffer | Use SNMPv3 encryption |
| Shared passwords | No way to track who made changes | Use SNMPv3 authentication |
| Simple passwords | Easy to guess common community strings | Use complex SNMPv3 credentials |

## Learning from Wonka's Mistakes

One day, Charlie discovered that a tempering machine's temperature had been changed. Because they were using community strings, they couldn't determine:
- Who made the change
- When exactly it happened
- Whether it was an authorized change

This incident highlighted why shared passwords (community strings) weren't secure enough for their industrial environment. Imagine if multiple people had a single key to your house - if something went missing, you wouldn't know who to ask about it.

## Basic Security Measures for Community Strings

If you must use community strings (for example, with legacy devices), follow these essential practices:

1. Never use default community strings like "public" or "private"
2. Use long, complex strings that include:
   - Upper and lowercase letters
   - Numbers
   - Special characters
3. Change community strings regularly
4. Use different strings for different device groups

## Moving Beyond Community Strings

Community strings represent an old approach to network security. Modern networks face sophisticated threats that require stronger protection. Just as you wouldn't secure your online banking with a shared password, critical network infrastructure needs stronger security.

In our next section, we'll explore authentication in SNMPv3, which provides the individual accountability and security tracking that community strings lack.

# SNMP Authentication: Securing Your Network Monitoring

Moving beyond simple community strings, SNMP authentication provides a robust way to verify who can access your network monitoring system. **Authentication** in SNMP ensures that only authorized users can view or modify device information. Think of it like showing your ID card at a secure facility - the system needs to know exactly who you are before granting access.

## SNMPv3 Authentication Methods

SNMPv3 supports several methods of authentication, but the two most common are:
- **MD5 (Message Digest 5)**: An older method, still widely supported
- **SHA (Secure Hash Algorithm)**: A stronger, recommended method

While both methods work, SHA is more secure. It's like the difference between a regular door lock and a high-security lock - both will keep most people out, but the high-security lock provides better protection.

## Authentication Levels in SNMPv3

SNMPv3 defines three security levels, each providing different levels of security:

| Security Level | Authentication | Privacy (Encryption) | Use Case |
|----------------|----------------|---------------------|-----------|
| noAuthNoPriv | None | None | Testing only |
| authNoPriv | Yes | None | Basic security |
| authPriv | Yes | Yes | Production use |

## Real-World Implementation at Wonka's

After their security incident with community strings, Charlie's team implemented SNMPv3 authentication on their tempering machines. Here's how they configured a secure monitoring setup:

```
# Create an SNMPv3 user with authentication and privacy
snmp-server user Charlie WONKA-OPERATORS v3 auth sha Willy-Wonka-123! priv aes Sweet-Secret-456
snmp-server group WONKA-OPERATORS v3 priv read CHOC-TEMP-DATA write CHOC-TEMP-CONTROL
```

Let's break down this configuration:

The first line creates a user account for Charlie with:
- Username: Charlie
- Group: WONKA-OPERATORS
- Authentication protocol: SHA
- Authentication password: Willy-Wonka-123!
- Privacy protocol: AES
- Privacy password: Sweet-Secret-456

The second line sets up access control:
- Group name: WONKA-OPERATORS
- Security level: Privacy required (priv)
- Read permissions: CHOC-TEMP-DATA view
- Write permissions: CHOC-TEMP-CONTROL view

## How Authentication Works in Practice

When Charlie needs to check the temperature of a tempering machine, here's what happens behind the scenes:

1. Charlie's monitoring software sends his username (Charlie) and a message authenticated with his password
2. The tempering machine receives the request and verifies:
   - Is this a valid username?
   - Was the message properly authenticated with the correct password?
   - Does this user have permission to read temperature data?
3. If all checks pass, the temperature data is returned
4. The entire exchange is logged for security auditing

## Authentication Best Practices

Learning from their experience, Wonka's developed these authentication guidelines:

1. Always use strong passwords that are:
   - At least 16 characters long
   - Include upper and lowercase letters, numbers, and symbols
   - Avoid any references to chocolate or candy that competitors might guess

2. Create individual accounts for each user instead of sharing credentials
3. Regularly review and update access permissions
4. Monitor and log all authentication attempts
5. Remove accounts promptly when employees leave

## Troubleshooting Authentication Issues

Sometimes authentication problems occur. Charlie's team created this simple troubleshooting process:

1. Verify username and passwords are correct
2. Check that authentication protocols match on both sides
3. Ensure time is synchronized between devices
4. Review firewall rules for blocked SNMP traffic
5. Check system logs for specific error messages

By implementing strong authentication, Wonka's not only protected their valuable manufacturing data but also gained the ability to track who accessed what information and when. This accountability helped them maintain the secrecy of their famous recipes while still allowing necessary monitoring of their production systems.

In our next section, we'll explore log aggregation and how to collect and analyze all the security information generated by our authenticated SNMP systems.

# Log Aggregation: Making Sense of Network Data

Every device in your network generates logs - records of what's happening moment by moment. **Log aggregation** is the process of collecting these logs from multiple sources into a central location where they can be analyzed together. Think of it like collecting security camera footage from different locations into a single monitoring room.

## Why Centralize Logs?

At Wonka's Chocolate Factory, Charlie's team initially looked at logs on each device separately. When a temperature problem occurred in Tempering Machine #3, they had to check:
- The machine's own logs
- Network switch logs
- Authentication server logs
- SNMP manager logs

This process was time-consuming and often missed important connections between events. It was like trying to solve a puzzle while only looking at one piece at a time.

## The Syslog Protocol

**Syslog** is a standard protocol that devices use to send their log messages to a central collector. Each syslog message contains essential information:

| Field | Example | Purpose |
|-------|---------|----------|
| Timestamp | 2025-01-17 14:30:22 | When the event occurred |
| Severity | Critical | How important the event is |
| Hostname | TempMachine3 | Which device sent the message |
| Process | temp-monitor | What generated the message |
| Message | Temperature exceeded threshold: 33.5°C | What happened |

## Implementing Log Aggregation at Wonka's

After several incidents where troubleshooting was delayed by scattered logs, Charlie's team implemented centralized logging. Here's a basic syslog configuration they applied to their tempering machines:

```
# Configure syslog on a tempering machine
logging host 192.168.1.100
logging trap critical
logging facility local5
logging source-interface GigabitEthernet0/1
```

This configuration:
- Sends logs to a central server at 192.168.1.100
- Forwards all critical messages
- Uses facility "local5" to identify tempering machine logs
- Sends logs through a specific network interface

## Making Logs Useful

Having all logs in one place is just the beginning. The real value comes from making sense of this information. Wonka's implemented several key practices:

First, they standardized log formats across all devices. For example, temperature alerts always follow this pattern:
```
TEMP-ALERT: Machine=[name] Current=[temp]C Threshold=[limit]C Location=[room]
```

This standardization makes it easier to:
- Search for specific types of events
- Create meaningful alerts
- Generate reports
- Identify patterns

## Real-World Example: Solving a Mystery

One morning, Charlie's team received complaints about inconsistent chocolate texture. Looking at their centralized logs, they could see:

```
2025-01-17 02:15:33 TempMachine3 AUTH-WARNING: Failed login attempt from 192.168.1.50
2025-01-17 02:15:45 TempMachine3 AUTH-WARNING: Failed login attempt from 192.168.1.50
2025-01-17 02:16:01 TempMachine3 AUTH-SUCCESS: User 'maintenance' logged in from 192.168.1.50
2025-01-17 02:16:30 TempMachine3 TEMP-CHANGE: Temperature threshold modified from 31.5C to 33.0C
2025-01-17 02:16:45 TempMachine3 AUTH-LOGOUT: User 'maintenance' logged out
```

With centralized logging, they could immediately see that someone had changed the temperature settings during the night shift. This would have been much harder to discover if they had to check each system's logs separately.

## Log Retention and Management

Like a security camera system, log management requires planning:

1. Storage: Logs need enough space to grow
2. Retention: Decide how long to keep logs
3. Backup: Protect historical log data
4. Access: Control who can view logs

Wonka's keeps logs for:
- 30 days of detailed logs (all messages)
- 1 year of security events
- 7 years of critical system changes

## Security Information and Event Management (SIEM)

A **SIEM** system takes log aggregation to the next level by adding:
- Real-time analysis of log data
- Automated alerts for suspicious patterns
- Correlation between different types of events
- Compliance reporting
- Long-term trend analysis

Think of a SIEM as a smart security guard who not only watches all the security cameras but can instantly recognize patterns and potential problems.

By centralizing their logs and implementing a SIEM system, Wonka's gained:
- Faster problem resolution
- Better security monitoring
- Improved compliance reporting
- Historical analysis capabilities

In our next section, we'll explore how to use APIs (Application Programming Interfaces) to automate the collection and analysis of this monitoring data.

In [7]:
# @title
%%html
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Wonka's SIEM Simulator</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            max-width: 1200px;
            margin: 20px auto;
            padding: 20px;
        }
        .container {
            display: grid;
            grid-template-columns: 1fr 1fr;
            gap: 20px;
        }
        .log-viewer, .alert-config, .alert-viewer {
            border: 1px solid #ccc;
            padding: 15px;
            margin-bottom: 20px;
            border-radius: 5px;
        }
        .log-entry {
            margin: 5px 0;
            padding: 5px;
            border-bottom: 1px solid #eee;
            font-family: monospace;
        }
        .critical { color: red; font-weight: bold; }
        .warning { color: orange; }
        .info { color: blue; }
        .alert {
            background-color: #ffe6e6;
            padding: 10px;
            margin: 5px 0;
            border-radius: 5px;
        }
        button {
            margin: 5px;
            padding: 5px 10px;
        }
        input, select {
            margin: 5px;
            padding: 5px;
        }
        /* Help modal styling */
        #helpModal {
            position: fixed;
            top: 50%;
            left: 50%;
            transform: translate(-50%, -50%);
            background-color: #fafafa;
            border: 1px solid #ccc;
            padding: 20px;
            width: 80%;
            max-width: 600px;
            z-index: 1000;
            border-radius: 5px;
            box-shadow: 0px 0px 10px rgba(0,0,0,0.5);
            display: none;
        }
        #helpModal h2 {
            margin-top: 0;
        }
        #modalOverlay {
            position: fixed;
            top: 0;
            left: 0;
            width: 100%;
            height: 100%;
            background: rgba(0,0,0,0.4);
            z-index: 999;
            display: none;
        }
        #closeHelp {
            float: right;
            cursor: pointer;
            background: none;
            border: none;
            font-size: 16px;
        }
        .alert-config-help {
            font-size: 14px;
            margin: 10px 0;
            color: #555;
            border-top: 1px solid #eee;
            padding-top: 10px;
        }
    </style>
</head>
<body>
    <h1>Wonka's SIEM Simulator</h1>
    <p>Monitor your chocolate factory's security events in real-time!</p>
    <button onclick="toggleHelp()">Help</button>

    <div id="modalOverlay" onclick="toggleHelp()"></div>
    <div id="helpModal">
        <button id="closeHelp" onclick="toggleHelp()">X</button>
        <h2>Help: How to Use the Simulator</h2>
        <p>This simulator demonstrates the principles of a <strong>Security Information and Event Management (SIEM)</strong> system. A **SIEM** aggregates security-related data in real-time.</p>
        <p>Key components of the interface are explained in the table below:</p>
        <table border="1" cellspacing="0" cellpadding="5">
            <thead>
                <tr>
                    <th>Component</th>
                    <th>Definition</th>
                </tr>
            </thead>
            <tbody>
                <tr>
                    <td><strong>Live Log Stream</strong></td>
                    <td>A display showing up to 10 of the most recent log messages from system events.</td>
                </tr>
                <tr>
                    <td><strong>Alert Configuration</strong></td>
                    <td>An interface for defining conditions that trigger notifications. Specify an alert type and threshold value.</td>
                </tr>
                <tr>
                    <td><strong>Active Alerts</strong></td>
                    <td>Displays the alerts that you have configured.</td>
                </tr>
                <tr>
                    <td><strong>Triggered Alerts</strong></td>
                    <td>Shows up to the 10 most recent triggered alerts based on current log events.</td>
                </tr>
            </tbody>
        </table>
        <p>To start, press the <strong>Start/Stop Logs</strong> button in the Live Log Stream panel. Then use the Alert Configuration box to set criteria; for example, choose "Temperature Alert" and set a threshold of <strong>32.5</strong> to trigger an alert for high temperatures.</p>
    </div>

    <div class="container">
        <div>
            <div class="log-viewer">
                <h2>Live Log Stream</h2>
                <button onclick="toggleLogGeneration()">Start/Stop Logs</button>
                <div id="logStream"></div>
            </div>
        </div>
        <div>
            <div class="alert-config">
                <h2>Alert Configuration</h2>
                <div>
                    <select id="alertType">
                        <option value="temperature">Temperature Alert</option>
                        <option value="auth">Authentication Alert</option>
                        <option value="status">Status Change Alert</option>
                    </select>
                    <input type="text" id="alertThreshold" placeholder="Threshold (e.g., 32.5)">
                    <button onclick="addAlert()">Add Alert</button>
                </div>
                <div class="alert-config-help">
                    <p><strong>Temperature Alert:</strong> Fires when a temperature reading exceeds the specified threshold (numeric value, e.g., <em>32.5</em>).</p>
                    <p><strong>Authentication Alert:</strong> Fires upon an authentication failure.</p>
                    <p><strong>Status Change Alert:</strong> Fires when a machine status changes to "Stopped".</p>
                </div>
            </div>
            <div class="alert-viewer">
                <h2>Active Alerts</h2>
                <div id="activeAlerts"></div>
                <h2>Triggered Alerts</h2>
                <div id="triggeredAlerts"></div>
            </div>
        </div>
    </div>

    <script>
        let isGeneratingLogs = false;
        let logInterval;
        const alerts = [];

        // Sample data for log generation
        const machines = ['TempMachine1', 'TempMachine2', 'TempMachine3'];
        const users = ['charlie', 'wonka', 'augustus', 'violet'];
        const severities = ['INFO', 'WARNING', 'CRITICAL'];
        const eventTypes = ['TEMP', 'AUTH', 'STATUS'];

        function generateLog() {
            const timestamp = new Date().toISOString().split('T').join(' ').split('.')[0];
            const machine = machines[Math.floor(Math.random() * machines.length)];
            const severity = severities[Math.floor(Math.random() * severities.length)];
            const eventType = eventTypes[Math.floor(Math.random() * eventTypes.length)];

            let message;
            switch(eventType) {
                case 'TEMP':
                    const temp = (30 + Math.random() * 4).toFixed(1);
                    message = `Temperature reading: ${temp}°C`;
                    break;
                case 'AUTH':
                    const user = users[Math.floor(Math.random() * users.length)];
                    const success = Math.random() > 0.3;
                    message = `Authentication ${success ? 'success' : 'failure'} for user ${user}`;
                    break;
                case 'STATUS':
                    const status = ['Running', 'Maintenance', 'Stopped'][Math.floor(Math.random() * 3)];
                    message = `Machine status changed to: ${status}`;
                    break;
            }

            return {
                timestamp,
                machine,
                severity,
                eventType,
                message
            };
        }

        function displayLog(log) {
            const logStream = document.getElementById('logStream');
            const logEntry = document.createElement('div');
            logEntry.className = `log-entry ${log.severity.toLowerCase()}`;
            logEntry.textContent = `${log.timestamp} ${log.machine} ${log.severity} ${log.eventType}: ${log.message}`;

            logStream.insertBefore(logEntry, logStream.firstChild);
            // Retain only the 10 most recent log messages.
            while (logStream.children.length > 10) {
                logStream.removeChild(logStream.lastChild);
            }

            checkAlerts(log);
        }

        function toggleLogGeneration() {
            isGeneratingLogs = !isGeneratingLogs;
            if (isGeneratingLogs) {
                logInterval = setInterval(() => {
                    displayLog(generateLog());
                }, 2000);
            } else {
                clearInterval(logInterval);
            }
        }

        function addAlert() {
            const type = document.getElementById('alertType').value;
            const threshold = document.getElementById('alertThreshold').value;
            const alert = { type, threshold, id: Date.now() };
            alerts.push(alert);
            displayActiveAlerts();
        }

        function displayActiveAlerts() {
            const activeAlerts = document.getElementById('activeAlerts');
            activeAlerts.innerHTML = alerts.map(alert =>
                `<div>
                    ${alert.type}: ${alert.threshold}
                    <button onclick="removeAlert(${alert.id})">Remove</button>
                </div>`
            ).join('');
        }

        function removeAlert(id) {
            const index = alerts.findIndex(alert => alert.id === id);
            if (index > -1) {
                alerts.splice(index, 1);
                displayActiveAlerts();
            }
        }

        function checkAlerts(log) {
            alerts.forEach(alert => {
                let shouldTrigger = false;

                if (alert.type === 'temperature' && log.eventType === 'TEMP') {
                    const temp = parseFloat(log.message.split(': ')[1]);
                    shouldTrigger = temp > parseFloat(alert.threshold);
                } else if (alert.type === 'auth' && log.eventType === 'AUTH') {
                    shouldTrigger = log.message.includes('failure');
                } else if (alert.type === 'status' && log.eventType === 'STATUS') {
                    shouldTrigger = log.message.includes('Stopped');
                }

                if (shouldTrigger) {
                    triggerAlert(alert, log);
                }
            });
        }

        function triggerAlert(alert, log) {
            const triggeredAlerts = document.getElementById('triggeredAlerts');
            const alertDiv = document.createElement('div');
            alertDiv.className = 'alert';
            alertDiv.textContent = `${log.timestamp}: ${alert.type} alert triggered - ${log.message}`;
            triggeredAlerts.insertBefore(alertDiv, triggeredAlerts.firstChild);
            // Retain only the 10 most recent triggered alerts.
            while (triggeredAlerts.children.length > 10) {
                triggeredAlerts.removeChild(triggeredAlerts.lastChild);
            }
        }

        function toggleHelp() {
            const helpModal = document.getElementById('helpModal');
            const overlay = document.getElementById('modalOverlay');
            if (helpModal.style.display === 'block') {
                helpModal.style.display = 'none';
                overlay.style.display = 'none';
            } else {
                helpModal.style.display = 'block';
                overlay.style.display = 'block';
            }
        }
    </script>
</body>
</html>


Component,Definition
Live Log Stream,A display showing up to 10 of the most recent log messages from system events.
Alert Configuration,An interface for defining conditions that trigger notifications. Specify an alert type and threshold value.
Active Alerts,Displays the alerts that you have configured.
Triggered Alerts,Shows up to the 10 most recent triggered alerts based on current log events.


# Application Programming Interface (API): Automating Network Monitoring

In modern network monitoring, APIs play a crucial role in automating data collection and system integration. An **Application Programming Interface (API)** provides a way for different software systems to communicate with each other using a defined set of rules. Think of an API as a waiter in a restaurant - it takes requests from customers (applications), delivers them to the kitchen (servers), and returns with the requested items (data).

## Understanding Data Exchange: Introduction to JSON

Before we dive into APIs, let's understand how they share information. Most modern APIs use a format called **JSON (JavaScript Object Notation)** to send and receive data. JSON is like filling out a form with organized information. Let's look at a simple example:

```json
{
    "name": "Tempering Machine 3",
    "location": "Room 102",
    "temperature": 31.5
}
```

In this JSON example:
- Curly braces `{ }` mark the start and end of the data
- Each piece of information has a name and a value, separated by a colon
- Text values are in quotes: `"name": "Tempering Machine 3"`
- Numbers don't need quotes: `"temperature": 31.5`
- Each line ends with a comma (except the last one)

JSON can also contain lists using square brackets `[ ]`:
```json
{
    "name": "Tempering Machine 3",
    "recent_temperatures": [
        31.5,
        31.6,
        31.4
    ]
}
```

## Understanding APIs in Network Monitoring

At Wonka's Chocolate Factory, the network monitoring system needs to communicate with various devices and services. Before implementing APIs, Charlie's team had to manually:
- Check temperature readings from each machine
- Update monitoring thresholds
- Generate daily reports
- Configure new devices

This manual process was time-consuming and prone to errors. By implementing APIs, they automated these tasks and improved efficiency.

## Types of API Requests

When working with APIs, there are different types of requests you can make:

| Request Type | What It Does | Real-World Example |
|-------------|--------------|-------------------|
| GET | Asks for information | Checking a machine's temperature |
| POST | Sends new information | Setting temperature limits |
| PUT | Updates existing information | Updating machine settings |
| DELETE | Removes information | Removing old alert settings |

## A Simple API Example

Here's how Charlie's team uses an API to check on a tempering machine:

```
Asking for information (GET request):
Web Address: /api/v1/machines/tempering3/status

The response comes back as JSON:
{
    "machine_name": "Tempering Machine 3",
    "temperature": 31.5,
    "is_running": true,
    "last_checked": "10 minutes ago"
}
```

When they need to set new temperature limits:
```
Sending information (POST request):
Web Address: /api/v1/machines/tempering3/settings

Information being sent:
{
    "max_temperature": 32.0,
    "min_temperature": 30.0
}

Response received:
{
    "message": "Settings updated successfully"
}
```

## Real-World Application: Automated Temperature Monitoring

Let's look at how Wonka's uses APIs to automate their temperature monitoring process:

1. Every 5 minutes, the monitoring system asks each machine "What's your temperature?"
2. Each machine responds with its current temperature
3. If a temperature is too high or too low:
   - The system sends alerts to the maintenance team
   - The air conditioning system is adjusted automatically
   - A maintenance ticket is created

Before APIs, each of these steps required someone to manually check and adjust things. Now, the entire process happens automatically in seconds.

## API Security

Just as Wonka's protects their secret recipes, they must also protect their APIs. They use several security measures:

1. Authentication: Systems must prove they have permission to use the API
2. Rate Limiting: Systems can only make a certain number of requests per minute
3. Encryption: All information is scrambled during transmission
4. Access Control: Different systems get different levels of access

## Putting It All Together

In Wonka's monitoring center, their dashboard combines information from many different sources using APIs:
- Machine temperatures and status
- Ingredient inventory levels
- Security camera feeds
- Room conditions (temperature, humidity)

This gives operators a complete view of the factory's operations. When something goes wrong, they can quickly see all the relevant information in one place.

## Learning from Experience

When Wonka's first started using APIs, they learned some important lessons:

1. Always have backup plans for when APIs fail
2. Keep good documentation about how to use each API
3. Start simple and add features as needed
4. Test thoroughly before making changes

In our next section, we'll explore port mirroring, another crucial technology for comprehensive network monitoring.

# Port Mirroring: Creating Network Traffic Copies

**Port mirroring**, also known as **SPAN** (Switched Port Analyzer), is like having a security camera for your network traffic. Just as a security camera creates a copy of everything happening in an area without interfering with people's activities, port mirroring creates a copy of network traffic for analysis without disrupting the actual communication.

## Understanding Port Mirroring

Imagine you're watching chocolate flow through clear pipes in Wonka's factory. You can see what's passing through, but you can't interfere with the flow. Port mirroring works similarly with network traffic:

- **Source Port**: The network connection you want to monitor (like the main pipe)
- **Mirror Port**: The connection where you send the copy (like a small observation pipe)
- **Analyzer**: The tool that examines the copied traffic (like a quality control station)

## Real-World Example at Wonka's

When Charlie's team noticed some unusual delays in the temperature reporting system, they needed to understand what was happening on the network. Here's how they used port mirroring:

| Original Traffic | Mirrored Traffic |
|-----------------|------------------|
| Temperature sensors → Control system | Copy sent to network analyzer |
| Control system → Cooling units | Copy sent to network analyzer |
| Maintenance alerts → Staff phones | Copy sent to network analyzer |

## Basic Port Mirroring Configuration

On a network switch at Wonka's, a basic port mirroring setup looks like this:

```
// Configure port mirroring on a switch
monitor session 1 source interface GigabitEthernet1/0/1
monitor session 1 destination interface GigabitEthernet1/0/24
```

This configuration:
- Watches all traffic on port 1 (where the temperature monitoring system is connected)
- Creates copies of this traffic
- Sends these copies to port 24 (where the network analyzer is connected)

## Using Port Mirroring in Practice

At Wonka's, port mirroring helped solve several problems:

1. Performance Issues
   - Noticed temperature updates were delayed
   - Used port mirroring to observe network traffic
   - Discovered bandwidth congestion from unrelated backup tasks
   - Rescheduled backups to avoid interference

2. Security Monitoring
   - Created copies of all traffic to the recipe database
   - Analyzed access patterns
   - Detected and blocked unauthorized access attempts

## Common Port Mirroring Mistakes

Charlie's team learned some important lessons about port mirroring:

1. Don't Mirror Too Much Traffic
   - Problem: Like trying to pour too much chocolate through a small pipe
   - Solution: Mirror only the specific ports you need to analyze

2. Watch Your Bandwidth
   - Mirror ports need to handle twice the traffic (original + copy)
   - Don't send more traffic than your analyzer can handle

## Best Practices from Wonka's Experience

Through trial and error, Charlie's team developed these guidelines:

1. Plan Before Mirroring
   - Identify exactly what traffic you need to monitor
   - Ensure your analyzer can handle the traffic volume
   - Document all port mirroring configurations

2. Monitor the Monitoring
   - Check that port mirroring isn't affecting network performance
   - Regularly verify that you're capturing the right traffic
   - Remove port mirroring when no longer needed

## Troubleshooting with Port Mirroring

When Wonka's temperature monitoring system started acting strangely, Charlie's team used this troubleshooting process:

1. Identified the ports carrying temperature monitoring traffic
2. Configured port mirroring to copy this traffic
3. Used a network analyzer to examine the copies
4. Discovered malformed temperature data packets
5. Fixed a software bug in the temperature sensors

Without port mirroring, finding this problem would have been like trying to find a chocolate chip in a vat of vanilla ice cream - nearly impossible!

In our next section, we'll explore network discovery, which helps us understand what devices are connected to our network and how they communicate.

# Network Discovery: Finding What's Connected

**Network discovery** is the process of finding and identifying all devices connected to your network. Think of it like taking inventory in a store - you need to know what you have before you can manage it effectively. In network terms, discovery helps you find every computer, printer, sensor, or other device using your network.

## Types of Network Discovery

Network discovery happens in two main ways:

| Discovery Type | When It Happens | Best Used For |
|---------------|-----------------|---------------|
| Ad hoc | On demand, when needed | Troubleshooting specific issues |
| Scheduled | Regular, automated intervals | Maintaining network inventory |

## Network Discovery at Wonka's

When Charlie first started as network administrator at Wonka's, he had a problem: nobody knew exactly how many devices were connected to the factory network. Some equipment had been installed years ago and was forgotten. Other devices, like wireless temperature sensors, were added without documentation.

### Initial Network Discovery

Charlie's first network-wide discovery found some surprising things:
- An old computer still running chocolate mold designs
- Several unauthorized wireless access points
- Temperature sensors that nobody remembered installing
- Candy-wrapping machines using outdated software

## Basic Discovery Tools

Network discovery often starts with simple tools. Here's a basic command used to find devices:

```
ping 192.168.1.255

// Response might show:
Reply from 192.168.1.10: Tempering Machine 1
Reply from 192.168.1.11: Tempering Machine 2
Reply from 192.168.1.12: Tempering Machine 3
No reply from 192.168.1.13
Reply from 192.168.1.14: Office Printer
```

This basic test:
- Sends a message to all devices in that network range
- Waits for responses
- Shows which addresses are being used
- Helps identify basic device information

## Automated Discovery Process

Modern network discovery tools do much more than just ping devices. At Wonka's, their automated discovery process:

1. Scans the Network
   - Checks all possible network addresses
   - Tests different communication protocols
   - Identifies device types and models

2. Creates an Inventory
   - Records each device's location
   - Notes operating systems and software versions
   - Maps connections between devices

3. Flags Potential Issues
   - Highlights unauthorized devices
   - Identifies outdated software
   - Notes unusual configurations

## Real-World Discovery Example

One day, Wonka's quality control system reported inconsistent chocolate thickness. Charlie's team used ad hoc network discovery to investigate:

```
Discovery Results for Production Line 3:
- Main Controller (192.168.1.50)
  ├─ Thickness Sensor 1 (192.168.1.51) - OK
  ├─ Thickness Sensor 2 (192.168.1.52) - OK
  ├─ Thickness Sensor 3 (192.168.1.53) - Not Responding
  └─ Thickness Sensor 4 (192.168.1.54) - OK
```

This discovery quickly showed that one sensor wasn't responding, allowing maintenance to fix the problem before more chocolate was wasted.

## Scheduled Discovery

Charlie set up regular network discovery to run every night at 2 AM when production was lowest. This helps:
- Maintain an accurate network inventory
- Detect new or missing devices
- Track changes over time
- Ensure security policies are followed

## Learning from Discovery Results

Each network discovery at Wonka's teaches something new:

1. Missing Devices
   - If a device stops appearing in discovery
   - Might indicate equipment failure
   - Could signal unauthorized removal

2. New Devices
   - Unexpected new devices need investigation
   - Might be legitimate new equipment
   - Could indicate security problems

3. Changed Configurations
   - Devices with new settings
   - Might show unauthorized changes
   - Could indicate needed updates

## Best Practices for Network Discovery

Through experience, Charlie's team developed these guidelines:

1. Regular Schedule
   - Run full discovery nightly
   - Quick scans during shift changes
   - Detailed audits monthly

2. Documentation
   - Keep discovery logs
   - Update network diagrams
   - Maintain device inventory

In our next section, we'll look at traffic analysis, which helps us understand how these discovered devices are actually using the network.

In [9]:
# @title
%%html
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Wonka's Network Discovery Tool</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            max-width: 1200px;
            margin: 20px auto;
            padding: 20px;
        }
        .container {
            display: grid;
            grid-template-columns: 1fr 1fr;
            gap: 20px;
        }
        .panel {
            border: 1px solid #ccc;
            padding: 15px;
            margin-bottom: 20px;
            border-radius: 5px;
        }
        .device {
            margin: 5px 0;
            padding: 10px;
            border: 1px solid #eee;
            border-radius: 3px;
        }
        .online { background-color: #e6ffe6; }
        .offline { background-color: #ffe6e6; }
        .new { background-color: #e6e6ff; }
        .controls {
            margin-bottom: 20px;
        }
        button {
            margin: 5px;
            padding: 8px 16px;
            cursor: pointer;
        }
        .loading {
            color: #666;
            font-style: italic;
        }
        .tree-view {
            font-family: monospace;
        }
        .status-dot {
            display: inline-block;
            width: 10px;
            height: 10px;
            border-radius: 50%;
            margin-right: 5px;
        }
        .status-dot.up { background-color: #00ff00; }
        .status-dot.down { background-color: #ff0000; }
    </style>
</head>
<body>
    <h1>Wonka's Network Discovery Tool</h1>
    <div class="controls">
        <button onclick="startAdHocDiscovery()">Run Ad Hoc Discovery</button>
        <button onclick="toggleScheduledDiscovery()">Start/Stop Scheduled Discovery</button>
        <button onclick="addRandomDevice()">Simulate New Device</button>
        <button onclick="toggleRandomDevice()">Toggle Random Device</button>
    </div>

    <div class="container">
        <div class="panel">
            <h2>Network Devices</h2>
            <div id="deviceList"></div>
        </div>
        <div class="panel">
            <h2>Discovery Log</h2>
            <div id="discoveryLog"></div>
        </div>
    </div>

    <script>
        let devices = [
            { id: 1, name: "Tempering Machine 1", ip: "192.168.1.10", type: "Production", status: "online" },
            { id: 2, name: "Tempering Machine 2", ip: "192.168.1.11", type: "Production", status: "online" },
            { id: 3, name: "Tempering Machine 3", ip: "192.168.1.12", type: "Production", status: "online" },
            { id: 4, name: "Office Printer", ip: "192.168.1.14", type: "Office", status: "online" },
            { id: 5, name: "Security Camera 1", ip: "192.168.1.20", type: "Security", status: "online" },
            { id: 6, name: "Temperature Sensor 1", ip: "192.168.1.30", type: "Sensor", status: "online" }
        ];

        let discoveryInterval;
        let discoveryRunning = false;

        function logDiscovery(message) {
            const log = document.getElementById('discoveryLog');
            const timestamp = new Date().toLocaleTimeString();
            const entry = document.createElement('div');
            entry.textContent = `${timestamp}: ${message}`;
            log.insertBefore(entry, log.firstChild);
            if (log.children.length > 20) log.removeChild(log.lastChild);
        }

        function displayDevices() {
            const list = document.getElementById('deviceList');
            list.innerHTML = '';

            // Group devices by type
            const groupedDevices = devices.reduce((acc, device) => {
                acc[device.type] = acc[device.type] || [];
                acc[device.type].push(device);
                return acc;
            }, {});

            // Create tree view
            Object.entries(groupedDevices).forEach(([type, typeDevices]) => {
                const typeDiv = document.createElement('div');
                typeDiv.innerHTML = `<strong>└─ ${type}</strong>`;
                list.appendChild(typeDiv);

                typeDevices.forEach(device => {
                    const deviceDiv = document.createElement('div');
                    deviceDiv.style.marginLeft = '20px';
                    deviceDiv.className = `device ${device.status}`;
                    const statusDot = `<span class="status-dot ${device.status === 'online' ? 'up' : 'down'}"></span>`;
                    deviceDiv.innerHTML = `${statusDot}├─ ${device.name} (${device.ip})`;
                    list.appendChild(deviceDiv);
                });
            });
        }

        async function startAdHocDiscovery() {
            logDiscovery("Starting ad hoc network discovery...");

            for (let device of devices) {
                await new Promise(resolve => setTimeout(resolve, 500));
                logDiscovery(`Scanning ${device.ip} (${device.name})... ${device.status}`);
            }

            logDiscovery("Ad hoc discovery complete.");
            displayDevices();
        }

        function toggleScheduledDiscovery() {
            if (discoveryRunning) {
                clearInterval(discoveryInterval);
                discoveryRunning = false;
                logDiscovery("Scheduled discovery stopped");
            } else {
                discoveryRunning = true;
                logDiscovery("Scheduled discovery started (runs every 10 seconds)");
                discoveryInterval = setInterval(() => {
                    logDiscovery("Running scheduled discovery scan...");
                    devices.forEach(device => {
                        if (Math.random() > 0.8) {
                            const newStatus = device.status === 'online' ? 'offline' : 'online';
                            device.status = newStatus;
                            logDiscovery(`Status change: ${device.name} is now ${newStatus}`);
                        }
                    });
                    displayDevices();
                }, 10000);
            }
        }

        function addRandomDevice() {
            const id = devices.length + 1;
            const types = ['Sensor', 'Production', 'Security', 'Office'];
            const type = types[Math.floor(Math.random() * types.length)];
            const newDevice = {
                id,
                name: `${type} Device ${id}`,
                ip: `192.168.1.${50 + id}`,
                type,
                status: 'online'
            };
            devices.push(newDevice);
            logDiscovery(`New device detected: ${newDevice.name} (${newDevice.ip})`);
            displayDevices();
        }

        function toggleRandomDevice() {
            const index = Math.floor(Math.random() * devices.length);
            devices[index].status = devices[index].status === 'online' ? 'offline' : 'online';
            logDiscovery(`Status change: ${devices[index].name} is now ${devices[index].status}`);
            displayDevices();
        }

        // Initial display
        displayDevices();
    </script>
</body>
</html>