Skip to content

CRITICAL: TOCTOU race condition in cache checks causes NullPointerException #58

@sfloess

Description

@sfloess

Severity: CRITICAL

Files:

  • NexusClassSource.java (lines 119-120)
  • MavenRepositoryClassSource.java (lines 70-72)

Problem

Both classes have a Time-Of-Check-Time-Of-Use (TOCTOU) race condition in their cache access logic that can cause NullPointerException in concurrent scenarios.

Bug Details

NexusClassSource.java lines 119-120

String cachedKey = packagePath;
if (jarCache.containsKey(cachedKey)) {
    return jarCache.get(cachedKey);  // ← Can return null!
}

MavenRepositoryClassSource.java lines 70-72

String cacheKey = className;
if (classCache.containsKey(cacheKey)) {
    return classCache.get(cacheKey);  // ← Can return null!
}

Root Cause

TOCTOU race condition:

  1. Thread A: containsKey(key) → returns true
  2. Thread B: remove(key) → removes entry
  3. Thread A: get(key) → returns null because entry was removed

Even though ConcurrentHashMap is thread-safe, two separate operations are not atomic together.

Exploitation Scenario

// Thread 1: Loading class
byte[] data = nexusSource.loadClassData("com.example.MyClass");
// containsKey() returns true
// Context switch!
// Thread 2 clears cache
// get() returns null!
// Returns null to caller

// Thread 1 continues:
Class<?> clazz = defineClass(name, data, 0, data.length);  // ← NPE!

Stack trace:

java.lang.NullPointerException: Cannot read the array length because "classData" is null
    at java.lang.ClassLoader.defineClass(ClassLoader.java:XXX)
    at org.flossware.jclassloader.JClassLoader.findClassInternal(JClassLoader.java:XXX)

Impact

Symptoms:

  • Random NullPointerExceptions in production
  • Only happens under load with concurrent class loading
  • Hard to reproduce in testing
  • Intermittent failures that look like "ghosts"

Affected scenarios:

  1. Multiple threads loading classes concurrently
  2. Cache eviction or clearing during class loading
  3. Long-running applications with many ClassLoaders

Proof of Failure

@Test
public void testConcurrentCacheAccess() throws Exception {
    NexusClassSource source = new NexusClassSource(nexusUrl, "repo");
    
    // Pre-load class into cache
    byte[] original = source.loadClassData("com.example.Test");
    
    ExecutorService executor = Executors.newFixedThreadPool(2);
    
    CountDownLatch latch = new CountDownLatch(2);
    AtomicReference<byte[]> result = new AtomicReference<>();
    
    // Thread 1: Load from cache
    executor.submit(() -> {
        try {
            latch.countDown();
            latch.await();  // Synchronize start
            result.set(source.loadClassData("com.example.Test"));
        } catch (Exception e) {
            e.printStackTrace();
        }
    });
    
    // Thread 2: Clear cache at the same time
    executor.submit(() -> {
        try {
            latch.countDown();
            latch.await();  // Synchronize start  
            source.jarCache.clear();  // Or remove specific key
        } catch (Exception e) {
            e.printStackTrace();
        }
    });
    
    executor.shutdown();
    executor.awaitTermination(5, TimeUnit.SECONDS);
    
    // Result can be null due to TOCTOU race!
    assertNotNull("Should not return null", result.get());  // ← FAILS
}

Fix for NexusClassSource

private byte[] loadFromMaven(String className) throws IOException {
    String packagePath = getPackagePath(className);
    if (packagePath == null) {
        throw new IOException("Cannot determine Maven coordinates for class: " + className);
    }

    String cachedKey = packagePath;
    
    // ATOMIC: Single operation, no TOCTOU
    byte[] cachedData = jarCache.get(cachedKey);
    if (cachedData != null) {
        return cachedData;
    }

    String simpleClassName = getSimpleClassName(className);
    String classFileInJar = ClassNameUtil.toClassFilePath(className);

    byte[] classData = searchInJars(packagePath, classFileInJar);
    if (classData != null) {
        jarCache.put(cachedKey, classData);
        return classData;
    }

    throw new IOException("Class not found in Nexus Maven repository: " + className);
}

Fix for MavenRepositoryClassSource

@Override
public byte[] loadClassData(String className) throws IOException {
    String cacheKey = className;
    
    // ATOMIC: Single operation, no TOCTOU
    byte[] cachedData = classCache.get(cacheKey);
    if (cachedData != null) {
        return cachedData;
    }

    String classFileName = ClassNameUtil.toClassFilePath(className);

    for (MavenArtifact artifact : artifacts) {
        try {
            String jarUrl = buildJarUrl(artifact);
            byte[] classData = extractClassFromJar(jarUrl, classFileName);
            classCache.put(cacheKey, classData);
            return classData;
        } catch (IOException e) {
            // Continue to next artifact if class not found in this one
        }
    }

    throw new IOException("Class not found in any configured Maven artifacts: " + className);
}

Key Principle

Never split contains() + get() into two operations.

// WRONG - TOCTOU race:
if (map.containsKey(key)) {
    return map.get(key);  // Can return null!
}

// CORRECT - Atomic:
Value value = map.get(key);
if (value != null) {
    return value;
}

Note: ConcurrentHashMap.get() returning null means either:

  1. Key doesn't exist, OR
  2. Key exists but value is null

In these classes, null values are never put in cache, so get() == null means key doesn't exist. Safe to use.

Required Actions

  1. Replace containsKey() + get() with single get() call in NexusClassSource
  2. Replace containsKey() + get() with single get() call in MavenRepositoryClassSource
  3. Add concurrency test to verify fix
  4. Audit ALL cache access patterns in codebase for similar issues

Related Bug Patterns

Search codebase for:

grep -rn "containsKey" --include="*.java" src/

Check if any are followed by .get() on same key - all are potential TOCTOU bugs.

References

  • CWE-367: Time-of-check Time-of-use (TOCTOU) Race Condition
  • Java Concurrency in Practice, Section 5.2: Concurrent Collections

This is a CRITICAL concurrency bug that causes production failures under load.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions