Discrete event simulation of a cluster of machines with .NET Core async/await
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Fdb
Runtime
.gitignore
LICENSE.txt
SimMach.sln
readme.md
screenshot.png

readme.md

sim-cluster

This is work-in-progress on running event-driven distributed systems inside discrete event simulation.

The purpose of this simulation research - to be able to run a distributed application inside a deterministic simulation while bombarding it with various faults that are hard to reproduce in the real world (but are still disruptive for the production systems).

This project builds upon the SymAsync project, extending it with more features (including a simplified networking stack).

If you want to discuss this project, don't hesitate to write me an email at rinat@abdullin.

Try it out

This is a .NET Core 2.0. You should be able to open it in a IDE (e.g. in JetBrains Rider) and run Runtime/SimMach.csproj project.

The output should be something like this:

screenshot.png

Alternatively, you could try launching everything from the CLI with something like:

$ dotnet run --project Runtime

Details

Sim-cluster builds up on the previous work:

  1. SimCPU - simulate CPU job scheduler (easier than it sounds);
  2. SimRing - simulate ring benchmark;
  3. SimAsync - plug into .NET Core async/await to simulate processes running in parallel;
  4. SimCluster - this.

This project introduces:

  • Simplified simulation of TCP/IP. This includes connection handshake, SEQ/ACK numbers and reorder buffers. There is now proper shutdown sequence and no packet re-transmissions.
  • Durable node storage in form of per-machine folders used by the LMDB database.
  • Configurable system topology - machines, services and network connections.
  • Simulation plans that specify how we want to run the simulated topology. This includes a graceful chaos monkey.
  • Simulating power outages by erasing future for the affected systems.
  • Network profiles - ability to configure latency, packet loss ratio and logging per network connection.

Dive in

To dive in take a look at the Program.cs. It generates a simulation scenario that is then executed.

A scenario could look like this:

public static ScenarioDef InventoryMoverBotOver3GConnection() {
    var test = new ScenarioDef();
    // define network connections and provide network profiles for them
    test.Connect("botnet", "public", NetworkProfile.Mobile3G);
    test.Connect("public", "internal", NetworkProfile.AzureIntranet);
    // install services on the machines
    test.AddService("cl.internal", InstallCommitLog);
    test.AddService("api1.public", InstallBackend("cl.internal"));
    test.AddService("api2.public", InstallBackend("cl.internal"));
    // configure a bot that will create workload and verify results 
    var mover = new InventoryMoverBot {
        Servers = new []{"api1.public", "api2.public"},
        RingSize = 7,
        Iterations = 30,
        Delay = 4.Sec(),
        HaltOnCompletion = true
    };
    
    test.AddBot(mover);
    
    // define a plan for the simulation (who will control the machines)
    // this is optional, but a chaos monkey is cute...
    var monkey = new GracefulChaosMonkey {
        ApplyToMachines = s => s.StartsWith("api"),
        DelayBetweenStrikes = r => r.Next(5,10).Sec()
    };
    test.Plan = monkey.Run;
    return test;
}

Installer functions bring together the necessary dependencies and return an instance of IEngine:

static Func<IEnv, IEngine> InstallBackend(string cl) {
    return env => {
        var client = new CommitLogClient(env, cl + ":443");
        return new BackendServer(env, 443, client);
    };
}
static IEngine InstallCommitLog(IEnv env) {
    return new CommitLogServer(env, 443);
}

BackendServer is a simplistic event-driven server that has its own projection thread and a (command) request handler. It commits data to the CommitLog from which other server instances could get the same data.

In theory, the same business logic should be able to run in the real world environment as well. I didn't get to that part, yet.

Licenses

This project is licensed under MIT license and uses: