Skip to content

Analyzing Performance (Documenting Progress, Results, etc.) #5150

@jdrueckert

Description

@jdrueckert

Assumptions

  • probable culprits:
    • chunk generation
    • lighting
  • we use multithreading
  • we don't support dynamic thread scaling
  • chunk generation phases:
    • Base terrain generation (noise)
    • Terrain Additions (facets?)
    • Chunk Mesh generation
    • Chunk Lighting calcuation

Next Steps

  • clarify chunk generation phase assumption using code
  • measure duration of chunk generation phases
  • confirm which threads are "ours"
  • find out which threads are used for what
  • visualize chunk generation flow

Collected Insights

Past Insights

Related Issues

Concurrency Providers & Consumers

(current state as created by @skaldarnar for Reactor effort)
image

Time-consuming tasks

(as compiled by @DarkWeird):
Many time takes:

  1. generating/loading chunks.
  2. Exposure node
  3. Shadow map.
  4. Nui

Many memory takes chunks... but we cannot shrink them almost. Bytewise operation take many time, any object structure (like octotree) take so huge memory.. that current impl is optimal. Java modules can enable agressive optimization if we hide it in separate module (or cannot). Also octotree can be more optimal by memory, when java implement compact class header. (Or we hide chunk in rust)

Most frequently called methods

(as compiled by @BenjaminAmos via JFR)

I don't know if this is right but a quick JFR recording seems to indicate void org.terasology.core.world.generator.facetProviders.DensityNoiseProvider.process(GeneratingRegion, float) as being called an awful lot (24% of the time). That doesn't necesarily mean that it's a bottleneck though (sampling does not measure execution time).

Interestingly, on the slow server recording the most frequently sampled methods were:

  • com.google.common.collect.MapMakerInternalMap$HashIterator.nextInTable()
  • java.lang.invoke.VarHandleObjects$Array.getVolatile(VarHandleObjects$Array, Object, int)

The HashIterator method was generally (indirectly) called from:

  • void org.terasology.engine.rendering.logic.LightFadeSystem.update(float)
  • void org.terasology.engine.logic.behavior.BehaviorSystem.update(float)
  • void org.terasology.engine.logic.characters.CharacterSystem.update(float)
  • void org.terasology.engine.logic.common.lifespan.LifespanSystem.update(float)
  • void org.terasology.engine.logic.behavior.CollectiveBehaviorSystem.update(float)

Actually, it's those systems for both frequent methods.
Inside of those methods, the stack generally goes:

  • boolean com.google.common.collect.Iterators$ConcatenatedIterator.hasNext()
  • boolean java.util.Spliterators$1Adapter.hasNext()
  • boolean java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(Consumer)
    Which implicates java.util.stream again (a known big issue in general).

DensityNoiseProvider is not as big of an issue on that machine though. It's only 1.52% of samples.
Could it possibly be related to

public final Iterable<EntityRef> getEntitiesWith(Class<? extends Component>... componentClasses) {
if (isWorldPoolGlobalPool()) {
return Iterables.concat(globalPool.getEntitiesWith(componentClasses),
sectorManager.getEntitiesWith(componentClasses));
}
return Iterables.concat(globalPool.getEntitiesWith(componentClasses),
getCurrentWorldPool().getEntitiesWith(componentClasses), sectorManager.getEntitiesWith(componentClasses));
}
or
public final Iterable<EntityRef> getEntitiesWith(Class<? extends Component>... componentClasses) {
return () -> entityStore.keySet().stream()
//Keep entities which have all of the required components
.filter(id -> {
for (Class<? extends Component> component : componentClasses) {
if (componentStore.get(id, component) == null) {
return false;
}
}
return true;
})
.map(id -> getEntity(id))
.iterator();
}
- This does use Java streams.

Multi-Threading

@BenjaminAmos found the following list of threads indicated by JFR (threads marked with '*' are assumed to be "ours"):

C1
C2
*Chunk-Processing-0
*Chunk-Processing-Reactor
*Chunk-Unloader-0
*Chunk-Unloader-1
*Chunk-Unloader-2
*Chunk-Unloader-3
Common-Cleaner
FileSystemWatchService
FileSystemWatchService
Finalizer
G1
Java2D
JFR
JFR
JFR
JFR:
Logging-Cleaner
*main
nioEventLoopGroup-2-1
nioEventLoopGroup-3-1
nioEventLoopGroup-3-2
nioEventLoopGroup-3-3
Reference
*Saving-0
Service
Signal
SIGTERM
StreamCloser
Sweeper
*Thread
*Thread-1
*Thread-2
VM

Code Areas with Longest Per-Call Durations

Based on the statistical info in https://benjaminamos.github.io/TerasologyPerformanceTracyView/tracy-profiler.html

TODO: Refactor the individual code areas to improve their performance and reduce their per-call run time.

References

Reactor Effort:

Potentially Helpful Tooling

  • Java Flight Recording
  • in-game Performance Monitor (F3 to open, F4 to cycle through individual tools)
    • Means Mode
    • Spikes Mode
    • Memory Allocations Mode
    • Running Threads Mode
    • World Renderer Mode
    • Rendering Execution Mode
  • Enable the Monitoring option in Settings->Autoconfig->System Settings to show this information in a separate window (requires restarting the game). The Performance tab will only function when the in-game performance monitor is open (F3) and F4 has been pressed once.

Information Sources

Performance-related issues:

Tooling-related issues:

Follow-Up Actions

  • improve documentation of in-game debug/analytics tooling

Metadata

Metadata

Labels

Category: DocRequests, Issues and Changes targeting javadoc and module documentationCategory: PerformanceRequests, Issues and Changes targeting performanceRevive: Keepissue has been looked at and deemed potentially helpful for reviveSize: LVery big effort likely requiring a lot of research and work in many areas across the codebaseStatus: Needs DiscussionRequires help discussing a reported issue or provided PRStatus: Needs InvestigationRequires to be debugged or checked for feasibility, etc.Topic: ArchitectureRequests, Issues and Changes related to software architecture, programming patterns, etc.Topic: ConcurrencyRequests, issues, and changes relating to threading, concurrency, parallel execution, etc.Topic: StabilizationRequests, Issues and Changes related to improving stablity and reducing flakynessTopic: WorldGenRequests, Issues and Changes related to facets, rasterizers, etc.Type: QuestionIssue intended to help understanding something that is unclear

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions