Skip to content

Conversation

@obastemur
Copy link
Collaborator

Brings ~4% perf improvement [ http-load test, measured on xplat ]

  • Disables RecyclerWatsonTelemetry for ChakraCore
    [ reduces the number of calls to system clock api <=~1.5% ]
  • Use RDTSC for GetTickCount.
    [ this affects only xplat <=~3% ]

inline size_t rdtsc()
{
uint32_t H, L;
__asm volatile ("rdtsc":"=a"(L), "=d"(H));
Copy link
Collaborator

@ThomsonTan ThomsonTan Oct 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for cross plat implementation? I remembered the prefix has 2 more underscores as suffix. Also is it fine to specify 2 variables on x86?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is for xplat and, it is perfectly fine.

@obastemur obastemur force-pushed the perf branch 3 times, most recently from b1a818c to 3ca2038 Compare October 20, 2016 15:58
@obastemur
Copy link
Collaborator Author

@dotnet-bot test Ubuntu ubuntu_linux_debug_static

@obastemur
Copy link
Collaborator Author

@dotnet-bot test Ubuntu ubuntu_linux_release_static

@obastemur
Copy link
Collaborator Author

@dotnet-bot test Ubuntu ubuntu_linux_debug_static

1 similar comment
@obastemur
Copy link
Collaborator Author

@dotnet-bot test Ubuntu ubuntu_linux_debug_static

#ifdef HEAP_ENUMERATION_VALIDATION
,pfPostHeapEnumScanCallback(nullptr)
#endif
#ifndef TARGET_CHAKRACORE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use existing #ifdef NTBUILD instead of introducing a new one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will but NTBUILD does not sound like this is for ChakraFull build. It reminds me WinNT instead..

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree... (and I actually think it is WinNT).

#ifdef _X86_
return L;
#else
return ((ULONGLONG)H << 32) | L;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(ULONGLONG)H [](start = 12, length = 12)

nit: H is already ULONGLONG type, why cast

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was supposed to be uint32 instead but PAL internal has this ULONGLONG stuff. Will update this one

return (end - start) / usec;
}

static ULONGLONG cpu_speed = CPUFreq() * 1e3; // 1000 + 1e6 => ns to ms

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1000 + 1e6 [](start = 49, length = 10)

nit: Is comment 1000 + ... meant to be ... * 1000?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No 1e3 means 1000. Comment is explaining the nanosec -> millisec conversion.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the char "+" in comment appeared to be an arithmetic operator and confused me.

{
return rdtsc() / cpu_speed;
}
GetTickCount64FallbackCB getTickCount64FallbackCB = cpu_speed ? FastTickCount : GetTickCount64Fallback;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetTickCount64FallbackCB getTickCount64FallbackCB [](start = 0, length = 49)

nit: "static" GetTickCount64FallbackCB getTickCount64FallbackCB?

Brings ~4% perf improvement [ http-load test, measured on xplat ]

- Disables RecyclerWatsonTelemetry for ChakraCore
  [ reduces the number of calls to system clock api <=~1.5% ]

- Use RDTSC for GetTickCount.
  [ this affects only xplat <=~3% ]
@obastemur
Copy link
Collaborator Author

@dotnet-bot test Ubuntu ubuntu_linux_debug_static

@obastemur
Copy link
Collaborator Author

@dotnet-bot test Ubuntu ubuntu_linux_release_static

@obastemur
Copy link
Collaborator Author

@jianchun @ThomsonTan Thanks for the reviews

@chakrabot chakrabot merged commit fe917d0 into chakra-core:master Oct 25, 2016
chakrabot pushed a commit that referenced this pull request Oct 25, 2016
…C for GetTickCount

Merge pull request #1780 from obastemur:perf

Brings ~4% perf improvement [ http-load test, measured on xplat ]

- Disables RecyclerWatsonTelemetry for ChakraCore
  [ reduces the number of calls to system clock api <=~1.5% ]

- Use RDTSC for GetTickCount.
  [ this affects only xplat <=~3% ]
@obastemur obastemur deleted the perf branch November 15, 2016 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants