Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix verbose log output of freePhysicalMemory on 31-bit z/OS #424

Closed
fjeremic opened this issue Oct 23, 2017 · 12 comments
Closed

Fix verbose log output of freePhysicalMemory on 31-bit z/OS #424

fjeremic opened this issue Oct 23, 2017 · 12 comments

Comments

@fjeremic
Copy link
Contributor

Currently on 31-bit z/OS the verbose log printout of -Xjit:verbose={compile*} says that freePhysicalMemory=1125899906842624 KB which is definitely false. There seems to be either a bug in the printout of the wrong value is returned by on of the APIs. If we are unable to retrieve this value we should not be printing it at all rather than misleading the user with an incorrect value.

@fjeremic
Copy link
Contributor Author

@dsouzai / @mpirvu FYI.

@mpirvu
Copy link
Contributor

mpirvu commented Oct 24, 2017

1125899906842624 in hex is 0x4,0000,0000,0000
The value is actually shifted left 10 digits before being printed, so (ignoring the bits that were lost through shifting) we are trying to print 0x1000,0000,0000,0000.
Indeed, on my windows system, if I force the return value from computeFreePhysicalMemory() to be 0x1000000000000000 then the printed value is 1125899906842624. However, I cannot explain how that value can be produced.

What we return is _cachedFreePhysicalMemoryB which is a int64_t value. This field is initialized to 0 and the first time computeFreePhysicalMemory() is called it gets set to some meaningful value. If the call to j9sysinfo_get_memory_info fails, then _cachedFreePhysicalMemoryB becomes -1 and we should print -1 even after shifting right.
If the call succeeds, then we compute

uint64_t freePhysicalMemorySizeB = memInfo.availPhysical + buffered + cached + availSwap;
 _cachedFreePhysicalMemoryB = freePhysicalMemorySizeB > LONG_MAX ? LONG_MAX : freePhysicalMemorySizeB;

I am unsure about the intent of the statement that limits _cachedFreePhysicalMemoryB to LONG_MAX. On my system LONG_MAX is 2,147,483,647 equivalent to 2GB, but in practice we could easily have physical memory larger than that.
Also, I don't think we should include availSwap in the physical memory calculation.

To sum up, I propose the following changes:

  1. _cachedFreePhysicalMemoryB should be of type uint64_t because all the fields in J9MemoryInfo are of this type
  2. availSwap should not be part of the physical address calculation
  3. we should eliminate the test for LONG_MAX
  4. In case we cannot read the physical memory from OS we should cache OMRPORT_MEMINFO_NOT_AVAILABLE into cachedFreePhysicalMemoryB . We should also update lastUpdateTime so that we don't try to read the physical memory over and over just to fail every time.
  5. Printing specifier should be changed to %llu (this may not be available on all platforms, but if %lld worked I see no reason why %llu would not work).
  6. Since computeFreePhysicalMemory() also informs us about incompleteInfo maybe we shouldn't print the reported physical memory when this is true. I don't feel strong about this one though.
    However, we shoudn't print anything if the value to be printed is OMRPORT_MEMINFO_NOT_AVAILABLE

@mpirvu
Copy link
Contributor

mpirvu commented Oct 24, 2017

Another suggestion: we should print in MB (not KB) since it's easier to read.

@mpirvu
Copy link
Contributor

mpirvu commented Jan 16, 2018

According to the investigation related to PR #909 the output coming from j9sysinfo_get_memory_info() for memInfo.cached looks bogus: 0xffffffff3c5c4000.
This is not error code and it is too big to be real. I am assuming this is a bug/limitation in the portlib implementation for zOS.

@mpirvu mpirvu added bug and removed comp:jit labels Jan 16, 2018
@pdbain-ibm
Copy link
Contributor

I will investigate.

@pdbain-ibm
Copy link
Contributor

J9MemoryInfo uses uint64_t to store values.
These are populated via retrieveZOSMemoryStats():
memInfo->totalPhysical = ((uint64_t)rcep->rcepool * J9BYTES_PER_PAGE);
memInfo->availPhysical = ((uint64_t)rcep->rceafc * J9BYTES_PER_PAGE);
memInfo->totalSwap = ((uint64_t)asmvtp->asmslots * J9BYTES_PER_PAGE);
memInfo->availSwap = ((uint64_t)(asmvtp->asmslots - (asmvtp->asmvsc + asmvtp->asmnvsc

@mpirvu
Copy link
Contributor

mpirvu commented Jan 16, 2018

The problem is related to the cached memory, so the relevant code is

	rc = getZFSUserCacheUsed(&userCacheUsed);
	if (0 == rc) {
		memInfo->cached = userCacheUsed;
	}

@pdbain-ibm
Copy link
Contributor

runtime/compiler/trj9/control/CompilationThread.cpp
Two issues:

  1. int64_t TR::CompilationInfo::computeFreePhysicalMemory()
    _cachedFreePhysicalMemoryB = freePhysicalMemorySizeB > LONG_MAX ? LONG_MAX : freePhysicalMemorySizeB;
    LONG_MAX evaluates to 2 GB on 31-bit z/OS, so we cap it at an unreasonably low value. Since we return a 64-bit number, there is no reason to cap the value.

  2.      TR_VerboseLog::writeLineLocked(
         TR_Vlog_COMPSTART,
         "(%s%s) Compiling %s %s %s j9m=%p t=%u compThread=%d memLimit=%d KB freePhysicalMemory=%lld KB",
         compilationTypeString,
         compiler->getHotnessName(compiler->getMethodHotness()),
         compiler->signature(),
         compiler->isDLT() ? "DLT" : "",
         details.name(),
         method,
         _compInfo.getPersistentInfo()->getElapsedTime(),
         compiler->getCompThreadID(),
         scratchSegmentProvider.allocationLimit() >> 10,
         _compInfo.computeFreePhysicalMemory(incomplete) >> 10
         );
    

The format specifiers for elapsed time and memLimit don't match the actual values.

The old result was:

(warm) Compiling java/lang/Math.powerOfTwoD(I)D OrdinaryMethod j9m=4B9DDD28 t=0 compThread=192 memLimit=0 KB freePhysicalMemory=1125899906842624 KB

Fixing them gives:

Compiling java/lang/Math.powerOfTwoD(I)D OrdinaryMethod j9m=4B9DDC28 t=184 compThread=0 memLimit=262144 KB freePhysicalMemory=21911196 KB**

@harryyu1994
Copy link
Contributor

I will add in the fix, the first two issues mentioned was caught, missed the memLimit one.

@mpirvu
Copy link
Contributor

mpirvu commented Jan 18, 2018

@pdbain-ibm The fixes you are proposing are awaiting to be integrated in PR #909. While working on it we discovered that the value for memInfo.cached in particular looks bad. It's a very large value, that when added to the value for physical memory results in an overflow. The outcome is that we print a value that looks reasonable, but it's bogus in reality.

@harryyu1994
Copy link
Contributor

memInfo.availPhysical=54154899456 Bytes 0xc9be23000 in Hex
memInfo.cached=18446744070427262976 Bytes 0xffffffff3c5c4000 in Hex
freePhysicalMemory=50872610816 Bytes; After Shift by 20 = 48515 MB

Since freePhysicalMemory = memInfo.availPhysical + memInfo.cached,
it's probably just a wrap around value that looks reasonable.

harryyu1994 added a commit to harryyu1994/openj9 that referenced this issue Jan 18, 2018
Made the following changes:
- Change _cachedFreePhysicalMemoryB from int64_t to uint64_t because all the fields in J9MemoryInfo are of this type
- Remove availSwap from physical memory calculation
- Remove the test for LONG_MAX because in practice we could easily have physical memory larger than that (2GB)
- Change freePhysicalMemory=%lld to freePhysicalMemory=%llu
- Print physical memory report only if it is complete
- Change t=%u to t=%llu
- Change memLimit=%d to memLimit=%zu

Issue: eclipse-openj9#424
Signed-off-by: Harry Yu <harryyu1994@gmail.com>
@dsouzai
Copy link
Contributor

dsouzai commented Feb 16, 2018

I think this can be closed since #909 is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants