Skip to content

Conversation

@steveloughran
Copy link
Contributor

This patch adds a new subclass, MonotonicClock, and switches ExecutorAllocationManager to using it (it used to use System.nanoTime()).

A lot of the code which uses SystemClock would appear to benefit from moving to a monotonic clock —but that would a more significant piece of work, needing understanding of what's happening. Furthermore, some of the uses are of fields which may also be converted to human-formatted time; the nanotime clock shouldn't be used that way.

@SparkQA
Copy link

SparkQA commented Sep 15, 2015

Test build #42485 has finished for PR 8766 at commit a71af05.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ordering

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's interesting: clearly IntelliJ idea makes ordering decisions within an import line than between them. Fixed to alpha order —and something to look out for in future.

@vanzin
Copy link
Contributor

vanzin commented Sep 15, 2015

Mostly style nits, but there seems to be a real bug in there. Otherwise, looks good.

…is patch adds a new subclass, MonotonicClock, and switches ExecutorAllocationManager to using it (it used to use System.nanoTime())
@steveloughran steveloughran force-pushed the stevel/feature/SPARK-10614-monotonic-time branch from a71af05 to 10ea2f3 Compare September 16, 2015 12:30
@SparkQA
Copy link

SparkQA commented Sep 16, 2015

Test build #42535 has finished for PR 8766 at commit 10ea2f3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@steveloughran
Copy link
Contributor Author

test failure is org.apache.spark.streaming.CheckpointSuite.recovery with invertible reduceByKeyAndWindow operation; appears unrelated.

@vanzin
Copy link
Contributor

vanzin commented Sep 17, 2015

retest this please

@steveloughran
Copy link
Contributor Author

Marcelo, I'm not sure I trust nanoTime any more: I did some reading on the topic and don't think code can be confident it works (i.e. is monotonic and consistent) on multi-socket systems or VMs:

http://steveloughran.blogspot.co.uk/2015/09/time-on-multi-core-multi-socket-servers.html

if you agree, there's no regression in the code, and we could close this as a WONTFIX.

@vanzin
Copy link
Contributor

vanzin commented Sep 17, 2015

This is what my clock_gettime(3) man page says:

Historical note for SMP systems
   Before Linux added kernel support for CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID, glibc implemented these clocks  on  many  plat‐
   forms  using  timer registers from the CPUs (TSC on i386, AR.ITC on Itanium).  These registers may differ between CPUs and as a consequence
   these clocks may return bogus results if a process is migrated to another CPU.

   If the CPUs in an SMP system have different clock sources, then there is no way to maintain a correlation between the timer registers since
   each  CPU  will run at a slightly different frequency.  If that is the case, then clock_getcpuclockid(0) will return ENOENT to signify this
   condition.  The two clocks will then be useful only if it can be ensured that a process stays on a certain CPU.

   The processors in an SMP system do not start all at exactly the same time and therefore the timer registers are  typically  running  at  an
   offset.   Some architectures include code that attempts to limit these offsets on bootup.  However, the code cannot guarantee to accurately
   tune the offsets.  Glibc contains no provisions to deal with these offsets (unlike the Linux Kernel).  Typically these  offsets  are  small
   and therefore the effects may be negligible in most cases.

   Since  glibc 2.4, the wrapper functions for the system calls described in this page avoid the abovementioned problems by employing the ker‐
   nel implementation of CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID, on systems that provide such  an  implementation  (i.e.,  Linux
   2.6.12 and later).

That, and some comments on the links that you posted, lead me to believe that System.nanoTime is, in the worst case, no worse than System.currentTimeMillis. But, caveat emptor, since the above man page only applies to Linux. java.util.concurrent seems to use nanoTime, so we already rely on it more than we might expect.

I haven't read the code in ExecutorAllocationManager in detail to see how it would behave when System.currentTimeMillis goes backwards. It seems to me that at least that case would not happen with nanoTime, assuming the conditions listed in the man page exist.

@SparkQA
Copy link

SparkQA commented Sep 17, 2015

Test build #42610 has finished for PR 8766 at commit 10ea2f3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

Actually, why introduce another clock here, why not just change SystemClock to use System.nanoTime? I think System.currentTimeMillis is known to be not monotonic, and usually switching to nanoTime fixes it. As @vanzin mentioned it's no worse than before, so I think we might as well just do it.

@steveloughran
Copy link
Contributor Author

If people are happy with that, I'll do that

@andrewor14
Copy link
Contributor

Yes let's just do that. There's no point in introducing another system clock and call it something else. Let's just fix the existing one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants