Issue 6989 - create a better string representation of Tid #2482

ghost · 2014-09-02T19:14:59Z

Original discussion here: #1582

The basic premise is that we will let Tid's expose the Thread that is receiving messages, without actually exposing the MessageBox itself. There exists a short window after spawn()/spawnLinked() where Tid.receiveThread is null (the thread hasn't yet set it in _spawn()).

I am very new to the Phobos codebase, my apologies if I am grossly breaking the process. Two areas I know this change is weak and needs help:

Tid.toString() should be changed to not allocate.
A proper unit test. I've got a quick tester program below that seems to work for me.

import core.thread;
import std.concurrency;
import std.stdio;

void spawnTest(int threadNum) {
    stdout.writefln("%d Thread.toHash(): 0x%x Tid: %s", threadNum, Thread.getThis.toHash(), thisTid);
}

void main(string [] args) {
    stdout.writefln("Current thread: 0x%x", Thread.getThis.toHash());
    Thread [] threads;
    for (auto i = 0; i < 100; i++) {
    auto child = spawn(&spawnTest, i);
    stdout.writefln("[main] Tid: %s", child);
    while (child.receiveThread is null) {
        Thread.sleep(dur!("msecs")(1));
    }
    threads ~= child.receiveThread;
    }
    foreach (t; threads) {
    t.join(false);
    }
    stdout.writeln("Done");
}

quickfur · 2014-09-02T20:42:56Z

std/concurrency.d

Don't use this overload if you want to avoid the GC; use this instead:

void toString(scope void delegate(const(char)[]) sink)

Instead of constructing a string, use sink.put(...) to append data to the output. If you're using formattedWrite, you can pass sink in place of the writer.

quickfur · 2014-09-02T20:45:24Z

Also, please follow the Phobos coding style guide for all code submissions: http://dlang.org/dstyle.html

Specifically, don't use "Egyptian braces"; instead, put the opening { on its own line.

schuetzm · 2014-09-03T19:33:13Z

std/concurrency.d

Please use 4 spaces instead of tabs. (Same elsewhere.)

quickfur · 2014-09-04T17:24:29Z

As far as a unittest is concerned, you can probably just compare the result of format("%s", tid) with the result of formatting the current thread's ID with the known format of the output. Something along these lines:

// Note: untested pseudocode
unittest
{
    auto tid = ... /* get current thread's Tid */;
    assert(format("%s", tid) == format("Tid(0x%x)", ... /* current thread ID */));
}

Note that unittests have access to private members within the module, so you don't have to worry about adding accessors to obtain the information you need to generate the expected formatting of Tid.

quickfur · 2014-09-04T17:26:12Z

Oh, and the unittest should probably also test for the Tid(NOT STARTED) case as well.

mihails-strasuns · 2014-09-06T17:13:55Z

There is one major issue with the concept of this PR : Tid is not supposed to always represent thread ID. Once #1910 is merged it can also be a fiber and overall idea is that it can be anything - even a process ID or remote server endpoint ID. So while some sort of identity string representation is necessary it can't attached to thread identity.

DmitryOlshansky · 2014-09-17T16:44:46Z

A trivial way out of this problem is to print TId with some sort of schema a-la URIs.

thread:<thread-id>
fiber:<fiber-ref>

These could be followed by @hostname to identify remote processes. I'm not proposing the above as is but surely something simple, sensible, and general enough can be worked out.

@complexmath What do you think?

complexmath · 2014-09-17T19:35:58Z

std/concurrency.d

Doesn't importing std.format implicitly pull in the better part of Phobos? Can we not do this?

Yes, please move the import to function body

complexmath · 2014-09-17T19:50:07Z

The big question for me is whether we want to try and have this identifier remain unique for the entire length of the program, or if we can accept them only being unique for as long as a thread is alive. Like it would be easy enough to use the address of the MessageBox plus some adornment in the latter case.

The other question I have is whether this is just for logging or if we want these to actually be passed around and used as a means to send a message to a thread. Erlang works this way, for example, but I think it tends to be problematic, particularly for long-running programs.

Ultimately, what I'd like is for threads that are intended to be referenced by name to be done via a registry. Then what we're really doing is referencing a thread that we know will handle some specific request and we don't actually care if it's the same thread we talked to last time or if it's a different one. I created register() for this purpose, and when std.concurrency supports IPC there will be a registry host process to serve the same function. What such a registry would return is more a bunch of info for how to contact a thread rather than a pretty identifier, as the needed info might include an IP address, port, protocol, registered thread name, etc.

So... for plain old in-process thread identifiers, I think we need to establish just what the use case is. I'd prefer to go the easy route and use something like the MessageBox address, but could probably be convinced to create an atomic integer so every spawned thread gets its own "unique" id, though efficiently dealing will rollover in this case is tricky.

schuetzm · 2014-09-18T08:34:52Z

A registry is the way to go for named tasks, I agree. There is already std.concurrency.register() for that purpose, although it might not be sufficient when remote tasks are involved, don't know. If someone want's to pass around a task reference, they should just use the Tid.

DmitryOlshansky · 2014-09-18T13:14:11Z

std/concurrency.d

AFAIK hash of a pointer is not unique nor meaningful.

mihails-strasuns · 2014-09-18T14:07:22Z

I think non-unique thread/fiber ID is good enough initially. Right now it is most useful for debugging purposes and introduction of cross-process messages is likely to warrant more complicated naming scheme anyway. Let's do KISS for now.

yebblies · 2014-09-18T15:02:59Z

How about document that the exact contents of the string may change and it is only for debugging purposes?

dnadlinger · 2014-09-18T18:00:57Z

@yebblies: Yep, we definitely need something along the lines of that anyway.

mihails-strasuns · 2014-09-19T08:18:29Z

Absolutely. It still needs to be practically useful though :)

mihails-strasuns · 2014-10-05T16:32:09Z

This has popped up recently again in context of logging.
@complexmath do you have any opinion about most pragmaitical approach to implement right now? I guess you are the person in position to decide here.

mihails-strasuns · 2014-10-11T03:18:41Z

@klamonte do you have time to move forward with this?

DmitryOlshansky · 2014-10-13T22:07:18Z

@klamonte please avoid merge commits - 55 changed files is certainly misrepresentation of this pull's changes.

And a hint - github doesn't notify contributors of any changes to commits of a pull, only comments.
So don't hesitate to ping reviewers after updating the pull ;)

ghost · 2014-10-14T01:06:46Z

@DmitryOlshansky Sorry about that, thanks for the tips! Is there anything I can do at this stage to reduce the clutter from the upstream merge?

@Dicebot How does this look? (Sorry about the clutter from the upstream merge.)

mihails-strasuns · 2014-10-14T02:47:59Z

@klamonte please do this to get rid of merge commit:

# if don't have upstream set up already:
# git remote add upstream https://github.com/D-Programming-Language/phobos.git
git fetch upstream
git rebase upstream/master
git push -f origin master

Also I recommend to create dedicated branches for pull requests if you ever will want to create another one ;)

ghost · 2014-10-14T09:54:51Z

@DmitryOlshansky @Dicebot How does this look?

mihails-strasuns · 2014-10-15T03:21:09Z

Thanks, commit history is much better.

On actual changes - right now I am convinced that tracking actual thread is impractical and some sort of message box pointer representation is good enough. Ideally this needs @complexmath judgement call but he seems a bit busy lately.

DmitryOlshansky · 2014-10-19T19:10:51Z

On actual changes - right now I am convinced that tracking actual thread is impractical and some sort of message box pointer representation is good enough. Ideally this needs @complexmath judgement call but he seems a bit busy lately.

Good idea.

@klamonte - one way to make sure history is clutter free is to always use:

pull --rebase upstream master

before pushing new changes.

That would however require forced push (push -f) that simply overwrites history on remote.
So be careful about what you force push if you work from many devices ;)

complexmath · 2014-10-20T18:09:13Z

I'm in favor of using the MessageBox pointer. It's already there and is guaranteed to be unique for the lifetime of a thread. Just format it in a way that makes sense, say a hex number without the leading "0X". Also, do we need a toString call? It would be nice if we could simply return the id as a size_t or something and let the caller format it.

ghost · 2014-10-30T22:20:35Z

@complexmath Doh, I used its hash rather than address, but is this closer to what you have in mind?

I'd be fine with letting the caller have an id in lieu of a string, but would also like formattedWrite(writer, "%s", someTid) to emit that id. Can that be done with just a change in std.concurrency?

DmitryOlshansky · 2014-10-31T19:43:32Z

@klamonte Hash is NOT unique. Just use the pointer.

Also formattedWrite(writer, "%s", someTid) should pick up toString of Tid.

mihails-strasuns · 2014-11-02T13:52:06Z

I was thinking about possibility of using some simple perfect hashing for a pointer to ensure that no one will be tempted to do cast(MessageBox*) tid - is it worth the headache?

complexmath · 2014-11-02T17:32:21Z

The hash of a pointer is probably just the pointer itself anyway. Either way, there's no point in the extra computation.

@Dicebot I don't think so. The id we return is just an identifier. If someone exploits knowledge of where this identifier comes from it's their problem if their code breaks or behaves erratically, say if we change how the id is computed.

DmitryOlshansky · 2014-11-03T20:24:10Z

Well XOR-masking a pointer or subtracting some big value is a 1:1 mapping that's cheap and would undermine blind use of casts.

mihails-strasuns · 2014-11-04T14:51:46Z

Unrelated tester failure?

DmitryOlshansky · 2014-11-27T02:53:09Z

Seems to fell out of sight. Just rebase and let's pull it?

quickfur · 2014-11-27T18:28:03Z

I agree, let's pull this.

quickfur · 2014-11-28T19:10:27Z

Superseded by: #2772

ghost mentioned this pull request Sep 2, 2014

Issue 6989 - Create a better string representation of Tid's for easier thread identification and debugging. #1582

Closed

quickfur reviewed Sep 2, 2014
View reviewed changes

schuetzm reviewed Sep 3, 2014
View reviewed changes

std/concurrency.d Outdated

Copy link

Contributor

schuetzm Sep 3, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use 4 spaces instead of tabs. (Same elsewhere.)

complexmath reviewed Sep 17, 2014
View reviewed changes

DmitryOlshansky reviewed Sep 18, 2014
View reviewed changes

std/concurrency.d Outdated

Copy link

Member

DmitryOlshansky Sep 18, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK hash of a pointer is not unique nor meaningful.

Kevin Lamonte added 3 commits October 13, 2014 22:56

#6989 - attempt 1 on Tid.toString()

d68e4da

code formatting, use sink

dc863b3

update docs, style, use thread:id scheme

a6f503e

use MessageBox hash, remove running thread

482608c

use mbox address

bf51ee8

moved import

b0edb88

quickfur mentioned this pull request Nov 28, 2014

Rebase of PR 2482: Issue 6989 - create a better string representation of Tid #2772

Merged

quickfur closed this Nov 28, 2014

Uh oh!

Issue 6989 - create a better string representation of Tid #2482

Issue 6989 - create a better string representation of Tid #2482

Uh oh!

Conversation

ghost commented Sep 2, 2014

Uh oh!

quickfur Sep 2, 2014

Choose a reason for hiding this comment

Uh oh!

quickfur commented Sep 2, 2014

Uh oh!

schuetzm Sep 3, 2014

Choose a reason for hiding this comment

Uh oh!

quickfur commented Sep 4, 2014

Uh oh!

quickfur commented Sep 4, 2014

Uh oh!

mihails-strasuns commented Sep 6, 2014

Uh oh!

DmitryOlshansky commented Sep 17, 2014

Uh oh!

complexmath Sep 17, 2014

Choose a reason for hiding this comment

Uh oh!

mihails-strasuns Nov 4, 2014

Choose a reason for hiding this comment

Uh oh!

complexmath commented Sep 17, 2014

Uh oh!

schuetzm commented Sep 18, 2014

Uh oh!

DmitryOlshansky Sep 18, 2014

Choose a reason for hiding this comment

Uh oh!

mihails-strasuns commented Sep 18, 2014

Uh oh!

yebblies commented Sep 18, 2014

Uh oh!

dnadlinger commented Sep 18, 2014

Uh oh!

mihails-strasuns commented Sep 19, 2014

Uh oh!

mihails-strasuns commented Oct 5, 2014

Uh oh!

mihails-strasuns commented Oct 11, 2014

Uh oh!

DmitryOlshansky commented Oct 13, 2014

Uh oh!

ghost commented Oct 14, 2014

Uh oh!

mihails-strasuns commented Oct 14, 2014

Uh oh!

ghost commented Oct 14, 2014

Uh oh!

mihails-strasuns commented Oct 15, 2014

Uh oh!

DmitryOlshansky commented Oct 19, 2014

Uh oh!

complexmath commented Oct 20, 2014

Uh oh!

ghost commented Oct 30, 2014

Uh oh!

DmitryOlshansky commented Oct 31, 2014

Uh oh!

mihails-strasuns commented Nov 2, 2014

Uh oh!

complexmath commented Nov 2, 2014

Uh oh!

DmitryOlshansky commented Nov 3, 2014

Uh oh!

mihails-strasuns commented Nov 4, 2014

Uh oh!

DmitryOlshansky commented Nov 27, 2014

Uh oh!

quickfur commented Nov 27, 2014

Uh oh!

quickfur commented Nov 28, 2014

Uh oh!