Skip to content

Conversation

ghost
Copy link

@ghost ghost commented Sep 2, 2014

Original discussion here: #1582

The basic premise is that we will let Tid's expose the Thread that is receiving messages, without actually exposing the MessageBox itself. There exists a short window after spawn()/spawnLinked() where Tid.receiveThread is null (the thread hasn't yet set it in _spawn()).

I am very new to the Phobos codebase, my apologies if I am grossly breaking the process. Two areas I know this change is weak and needs help:

  • Tid.toString() should be changed to not allocate.
  • A proper unit test. I've got a quick tester program below that seems to work for me.
import core.thread;
import std.concurrency;
import std.stdio;

void spawnTest(int threadNum) {
    stdout.writefln("%d Thread.toHash(): 0x%x Tid: %s", threadNum, Thread.getThis.toHash(), thisTid);
}

void main(string [] args) {
    stdout.writefln("Current thread: 0x%x", Thread.getThis.toHash());
    Thread [] threads;
    for (auto i = 0; i < 100; i++) {
    auto child = spawn(&spawnTest, i);
    stdout.writefln("[main] Tid: %s", child);
    while (child.receiveThread is null) {
        Thread.sleep(dur!("msecs")(1));
    }
    threads ~= child.receiveThread;
    }
    foreach (t; threads) {
    t.join(false);
    }
    stdout.writeln("Done");
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use this overload if you want to avoid the GC; use this instead:

void toString(scope void delegate(const(char)[]) sink)

Instead of constructing a string, use sink.put(...) to append data to the output. If you're using formattedWrite, you can pass sink in place of the writer.

@quickfur
Copy link
Member

quickfur commented Sep 2, 2014

Also, please follow the Phobos coding style guide for all code submissions: http://dlang.org/dstyle.html

Specifically, don't use "Egyptian braces"; instead, put the opening { on its own line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use 4 spaces instead of tabs. (Same elsewhere.)

@quickfur
Copy link
Member

quickfur commented Sep 4, 2014

As far as a unittest is concerned, you can probably just compare the result of format("%s", tid) with the result of formatting the current thread's ID with the known format of the output. Something along these lines:

// Note: untested pseudocode
unittest
{
    auto tid = ... /* get current thread's Tid */;
    assert(format("%s", tid) == format("Tid(0x%x)", ... /* current thread ID */));
}

Note that unittests have access to private members within the module, so you don't have to worry about adding accessors to obtain the information you need to generate the expected formatting of Tid.

@quickfur
Copy link
Member

quickfur commented Sep 4, 2014

Oh, and the unittest should probably also test for the Tid(NOT STARTED) case as well.

@mihails-strasuns
Copy link

There is one major issue with the concept of this PR : Tid is not supposed to always represent thread ID. Once #1910 is merged it can also be a fiber and overall idea is that it can be anything - even a process ID or remote server endpoint ID. So while some sort of identity string representation is necessary it can't attached to thread identity.

@DmitryOlshansky
Copy link
Member

A trivial way out of this problem is to print TId with some sort of schema a-la URIs.

thread:<thread-id>
fiber:<fiber-ref>

These could be followed by @hostname to identify remote processes. I'm not proposing the above as is but surely something simple, sensible, and general enough can be worked out.

@complexmath What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't importing std.format implicitly pull in the better part of Phobos? Can we not do this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please move the import to function body

@complexmath
Copy link
Contributor

The big question for me is whether we want to try and have this identifier remain unique for the entire length of the program, or if we can accept them only being unique for as long as a thread is alive. Like it would be easy enough to use the address of the MessageBox plus some adornment in the latter case.

The other question I have is whether this is just for logging or if we want these to actually be passed around and used as a means to send a message to a thread. Erlang works this way, for example, but I think it tends to be problematic, particularly for long-running programs.

Ultimately, what I'd like is for threads that are intended to be referenced by name to be done via a registry. Then what we're really doing is referencing a thread that we know will handle some specific request and we don't actually care if it's the same thread we talked to last time or if it's a different one. I created register() for this purpose, and when std.concurrency supports IPC there will be a registry host process to serve the same function. What such a registry would return is more a bunch of info for how to contact a thread rather than a pretty identifier, as the needed info might include an IP address, port, protocol, registered thread name, etc.

So... for plain old in-process thread identifiers, I think we need to establish just what the use case is. I'd prefer to go the easy route and use something like the MessageBox address, but could probably be convinced to create an atomic integer so every spawned thread gets its own "unique" id, though efficiently dealing will rollover in this case is tricky.

@schuetzm
Copy link
Contributor

A registry is the way to go for named tasks, I agree. There is already std.concurrency.register() for that purpose, although it might not be sufficient when remote tasks are involved, don't know. If someone want's to pass around a task reference, they should just use the Tid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK hash of a pointer is not unique nor meaningful.

@mihails-strasuns
Copy link

I think non-unique thread/fiber ID is good enough initially. Right now it is most useful for debugging purposes and introduction of cross-process messages is likely to warrant more complicated naming scheme anyway. Let's do KISS for now.

@yebblies
Copy link
Contributor

How about document that the exact contents of the string may change and it is only for debugging purposes?

@dnadlinger
Copy link
Contributor

@yebblies: Yep, we definitely need something along the lines of that anyway.

@mihails-strasuns
Copy link

Absolutely. It still needs to be practically useful though :)

@mihails-strasuns
Copy link

This has popped up recently again in context of logging.
@complexmath do you have any opinion about most pragmaitical approach to implement right now? I guess you are the person in position to decide here.

@mihails-strasuns
Copy link

@klamonte do you have time to move forward with this?

@DmitryOlshansky
Copy link
Member

@klamonte please avoid merge commits - 55 changed files is certainly misrepresentation of this pull's changes.

And a hint - github doesn't notify contributors of any changes to commits of a pull, only comments.
So don't hesitate to ping reviewers after updating the pull ;)

@ghost
Copy link
Author

ghost commented Oct 14, 2014

@DmitryOlshansky Sorry about that, thanks for the tips! Is there anything I can do at this stage to reduce the clutter from the upstream merge?

@Dicebot How does this look? (Sorry about the clutter from the upstream merge.)

@mihails-strasuns
Copy link

@klamonte please do this to get rid of merge commit:

# if don't have upstream set up already:
# git remote add upstream https://github.com/D-Programming-Language/phobos.git
git fetch upstream
git rebase upstream/master
git push -f origin master

Also I recommend to create dedicated branches for pull requests if you ever will want to create another one ;)

@ghost
Copy link
Author

ghost commented Oct 14, 2014

@DmitryOlshansky @Dicebot How does this look?

@mihails-strasuns
Copy link

Thanks, commit history is much better.

On actual changes - right now I am convinced that tracking actual thread is impractical and some sort of message box pointer representation is good enough. Ideally this needs @complexmath judgement call but he seems a bit busy lately.

@DmitryOlshansky
Copy link
Member

On actual changes - right now I am convinced that tracking actual thread is impractical and some sort of message box pointer representation is good enough. Ideally this needs @complexmath judgement call but he seems a bit busy lately.

Good idea.

@klamonte - one way to make sure history is clutter free is to always use:

pull --rebase upstream master

before pushing new changes.

That would however require forced push (push -f) that simply overwrites history on remote.
So be careful about what you force push if you work from many devices ;)

@complexmath
Copy link
Contributor

I'm in favor of using the MessageBox pointer. It's already there and is guaranteed to be unique for the lifetime of a thread. Just format it in a way that makes sense, say a hex number without the leading "0X". Also, do we need a toString call? It would be nice if we could simply return the id as a size_t or something and let the caller format it.

@ghost
Copy link
Author

ghost commented Oct 30, 2014

@complexmath Doh, I used its hash rather than address, but is this closer to what you have in mind?

I'd be fine with letting the caller have an id in lieu of a string, but would also like formattedWrite(writer, "%s", someTid) to emit that id. Can that be done with just a change in std.concurrency?

@DmitryOlshansky
Copy link
Member

@klamonte Hash is NOT unique. Just use the pointer.

Also formattedWrite(writer, "%s", someTid) should pick up toString of Tid.

@mihails-strasuns
Copy link

I was thinking about possibility of using some simple perfect hashing for a pointer to ensure that no one will be tempted to do cast(MessageBox*) tid - is it worth the headache?

@complexmath
Copy link
Contributor

The hash of a pointer is probably just the pointer itself anyway. Either way, there's no point in the extra computation.

@Dicebot I don't think so. The id we return is just an identifier. If someone exploits knowledge of where this identifier comes from it's their problem if their code breaks or behaves erratically, say if we change how the id is computed.

@DmitryOlshansky
Copy link
Member

Well XOR-masking a pointer or subtracting some big value is a 1:1 mapping that's cheap and would undermine blind use of casts.

@mihails-strasuns
Copy link

Unrelated tester failure?

@DmitryOlshansky
Copy link
Member

Seems to fell out of sight. Just rebase and let's pull it?

@quickfur
Copy link
Member

I agree, let's pull this.

@quickfur
Copy link
Member

Superseded by: #2772

@quickfur quickfur closed this Nov 28, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants