Skip to content

Conversation

This reverts commit 4f3b3f4.

> This fix is incorrect and dangerous. You've removed a potential
> InvalidMemoryOperationError, which is deadly but easy to detect,
> but retained the possibility of closing the file handle of a
> completely unrelated file, if that file handle has since been
> reused. With this change, the bug will always be silent and much
> harder to find the real cause of.
return;
}
}
errnoEnforce(fd == -1 || fd <= 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enforce still going to InvalidMemoryOperation so maybe just call abort?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wouldn't be an improvement. enforce will have to do. Let's not forget that class destructors are not always invoked by the GC (there's e.g. scoped), so removing enforce would penalize uses that do not use the GC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failure to close a fd which should be open indicates a dangling fd, which is a logic error... I think the response is apt, though a stack trace and better error message would be better, yeah.

@CyberShadow CyberShadow changed the title fix Issue 14868 - MmFile destructor seems to corrupt memory std.mmfile - fix multiple issues Sep 1, 2015
@CyberShadow
Copy link
Member Author

Added fixes for 2 more bugs.

@CyberShadow CyberShadow force-pushed the pull-20150901-074214 branch 2 times, most recently from 0753da3 to 21a007b Compare September 1, 2015 14:50
@DmitryOlshansky
Copy link
Member

@sdegtiarev any new issues? Please post them here.

@CyberShadow
Copy link
Member Author

There is https://issues.dlang.org/show_bug.cgi?id=15013 but it'll take a bit more work. Generally the whole page-size calculation is messed up and has to be reviewed completely. Not blocking this PR.

@DmitryOlshansky
Copy link
Member

Generally the whole page-size calculation is messed up and has to be reviewed completely. Not blocking this PR.

Maybe re-writing the underlying implementation is more sensible, in particular decomposing OS syscall abstraction from higher-level wrapper object. Thoughts?

@CyberShadow
Copy link
Member Author

That wouldn't solve any problems other than cleaning up the code a bit, like any refactoring.

@sdegtiarev
Copy link
Contributor

@CyberShadow

the whole page-size calculation is messed up and has to be reviewed completely

I disagree, the root problem is, std.mmfile reports more memory allocated than is actually available. I believe, an error is to be thrown at this point.
@DmitryOlshansky
I'm planning to post a list of requirements and preconditions for shared memory API, that will hopefully show the direction to move. Give me couple days to finish, I had occasional fire at work last week and it's still smoking, need time to finish.

@sdegtiarev
Copy link
Contributor

I think we all agree that std.perpetual should be built on top of some memory allocator, std.mmfile or similar. To be usable, this allocator class should provide certain guarantees above regular memory allocator, which mostly fall in two categories, concurrent initialization and data mutability.

  • Concurrent initialization issue arises when the file is opened and/or created, in the latter case it is to be initialized, in the first - not, this is the point of persistent data. So we need something like
    @property bool creator() const;
    indicating if the file was created by caller. More, if two or more processes simultaneously try to open the file, at most one can be creator, in my sample impl. I had something like this:
    open(file, O_CREAT|O_EXCL); // creates file or fails
    if(success) _creator=true;
    else { _creator=false; open(file); } // never creates, opens existing file only
    This atomically guarantees that only one process creates and initializes the shared memory. There is also gap between file creation and initialization by master process and it can't be done atomically, I think we can leave it for user of the class, at least for now. More, we also need to know which part of the data was already there if the file was extended:
    @property size_t unplowedData() const;
    returning 0 if the entire file was created, requested size if the file already exists and is big enough, and something in between if the file has been extended. It also would be useful to have following creation options:

    exclusiveCreate // fails if the file exists

    noCreate // fails if the file does not exist

    noExtend // fails if the file is smaller than requested

    exactSize // fails if the file exists, but is different size than requested, even bigger

    multipleOf(type.sizeof) // fails if the file exists, is not extended and its size is not multiple of type

There might be more and not all of them are orthogonal, so they may be reshuffled.

  • Data mutability: std.perpetual tries to conform to requested type, so that
    perpetual!(immutable(type) will be returning reference to immutable, unfortunately std.mmfile provides little help with this. It always returns void[] providing user to cast it to desired type, even if the file was opened in read-only mode. Any attempt to modify such memory will result in segmentation fault, no escape. Providing such raw ambiguous interface is usually regarded as bad thing, especially If used as final user interface. Perhaps, it would be better to provide couple of template cast functions?
    , like:

    type[] get(type)();
    const(type)[] getConst(type)() const;
    It also could make sense to have two sibling classes, for mutable and immutable memory.
    In particular, in std.perpetual I need to distinguish three cases:

    • mutable object mapped on read/write file
    • immutable object mapped on read/write file
    • immutable object mapped on read-only file

Also, I don't especially like the ensureMapped() function in std.mmfile, it silently unmaps and maps again the shared memory when requested index is beyond mapped area. This is nice feature to have, but not in generic class because it violates time guarantee for memory access. While accessing shared memory is as fast as any other variable in process address space, remapping makes it as slow as reading from file, it does not make it true random access data. That makes sense for big data, but not for generic use. It also probably should not be transparent for the user. In short, I would think about moving it to the derived class.

For std.perpetual module I'm going to exclude functions to be placed in shared memory first, then to handle differently mutable and immutable types and then to sort out all these different variants of creation, and finally, to provide arguments forwarding.
I apologize for the long post, it's a roadmap I'm going to follow.
What do you think?

@CyberShadow
Copy link
Member Author

the whole page-size calculation is messed up and has to be reviewed completely

I disagree, the root problem is, std.mmfile reports more memory allocated than is actually available. I believe, an error is to be thrown at this point.

When you have code like:

            size_t initial_map = (window && 2*window<size)
                ? 2*window : cast(size_t)size;

it's clear that the original author didn't really know what he was doing and we need to review all related code entirely.

I think we all agree that std.perpetual should be built on top of some memory allocator, std.mmfile or similar. To be usable, this allocator class should provide certain guarantees above regular memory allocator, which mostly fall in two categories, concurrent initialization and data mutability.

This is beyond the scope of this PR. We should try to fix std.mmfile as much as possible without breaking existing code. New modules are orthogonal to this goal.

I'm not really in a position currently to think about an entire new Phobos module. Perhaps it would be better to post this in a new forum thread instead of a bugfix PR.

This is nice feature to have, but not in generic class because it violates time guarantee for memory access.

Generally I'd think page faults would outweigh OS syscall overhead.

@DmitryOlshansky
Copy link
Member

Focusing on the current bugfixes - anything missing - @sdegtiarev ?

@sdegtiarev
Copy link
Contributor

@CyberShadow

page faults would outweigh OS syscall overhead

These are not just syscalls, but HD file system requests, they are in general thousands times slower and nothing can possible overweight them.

@DmitryOlshansky

current bugfixes - anything missing?

For std.mmfile or std.perpetual?
The first one has only last bus error bug to fix I hope, but as it is now, it's not very suitable to be used in std.perpetual.
The perpetual module has a whole bunch of initialization problems to be solved, not bugs, but rather architectural decisions to be made. Unfortunately, it depends on underlying allocator to be used.
Plus, number of lesser bugs, as eliminating functions and delegates as stored objects and so on. I wrote about it in the last paragraph above.

@CyberShadow
Copy link
Member Author

These are not just syscalls, but HD file system requests, they are in general thousands times slower and nothing can possible overweight them.

What are you talking about? Why would merely mapping a file incur HD access (when file metadata is cached), and how would this possibly outweigh actual pagefaults when accessing non-paged memory? You realize that simply mapping a file does not load it into memory, right?

@sdegtiarev
Copy link
Contributor

@CyberShadow
No, I don't. Mapping existing file into memory means opening it and reading at least one page from HD into memory buffer. Remapping means as minimum reading another page. If the memory is accessed concurrently by different processes it also involves all kinds of interprocess sync. Right?

@CyberShadow
Copy link
Member Author

@sdegtiarev Either we are misunderstanding each other, or you should read up on virtual memory before you continue working on std.mmfile or std.perpetual. Mapping files simply allocates memory addresses for them, no physical memory is used and no I/O is done until those memory locations are accessed, which causes a page fault and the kernel pages in file data on demand.

If the memory is accessed concurrently by different processes it also involves all kinds of interprocess sync.

It is exactly as when multiple threads access the same data within the same process.

@sdegtiarev
Copy link
Contributor

@CyberShadow
I hope, we're talking about different things. Common point: if you need to have ACTUAL data residing on HD, you have to READ it at least once. Anyway, experiment:
` // note, we open eight-pages long file, but limit the window to one page
// (in fact, two pages are read)
auto m=new MmFile("ttt", MmFile.Mode.readWrite, 0x8000, null, 0x1000);
StopWatch s;

s.start();
    // memory access does not need remapping here
foreach(n; 0..N) {
    auto m0=m[0];
    auto m1=m[0x1100]; // address is at second page
}
s.stop();
writeln("no remap: ", s.peek/N);

s.reset();
s.start();
    // here, every read/writ causes unmap()/mmap() pair 
foreach(n; 0..N) {
    auto m0=m[0];
    auto m1=m[0x2100];  // address is at third page, beyond the window
}
s.stop();
writeln("do remap: ", s.peek/N);

`
Output:

` no remap: TickDuration(65) `

` do remap: TickDuration(17126) `

Is that what you mean?

@DmitryOlshansky
Copy link
Member

Okay the discussion goes sideways... Time to merge? @CyberShadow

@CyberShadow
Copy link
Member Author

Is that what you mean?

17126 ticks is not disk access, it's essentially the cost of the syscall.

Time to merge?

Sure why not.

@DmitryOlshansky
Copy link
Member

Auto-merge toggled on

DmitryOlshansky added a commit that referenced this pull request Sep 18, 2015
@DmitryOlshansky DmitryOlshansky merged commit b6dd28d into dlang:master Sep 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants