Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revival of DCache, now with SHA1 as hash function + dependencies #7843

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

DmitryOlshansky
Copy link
Member

@DmitryOlshansky DmitryOlshansky commented Feb 5, 2018

Getting closer to building Phobos with DCache enabled

Reboot of #7239 moved here because the original discussion got super long.

Getting closer to building Phobos with DCache
@dlang-bot
Copy link
Contributor

Thanks for your pull request, @DmitryOlshansky!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

import core.stdc.string;
import core.atomic;

version(Posix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows has memory-mapped files, too.

/// Load the cache identified by 'keyspace'
void initialize(const(char)[] keyspace)
{
char[512] buf;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= void;

void initialize(const(char)[] keyspace)
{
char[512] buf;
const(char)* home = getenv("HOME");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what all the following string manipulation and file calls are doing without very carefully reading the code. Please add a block comment.

struct DCache
{
/// Load the cache identified by 'keyspace'
void initialize(const(char)[] keyspace)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global comment: Please add Ddoc headers for all functions.

}
}

/// Clean the cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "cleaning" the cache mean? Deleting it? Filling it with 0's?

}
static assert(TocEntry.sizeof == 8);

static struct Data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data is very generic. What sort of data is it?

}
}

bool put(const(ubyte)[] key, const(ubyte)[] value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ddoc comment needed. What the function does, what the parameters are, what the return value means.

{
uint h = first32bits(key);
size_t offset = h & (toc.length-1);
uint itemSize = cast(uint)(Data.sizeof + key.length + value.length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any possibility of integer overflow or truncation?

bool put(const(ubyte)[] key, const(ubyte)[] value)
{
uint h = first32bits(key);
size_t offset = h & (toc.length-1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this means toc.length must be a power of 2. I don't see where this is documented.

lock();
scope(exit) unlock();
for (;;)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are lots of hashing algorithms. It'd be nice if there was an explanation of which one this is, maybe with a URL to an explanation if it's a standard one.

shared(pid_t)* owner; // owner of a lock, points to shared memory
uint* nextAlloc; // last allocated block to avoid linear scan
pid_t ourPid; // pid of this process
TocEntry[] toc; // table of contents, fixed-size open-addressing hash map
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should document that the length must be a power of 2.

*owner = 0;
}

Data* dataEnd(){ return cast(Data*)(cast(void*)data + DATA_SIZE); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigate using attributes. This one can be pure, for example.

import dmd.dclass;
import dmd.declaration;
import dmd.dmodule;
import dmd.doc;
import dmd.dimport;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this being imported?

if (ti.aliasdecl)
buf.writestring(ti.aliasdecl.toChars());
else
buf.writestring(ti.name.toChars());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like part of another PR?

@@ -1543,6 +1546,10 @@ private bool parseCommandLine(const ref Strings arguments, const size_t argc, re
else if (p[4])
goto Lerror;
}
else if (strcmp(p + 1, cast(char*)"cache=mmap") == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't use strcmp in mars.d anymore.

{
foreach (id; *global.versionids)
{
const(char)* s = id.toChars();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use id.toString() and don't use strlen

bool cacheable = (istate == null || istate.caller == null)
&& !thisarg // if have this then can mutate the object as "side-effect"
&& !fd.isNogc() // compiler bug - should not consider GC in CTFE closures
&& strncmp(fd.ident.toChars(), "__", 2) != 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strncmp, strlen, strcmp, etc., must die.

@@ -836,6 +1012,53 @@ private Expression interpret(FuncDeclaration fd, InterState* istate, Expressions
}
eargs[i] = earg;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following large chunks of code should be factored out into separate functions.

@@ -836,6 +1012,53 @@ private Expression interpret(FuncDeclaration fd, InterState* istate, Expressions
}
eargs[i] = earg;
}

// Skip generated symbols and only permit caching of top-level CTFE calls
bool cacheable = (istate == null || istate.caller == null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about having cacheable be off if global.params.cache is off? I.e. why test both repeatedly in subsequent code?

@ibuclaw
Copy link
Member

ibuclaw commented Feb 10, 2018

As a small security detail, shouldn't you be using SHA2?

@MartinNowak
Copy link
Member

As a small security detail, shouldn't you be using SHA2?

Indeed would make sense, given that such a cache could be used on sites like run.dlang.io.

@MartinNowak
Copy link
Member

MartinNowak commented Feb 23, 2018

Just making a list so we can collaborate a bit on this.
TODO:

  • fix visibility issues when reifying cached literals (@MartinNowak)
  • switch to SHA2 (no SSE accelerated SHA2 for now, SHA1 attacks on valid code would be tricky and we can harden the hash once we have a suiteable replacement)
  • AST hashing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants