Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange code generated from streamAsVector/streams. #9

Open
pjonsson opened this issue Oct 29, 2012 · 7 comments
Open

Strange code generated from streamAsVector/streams. #9

pjonsson opened this issue Oct 29, 2012 · 7 comments
Labels

Comments

@pjonsson
Copy link
Member

This is a distilled test case that at least shows some inefficient code being generated. Putting it here so that it doesn't get lost.

cexp :: Vector1 WordN -> Vector1 WordN -> Vector (Vector1 WordN)
cexp n = streamAsVector (scan (\p k -> p) n)

I get the generated code:

    uint32_t v43;
    struct array e0 = {0};
    uint32_t e1;
    struct array e2 = {0};
    int e3;

    v43 = getLength(v1);
    initArray(&e0, (0 - sizeof(struct array)), v43);
    e1 = 0;
    initArray(&e2, sizeof(uint32_t), getLength(v0));
    copyArray(&e2, v0);
    for(uint32_t v34 = 0; v34 < v43; v34 += 1)
    {
        e1 = (e1 + 1);
        initArray(&e2, sizeof(uint32_t), getLength(&e2));
        copyArray(&e2, &e2);
        initArray(&at(struct array,&e0,v34), sizeof(uint32_t), getLength(&e2));
        copyArray(&at(struct array,&e0,v34), &e2);
    }
    initArray(out, (0 - sizeof(struct array)), getLength(&e0));
    copyArray(out, &e0);
    freeArray(&e0);
    freeArray(&e2);

e2 is initialized and populated just before the for-loop. Inside the for-loop e2 is re-initialized and copied onto itself. The generated code might be correct, but should in that case not init and copy arrays to themselves. I have done some preliminary tracing and the generated program structure is immediately visible in the output from fromCore.

@emwap
Copy link
Member

emwap commented Oct 30, 2012

Thank you. Using arrays as the state in a loop is a known deficiency. The copying of e2 onto itself comes from scan (\p k -> p). I guess we could optimize that...

I assume that \p k -> p is something more complicated in your code. If not, the code for cexp can be expressed as follows:

cexp2 :: Vector1 WordN -> Vector1 WordN -> Vector (Vector1 WordN)
cexp2 v1 v2 = replicate (length v2) v1

@pjonsson
Copy link
Member Author

Known deficiency as "does not work" or "is slow"?

My code is more complicated; here's an excerpt from a larger testcase:

initArray(&e3, sizeof(int32_t), 8);
at(int32_t,&e3,0) = -32768;
...
at(int32_t,&e3,7) = -32768;
for(uint32_t v23 = 0; v23 < v35; v23 += 1)
{
e2 = (e2 + 1);
initArray(&e3, sizeof(int32_t), 8);
for(uint32_t v29 = 0; v29 < 8; v29 += 1)
{
at(int32_t,&e3,v29) = at(int32_t,&e3,at(uint32_t,&v36,v29));
}
initArray(&at(struct array,&e1,v23), sizeof(int32_t), getLength(&e3));
copyArray(&at(struct array,&e1,v23), &e3);
}

So all elements of e3 are initialized to -32768 before the loop. First thing that happens inside the loop is a call to initArray() which might (?) allocate new memory for e3. After that there's a small loop that both read and write elements in e3.

It might work for this test case as well, but is this safe in general? My concern is that we might read uninitialized memory in the first round of the loop if we are unlucky.

@emwap
Copy link
Member

emwap commented Oct 30, 2012

Known deficiency as "does not work" or "is slow"?

Is slow. The code should be correct but it does too much copying.
My code is more complicated; here's an excerpt from a larger testcase:

initArray(&e3, sizeof(int32_t), 8);
at(int32_t,&e3,0) = -32768;
...
at(int32_t,&e3,7) = -32768;
for(uint32_t v23 = 0; v23 < v35; v23 += 1)
{
e2 = (e2 + 1);
initArray(&e3, sizeof(int32_t), 8);
for(uint32_t v29 = 0; v29 < 8; v29 += 1)
{
at(int32_t,&e3,v29) = at(int32_t,&e3,at(uint32_t,&v36,v29));
}
initArray(&at(struct array,&e1,v23), sizeof(int32_t), getLength(&e3));
copyArray(&at(struct array,&e1,v23), &e3);
}

So all elements of e3 are initialized to -32768 before the loop. First thing that happens inside the loop is a call to initArray() which might (?) allocate new memory for e3. After that there's a small loop that both read and write elements in e3.

It might work for this test case as well, but is this safe in general? My concern is that we might read uninitialized memory in the first round of the loop if we are unlucky.

It should be safe since it relies on the invariant that initArray will only reallocate the buffer if there is not enough room to hold the data. The generated code will in this case always have the same parameters to initArray and thus there will be no reallocation.

It is of course a bit fragile and I will change the classification of this issue to a bug. We need to remove the extra calls to initArray.

@pjonsson
Copy link
Member Author

I figured out why I'm worried. Here's a test case:

cexp3 :: Vector1 WordN -> Vector1 WordN -> Vector (Vector1 WordN)
cexp3 n = streamAsVector (scan (\p k -> p ++ p) n)

Generated code:

    initArray(&e2, sizeof(uint32_t), getLength(v0));
    copyArray(&e2, v0);
    for(uint32_t v36 = 0; v36 < v46; v36 += 1) {
        e1 = (e1 + 1);
        initArray(&e2, sizeof(uint32_t), (getLength(&e2) + getLength(&e2)));
        copyArray(&e2, &e2);
        copyArrayPos(&e2, getLength(&e2), &e2);
        initArray(&at(struct array,&e0,v36), sizeof(uint32_t), getLength(&e2));
        copyArray(&at(struct array,&e0,v36), &e2);
    }

The problem is definitely there in the output from fromCore.

@emwap
Copy link
Member

emwap commented Oct 31, 2012

That is most certainly a bug. The length of e2 will double for each lap.

@emwap
Copy link
Member

emwap commented Oct 31, 2012

The fix for aliasing in 3046e0b (issue #12), helps for this issue in that the generated code is correct. However, the increased copying really hurts in this example since we are copying arrays.

I will leave this issue open until the extra copying has been removed.

    uint32_t v46;
    struct array v32 = {0};
    uint32_t v33;
    struct array v35 = {0};
    uint32_t v37;
    struct array v39 = {0};

    v46 = getLength(v1);
    initArray(&v32, (0 - sizeof(struct array)), v46);
    v33 = 0;
    initArray(&v35, sizeof(uint32_t), getLength(v0));
    copyArray(&v35, v0);
    for(uint32_t v36 = 0; v36 < v46; v36 += 1)
    {
        v37 = v33;
        v33 = (v37 + 1);
        initArray(&v39, sizeof(uint32_t), getLength(&v35));
        copyArray(&v39, &v35);
        initArray(&v35, sizeof(uint32_t), (getLength(&v39) + getLength(&v39)));
        copyArray(&v35, &v39);
        copyArrayPos(&v35, getLength(&v39), &v39);
        initArray(&at(struct array,&v32,v36), sizeof(uint32_t), getLength(&v39));
        copyArray(&at(struct array,&v32,v36), &v39);
    }

@pjonsson
Copy link
Member Author

pjonsson commented Nov 1, 2012

The code given to feldspar-compiler for cexp7 looks like this:

(bind (getRef var10) (\var12 -> (then (setRef var10 (var12 + 1)) (setMArr var9 var11 (var0 ! var12)))))

So there's a sequence of evaluation order imposed upon us. Looking at the generated code in the previous comment: why do we have to update v33 at the beginning of the loop? If that update was performed at the end of the loop things would look much better. I tried wiggling with the Stream library implementation but I seem unable to change the order of the array update and the setRef.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants