Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang] Allow inlining of copies without capabilities #506

Conversation

arichardson
Copy link
Member

@arichardson arichardson commented Jan 25, 2021

This pull request allows inlining of memcpy/memmove for structures/types
that are guaranteed to not contain tags if the size is >= cap_size.
The motivating case for this is that we currently call memcpy when
performing structure assignment for struct { long a; long b;} rather than
using two loads+stores.

The backends still conservatively assume that a copy may contain capabilities
if sizeof(void*), but at least we can now inline copies that are explicitly marked
as non-tag-preserving.
We do this by adding a new no_preserve_cheri_tags attribute. If
neither no_preserve_cheri_tags nor must_preserve_cheri_tags is set
we still fall back to the conservative behaviour of calling memcpy for
size>cap_size && align < cap_align.

Due to C's effective type rules, we have to be careful when setting the
attribute and only perform the type-base tag-preservation analysis if we
know the effective type. For example, marking a memcpy() to/from long*
as not tag-preserving could result in tag stripping for code that uses
type casts. Such code is correct even under strict aliasing rules since
the first store to a memory location determines the type. Example from
#506:

void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}

Despite the memcpy() argument being a long* (and therefore intuitively
not tag preserving), we can't add the attribute since we don't actually
know the type of the underlying object (malloc creates an allocated with
no declared type). From C99:

The effective type of an object for an access to its stored value is the
declared type of the object, if any (footnote 75: Allocated objects have
no declared type).

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the
lvalue becomes the effective type of the object for that access and for
subsequent accesses that do not modify the stored value.

If a value is copied into an object having no declared type using memcpy
or memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one.

For all other accesses to an object having no declared type, the effective
type of the object is simply the type of the lvalue used for the access.

Copy link
Member

@jrtc27 jrtc27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with:

void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}

I believe that's valid C, provided 2 * sizeof(long) is a multiple of void (**)(long **, long **).

clang/lib/CodeGen/CGBuiltin.cpp Outdated Show resolved Hide resolved
clang/lib/CodeGen/CodeGenTypes.cpp Outdated Show resolved Hide resolved
@jrtc27
Copy link
Member

jrtc27 commented Jan 25, 2021

What happens with:

void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}

I believe that's valid C, provided 2 * sizeof(long) is a multiple of void (**)(long **, long **).

Imagine it as being that you have a pointer to a header and want to copy the header plus some amount of the body, where the body contains capabilities.

arichardson added a commit to arichardson/llvm-project that referenced this pull request Mar 10, 2021
…iasing

Marking a memcpy() to/from a long* as not tag-preserving could result in
tag stripping for code that assumes relaxed aliasing.

Example from CTSRD-CHERI#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Adding the no_preserve_tags attribute to memcpy() is safe under
-fstrict-aliasing (since the store above is invalid), but with
-fno-strict-aliasing this aligned memcpy must preserve tags.
arichardson added a commit to arichardson/llvm-project that referenced this pull request Mar 10, 2021
…iasing

Marking a memcpy() to/from a long* as not tag-preserving could result in
tag stripping for code that assumes relaxed aliasing.

Example from CTSRD-CHERI#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Adding the no_preserve_tags attribute to memcpy() is safe under
-fstrict-aliasing (since the store above is invalid), but with
-fno-strict-aliasing this aligned memcpy must preserve tags.
@arichardson
Copy link
Member Author

arichardson commented Mar 10, 2021

void (**)(long **, long **)

I believe storing a void (*)(long **, long **) to a variable of type long * is a strict aliasing violation, so in the new version of this PR I've made the optimization conditional on -f(no-)strict-aliasing. Struct assignment is always optimized though.

@arichardson arichardson requested a review from jrtc27 March 10, 2021 21:39
arichardson added a commit to arichardson/llvm-project that referenced this pull request Mar 12, 2021
…type

Marking a memcpy() to/from a long* as not tag-preserving could result in
tag stripping for code that using type casts and is correct under strict
aliasing rules since the first store to a memory location determines the
type. Example from CTSRD-CHERI#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Despite the memcpy being a long* (and therefore intuitevly not tag
preserving), we can't add the attribute since we don't actually know the
type of the underlying object (malloc creates an allocated with no declared
type). From C99:

The effective type of an object for an access to its stored value is the
declared type of the object, if any (footnote 75: Allocated objects have
no declared type).

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the
lvalue becomes the effective type of the object for that access and for
subsequent accesses that do not modify the stored value.

If a value is copied into an object having no declared type using memcpy
or memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one.

For all other accesses to an object having no declared type, the effective
type of the object is simply the type of the lvalue used for the access.
arichardson added a commit to arichardson/llvm-project that referenced this pull request Mar 12, 2021
…type

Marking a memcpy() to/from a long* as not tag-preserving could result in
tag stripping for code that using type casts and is correct under strict
aliasing rules since the first store to a memory location determines the
type. Example from CTSRD-CHERI#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Despite the memcpy being a long* (and therefore intuitevly not tag
preserving), we can't add the attribute since we don't actually know the
type of the underlying object (malloc creates an allocated with no declared
type). From C99:

The effective type of an object for an access to its stored value is the
declared type of the object, if any (footnote 75: Allocated objects have
no declared type).

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the
lvalue becomes the effective type of the object for that access and for
subsequent accesses that do not modify the stored value.

If a value is copied into an object having no declared type using memcpy
or memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one.

For all other accesses to an object having no declared type, the effective
type of the object is simply the type of the lvalue used for the access.
@arichardson
Copy link
Member Author

As discussed earlier this week, my -fstrict-aliasing/-fno-strict-aliasing assumptions were incorrect, so this new version makes decisions based on whether a valid variable declaration is visible and if not assumes it is an "allocated type" where the first store defines the underlying type.

arichardson added a commit to arichardson/llvm-project that referenced this pull request Jul 15, 2021
…type

Marking a memcpy() to/from a long* as not tag-preserving could result in
tag stripping for code that using type casts and is correct under strict
aliasing rules since the first store to a memory location determines the
type. Example from CTSRD-CHERI#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Despite the memcpy being a long* (and therefore intuitevly not tag
preserving), we can't add the attribute since we don't actually know the
type of the underlying object (malloc creates an allocated with no declared
type). From C99:

The effective type of an object for an access to its stored value is the
declared type of the object, if any (footnote 75: Allocated objects have
no declared type).

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the
lvalue becomes the effective type of the object for that access and for
subsequent accesses that do not modify the stored value.

If a value is copied into an object having no declared type using memcpy
or memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one.

For all other accesses to an object having no declared type, the effective
type of the object is simply the type of the lvalue used for the access.
arichardson added a commit to arichardson/llvm-project that referenced this pull request Jul 15, 2021
…type

Marking a memcpy() to/from a long* as not tag-preserving could result in
tag stripping for code that using type casts and is correct under strict
aliasing rules since the first store to a memory location determines the
type. Example from CTSRD-CHERI#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Despite the memcpy being a long* (and therefore intuitevly not tag
preserving), we can't add the attribute since we don't actually know the
type of the underlying object (malloc creates an allocated with no declared
type). From C99:

The effective type of an object for an access to its stored value is the
declared type of the object, if any (footnote 75: Allocated objects have
no declared type).

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the
lvalue becomes the effective type of the object for that access and for
subsequent accesses that do not modify the stored value.

If a value is copied into an object having no declared type using memcpy
or memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one.

For all other accesses to an object having no declared type, the effective
type of the object is simply the type of the lvalue used for the access.
@arichardson
Copy link
Member Author

rebased

arichardson added a commit to arichardson/llvm-project that referenced this pull request Aug 17, 2021
…type

Marking a memcpy() to/from a long* as not tag-preserving could result in
tag stripping for code that using type casts and is correct under strict
aliasing rules since the first store to a memory location determines the
type. Example from CTSRD-CHERI#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Despite the memcpy being a long* (and therefore intuitevly not tag
preserving), we can't add the attribute since we don't actually know the
type of the underlying object (malloc creates an allocated with no declared
type). From C99:

The effective type of an object for an access to its stored value is the
declared type of the object, if any (footnote 75: Allocated objects have
no declared type).

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the
lvalue becomes the effective type of the object for that access and for
subsequent accesses that do not modify the stored value.

If a value is copied into an object having no declared type using memcpy
or memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one.

For all other accesses to an object having no declared type, the effective
type of the object is simply the type of the lvalue used for the access.
@arichardson
Copy link
Member Author

rebased

arichardson added a commit to arichardson/llvm-project that referenced this pull request Oct 4, 2021
…type

Marking a memcpy() to/from a long* as not tag-preserving could result in
tag stripping for code that using type casts and is correct under strict
aliasing rules since the first store to a memory location determines the
type. Example from CTSRD-CHERI#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Despite the memcpy being a long* (and therefore intuitevly not tag
preserving), we can't add the attribute since we don't actually know the
type of the underlying object (malloc creates an allocated with no declared
type). From C99:

The effective type of an object for an access to its stored value is the
declared type of the object, if any (footnote 75: Allocated objects have
no declared type).

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the
lvalue becomes the effective type of the object for that access and for
subsequent accesses that do not modify the stored value.

If a value is copied into an object having no declared type using memcpy
or memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one.

For all other accesses to an object having no declared type, the effective
type of the object is simply the type of the lvalue used for the access.
@arichardson
Copy link
Member Author

rebased after upstream merge.

arichardson added a commit to arichardson/llvm-project that referenced this pull request Oct 15, 2021
…type

Marking a memcpy() to/from a long* as not tag-preserving could result in
tag stripping for code that using type casts and is correct under strict
aliasing rules since the first store to a memory location determines the
type. Example from CTSRD-CHERI#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Despite the memcpy being a long* (and therefore intuitevly not tag
preserving), we can't add the attribute since we don't actually know the
type of the underlying object (malloc creates an allocated with no declared
type). From C99:

The effective type of an object for an access to its stored value is the
declared type of the object, if any (footnote 75: Allocated objects have
no declared type).

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the
lvalue becomes the effective type of the object for that access and for
subsequent accesses that do not modify the stored value.

If a value is copied into an object having no declared type using memcpy
or memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one.

For all other accesses to an object having no declared type, the effective
type of the object is simply the type of the lvalue used for the access.
arichardson added a commit to arichardson/llvm-project that referenced this pull request Oct 21, 2021
…type

Marking a memcpy() to/from a long* as not tag-preserving could result in
tag stripping for code that using type casts. Such code is correct even
under strict aliasing rules since the first store to a memory location
determines the type. Example from
CTSRD-CHERI#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Despite the memcpy() being a long* (and therefore intuitively not tag
preserving), we can't add the attribute since we don't actually know the
type of the underlying object (malloc creates an allocated with no
declared type). From C99:

The effective type of an object for an access to its stored value is the
declared type of the object, if any (footnote 75: Allocated objects have
no declared type).

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the
lvalue becomes the effective type of the object for that access and for
subsequent accesses that do not modify the stored value.

If a value is copied into an object having no declared type using memcpy
or memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one.

For all other accesses to an object having no declared type, the effective
type of the object is simply the type of the lvalue used for the access.
@arichardson arichardson marked this pull request as draft October 21, 2021 13:26
arichardson added a commit to arichardson/llvm-project that referenced this pull request Oct 21, 2021
Once the backends handle the new attribute, this will allow inlining
structure assignments for structs that are at least capability size
but do not contain any capabilities (e.g. struct { long a; long b; }).
We can also set the attribute for all trivial auto var-init cases since
those patterns never contain valid capabilities.

Due to C's effective type rules, we have to be careful when setting the
attribute and only perform the type-base tag-preservation analysis if we
know the effective type. For example, marking a memcpy() to/from `long*`
as not tag-preserving could result in tag stripping for code that uses
type casts. Such code is correct even under strict aliasing rules since
the first store to a memory location determines the type. Example from
CTSRD-CHERI#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Despite the memcpy() argument being a long* (and therefore intuitively
not tag preserving), we can't add the attribute since we don't actually
know the type of the underlying object (malloc creates an allocated with
no declared type). From C99:
```
The effective type of an object for an access to its stored value is the
declared type of the object, if any (footnote 75: Allocated objects have
no declared type).

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the
lvalue becomes the effective type of the object for that access and for
subsequent accesses that do not modify the stored value.

If a value is copied into an object having no declared type using memcpy
or memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one.

For all other accesses to an object having no declared type, the effective
type of the object is simply the type of the lvalue used for the access.
```
arichardson added a commit to arichardson/llvm-project that referenced this pull request Oct 21, 2021
Once the backends handle the new attribute, this will allow inlining
structure assignments for structs that are at least capability size
but do not contain any capabilities (e.g. struct { long a; long b; }).
We can also set the attribute for all trivial auto var-init cases since
those patterns never contain valid capabilities.

Due to C's effective type rules, we have to be careful when setting the
attribute and only perform the type-base tag-preservation analysis if we
know the effective type. For example, marking a memcpy() to/from `long*`
as not tag-preserving could result in tag stripping for code that uses
type casts. Such code is correct even under strict aliasing rules since
the first store to a memory location determines the type. Example from
CTSRD-CHERI#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Despite the memcpy() argument being a long* (and therefore intuitively
not tag preserving), we can't add the attribute since we don't actually
know the type of the underlying object (malloc creates an allocated with
no declared type). From C99:
```
The effective type of an object for an access to its stored value is the
declared type of the object, if any (footnote 75: Allocated objects have
no declared type).

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the
lvalue becomes the effective type of the object for that access and for
subsequent accesses that do not modify the stored value.

If a value is copied into an object having no declared type using memcpy
or memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one.

For all other accesses to an object having no declared type, the effective
type of the object is simply the type of the lvalue used for the access.
```
arichardson added a commit to arichardson/llvm-project that referenced this pull request Oct 21, 2021
Once the backends handle the new attribute, this will allow inlining
structure assignments for structs that are at least capability size
but do not contain any capabilities (e.g. struct { long a; long b; }).
We can also set the attribute for all trivial auto var-init cases since
those patterns never contain valid capabilities.

Due to C's effective type rules, we have to be careful when setting the
attribute and only perform the type-base tag-preservation analysis if we
know the effective type. For example, marking a memcpy() to/from `long*`
as not tag-preserving could result in tag stripping for code that uses
type casts. Such code is correct even under strict aliasing rules since
the first store to a memory location determines the type. Example from
CTSRD-CHERI#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Despite the memcpy() argument being a long* (and therefore intuitively
not tag preserving), we can't add the attribute since we don't actually
know the type of the underlying object (malloc creates an allocated with
no declared type). From C99:
```
The effective type of an object for an access to its stored value is the
declared type of the object, if any (footnote 75: Allocated objects have
no declared type).

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the
lvalue becomes the effective type of the object for that access and for
subsequent accesses that do not modify the stored value.

If a value is copied into an object having no declared type using memcpy
or memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one.

For all other accesses to an object having no declared type, the effective
type of the object is simply the type of the lvalue used for the access.
```
@arichardson arichardson force-pushed the no-preserve-tags-codegen branch 2 times, most recently from 0c5b087 to 2c713b3 Compare August 16, 2022 09:55
@arichardson arichardson force-pushed the no-preserve-tags-codegen branch 2 times, most recently from 1539e37 to 66d47e7 Compare August 26, 2022 10:53
arichardson added a commit to arichardson/llvm-project that referenced this pull request Aug 26, 2022
This allows inlining of structure assignments for structs that are at
least capability size but do not contain any capabilities (e.g.
`struct { long a; long b; }`). We can also set the attribute for all
trivial auto var-init cases since those patterns never contain valid
capabilities.

Due to C's effective type rules, we have to be careful when setting the
attribute and only perform the type-base tag-preservation analysis if we
know the effective type. For example, marking a memcpy() to/from `long*`
as not tag-preserving could result in tag stripping for code that uses
type casts. Such code is correct even under strict aliasing rules since
the first store to a memory location determines the type. Example from
CTSRD-CHERI#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Despite the memcpy() argument being a long* (and therefore intuitively
not tag preserving), we can't add the attribute since we don't actually
know the type of the underlying object (malloc creates an allocated with
no declared type). From C99:
```
The effective type of an object for an access to its stored value is the
declared type of the object, if any (footnote 75: Allocated objects have
no declared type).

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the
lvalue becomes the effective type of the object for that access and for
subsequent accesses that do not modify the stored value.

If a value is copied into an object having no declared type using memcpy
or memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one.

For all other accesses to an object having no declared type, the effective
type of the object is simply the type of the lvalue used for the access.
```
@arichardson arichardson changed the title Allow inlining of copies without capabilities [clang] Allow inlining of copies without capabilities Aug 26, 2022
arichardson added a commit that referenced this pull request Sep 1, 2022
This allows inlining of structure assignments for structs that are at
least capability size but do not contain any capabilities (e.g.
`struct { long a; long b; }`). We can also set the attribute for all
trivial auto var-init cases since those patterns never contain valid
capabilities.

Due to C's effective type rules, we have to be careful when setting the
attribute and only perform the type-base tag-preservation analysis if we
know the effective type. For example, marking a memcpy() to/from `long*`
as not tag-preserving could result in tag stripping for code that uses
type casts. Such code is correct even under strict aliasing rules since
the first store to a memory location determines the type. Example from
#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Despite the memcpy() argument being a long* (and therefore intuitively
not tag preserving), we can't add the attribute since we don't actually
know the type of the underlying object (malloc creates an allocated with
no declared type). From C99:
```
The effective type of an object for an access to its stored value is the
declared type of the object, if any (footnote 75: Allocated objects have
no declared type).

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the
lvalue becomes the effective type of the object for that access and for
subsequent accesses that do not modify the stored value.

If a value is copied into an object having no declared type using memcpy
or memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one.

For all other accesses to an object having no declared type, the effective
type of the object is simply the type of the lvalue used for the access.
```
arichardson added a commit to arichardson/llvm-project that referenced this pull request Sep 2, 2022
This allows inlining of structure assignments for structs that are at
least capability size but do not contain any capabilities (e.g.
`struct { long a; long b; }`). We can also set the attribute for all
trivial auto var-init cases since those patterns never contain valid
capabilities.

Due to C's effective type rules, we have to be careful when setting the
attribute and only perform the type-base tag-preservation analysis if we
know the effective type. For example, marking a memcpy() to/from `long*`
as not tag-preserving could result in tag stripping for code that uses
type casts. Such code is correct even under strict aliasing rules since
the first store to a memory location determines the type. Example from
CTSRD-CHERI#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Despite the memcpy() argument being a long* (and therefore intuitively
not tag preserving), we can't add the attribute since we don't actually
know the type of the underlying object (malloc creates an allocated with
no declared type). From C99:
```
The effective type of an object for an access to its stored value is the
declared type of the object, if any (footnote 75: Allocated objects have
no declared type).

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the
lvalue becomes the effective type of the object for that access and for
subsequent accesses that do not modify the stored value.

If a value is copied into an object having no declared type using memcpy
or memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one.

For all other accesses to an object having no declared type, the effective
type of the object is simply the type of the lvalue used for the access.
```

There is another important caveat: we have to conservatively assume that
the copy affects adjacent data (e.g. C++ subclass fields) that could
hold capabilities if we don't know the copy size. If the copy size is
<= sizeof(T), we can mark copies as non-tag-preserving since it cannot
affect trailing fields (even if we are actually copying a subclass).
These tests highlight some places where we can easily add the
no_preserve_tags attribute to allow inlining small copies.
This allows inlining of structure assignments for structs that are at
least capability size but do not contain any capabilities (e.g.
`struct { long a; long b; }`). We can also set the attribute for all
trivial auto var-init cases since those patterns never contain valid
capabilities.

Due to C's effective type rules, we have to be careful when setting the
attribute and only perform the type-base tag-preservation analysis if we
know the effective type. For example, marking a memcpy() to/from `long*`
as not tag-preserving could result in tag stripping for code that uses
type casts. Such code is correct even under strict aliasing rules since
the first store to a memory location determines the type. Example from
CTSRD-CHERI#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Despite the memcpy() argument being a long* (and therefore intuitively
not tag preserving), we can't add the attribute since we don't actually
know the type of the underlying object (malloc creates an allocated with
no declared type). From C99:
```
The effective type of an object for an access to its stored value is the
declared type of the object, if any (footnote 75: Allocated objects have
no declared type).

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the
lvalue becomes the effective type of the object for that access and for
subsequent accesses that do not modify the stored value.

If a value is copied into an object having no declared type using memcpy
or memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one.

For all other accesses to an object having no declared type, the effective
type of the object is simply the type of the lvalue used for the access.
```

There is another important caveat: we have to conservatively assume that
the copy affects adjacent data (e.g. C++ subclass fields) that could
hold capabilities if we don't know the copy size. If the copy size is
<= sizeof(T), we can mark copies as non-tag-preserving since it cannot
affect trailing fields (even if we are actually copying a subclass).

We are also conservative if the structure contains an array of type
((un)signed) char or std::byte since those are often used to store
arbitrary data (including capabilities). We could make this check more
strict and require the array to be capability aligned, but that could be
done as a follow-up change.
@arichardson
Copy link
Member Author

I've dropped all the more complex logic from this patch - should be easier to review now.

if (Ty->isCHERICapabilityType(*this))
return false;
else if (const RecordType *RT = Ty->getAs<RecordType>()) {
if (!cannotContainCapabilities(RT->getDecl()))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The double negatives make this function implementation rather ugly, but I think it makes more sense for callers such as CodeGenTypes::copyShouldPreserveTagsForPointee().

@arichardson arichardson deleted the branch CTSRD-CHERI:no-preserve-tags-codegen October 6, 2022 12:56
@arichardson arichardson closed this Oct 6, 2022
arichardson added a commit that referenced this pull request Oct 6, 2022
This allows inlining of structure assignments for structs that are at
least capability size but do not contain any capabilities (e.g.
`struct { long a; long b; }`). We can also set the attribute for all
trivial auto var-init cases since those patterns never contain valid
capabilities.

Due to C's effective type rules, we have to be careful when setting the
attribute and only perform the type-base tag-preservation analysis if we
know the effective type. For example, marking a memcpy() to/from `long*`
as not tag-preserving could result in tag stripping for code that uses
type casts. Such code is correct even under strict aliasing rules since
the first store to a memory location determines the type. Example from
#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Despite the memcpy() argument being a long* (and therefore intuitively
not tag preserving), we can't add the attribute since we don't actually
know the type of the underlying object (malloc creates an allocated with
no declared type). From C99:
```
The effective type of an object for an access to its stored value is the
declared type of the object, if any (footnote 75: Allocated objects have
no declared type).

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the
lvalue becomes the effective type of the object for that access and for
subsequent accesses that do not modify the stored value.

If a value is copied into an object having no declared type using memcpy
or memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one.

For all other accesses to an object having no declared type, the effective
type of the object is simply the type of the lvalue used for the access.
```

There is another important caveat: we have to conservatively assume that
the copy affects adjacent data (e.g. C++ subclass fields) that could
hold capabilities if we don't know the copy size. If the copy size is
<= sizeof(T), we can mark copies as non-tag-preserving since it cannot
affect trailing fields (even if we are actually copying a subclass).

We are also conservative if the structure contains an array of type
((un)signed) char or std::byte since those are often used to store
arbitrary data (including capabilities). We could make this check more
strict and require the array to be capability aligned, but that could be
done as a follow-up change.
arichardson added a commit that referenced this pull request Oct 6, 2022
This allows inlining of structure assignments for structs that are at
least capability size but do not contain any capabilities (e.g.
`struct { long a; long b; }`). We can also set the attribute for all
trivial auto var-init cases since those patterns never contain valid
capabilities.

Due to C's effective type rules, we have to be careful when setting the
attribute and only perform the type-base tag-preservation analysis if we
know the effective type. For example, marking a memcpy() to/from `long*`
as not tag-preserving could result in tag stripping for code that uses
type casts. Such code is correct even under strict aliasing rules since
the first store to a memory location determines the type. Example from
#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Despite the memcpy() argument being a long* (and therefore intuitively
not tag preserving), we can't add the attribute since we don't actually
know the type of the underlying object (malloc creates an allocated with
no declared type). From C99:
```
The effective type of an object for an access to its stored value is the
declared type of the object, if any (footnote 75: Allocated objects have
no declared type).

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the
lvalue becomes the effective type of the object for that access and for
subsequent accesses that do not modify the stored value.

If a value is copied into an object having no declared type using memcpy
or memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one.

For all other accesses to an object having no declared type, the effective
type of the object is simply the type of the lvalue used for the access.
```

There is another important caveat: we have to conservatively assume that
the copy affects adjacent data (e.g. C++ subclass fields) that could
hold capabilities if we don't know the copy size. If the copy size is
<= sizeof(T), we can mark copies as non-tag-preserving since it cannot
affect trailing fields (even if we are actually copying a subclass).

We are also conservative if the structure contains an array of type
((un)signed) char or std::byte since those are often used to store
arbitrary data (including capabilities). We could make this check more
strict and require the array to be capability aligned, but that could be
done as a follow-up change.
arichardson added a commit that referenced this pull request Oct 6, 2022
This allows inlining of structure assignments for structs that are at
least capability size but do not contain any capabilities (e.g.
`struct { long a; long b; }`). We can also set the attribute for all
trivial auto var-init cases since those patterns never contain valid
capabilities.

Due to C's effective type rules, we have to be careful when setting the
attribute and only perform the type-base tag-preservation analysis if we
know the effective type. For example, marking a memcpy() to/from `long*`
as not tag-preserving could result in tag stripping for code that uses
type casts. Such code is correct even under strict aliasing rules since
the first store to a memory location determines the type. Example from
#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Despite the memcpy() argument being a long* (and therefore intuitively
not tag preserving), we can't add the attribute since we don't actually
know the type of the underlying object (malloc creates an allocated with
no declared type). From C99:
```
The effective type of an object for an access to its stored value is the
declared type of the object, if any (footnote 75: Allocated objects have
no declared type).

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the
lvalue becomes the effective type of the object for that access and for
subsequent accesses that do not modify the stored value.

If a value is copied into an object having no declared type using memcpy
or memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one.

For all other accesses to an object having no declared type, the effective
type of the object is simply the type of the lvalue used for the access.
```

There is another important caveat: we have to conservatively assume that
the copy affects adjacent data (e.g. C++ subclass fields) that could
hold capabilities if we don't know the copy size. If the copy size is
<= sizeof(T), we can mark copies as non-tag-preserving since it cannot
affect trailing fields (even if we are actually copying a subclass).

We are also conservative if the structure contains an array of type
((un)signed) char or std::byte since those are often used to store
arbitrary data (including capabilities). We could make this check more
strict and require the array to be capability aligned, but that could be
done as a follow-up change.
arichardson added a commit that referenced this pull request Oct 7, 2022
This allows inlining of structure assignments for structs that are at
least capability size but do not contain any capabilities (e.g.
`struct { long a; long b; }`). We can also set the attribute for all
trivial auto var-init cases since those patterns never contain valid
capabilities.

Due to C's effective type rules, we have to be careful when setting the
attribute and only perform the type-base tag-preservation analysis if we
know the effective type. For example, marking a memcpy() to/from `long*`
as not tag-preserving could result in tag stripping for code that uses
type casts. Such code is correct even under strict aliasing rules since
the first store to a memory location determines the type. Example from
#506:
```
void *malloc(__SIZE_TYPE__);
void *memcpy(void *, const void *, __SIZE_TYPE__);

void foo(long **p, long **q) {
    *p = malloc(32);
    *q = malloc(32);
    (*p)[0] = 1;
    (*p)[1] = 2;
    *(void (**)(long **, long **))(*p + 2) = &foo;
    memcpy(*q, *p, 32);
}
```

Despite the memcpy() argument being a long* (and therefore intuitively
not tag preserving), we can't add the attribute since we don't actually
know the type of the underlying object (malloc creates an allocated with
no declared type). From C99:
```
The effective type of an object for an access to its stored value is the
declared type of the object, if any (footnote 75: Allocated objects have
no declared type).

If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the
lvalue becomes the effective type of the object for that access and for
subsequent accesses that do not modify the stored value.

If a value is copied into an object having no declared type using memcpy
or memmove, or is copied as an array of character type, then the effective
type of the modified object for that access and for subsequent accesses
that do not modify the value is the effective type of the object from
which the value is copied, if it has one.

For all other accesses to an object having no declared type, the effective
type of the object is simply the type of the lvalue used for the access.
```

There is another important caveat: we have to conservatively assume that
the copy affects adjacent data (e.g. C++ subclass fields) that could
hold capabilities if we don't know the copy size. If the copy size is
<= sizeof(T), we can mark copies as non-tag-preserving since it cannot
affect trailing fields (even if we are actually copying a subclass).

We are also conservative if the structure contains an array of type
((un)signed) char or std::byte since those are often used to store
arbitrary data (including capabilities). We could make this check more
strict and require the array to be capability aligned, but that could be
done as a follow-up change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants