Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#1569 AArch64: Implement trace optimisation. #2442

Closed
wants to merge 2 commits into from
Closed

Conversation

egrimley
Copy link
Contributor

The optimisation is not yet enabled by default, though some programs
run faster, and correctly, with "-enable_traces".

This is based on work done by Kevin Zhou in July 2016.

Change-Id: I6ab3bf0a2f48b25002f1586ff1b85e6232b4357d

The optimisation is not yet enabled by default, though some programs
run faster, and correctly, with "-enable_traces".

This is based on work done by Kevin Zhou in July 2016.

Change-Id: I6ab3bf0a2f48b25002f1586ff1b85e6232b4357d
@egrimley
Copy link
Contributor Author

The Travis failure is i#2003.

@@ -107,6 +107,8 @@ insert_exit_stub_other_flags(dcontext_t *dcontext, fragment_t *f,
/* FIXME i#1575: coarse-grain NYI on ARM */
ASSERT_NOT_IMPLEMENTED(!TEST(FRAG_COARSE_GRAIN, f->flags));
if (LINKSTUB_DIRECT(l_flags)) {
/* We put a NOP here for future linking. */
*pc++ = 0xd503201f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a named constant

@@ -119,6 +121,8 @@ insert_exit_stub_other_flags(dcontext_t *dcontext, fragment_t *f,
/* br x1 */
*pc++ = 0xd61f0000 | 1 << 5;
} else {
/* We put a NOP here for trace building. */
*pc++ = 0xd503201f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a named constant

*(uint *)stub_pc = (0xa9000000 | 0 | 1 << 10 | (dr_reg_stolen - DR_REG_X0) << 5 |
TLS_REG0_SLOT >> 3 << 15);
/* Restore the NOP instruction. */
*(uint *)stub_pc = 0xd503201f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a named constant

@@ -107,6 +107,8 @@ insert_exit_stub_other_flags(dcontext_t *dcontext, fragment_t *f,
/* FIXME i#1575: coarse-grain NYI on ARM */
ASSERT_NOT_IMPLEMENTED(!TEST(FRAG_COARSE_GRAIN, f->flags));
if (LINKSTUB_DIRECT(l_flags)) {
/* We put a NOP here for future linking. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But linking was already working fine, so it is unclear why we'd need a nop now. Please elaborate in the comment -- maybe best in the pre-function comment. Putting an asm listing is also helpful (esp with this stub getting very large: 8 instructions now). This is unusual.

@@ -119,6 +121,8 @@ insert_exit_stub_other_flags(dcontext_t *dcontext, fragment_t *f,
/* br x1 */
*pc++ = 0xd61f0000 | 1 << 5;
} else {
/* We put a NOP here for trace building. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does trace building need a nop? I think there's some code missing from this diff (e.g., fixup_indirect_trace_exit): I can't get a clear picture of what the trace building strategy is on A64 vs x86 and why a nop would help.

(dr_reg_stolen - DR_REG_X0) << 5 |
ENTRY_PC_SPILL_SLOT >> 3 << 10);
pc += 4;
f->prefix_size = (byte)((cache_pc)pc - f->start_pc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks identical to the else code below which looks identical to the original code: so the original was sufficient and there's nothing to change here, right?

#ifdef AARCH64
/* For AArch64, there is no need to save the flags
* so we always have the same ibt prefix. */
return fragment_ibt_prefix_size(flags);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this code needs to be here: use_ibt_prefix should be true always for AARCH64, right? So this routine was already returning fragment_ibt_prefix_size(flags). Cleaner to not have an ifdef.

@@ -295,7 +295,7 @@ typedef enum _dr_pred_type_t {
#define PREFIX_PRED_BITS 5
#define PREFIX_PRED_BITPOS (32 - PREFIX_PRED_BITS)
#define PREFIX_PRED_MASK \
(((1 << PREFIX_PRED_BITS)-1) << PREFIX_PRED_BITPOS) /*0xf8000000 */
((((uint)1 << PREFIX_PRED_BITS)-1) << PREFIX_PRED_BITPOS) /*0xf8000000 */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just 1U

@@ -705,7 +705,8 @@ instr_get_predicate(instr_t *instr)
instr_t *
instr_set_predicate(instr_t *instr, dr_pred_type_t pred)
{
instr->prefixes |= ((pred << PREFIX_PRED_BITPOS) & PREFIX_PRED_MASK);
instr->prefixes = ((instr->prefixes & ~PREFIX_PRED_MASK ) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks incorrect: I think there are other bits in prefixes besides predicates, which this routine should not clobber

@@ -1261,6 +1261,9 @@ end_and_emit_trace(dcontext_t *dcontext, fragment_t *cur_f)
md->emitted_size -= local_exit_stub_size(dcontext, target, md->trace_flags);
}

/* Fixup all indirect trace exits to comform ibl routine */
IF_AARCH64(md->emitted_size += fixup_indirect_trace_exit(dcontext, trace));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is missing from this diff: this function does not exist.

Also, while I don't know precisely what it does, are you sure this shouldn't be done during incremental trace building as each exit is added, rather than at the end.

@fhahn
Copy link
Contributor

fhahn commented Dec 13, 2017

run aarch64 tests

Vincent-lau added a commit that referenced this pull request Aug 11, 2021
…part of support

This patch incorporated changes from patch #2442, and fixed some corner cases
where the assumption was incorrect and caused the program to crash.

Trace support is not yet enabled by default, but can be enabled with "-enable_traces".

A large part of this work was based on work done by Kevin Zhou. Please see branch
i1569-trace and patch #2442 for the original version.

Issues: #1569, #2974
Vincent-lau added a commit that referenced this pull request Aug 11, 2021
…part of support

This patch incorporated changes from patch #2442, and fixed some corner cases
where the assumption was incorrect and caused the program to crash.

Trace support is not yet enabled by default, but can be enabled with "-enable_traces".

A large part of this work was based on work done by Kevin Zhou. Please see branch
i1569-trace and patch #2442 for the original version.

Issues: #1569, #2974
Vincent-lau added a commit that referenced this pull request Sep 1, 2021
…5045)

This patch incorporated changes from PR #2442 that implemented the initial version
of trace support for AArch64.

This patch also fixed some corner cases not considered in PR #2442
where the assumption was incorrect and caused the program to crash.

Trace support is not yet enabled by default, but can be enabled with "-enable_traces".

This commit introduces internal control flow by adding a trace_exit_label in
fixup_indirect_trace_exit, which might break code that assumes linear control
flow (such as translate.c). 
Either special support is needed for this trace_exit_label or alternative
schemes should be used that has a linear control.

Some complexities in this commit can be removed once we have #5062 
implemented and decode_fragment eliminated.

Co-authored-by: Kevin Zhou <Kevin.Zhou@arm.com>

Issues: #1569, #2974
@egrimley
Copy link
Contributor Author

An improved version of this has already been merged by #5045.

@egrimley egrimley closed this Nov 29, 2023
@egrimley egrimley deleted the i1569-trace branch November 29, 2023 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants