-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added google workload to champsim trace converter #379
base: master
Are you sure you want to change the base?
Conversation
## About Wrapper Script | ||
|
||
### Google Workload Trace Format | ||
Google Workload traces are stored as records, each record being of type `trace_entry_t` and size 12 bytes. Following is the structure of `trace_entry_t`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the conversion, we would highly recommend using our libraries which read the disk format into a record format suitable for analysis and simulation (the memref_t type). The disk format is subject to change without considering that a compatibility change, as it is not meant to be a public relied-upon interface. Furthermore, there are numerous complexities built into the disk format, such as including instruction encodings or physical address values only the first time observed, which are abstracted away by the libraries. Trying to directly convert the disk format would require duplicating the handling of those complexities. Best to work at the recommended and documented higher level for simplicity and to avoid breaking changes.
Thus, we would request to not take this approach and to instead create an analysis tool in our framework and convert from the memref_t level. There are numerous changes at the disk level that have been made in the last year which break this converter as-is: it won't work on today's traces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going further, for our user-mode software-thread-oriented traces, we would recommend building a dynamic converter from memref_t, rather than an offline static converter, for use with our dynamic scheduler which re-schedules our traces onto the simulated virtual cores. This solves several issues: it allows for mixing multiple workloads to model multi-tenant datacenters yet per-workload traces; it deflates the context switch inflation added due to tracing overhead; it (will) provide some functionality typically missing from trace-based simulation such as speculative path support by pulling instructions from prior points in the trace or using other schemes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you suggest we do instead? Would this fall under the category of analysis tools, as described here? Or is there a different way to get to a memref_t
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, an analysis tool. If ChampSim files are per-software-thread, a parallel tool with software threads as the parallel shards would work. You could start with an offline conversion to ChampSim files in the analysis tool. The same analysis tool code could be run with the shards as virtual cores (possibly with a dynamic reschedule of the software threads) to produce per-hardware-thread trace files.
|
||
### Conversion | ||
|
||
Converting of Non-Branch instructions to champsim format is straightforward. We just copy instruction pointer value and set `is_branch=0` and `branch_taken=0`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The drmemtrace records now have different types for taken or untaken conditional branches.
|
||
`./a.out <trace>` | ||
|
||
`<trace>` should be a google workload trace file in compressed form with `.gz` extension. Google workload traces can be obtained [here](https://dynamorio.org/google_workload_traces.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drmemtrace traces are now in .zip format. Sample traces are in the https://github.com/DynamoRIO/drmemtrace_samples repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably still needs work before it can be merged in. I think Derek's comments are important, and I also have some concerns about the ChampSim trace that is produced.
If we want to, we can merge this into its own branch and create a series of PRs against that. It would allow us to iterate on this over time.
|
||
#define REG_STACK_POINTER 6 | ||
#define REG_FLAGS 25 | ||
#define REG_INSTRUCTION_POINTER 26 | ||
|
||
using namespace std; | ||
|
||
#define NUM_INSTR_DESTINATIONS 2 | ||
#define NUM_INSTR_SOURCES 4 | ||
|
||
#define READ_BUFF_SIZE 12 * 1024 | ||
#define WRITE_BUFF_SIZE 64 * 1024 | ||
|
||
typedef struct trace_instr_format | ||
{ | ||
unsigned long long int ip; // instruction pointer (program counter) value | ||
|
||
unsigned char is_branch; // is this branch | ||
unsigned char branch_taken; // if so, is this taken | ||
|
||
unsigned char destination_registers[NUM_INSTR_DESTINATIONS]; // output registers | ||
unsigned char source_registers[NUM_INSTR_SOURCES]; // input registers | ||
|
||
unsigned long long int destination_memory[NUM_INSTR_DESTINATIONS]; // output memory | ||
unsigned long long int source_memory[NUM_INSTR_SOURCES]; // input memory | ||
} trace_instr_format_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should reference the trace format in the codebase already:
#define REG_STACK_POINTER 6 | |
#define REG_FLAGS 25 | |
#define REG_INSTRUCTION_POINTER 26 | |
using namespace std; | |
#define NUM_INSTR_DESTINATIONS 2 | |
#define NUM_INSTR_SOURCES 4 | |
#define READ_BUFF_SIZE 12 * 1024 | |
#define WRITE_BUFF_SIZE 64 * 1024 | |
typedef struct trace_instr_format | |
{ | |
unsigned long long int ip; // instruction pointer (program counter) value | |
unsigned char is_branch; // is this branch | |
unsigned char branch_taken; // if so, is this taken | |
unsigned char destination_registers[NUM_INSTR_DESTINATIONS]; // output registers | |
unsigned char source_registers[NUM_INSTR_SOURCES]; // input registers | |
unsigned long long int destination_memory[NUM_INSTR_DESTINATIONS]; // output memory | |
unsigned long long int source_memory[NUM_INSTR_SOURCES]; // input memory | |
} trace_instr_format_t; | |
#include "../inc/trace_instruction.h" | |
using namespace std; | |
#define READ_BUFF_SIZE 12 * 1024 | |
#define WRITE_BUFF_SIZE 64 * 1024 | |
using trace_instr_format_t = input_instr; |
This reduces the possibility for bugs that come from the formats being slightly different. You might need to reference the special registers with champsim::REG_STACK_POINTER
.
## About Wrapper Script | ||
|
||
### Google Workload Trace Format | ||
Google Workload traces are stored as records, each record being of type `trace_entry_t` and size 12 bytes. Following is the structure of `trace_entry_t`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you suggest we do instead? Would this fall under the category of analysis tools, as described here? Or is there a different way to get to a memref_t
?
cs.source_registers[1] = REG_INSTRUCTION_POINTER; // reads_ip | ||
cs.destination_registers[0] = REG_STACK_POINTER; // writes_sp | ||
cs.destination_registers[1] = REG_INSTRUCTION_POINTER; // writes_ip | ||
cs.source_registers[2] = 99; // reads_other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that instructions do not get any register information unless they're branches. It implies that there are no data dependencies in the trace, which is almost certainly not true. Furthermore, all indirect branches have data dependencies on each other (on register 99), which could lead to some strange results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is a limitation of the Google traces. Google traces do not have these info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do not have this information, I'm not sure we can move forward converting the Google traces to ChampSim traces. Memory timing is extremely important, and register information is the way that ChampSim generates timing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regular drmemtrace traces have instruction encodings with this information so a converter is feasible there. The version 1 release of the Google Workload traces had that information stripped out, and most core studies then had to perform limit studies assuming full deps vs zero deps. We are working on a version 2 release of the Google workloads where we hope to get approval to release information on dependencies, though likely not full encodings.
In reference to this conversation