Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Add performance counter readonly register to sim65 #2355

Open
jroweboy opened this issue Jan 14, 2024 · 5 comments
Open

[Feature Request] Add performance counter readonly register to sim65 #2355

jroweboy opened this issue Jan 14, 2024 · 5 comments

Comments

@jroweboy
Copy link

jroweboy commented Jan 14, 2024

A feature request for sim65, right now it only supports printing out the cycle count after program execution, but for performance testing, it would be much more handy if you could access the performance counter directly in the executing application.

Motivation

The challenge with performance testing with sim65 right now is there's no way to filter out any of the driver and setup code. Additionally, even if you have a minimal test driver, there's not a good way to do average performance metrics, or finding the minimum cycle count/maximum cycle count for code that will change over multiple runs.

Implementation

One way to achieve this without affecting any existing code would be to add a virtual file that the application can open with open with a name that is illegal/unlikely for a file to contain. For Linux, its not really possible to have such a filename, as the only illegal characters in filenames are / and \0 which wouldn't work, but its close enough to just use something illegal in windows like :.

The cycle counter virtual register can be an unsigned long long, and reading this will return the number of cycles since the start of the application. The bytes returned can be in little endian order, so if the user only needs 16 or 32 bits of counter, they will be able to just read 2 or 4 bytes from the virtual register instead of the full 8 bytes.

The file descriptor returned can be a negative number. Since this usually indicates error, on the very very very small chance the user has a file named :cyclecount that they actually intended to open, the negative return value should tip them off that something is up. The actual number for the FD could be a constant like -2 to disambiguate with the OS provided error. This is admittedly a bit counter intuitive to return an error, but this way it shouldn't conflict with any real file descriptors on the OS.

Attempting to write to the virtual register will return an error. I don't think its necessary to prevent opening the file with write mode enabled (as it appears to be the default for the open call).

Example

So with that in mind here's how I could see it working with a small example.

static unsigned long start_count;
static unsigned long end_count;
static unsigned long cycles;
static unsigned long min_cycles, max_cycles, avg_cycles, total_cycles;

int main() {
  // Open the virtual cycle count register
  int perf = open(":cyclecount");
  if (perf >= 0) {
    // Error! Expected virtual register
  }
  
  // Find out how much overhead the reads add.
  read(perf, (void*)&start_count, sizeof(start_count));
  read(perf, (void*)&end_count, sizeof(end_count));
  unsigned int overhead = end_count - start_count;
  
  // Run 100 samples of the task that we are measuring
  for (char i=0; i < 100; i++) {
    read(perf, (void*)&start_count, sizeof(start_count));
    // Do Task
    read(perf, (void*)&end_count, sizeof(end_count));
    // Calculate our perf metrics
    cycles = end_count - start_count - overhead;
    if (cycles < min_cycles) {
      min_cycles = cycles;
    }
    if (cycles > max_cycles) {
      max_cycles = cycles;
    }
    total_cycles += cycles;
  }
  // Finally print out the results
  avg_cycles = total_cycles / 100;
  printf("Estimated cycle count. Min: %ul Max: %ul Avg: %ul", min_cycles, max_cycles, avg_cycles);
}
@mrdudz
Copy link
Contributor

mrdudz commented Jan 15, 2024

The idea is good - but implementing it like this seems a bit overkill - I'd really have a "register" for this - we could reserve a couple bytes right before the hardware vectors for this (i was also thinking about putting a debug register there, as required by the VICE test bench, so we can run the CPU tests on sim65)

@jroweboy
Copy link
Author

I'd like that too, but I was concerned about a few things. If these are acceptable tradeoffs I am fine with either way

  1. backwards compat. If an existing application has code that accessed that address it would break.

  2. cc65 seems to not support a 64 bit int type, but the internal counter is 64bit, meaning it could be annoying to handle in c.

  3. you can only move 1 byte of the counter at a time, which makes reading it consistently a pain. As you are reading the counter it will be constantly changing so it will be non trivial to copy the value.

3b) alternatively it could be two registers, one to clear the read latch and freeze the cycle value and one to read one byte at a time. This is probably good enough but maybe a little bit more complex that originally planned.

Lemme know what your thoughts are, thanks for the response!

@mrdudz
Copy link
Contributor

mrdudz commented Jan 15, 2024

I'm not concerned about backwards compatibility at all (and i see sim65 mostly as a tool for testing cc65 anyway)

As for reading the counter, it could be done the same way it works in many I/O chips - eg reading the MSB would latch the value, all subsequent reads read the latched value.

Not sure if 32bit wouldn't be enough for any practical purposes either.... 2^32 cycles is already quite some time. I never needed more than 16bit for anything i did on 8bit machines :)

@polluks
Copy link
Contributor

polluks commented Jan 20, 2024 via email

@clbr
Copy link
Contributor

clbr commented Feb 29, 2024

Did you see the sim65 profiling PR? Seems like the profiling output would also solve your issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants