Byte-addressable vs word-addressable memory architectures and terminology #100

jreineckearm · 2024-03-08T16:58:26Z

Description

~~The recently introduced Bytes per Word setting isn't considered when calculating the values for the address column. This causes incorrect addresses to be displayed.~~

UPDATE: What was initially considered as a bug showed the need to consider byte-addressable memory architectures that use the term word for a data unit consisting of multiple directly addressable bytes. We need to ensure that

terminology used in the Memory Inspector either scales across byte- and word-addressable architectures. Or is easily understandable for users of both architecture types.
address calculations, for example for the address column, reflect this architecture characteristic correctly for both.

The solution may require the introduction of a new configuration option.

Below the original "bug reproducer" to allow understanding of the evolution of this Github issue.

How to reproduce:

Open Memory Inspector
Try different "Bytes per Word", "Words per Group", "Groups per Row" settings.
Check the address values in the left column.

Expected behavior

The addresses should be incremented by the effective number of Bytes displayed per previous row. At the moment, it looks like the "Bytes per Word" is not taken into account.

Environment

OS: Windows 10 Enterprise
Browser: N/A
Theia or VS Code Version: 1.87.0

Additional information

Correct address values:

Address values not taking "Bytes per Word" into account:

The text was updated successfully, but these errors were encountered:

colin-grant-work · 2024-03-08T17:31:03Z

@jreineckearm, shouldn't each address cover one word, so that the current behavior is correct? Since you've changed the bytes per word, but not the words/group or groups/row values, the number of words/row hasn't changed, and the address difference between rows should be the same.

jreineckearm · 2024-03-08T17:51:38Z

@colin-grant-work , interesting. I think we are hitting here a CPU architecture terminology ~~problem~~ challenge. :-)
In general, the Arm architecture (yes, I may be a little biased :-) ) works with byte-addressable memory. But uses the term word for example for the M-class flavors as a 32-bit unit. See for example: https://developer.arm.com/documentation/ddi0419/c/Application-Level-Architecture/ARM-Architecture-Memory-Model/Address-space
I realize only now that other CPU architectures can be word-addressable. This is may become more than just a quick calculation fix and require a little more thinking....

colin-grant-work · 2024-03-08T17:54:37Z

Yes... it's not very nice. I was thinking through some similar issues recently and found this sentence:

A byte is composed of a contiguous sequence of bits, the number of which is implementation-defined.

So we can't even technically use the term 'byte' with any definite meaning!

jreineckearm · 2024-03-08T18:06:13Z

So we can't even technically use the term 'byte' with any definite meaning!

Hm....I am sure we can come up with a good story around it. At the end of the day, each memory window instance is used in the context of a specific CPU architecture. But we may need to carefully think about how to feed in such a-priori-knowledge:
A DAP should know how to interpret an address based on the targeted CPU. So, it should be able to return the right amount of bits based on address and length (sorry, if wrong, don't have the MS DAP spec at hand). We just may need more meta-information about to how many bits/digits a byte refers. And if a CPU is byte- or word-addressable. All of that without the need to make a user an architecture expert to configure the window correctly for their targeted HW.
Let me think this a little through over the weekend (it's getting late over here in Germany). But I believe we are not that far off with what we have already now...

jreineckearm · 2024-03-08T18:07:50Z

Just curious, @colin-grant-work , do we hit the 1 byte != 8 bit problem with your use cases already? Just curious where we can take shortcuts for now and tackle the additional complexity when the demand is there.

colin-grant-work · 2024-03-08T18:20:45Z

Just curious, @colin-grant-work , do we hit the 1 byte != 8 bit problem with your use cases already?

No, byte != 8 bits hasn't come up (thankfully), but word-addressability is important to us. Though I'm not sure that those aren't just paraphrasings of the same problem.

jreineckearm · 2024-03-08T19:06:09Z

No, byte != 8 bits hasn't come up (thankfully), but word-addressability is important to us. Though I'm not sure that those aren't just paraphrasings of the same problem.

OK, thanks! Let me digest this a little. But I am confident we'll find a solution for this.

@colin-grant-work , @thegecko , @planger , could one of you please remove the bug label? I think this is no longer a defect but an enhancement.

colin-grant-work · 2024-03-08T19:37:59Z

This probably fits together with the issues raised in #77, as well.

jreineckearm · 2024-03-08T20:24:08Z

This probably fits together with the issues raised in #77, as well.

Yes, it potentially does.

Thinking this further through, you are absolutely right with this statement:

Though I'm not sure that those aren't just paraphrasings of the same problem.

I believe what we are really facing is a "just" a terminology problem. And we may get this solved by "just" updating option names and descriptions. This would then be closer to the Theia Memory Inspector options which I now understand better by this exercise here. The implemented logic wouldn't need to change.

The problem we tried to solve with adding the "Bytes per Word" option was to express what, I believe to understand now, the group was supposed to express. In terms of an M-profile Arm CPU, the group would be one of BYTE (8-bit), H(alf)WORD (16-bit), WORD (32-bit). Other Arm architecture profiles have comparable terminology and include 64-bit representations.

[Action] What we effectively need to change is the introduced "Bytes per Word" to a "Bytes per Unit" (or similar). Where "Unit" is the minimum addressable unit. Open for suggestions for a better name.

[Action] "Words per Group" would change to "Units per Group". Though we may want to rename "Group" to something expressing the above relation more easily understandable for users of our existing debug tools. No good idea yet, but I'll keep thinking about this. Again, open for suggestions.

The rest would stay the same.

We could consider additionally to change the "Bytes per to Unit" to a "Bits per Unit". But how realistic would it be today to that there are units that are not a multiple of 8 Bits? My worry really is the involved effort to make that change and the resulting testing.

What do you think?

jreineckearm · 2024-03-15T13:34:50Z

@colin-grant-work , I spent a bit more time thinking about this topic and discussed this with others internally.
And I still believe that this is "only" a terminology problem.

Current situation

We have the following options (I leave Groups per Row out of the equation, it's not relevant for the problem):

Bytes per Word: A size in Bytes of the finest granular data unit in memory. The number of bytes returned for a single address.
Words per Group: Number of above words combined into a value. Groups are separated by spaces. Endianess affects the order of those words in the group display.

For byte-addressable machines, a word as per above terminology is always 1 byte.
For word-addressable machines, a word as per above terminology is a value greater than 1 byte. Unless an architecture specifies the meaning of word to be one byte.

The term word can have various meanings across architectures. In the Arm architecture, it for example means the natural data word size for CPU instructions. For Arm M-profile CPUs (32-bit), it is 4 bytes. However, there also exist instructions working on other sized data units like bytes or half-words. Which is possible because of the byte-addressable character.

Due to the ambiguity that I see for the term word, I don't think we can use it in our displays/documentation except from descriptions of specific examples and how they map to what we have.

Proposal

I have explored various alternatives for word:

unit: IMO too generic and not meaningful enough.
addressable unit: A little better, but still leaves room for interpretation.
memory cell: Sounds a little too "physical" for SW developers which is our main user base.
minimum addressable unit: IMO the most scalable, yet still specific term I can think of. And it covers the address aspect. I've seen it being used in software APIs for the very same purpose before. Though the name is a bit lengthy for option names.
MAU: Short form of the above.

I believe MAU could work with appropriate tooltips, in-tool descriptions, and documentation. It would leave us with renamed options:

Bytes per MAU: Number of bytes of a Minimum Addressable Unit in memory. Always 1 for byte-addressable architectures. Values greater than 1 for word-addressable architectures.
MAUs per Group: Number of MAUs to form a value representation. Displayed groups are separated by spaces. Endianess affects the order of MAUs within a displayed group.

Is this something that would work for your use cases as well? I sense there is more to discuss from this comment: #108 (comment)

The resulting actions to make this change are:

Update the option names.
Add tooltips to the options to explain its meaning in place.
Update description texts.
Describe this in any future documentation and explain how this would map to different architecture types (address vs word addressing).
Update source code to replace the mentioning of word with mau and similar.

jreineckearm · 2024-03-19T19:54:18Z

@colin-grant-work , @planger , thanks again for joining today's call! This was a useful discussion to agree on a couple of topics.
Below, the essence of the call and decisions. Please correct or amend if I missed to write something down. And apologies if this is partly repeating information from above. I'll update other PRs and Issues accordingly.

We agreed on using the term byte as a data unit of 8 bit. We revisit once we hit use cases where this assumption no longer holds.
We agreed to replace the term word with Minimum Addressable Unit (MAU). This helps to abstract away the meaning of word which can mean different things across CPU/memory architectures.
- This change is accompanied by in-tool help texts/tooltips and documentation of its meaning.
We agreed to keep the term group for a grouped number of MAUs. While the term group is rather generic, it is a means to transform a series of MAUs into a hexadecimal value that considers the selected endianess.
- This a temporary step. In future we'd like to replace this with a more meaningful value display selection like "char", "short", "int", "long", "float", "double" etc. To be discussed whether to have this built-in and/or a selection that is customizable from a debug adapter when the time comes.
We keep the approach of applying endianess on group level.
We keep the approach of editing memory contents based on groups. Further investigation on enhancing in future with an "edit-by-selection" in the sense of selecting a range via mouse and turn the selection into an in-place edit.
- Reconsider/refine this approach if we receive strong/convincing user feedback.

Resulting actions

1. Update the use of word with MAU (and mau, respectively) in both user visible text (options) and source code. Also add tooltips/descriptions to clarify the meaning of MAU within the tool.
- @planger , as discussed, could please your team pick this up? This would be a new PR.
2. Update the README with a terminology section to clarify the most important terms we agreed on: byte, mau, group. This shall be complemented by a screenshot which highlights each of the three in a specific setup.
- @jreineckearm : happy to provide a draft myself and add as a comment/suggestion to PR Improve readme and CDT Cloud update logo #109 .
3. Continue with the approach for editing memory contents as started with Allow to write memories within the memory-table #108 . I'll comment there separately.
4. @colin-grant-work agreed to investigate further a potential edit-by-selection approach. Probably best to capture in a separate issue. See Memory Table Editability #110
5. @jreineckearm to create a new issue for replacing group with a more meaningful value display selection. See Review definition and meaning of "Group" #114

We agreed on making a new Memory Inspector release after items 1. - 3. have been addressed (FYI for @thegecko )

eclipse-cdt-cloud#100

jreineckearm added the bug Something isn't working label Mar 8, 2024

jreineckearm changed the title ~~Values calculation for address column broken~~ Value calculation for address column does not consider 'Bytes per Word' Mar 8, 2024

jreineckearm changed the title ~~Value calculation for address column does not consider 'Bytes per Word'~~ Consider byte-addressable vs word-addressable memory architectures Mar 8, 2024

colin-grant-work added enhancement New feature or request and removed bug Something isn't working labels Mar 8, 2024

jreineckearm mentioned this issue Mar 15, 2024

Allow to write memories within the memory-table #108

Merged

1 task

jreineckearm changed the title ~~Consider byte-addressable vs word-addressable memory architectures~~ Byte-addressable vs word-addressable memory architectures and terminology Mar 19, 2024

jreineckearm mentioned this issue Mar 19, 2024

Improve readme and CDT Cloud update logo #109

Merged

1 task

planger added a commit to eclipsesource/vscode-memory-inspector that referenced this issue Mar 21, 2024

Rename word to MAU and add link to settings docs

61b4b98

eclipse-cdt-cloud#100

planger added a commit to eclipsesource/vscode-memory-inspector that referenced this issue Mar 21, 2024

Rename word to MAU and add link to settings docs

10c186e

eclipse-cdt-cloud#100

planger mentioned this issue Mar 21, 2024

Rename word to MAU and add link to settings docs #111

Merged

1 task

planger closed this as completed Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Byte-addressable vs word-addressable memory architectures and terminology #100

Byte-addressable vs word-addressable memory architectures and terminology #100

jreineckearm commented Mar 8, 2024 •

edited

Loading

colin-grant-work commented Mar 8, 2024 •

edited

Loading

jreineckearm commented Mar 8, 2024 •

edited

Loading

colin-grant-work commented Mar 8, 2024

jreineckearm commented Mar 8, 2024

jreineckearm commented Mar 8, 2024 •

edited

Loading

colin-grant-work commented Mar 8, 2024

jreineckearm commented Mar 8, 2024

colin-grant-work commented Mar 8, 2024 •

edited

Loading

jreineckearm commented Mar 8, 2024

jreineckearm commented Mar 15, 2024

jreineckearm commented Mar 19, 2024 •

edited by planger

Loading

Byte-addressable vs word-addressable memory architectures and terminology #100

Byte-addressable vs word-addressable memory architectures and terminology #100

Comments

jreineckearm commented Mar 8, 2024 • edited Loading

colin-grant-work commented Mar 8, 2024 • edited Loading

jreineckearm commented Mar 8, 2024 • edited Loading

colin-grant-work commented Mar 8, 2024

jreineckearm commented Mar 8, 2024

jreineckearm commented Mar 8, 2024 • edited Loading

colin-grant-work commented Mar 8, 2024

jreineckearm commented Mar 8, 2024

colin-grant-work commented Mar 8, 2024 • edited Loading

jreineckearm commented Mar 8, 2024

jreineckearm commented Mar 15, 2024

Current situation

Proposal

jreineckearm commented Mar 19, 2024 • edited by planger Loading

jreineckearm commented Mar 8, 2024 •

edited

Loading

colin-grant-work commented Mar 8, 2024 •

edited

Loading

jreineckearm commented Mar 8, 2024 •

edited

Loading

jreineckearm commented Mar 8, 2024 •

edited

Loading

colin-grant-work commented Mar 8, 2024 •

edited

Loading

jreineckearm commented Mar 19, 2024 •

edited by planger

Loading