Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory pattern configurability #6

Closed
rbertran opened this issue Sep 30, 2019 · 4 comments
Closed

Memory pattern configurability #6

rbertran opened this issue Sep 30, 2019 · 4 comments
Assignees
Labels
question Further information is requested

Comments

@rbertran
Copy link
Collaborator

From @rgokulsm comment on issue #5 :

I think it might be a useful feature if each loop iteration had some configurability. For example, if the memory addresses could be changed from one iteration to another (say, as some function of the iteration number), this might be useful in stressing the prefetcher. If you think this would be a useful usecase then maybe adding a pass to enable this might be a better option than directly doing so from the wrapper?

There are already various passes that control the memory access pattern, all of them rely on an endless loop behavior so that the requested access pattern can be generated over time. The two main passes are the following (there are others for more specific use cases):

  • GenericMemoryModelPass: Given the definition of the uarch details (e.g. associativity, sets, cache size, hierarchy, etc...) one can specify that 20% of accesses go to L1, 30% of accesses go to L2 and 50% of accesses go to L3. The pass generates the required addresses to guarantee that behavior (or similar).
  • GenericMemoryStreamsPass: Given a list of memory access streams (each of them with their size, stride access pattern, and weight) , the pass generates the addresses according to the streams defined. E.g. 1 stream of length 10K of stride 256bytes, and 2 streams of length 4MB of stride 16bytes, each stream same weight. The code generated will traverse/access tree different memory regions using those strides patterns, and whenever the end of a stream is reached, the access stream starts from the beginning.

Not sure the level of support for these passes for the RISCV-port, but if you let me know which type of patterns you'd like to generate, I can prioritize that effort.

@rbertran rbertran added the question Further information is requested label Sep 30, 2019
@rbertran rbertran self-assigned this Sep 30, 2019
@rgokulsm
Copy link

Oh okay, thanks! I have used 'GenericMemoryStreamsPass' with RISCV and it works. I haven't tried this with the looping aspect enabled. I will try that out and update.

Thanks!

@rgokulsm
Copy link

rgokulsm commented Oct 3, 2019

Thanks I am able to use the memory stream pass ( 'GenericMemoryStreamsPass' with RISCV ) with endless loop enabled. It does seem to me though that in every iteration of the loop the addresses are repeated from the beginning of the stream.

For example:

/* Building block start /
first:FLD f0, 0x0(x2) /
Address: Address((char) ST_4096_4_0[16388]+0x0000000000000000) /
/
Loop start /
FADD.S f2, f1, f0
FLD f3, 0x4(x2) /
Address: Address((char) ST_4096_4_0[16388]+0x0000000000000004) /
LW x0, 0x8(x2) /
Address: Address((char) ST_4096_4_0[16388]+0x0000000000000008) */

In the code above, I think the LW instruction in the loop always loads from the same address. In other words, I don't think there seems to be a 'per-loop-iteration' configurability. This might be the intended objective (i.e. that the iterations of the loop are exactly the same) but I guess having some per-iteration configurability might be useful in creating some streaming usecases. Let me know what you think and/or if I'm missing anything.

@rbertran
Copy link
Collaborator Author

rbertran commented Oct 3, 2019

@rgokulsm without more information I can not debug/check what is going on. Can you provide the actual set of passes and parameters that you are using?

@rgokulsm
Copy link

rgokulsm commented Oct 3, 2019

Ramon, I just realized I'm using the 'SingleMemoryStreamPass' and not the 'GenericMemoryStreamsPass'. Not sure if that has anything to do with it. My apologies, I will test the latter and get back to you.

The code I'm running at the moment is:
` cwrapper = get_wrapper('RiscvTestsP')
synth = Synthesizer(
self.target,
cwrapper(endless=True),
value=0b01010101,
)
passes = [
structure.SimpleBuildingBlockPass(self.args.loop_size),
initialization.InitializeRegistersPass(),
initialization.InitializeRegistersPass(v_value=(1.000000000000001, 64)),
instruction.SetInstructionTypeByProfilePass(thisdict),
address.UpdateInstructionAddressesPass(),
memory.SingleMemoryStreamPass(4096, 4),
branch.BranchNextPass(),
register.DefaultRegisterAllocationPass(dd=1),
address.UpdateInstructionAddressesPass()
]

            for p in passes:
                synth.add_pass(p)

            bench = synth.synthesize()

`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants