-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Photon Batching #78
Photon Batching #78
Conversation
Log with different index Change propagation kernel
Problem occured because .build always reset the hostBuffer to default value. A resetAuto flag is now available on each CLObject and set to False by default. Also Another problem where the last batch had less photons than the number of WorkUnits caused a memory access problem due to a workunit with no photons trying to acess a non-allocated memory spot.
Ceiling the photon per workunit let the propagation kernel to beilieve there was more photons, in the case were we had 12 photons to propagate on 10 cores, np.ceil would put 20photons, 8 of which didnt existed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. The OpenCL dev will be mostly done once we merge this with IntersectionFinder and write some form of auto batching parameters.
The ability to send more photons will put a bigger load on the logger parsing / binning. Can't wait to see how this will perform with the IntersectionFinder and the interaction key logger parsing.
Since we have openCL intersections, it would be time to create a more realistic tissue to test performance. This infinite tissue propagates photons in a 10cm radius which seems big for our typical applications.
|
||
while photonCount < self._N: | ||
logger = DataPointCL(size=params.maxLoggableInteractions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we be using the autoReset behavior instead of creating a new logger object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but, the size of the logger might vary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess for now it isnt too bad if the last batch is bigger than necessary
dataPointSize = 16 | ||
photonSize = 64 | ||
seedSize = 4 | ||
materialSize = 32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These have changed in other branches and can change in the future. Hard to maintain here. We should consider to try getting this info from the CLObject itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
|
||
@property | ||
def workItemAmount(self): | ||
return np.int32(self._workItemAmount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These type conversions are very implicit and all over the place. I hope CLProgram can soon take this responsibility.
@@ -17,10 +17,10 @@ | |||
|
|||
|
|||
tissue = InfiniteTissue(material=ScatteringMaterial(30, 0.1, 0.8, 1.4)) | |||
source = PencilPointSource(position=Vector(0, 0, 0), direction=Vector(0, 0, 1), N=10000, useHardwareAcceleration=True) | |||
source = PencilPointSource(position=Vector(0, 0, 0), direction=Vector(0, 0, 1), N=100000, useHardwareAcceleration=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100k is a bit too much for an example. It crashes on my PC.
if self._autoReset: | ||
self._initializeHostBuffer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work. initializeHostBuffer only returns. Maybe rename to _initialHostBuffer(). Then properly set HostBuffer before returning array. (similar problem on line 43)
def testWhenGet3DScatterOfSurface_shouldReturnScatterOfAllPointsLeavingTheSurface(self): | ||
frontScatter = self.stats._get3DScatter(solidLabel="cube", surfaceLabel="front") | ||
self.assertEqual(0, frontScatter.size) | ||
self.assertEqual(0, frontScatter.length) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no
Photon Batching
The main advantage of GPU is to decrease computation time.
With reasonable times achieved, one might want to increase the number of photons propagated in the scene.
With limited memory on the CPU/GPU, this was not possible in a single batch. With batching, it is now possible to propagate photons until your device's VRAM or storage memory is full.
New Objects
CLParameters
: a class that stores parameters of a batch and automatically calculates other significant values via properties.New logic
The main changes are in the
CLPhotons().propagate()
method. and in thepropagation.c
propagate
kernel.The CLPhotons ensures the batching functionality in the
propagate
method. It manages the logger buffer reset, sets the number of workItem, changes the size of the buffers to be sent, and captures data.For the developer
The batching parameters are not automatically set, for now, the user/developer must select it.
3 parameters must be given when creating a
CLParameters
instance :LoggerMemory
: The memory to be allocated on the device in Bytes (usually > 1GB)WorkItem Amount
: the amount of WorkItem, which represents the number of threads launched on the device. It is believed the ideal amount of WorkItem Is the number of logic cores, but this has not always been true in our experience.Total amount of photons per batch
: The number of photons that will be sent on the device.The WorkItem amount mostly influences speed.
The other two are symbiotic: They must be in perfect balance. In a certain scene, a photon will travel X number of times, the idea is to have a logger memory that is close to propagating all photons sent to the device, but no greater. This will allow spending the least amount of time transferring data between CPU/GPU (large logger buffer) and logging many 0 from the logger (Too large logger buffer). The idea is to almost fill the memory of the device to spend the most time on the device rather than doing back and forth.
Examples
Sending 50 000 in batches of 10 000
example.py
In
CLPhotons.propagate
, theCLParameters(1e8, 100, 10000)
means 0.1GB of logger memory, 100WorkItem on the device, and 10k photons on the device per batch.Speed Tests
For infinite medium, with
mu_s = 30
,mu_a=0.1
,g=0.8
Limitations