Photon Batching #78

PyMarc2 · 2022-10-27T18:19:56Z

Photon Batching

The main advantage of GPU is to decrease computation time.
With reasonable times achieved, one might want to increase the number of photons propagated in the scene.
With limited memory on the CPU/GPU, this was not possible in a single batch. With batching, it is now possible to propagate photons until your device's VRAM or storage memory is full.

New Objects

CLParameters: a class that stores parameters of a batch and automatically calculates other significant values via properties.

New logic

The main changes are in the CLPhotons().propagate() method. and in the propagation.c propagate kernel.
The CLPhotons ensures the batching functionality in the propagate method. It manages the logger buffer reset, sets the number of workItem, changes the size of the buffers to be sent, and captures data.

For the developer

The batching parameters are not automatically set, for now, the user/developer must select it.
3 parameters must be given when creating a CLParameters instance :

LoggerMemory: The memory to be allocated on the device in Bytes (usually > 1GB)
WorkItem Amount: the amount of WorkItem, which represents the number of threads launched on the device. It is believed the ideal amount of WorkItem Is the number of logic cores, but this has not always been true in our experience.
Total amount of photons per batch: The number of photons that will be sent on the device.

The WorkItem amount mostly influences speed.
The other two are symbiotic: They must be in perfect balance. In a certain scene, a photon will travel X number of times, the idea is to have a logger memory that is close to propagating all photons sent to the device, but no greater. This will allow spending the least amount of time transferring data between CPU/GPU (large logger buffer) and logging many 0 from the logger (Too large logger buffer). The idea is to almost fill the memory of the device to spend the most time on the device rather than doing back and forth.

Examples

Sending 50 000 in batches of 10 000

example.py

tissue = InfiniteTissue(material=ScatteringMaterial(30, 0.1, 0.8, 1.4))
source = PencilPointSource(position=Vector(0, 0, 0), direction=Vector(0, 0, 1), N=50000, useHardwareAcceleration=True)
logger = Logger()
source.propagate(tissue, logger=logger)
stats = Stats(logger, source, tissue)
stats.showEnergy2D(bins=1001, logScale=True, limits=[[-10, 10], [-10, 10]])

In CLPhotons.propagate, the CLParameters(1e8, 100, 10000) means 0.1GB of logger memory, 100WorkItem on the device, and 10k photons on the device per batch.

Speed Tests

For infinite medium, with mu_s = 30, mu_a=0.1, g=0.8

GPU GTX 1060 3GB 1.5Ghz, 1024 Cuda Cores : 10us/photon
CPU Intel Quad-Core i7 1.2Ghz + Boost: 120us/phot

Limitations

The parameters are not set automatically, yet. Hardcoded in CLPhoton.propagate.
The data is not live binned, which means that the raw data size quickly increases. For 1M photons at us=30, ua=0.1, the total interaction data exceeds 150GB.

Log with different index Change propagation kernel

…ynamic-logger"

Problem occured because .build always reset the hostBuffer to default value. A resetAuto flag is now available on each CLObject and set to False by default. Also Another problem where the last batch had less photons than the number of WorkUnits caused a memory access problem due to a workunit with no photons trying to acess a non-allocated memory spot.

Ceiling the photon per workunit let the propagation kernel to beilieve there was more photons, in the case were we had 12 photons to propagate on 10 cores, np.ceil would put 20photons, 8 of which didnt existed.

JLBegin

Nice. The OpenCL dev will be mostly done once we merge this with IntersectionFinder and write some form of auto batching parameters.

The ability to send more photons will put a bigger load on the logger parsing / binning. Can't wait to see how this will perform with the IntersectionFinder and the interaction key logger parsing.

Since we have openCL intersections, it would be time to create a more realistic tissue to test performance. This infinite tissue propagates photons in a 10cm radius which seems big for our typical applications.

pytissueoptics/rayscattering/opencl/CLObjects.py

JLBegin · 2022-10-31T14:28:48Z

pytissueoptics/rayscattering/opencl/CLPhotons.py


+        while photonCount < self._N:
+            logger = DataPointCL(size=params.maxLoggableInteractions)


Shouldn't we be using the autoReset behavior instead of creating a new logger object?

but, the size of the logger might vary.

I guess for now it isnt too bad if the last batch is bigger than necessary

JLBegin · 2022-10-31T14:35:26Z

pytissueoptics/rayscattering/opencl/CLParameters.py

+    dataPointSize = 16
+    photonSize = 64
+    seedSize = 4
+    materialSize = 32


These have changed in other branches and can change in the future. Hard to maintain here. We should consider to try getting this info from the CLObject itself.

JLBegin · 2022-10-31T14:41:40Z

pytissueoptics/rayscattering/opencl/CLParameters.py

+
+    @property
+    def workItemAmount(self):
+        return np.int32(self._workItemAmount)


These type conversions are very implicit and all over the place. I hope CLProgram can soon take this responsibility.

JLBegin · 2022-10-31T14:48:12Z

pytissueoptics/rayscattering/opencl/example.py

@@ -17,10 +17,10 @@


 tissue = InfiniteTissue(material=ScatteringMaterial(30, 0.1, 0.8, 1.4))
-source = PencilPointSource(position=Vector(0, 0, 0), direction=Vector(0, 0, 1), N=10000, useHardwareAcceleration=True)
+source = PencilPointSource(position=Vector(0, 0, 0), direction=Vector(0, 0, 1), N=100000, useHardwareAcceleration=True)


100k is a bit too much for an example. It crashes on my PC.

JLBegin · 2022-10-31T16:18:52Z

pytissueoptics/rayscattering/opencl/CLObjects.py

+            if self._autoReset:
+                self._initializeHostBuffer()


This won't work. initializeHostBuffer only returns. Maybe rename to _initialHostBuffer(). Then properly set HostBuffer before returning array. (similar problem on line 43)

JLBegin · 2022-10-31T16:35:54Z

pytissueoptics/rayscattering/tests/testStats.py

    def testWhenGet3DScatterOfSurface_shouldReturnScatterOfAllPointsLeavingTheSurface(self):
        frontScatter = self.stats._get3DScatter(solidLabel="cube", surfaceLabel="front")
-        self.assertEqual(0, frontScatter.size)
+        self.assertEqual(0, frontScatter.length)



Is this change intentional?

PyMarc2 added 19 commits October 17, 2022 09:33

add device infos from program + prop batch draft

3d27d1c

description of the method for batching

0ad2cb9

Arnaud's push

f4825ff

find easy way to create buffer without pushing on gpu + propagation loop

006cfc7

change propagate kernel

c78ca9b

Log with different index Change propagation kernel

CLPhoton propagation batching (putting a MWE)

43f7be2

use PhotonIndex for propagation.c

0779f8f

pyopencl propagation indexing changes

65aeee8

currentPhotonIndex check with maxPhoton in propagation.c

6f72439

fix indexing with localMaxPhoton and localMaxInteraction

f46649b

initialize logger at np.zero instead of np.empty

eddadc9

create logger with np.concat and log post batching

47a8fed

Auto stash before merge of "batch-dynamic-logger" and "origin/batch-d…

2476116

…ynamic-logger"

photonReplacement in separate method + structure for optimizer class

b473109

hide parameters calculations in CLParameter class. Clean algo a little

65f394f

CLParam correct workItem when photonAmount is less than

6502945

remove propagation.c comments

faffb65

the culprit.

c136137

Ceiling the photon per workunit let the propagation kernel to beilieve there was more photons, in the case were we had 12 photons to propagate on 10 cores, np.ceil would put 20photons, 8 of which didnt existed.

PyMarc2 requested a review from JLBegin October 27, 2022 18:59

PyMarc2 self-assigned this Oct 29, 2022

JLBegin approved these changes Oct 31, 2022

View reviewed changes

size -" length for CLObject

7f6452c

PyMarc2 merged commit 62f9176 into opencl-dev Oct 31, 2022

PyMarc2 deleted the batch-dynamic-logger branch October 31, 2022 16:11

JLBegin reviewed Oct 31, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Photon Batching #78

Photon Batching #78

PyMarc2 commented Oct 27, 2022 •

edited

Loading

JLBegin left a comment •

edited

Loading

JLBegin Oct 31, 2022

PyMarc2 Oct 31, 2022

PyMarc2 Oct 31, 2022

PyMarc2 Oct 31, 2022

JLBegin Oct 31, 2022

PyMarc2 Oct 31, 2022

JLBegin Oct 31, 2022

JLBegin Oct 31, 2022

JLBegin Oct 31, 2022

JLBegin Oct 31, 2022

PyMarc2 Oct 31, 2022


		while photonCount < self._N:
		logger = DataPointCL(size=params.maxLoggableInteractions)

Photon Batching #78

Photon Batching #78

Conversation

PyMarc2 commented Oct 27, 2022 • edited Loading

Photon Batching

New Objects

New logic

For the developer

Examples

Sending 50 000 in batches of 10 000

Speed Tests

Limitations

JLBegin left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PyMarc2 commented Oct 27, 2022 •

edited

Loading

JLBegin left a comment •

edited

Loading