Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Photon Batching #78

Merged
merged 20 commits into from
Oct 31, 2022
Merged

Photon Batching #78

merged 20 commits into from
Oct 31, 2022

Conversation

PyMarc2
Copy link
Contributor

@PyMarc2 PyMarc2 commented Oct 27, 2022

Photon Batching

The main advantage of GPU is to decrease computation time.
With reasonable times achieved, one might want to increase the number of photons propagated in the scene.
With limited memory on the CPU/GPU, this was not possible in a single batch. With batching, it is now possible to propagate photons until your device's VRAM or storage memory is full.

New Objects

  • CLParameters: a class that stores parameters of a batch and automatically calculates other significant values via properties.

New logic

The main changes are in the CLPhotons().propagate() method. and in the propagation.c propagate kernel.
The CLPhotons ensures the batching functionality in the propagate method. It manages the logger buffer reset, sets the number of workItem, changes the size of the buffers to be sent, and captures data.

For the developer

The batching parameters are not automatically set, for now, the user/developer must select it.
3 parameters must be given when creating a CLParameters instance :

  • LoggerMemory: The memory to be allocated on the device in Bytes (usually > 1GB)
  • WorkItem Amount: the amount of WorkItem, which represents the number of threads launched on the device. It is believed the ideal amount of WorkItem Is the number of logic cores, but this has not always been true in our experience.
  • Total amount of photons per batch: The number of photons that will be sent on the device.

The WorkItem amount mostly influences speed.
The other two are symbiotic: They must be in perfect balance. In a certain scene, a photon will travel X number of times, the idea is to have a logger memory that is close to propagating all photons sent to the device, but no greater. This will allow spending the least amount of time transferring data between CPU/GPU (large logger buffer) and logging many 0 from the logger (Too large logger buffer). The idea is to almost fill the memory of the device to spend the most time on the device rather than doing back and forth.

Examples

Sending 50 000 in batches of 10 000

example.py

tissue = InfiniteTissue(material=ScatteringMaterial(30, 0.1, 0.8, 1.4))
source = PencilPointSource(position=Vector(0, 0, 0), direction=Vector(0, 0, 1), N=50000, useHardwareAcceleration=True)
logger = Logger()
source.propagate(tissue, logger=logger)
stats = Stats(logger, source, tissue)
stats.showEnergy2D(bins=1001, logScale=True, limits=[[-10, 10], [-10, 10]])

In CLPhotons.propagate, the CLParameters(1e8, 100, 10000) means 0.1GB of logger memory, 100WorkItem on the device, and 10k photons on the device per batch.

Speed Tests

For infinite medium, with mu_s = 30, mu_a=0.1, g=0.8

  • GPU GTX 1060 3GB 1.5Ghz, 1024 Cuda Cores : 10us/photon
  • CPU Intel Quad-Core i7 1.2Ghz + Boost: 120us/phot
    bitmap

Limitations

  • The parameters are not set automatically, yet. Hardcoded in CLPhoton.propagate.
  • The data is not live binned, which means that the raw data size quickly increases. For 1M photons at us=30, ua=0.1, the total interaction data exceeds 150GB.

Log with different index
Change propagation kernel
Problem occured because .build always reset the hostBuffer to default value. A resetAuto flag is now available on each CLObject and set to False by default. Also Another problem where the last batch had less photons than the number of WorkUnits caused a memory access problem due to a workunit with no photons trying to acess a non-allocated memory spot.
Ceiling the photon per workunit let the propagation kernel to beilieve there was more photons, in the case were we had 12 photons to propagate on 10 cores, np.ceil would put 20photons, 8 of which didnt existed.
@PyMarc2 PyMarc2 requested a review from JLBegin October 27, 2022 18:59
@PyMarc2 PyMarc2 self-assigned this Oct 29, 2022
Copy link
Contributor

@JLBegin JLBegin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. The OpenCL dev will be mostly done once we merge this with IntersectionFinder and write some form of auto batching parameters.

The ability to send more photons will put a bigger load on the logger parsing / binning. Can't wait to see how this will perform with the IntersectionFinder and the interaction key logger parsing.

Since we have openCL intersections, it would be time to create a more realistic tissue to test performance. This infinite tissue propagates photons in a 10cm radius which seems big for our typical applications.

pytissueoptics/rayscattering/opencl/CLObjects.py Outdated Show resolved Hide resolved

while photonCount < self._N:
logger = DataPointCL(size=params.maxLoggableInteractions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be using the autoReset behavior instead of creating a new logger object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but, the size of the logger might vary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for now it isnt too bad if the last batch is bigger than necessary

Comment on lines +6 to +9
dataPointSize = 16
photonSize = 64
seedSize = 4
materialSize = 32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These have changed in other branches and can change in the future. Hard to maintain here. We should consider to try getting this info from the CLObject itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes


@property
def workItemAmount(self):
return np.int32(self._workItemAmount)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These type conversions are very implicit and all over the place. I hope CLProgram can soon take this responsibility.

@@ -17,10 +17,10 @@


tissue = InfiniteTissue(material=ScatteringMaterial(30, 0.1, 0.8, 1.4))
source = PencilPointSource(position=Vector(0, 0, 0), direction=Vector(0, 0, 1), N=10000, useHardwareAcceleration=True)
source = PencilPointSource(position=Vector(0, 0, 0), direction=Vector(0, 0, 1), N=100000, useHardwareAcceleration=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100k is a bit too much for an example. It crashes on my PC.

@PyMarc2 PyMarc2 merged commit 62f9176 into opencl-dev Oct 31, 2022
@PyMarc2 PyMarc2 deleted the batch-dynamic-logger branch October 31, 2022 16:11
Comment on lines +38 to +39
if self._autoReset:
self._initializeHostBuffer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work. initializeHostBuffer only returns. Maybe rename to _initialHostBuffer(). Then properly set HostBuffer before returning array. (similar problem on line 43)

Comment on lines 32 to 35
def testWhenGet3DScatterOfSurface_shouldReturnScatterOfAllPointsLeavingTheSurface(self):
frontScatter = self.stats._get3DScatter(solidLabel="cube", surfaceLabel="front")
self.assertEqual(0, frontScatter.size)
self.assertEqual(0, frontScatter.length)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants