Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use mtgp32_kernel_params field inside of mtgp_engine for better OO design. #22

Closed
Jorghi12 opened this issue Jul 11, 2018 · 16 comments
Closed

Comments

@Jorghi12
Copy link

Jorghi12 commented Jul 11, 2018

I don't see why you're diverging from CUDA for no good reason. I noticed a redundancy with the design of the mtgp_engine class.

This is how it's already done in CUDA.

struct curandStateMtgp32 {
    unsigned int s[MTGP32_STATE_SIZE];
    int offset;
    int pIdx;
    mtgp32_kernel_params_t * k;
    int precise_double_flag;
};

Let's take a look at rocRAND.
Fields of the mtgp32_kernel_params_t class:

public:
    // State
    mtgp32_state m_state;
    // Parameters
    unsigned int pos_tbl;
    unsigned int param_tbl[MTGP_TS];
    unsigned int temper_tbl[MTGP_TS];
    unsigned int single_temper_tbl[MTGP_TS];
    unsigned int sh1_tbl;
    unsigned int sh2_tbl;
    unsigned int mask;

Fields of t mtgp32_kernel_params_t:

    unsigned int pos_tbl[MTGP_BN_MAX];
    unsigned int param_tbl[MTGP_BN_MAX][MTGP_TS];
    unsigned int temper_tbl[MTGP_BN_MAX][MTGP_TS];
    unsigned int single_temper_tbl[MTGP_BN_MAX][MTGP_TS];
    unsigned int sh1_tbl[MTGP_BN_MAX];
    unsigned int sh2_tbl[MTGP_BN_MAX];
    unsigned int mask[1];

Why not simply define the mtgp_engine class to have a pointer to mtgp32_kernel_params_t?

Example:

public:
    // State
    mtgp32_state m_state;
    // Parameters
    mtgp32_kernel_params_t * k;

Reason One
Better object oriented design. I should be able to create a single mtgp32_kernel_params and reuse that object in other hiprandStateMtgp32_t states. Otherwise, you'll have to do bloated things such as

my_state.pos_tbl = mtgp32_kernel_params.pos_tbl;
my_state.param_tbl = mtgp32_kernel_params.param_tbl;
my_state.temper_tbl = mtgp32_kernel_params.temper_tbl;
my_state.single_temper_tbl = mtgp32_kernel_params.single_temper_tbl;
my_state.sh1_tbl = mtgp32_kernel_params.sh1_tbl;
my_state.sh2_tbl = mtgp32_kernel_params.sh2_tbl;
my_state.mask = mtgp32_kernel_params.mask;

Reason Two
Unnecessary divergence from the CUDA API leading to more work to have new frameworks run seamlessly.

@Jorghi12 Jorghi12 changed the title Better to use a mtgp32_kernel_params field inside of the mtgp_engine structs. Use mtgp32_kernel_params field inside of mtgp_engine for better OO design. Jul 11, 2018
@Jorghi12
Copy link
Author

@jszuppe @ex-rzr

@ex-rzr
Copy link
Contributor

ex-rzr commented Jul 12, 2018

@Jorghi12 But why exactly do these new frameworks need access to "private" types? (by "private" I mean anything that is not listed in https://docs.nvidia.com/cuda/curand/group__HOST.html and https://docs.nvidia.com/cuda/curand/group__DEVICE.html).

@Jorghi12
Copy link
Author

Jorghi12 commented Jul 12, 2018

The type isn't private in any way.

The links you've presented only list the functions available in the Host & Device API.

The reason mtgp32_kernel_params_t isn't listed there is simply because it is not a function.

Take a look at the Device API. You'll see some of the functions take a mtgp32_kernel_params_t object.
https://docs.nvidia.com/cuda/curand/group__DEVICE.html
_host__ ​ curandStatus_t curandMakeMTGP32Constants ( const mtgp32_params_fast_t params[], mtgp32_kernel_params_t* p )

Surely, if the type mtgp32_kernel_params_t were private, then the function curandMakeMTGP32Constants would be unusable.

@ex-rzr
Copy link
Contributor

ex-rzr commented Jul 12, 2018

That's exactly what I'm asking about: both host and device cuRAND APIs work with *_t types, i.e. there is mtgp32_kernel_params_t but no mtgp32_kernel_params.

What does pyTorch uses such types? Does it implement something that is not possible to accomplish using public APIs?

@Jorghi12
Copy link
Author

I just didn't see any reason for the divergence away from the CUDA API and wondered why such decisions were made. This isn't urgent but it'd be nice to have.

@Jorghi12
Copy link
Author

Jorghi12 commented Jul 12, 2018

@ex-rzr I think my original proposal may not be coming across clearly enough.

I don't want / need mtgp32_kernel_params exposed at all.

I'm simply saying that the mtgp_engine class should have a pointer to a mtgp32_kernel_params_t object.

@jszuppe
Copy link
Contributor

jszuppe commented Jul 12, 2018

mtgp32_kernel_params_t is not private, but its fields are implementation details. The same goes for fields of curandState<> structures.

Here's an example with PyTorch & CUDA: https://github.com/pytorch/pytorch/blob/master/aten/src/THC/THCTensorRandom.cu#L64

The fact that it's used in PyTorch does not mean it's correct usage.

@Jorghi12
Copy link
Author

Jorghi12 commented Jul 12, 2018

@jszuppe I believe we're discussing different things. From within device code, surely it would still be a nice to have to be able to pass mtgp32_kernel_params_t objects around to different mtgp_engine classes instead. You can ignore that link, since that's tangential to this conversation.

@jszuppe
Copy link
Contributor

jszuppe commented Jul 12, 2018

Please re-read the initial message.

Yeah. We get that. We don't remember exactly why it was done that way. It might have to do something with performance, or just missed (it happens; that was not the most important part of the project and it's not like the time was unlimited). mtgp32_kernel_params_t is not the same as fields in mtgp_engine, because it's bigger. I get that pointer may solve that, and that's a valid idea. We would need to check performance. We will look at this when we have time.

I believe we're discussing different things.

Maybe. We're just saying: what's not in a public API is an implementation detail, and I think that's obvious. (And I'm not talking about struct, but it's fields.)

@Jorghi12
Copy link
Author

Jorghi12 commented Jul 12, 2018

Check out Nvidia's guide for CURAND, page 55.
https://developer.download.nvidia.com/compute/DevZone/docs/html/CUDALibraries/doc/CURAND_Library.pdf

@jszuppe Sounds great. Yes I agree, it's not part of the public API, but it would be nice to maintain the idiosyncrasies.

I wouldn't have expected this to be caught at all given how subtle it is and how unrelated it is to the actual public API.

The rocRAND API works very well as a whole and has improved on many different levels from the hcRNG package.

@jszuppe
Copy link
Contributor

jszuppe commented Jul 12, 2018

Sounds great. Yes I agree, it's not part of the public API, but it would be nice to maintain the idiosyncrasies.

I wouldn't have expected this to be caught at all given how subtle it is and how unrelated it is to the actual public API.

Yeah. We can't predict what implementation details or implementation defined behaviours (like in here) will be used by developers that uses cuRAND. We can just hope that developers are sane ;) and it won't happen often. We will consider you request and check how it affects performance.

The rocRAND API works very well as a whole and has improved on many different levels from the hcRNG package.

Thanks! Let's not talk about hcRNG :D

@jszuppe
Copy link
Contributor

jszuppe commented Jul 18, 2018

I'm not sure if we will do that change as using pointer significantly decreases performance of MTGP32 RNG. The drops are: from 180GB/s to 102GB/s on my GTX1080, and from 263GB/s to 156 GB/s on MI25.

btw.

my_state.pos_tbl = mtgp32_kernel_params.pos_tbl;
my_state.param_tbl = mtgp32_kernel_params.param_tbl;
my_state.temper_tbl = mtgp32_kernel_params.temper_tbl;
my_state.single_temper_tbl = mtgp32_kernel_params.single_temper_tbl;
my_state.sh1_tbl = mtgp32_kernel_params.sh1_tbl;
my_state.sh2_tbl = mtgp32_kernel_params.sh2_tbl;
my_state.mask = mtgp32_kernel_params.mask;

is incorrect. For example: mtgp32_kernel_params.pos_tbl is array and my_state.pos_tbl is a single unsigned int value. If you need simple function for switching those parameters we can just add a function/method for that.

@Jorghi12
Copy link
Author

@jszuppe Yeah, adding a function for that is just fine.

@jszuppe
Copy link
Contributor

jszuppe commented Jul 23, 2018

@Jorghi12 check 23a7294 (develop branch)

@jszuppe
Copy link
Contributor

jszuppe commented Jul 30, 2018

@Jorghi12 Have you checked 23a7294? Can I close this issue?

@jszuppe
Copy link
Contributor

jszuppe commented Aug 10, 2018

I'm closing this. Please reopen if you still have some issues.

@jszuppe jszuppe closed this as completed Aug 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants