Description
Analyze underlying types of arrays passed to compute() method. Usually we pass numpy arrays into compute() method, but don`t care about the underlying type of this array. So if we pass numpy array read from csv (where dtype=np.float64 by default) and feed the algorithm with it, we may give significant performance degradation caused by internal data conversion from double to float (if algorithm is specified by fptype=’float’). So maybe it is useful to develop automatic fptype detection based on data, which the algorithm is fed with. Or simply notify user by warning in stdout about conversion of data to be made.
The same issues exists for CSR input, the user should by default get the fastest method and not be required to select csr method manually.
The problem here is of course that we cannot create the algorithm until we know the input data. In general it should be possible to defer the creation until we have the data. There are some technical details in daal4py to work out. More importantly this raises a few user-visible issues, like
- DAAL’s parameter checking will be deferred as well (and so the user will get a message triggered by a ‘unrelated’ line of code)
- what should happen if a kernel is setup by the user with partial input-data and DAAL uses it internally by other algorithms (like optimization solver pattern)? What if the user changes the partial input?