Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amin always finds index 0 #359

Closed
Nyrio opened this issue May 15, 2019 · 5 comments
Closed

Amin always finds index 0 #359

Nyrio opened this issue May 15, 2019 · 5 comments

Comments

@Nyrio
Copy link

Nyrio commented May 15, 2019

I noticed that Amin was always finding the index 0, so I wrote this sample code to check that it is indeed the case:

#include <cstdio>
#include <chrono>
#include <vector>

#define CL_USE_DEPRECATED_OPENCL_1_1_APIS // to disable deprecation warnings
#define CL_USE_DEPRECATED_OPENCL_1_2_APIS // to disable deprecation warnings

// Includes the C++ OpenCL API. If not yet available, it can be found here:
// https://www.khronos.org/registry/cl/api/1.1/cl.hpp
#include "cl.hpp"

// Includes the CLBlast library
#include <clblast.h>

// =================================================================================================

int main() {

  // OpenCL platform/device settings
  const auto platform_id = 0;
  const auto device_id = 0;

  // Initializes the OpenCL platform
  auto platforms = std::vector<cl::Platform>();
  cl::Platform::get(&platforms);
  if (platforms.size() == 0 || platform_id >= platforms.size()) { return 1; }
  auto platform = platforms[platform_id];

  // Initializes the OpenCL device
  auto devices = std::vector<cl::Device>();
  platform.getDevices(CL_DEVICE_TYPE_ALL, &devices);
  if (devices.size() == 0 || device_id >= devices.size()) { return 1; }
  auto device = devices[device_id];

  // Creates the OpenCL context, queue, and an event
  auto device_as_vector = std::vector<cl::Device>{device};
  auto context = cl::Context(device_as_vector);
  auto queue = cl::CommandQueue(context, device);
  auto event = cl_event{nullptr};

  const size_t size = 8;

  // Populate host matrices with some example data
  std::vector<float> host_vec = {2.0, 3.3, 4.2, -1.1, 13.37, -4.0, 7, -1.7};
  int host_res = -1;

  // Copy the matrices to the device
  auto device_vec = cl::Buffer(context, CL_MEM_READ_WRITE, size * sizeof(float));
  auto device_res = cl::Buffer(context, CL_MEM_READ_WRITE, sizeof(int));
  queue.enqueueWriteBuffer(device_vec, CL_TRUE, 0, size*sizeof(float), host_vec.data());
  queue.enqueueWriteBuffer(device_res, CL_TRUE, 0, sizeof(int), &host_res);

  // Start the timer
  auto start_time = std::chrono::steady_clock::now();

  // Call the ISAMIN routine.
  auto queue_plain = queue();
  auto status = clblast::Amin<float>(size, device_res(), 0, device_vec(), 0, 1,
                                     &queue_plain, &event);

  // Record the execution time
  if (status == clblast::StatusCode::kSuccess) {
    clWaitForEvents(1, &event);
    clReleaseEvent(event);
  }
  auto elapsed_time = std::chrono::steady_clock::now() - start_time;
  auto time_ms = std::chrono::duration<double,std::milli>(elapsed_time).count();

  queue.enqueueReadBuffer(device_res, CL_TRUE, 0, sizeof(int), &host_res);

  // Example completed. See "clblast.h" for status codes (0 -> success).
  printf("Completed ISAMIN in %.3lf ms with status %d\n", time_ms, static_cast<int>(status));
  printf("Result: %d\n", host_res);
  return 0;
}

Amax displays 4 as expected, but Amin displays 0 instead of 3. Whatever the content of the vector, it always says 0.

@Nyrio Nyrio changed the title Amin always returns 0 Amin always finds index 0 May 15, 2019
@Nyrio
Copy link
Author

Nyrio commented May 16, 2019

I've created a branch on my fork where I add the sample above, if you want to reproduce the issue: https://github.com/Nyrio/CLBlast/tree/issue-359

@CNugteren
Copy link
Owner

I managed to reproduce, thanks. This is one of the few routines that isn't tested because it is not part of the regular BLAS interface.

It seems adding || defined(ROUTINE_AMIN) to this line resolves the issue, I'll have a more detailed look later.

@Nyrio
Copy link
Author

Nyrio commented May 17, 2019

Yes it's an extension. It can be tested against a custom implementation though, e.g:

template<typename scalar_t>
inline int iamin(const int N, const scalar_t *X, const int incX)
{
  int best = 0;
  for(int i = incX; i < N * incX; i += incX) {
    if(std::abs(X[i]) < std::abs(X[best])) {
      best = i;
    }
  }
  return best / incX;
}

@Nyrio
Copy link
Author

Nyrio commented May 20, 2019

Fixed, thanks!

@Nyrio Nyrio closed this as completed May 20, 2019
@CNugteren
Copy link
Owner

It can be tested against a custom implementation though, e.g:

I know, thanks for the snippet. The issue is that the CLBlast test framework doesn't easily accomodate these kind of tests (integer result and also non BLAS), thus there is a bit of boilerplate work. I though since this was only about 2 lines of different kernel code from another routine it should be safe but it turned out one of those 2 lines was wrong... I'll add a test when I have more time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants