Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for composite devices #1993

Merged
merged 17 commits into from
Apr 1, 2025
Merged

Support for composite devices #1993

merged 17 commits into from
Apr 1, 2025

Conversation

ndgrigorian
Copy link
Collaborator

@ndgrigorian ndgrigorian commented Feb 14, 2025

This PR proposes supporting the oneAPI composite devices extension in dpctl, which exposes multi-tile GPU devices as root devices for each tile, but also provides access to the composite devices from level zero prior to ZE_FLAT_DEVICE_HIERARCHY=FLAT becoming the default. i.e., with ZE_FLAT_DEVICE_HIERARCHY=COMPOSITE, the composite devices are root devices, but with ZE_FLAT_DEVICE_HIERARCHY=COMBINED, a user gains access to these composite devices through dedicated APIs.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • Have you added documentation for your changes, if necessary?
  • Have you added your changes to the changelog?
  • If this PR is a work in progress, are you opening the PR as a draft?

Copy link

github-actions bot commented Feb 14, 2025

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

Copy link

Array API standard conformance tests for dpctl= ran successfully.
Passed: 893
Failed: 3
Skipped: 118

@coveralls
Copy link
Collaborator

coveralls commented Feb 14, 2025

Coverage Status

coverage: 86.379% (-0.4%) from 86.752%
when pulling c23cbcb on support-composite-devices
into 63f5129 on master.

@ndgrigorian ndgrigorian force-pushed the support-composite-devices branch 2 times, most recently from 2d5942d to 76f2fc2 Compare February 21, 2025 20:55
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_544 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

@ndgrigorian ndgrigorian force-pushed the support-composite-devices branch from 76f2fc2 to 00cb089 Compare March 1, 2025 09:10
Copy link

github-actions bot commented Mar 1, 2025

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_25 ran successfully.
Passed: 897
Failed: 0
Skipped: 125

@ndgrigorian ndgrigorian force-pushed the support-composite-devices branch from 00cb089 to cc5ef08 Compare March 5, 2025 17:03
Copy link

github-actions bot commented Mar 5, 2025

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_27 ran successfully.
Passed: 896
Failed: 1
Skipped: 126

Copy link

github-actions bot commented Mar 6, 2025

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_31 ran successfully.
Passed: 895
Failed: 2
Skipped: 126

@ndgrigorian ndgrigorian force-pushed the support-composite-devices branch from 7ea744b to b8c765f Compare March 6, 2025 02:31
Copy link

github-actions bot commented Mar 6, 2025

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_31 ran successfully.
Passed: 895
Failed: 2
Skipped: 126

@ndgrigorian ndgrigorian force-pushed the support-composite-devices branch 2 times, most recently from 4ab4cb0 to 8327887 Compare March 6, 2025 05:44
@ndgrigorian ndgrigorian marked this pull request as ready for review March 6, 2025 05:45
Copy link

github-actions bot commented Mar 6, 2025

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_34 ran successfully.
Passed: 895
Failed: 2
Skipped: 126

1 similar comment
Copy link

github-actions bot commented Mar 6, 2025

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_34 ran successfully.
Passed: 895
Failed: 2
Skipped: 126

Copy link

github-actions bot commented Mar 6, 2025

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_34 ran successfully.
Passed: 896
Failed: 1
Skipped: 126

@ndgrigorian ndgrigorian force-pushed the support-composite-devices branch from 8327887 to d27f1d0 Compare March 28, 2025 03:02
Copy link

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_82 ran successfully.
Passed: 894
Failed: 2
Skipped: 154

This leverages oneAPI extension for composite devices to add the free function `ext_oneapi_get_composite_devices` to the main dpctl namespace
This method is only applicable for level_zero backend, returning an empty list for all other backend types
…aspect_component`

Aligns with the rest of the SyclDevice properties

Adds tests for the aspects
@ndgrigorian ndgrigorian force-pushed the support-composite-devices branch from d27f1d0 to 9cb7b50 Compare March 31, 2025 16:34
Copy link

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_85 ran successfully.
Passed: 894
Failed: 2
Skipped: 154

@ndgrigorian
Copy link
Collaborator Author

@antonwolfy
ping

Copy link

github-actions bot commented Apr 1, 2025

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_99 ran successfully.
Passed: 894
Failed: 2
Skipped: 154

Copy link
Collaborator

@antonwolfy antonwolfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ndgrigorian, LGTM!

Copy link

github-actions bot commented Apr 1, 2025

Array API standard conformance tests for dpctl=0.20.0dev0=py310h93fe807_100 ran successfully.
Passed: 894
Failed: 2
Skipped: 154

@ndgrigorian ndgrigorian merged commit 67317b0 into master Apr 1, 2025
71 of 72 checks passed
@ndgrigorian ndgrigorian deleted the support-composite-devices branch April 1, 2025 20:49
DPCTLDevice_GetComponentDevices(__dpctl_keep const DPCTLSyclDeviceRef DRef)
{
using vecTy = std::vector<DPCTLSyclDeviceRef>;
vecTy *ComponentDevicesVectorPtr = nullptr;
Copy link

@AlexanderKalistratov AlexanderKalistratov Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use unique_ptr here to avoid calling delete by hands.

__dpctl_give DPCTLDeviceVectorRef
DPCTLDevice_GetComponentDevices(__dpctl_keep const DPCTLSyclDeviceRef DRef)
{
    using vecTy = std::vector<DPCTLSyclDeviceRef>;

    if (!DRef) {
        return nullptr;
    }

    auto D = unwrap<device>(DRef);
    try {
        auto componentDevices =
            D->get_info<sycl::ext::oneapi::experimental::info::device::
                            component_devices>();
        auto ComponentDevicesVectorPtr = std::make_unique(new vecTy());
        ComponentDevicesVectorPtr->reserve(componentDevices.size());
        for (const auto &cd : componentDevices) {
            ComponentDevicesVectorPtr->emplace_back(
                wrap<device>(new device(cd)));
        }

        return wrap<vecTy>(ComponentDevicesVectorPtr.release());
    } catch (std::exception const &e) {
        error_handler(e, __FILE__, __func__, __LINE__);
    }

    return nullptr;
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that might be smart to go back and do throughout the libsyclinterface codebase

__dpctl_give DPCTLSyclDeviceRef
DPCTLDevice_GetCompositeDevice(__dpctl_keep const DPCTLSyclDeviceRef DRef)
{
auto D = unwrap<device>(DRef);
Copy link

@AlexanderKalistratov AlexanderKalistratov Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function feels a bit overcomplicated. Is this done for purpose?

__dpctl_give DPCTLSyclDeviceRef
DPCTLDevice_GetCompositeDevice(__dpctl_keep const DPCTLSyclDeviceRef DRef)
{
    if (!DRef) {
        return nullptr;
    }

    auto D = unwrap<device>(DRef);
    try {
        bool is_component = D->has(sycl::aspect::ext_oneapi_is_component);
        if (is_component) {
            const auto &compositeDevice =
            D->get_info<sycl::ext::oneapi::experimental::info::device::
                            composite_device>();

            return wrap<device>(new device(compositeDevice));
        }
    } catch (std::exception const &e) {
        error_handler(e, __FILE__, __func__, __LINE__);
    }

    return nullptr;
}

*/
__dpctl_give DPCTLDeviceVectorRef DPCTLDeviceMgr_GetCompositeDevices()
{
using vecTy = std::vector<DPCTLSyclDeviceRef>;
Copy link

@AlexanderKalistratov AlexanderKalistratov Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. unique_ptr + combine two try...catch into one:

__dpctl_give DPCTLDeviceVectorRef DPCTLDeviceMgr_GetCompositeDevices()
{
    using vecTy = std::vector<DPCTLSyclDeviceRef>;

    try {
        auto Devices = std::make_unique(new vecTy());

        auto composite_devices =
            ext::oneapi::experimental::get_composite_devices();
        Devices->reserve(composite_devices.size());
        for (const auto &CDev : composite_devices) {
            Devices->emplace_back(wrap<device>(new device(std::move(CDev))));
        }

        return wrap<vecTy>(Devices.release());
    } catch (std::exception const &e) {
        error_handler(e, __FILE__, __func__, __LINE__);
    }

    return nullptr;
}

@@ -316,3 +316,39 @@ DPCTLPlatform_GetDevices(__dpctl_keep const DPCTLSyclPlatformRef PRef,
return nullptr;
}
}

__dpctl_give DPCTLDeviceVectorRef
DPCTLPlatform_GetCompositeDevices(__dpctl_keep const DPCTLSyclPlatformRef PRef)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here:

__dpctl_give DPCTLDeviceVectorRef
DPCTLPlatform_GetCompositeDevices(__dpctl_keep const DPCTLSyclPlatformRef PRef)
{
    if (!PRef) {
        error_handler("Cannot retrieve composite devices from "
                      "DPCTLSyclPlatformRef as input is a nullptr.",
                      __FILE__, __func__, __LINE__);
        return nullptr;
    }

    auto P = unwrap<platform>(PRef);

    using vecTy = std::vector<DPCTLSyclDeviceRef>;
    try {
        auto DevicesVectorPtr = std::unique_ptr<>(new vecTy());

        auto composite_devices = P->ext_oneapi_get_composite_devices();
        DevicesVectorPtr->reserve(composite_devices.size());
        for (const auto &Dev : composite_devices) {
            DevicesVectorPtr->emplace_back(
                wrap<device>(new device(std::move(Dev))));
        }

        return wrap<vecTy>(DevicesVectorPtr.release());
    } catch (std::exception const &e) {
        error_handler(e, __FILE__, __func__, __LINE__);
    }


    return nullptr;
}

{
using vecTy = std::vector<DPCTLSyclDeviceRef>;
vecTy *ComponentDevicesVectorPtr = nullptr;
if (DRef) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some functions we are checking for XRef to be non nullptr. (https://github.com/IntelPython/dpctl/pull/1993/files#diff-93f8a41da82dd5dd2ae6b05e019d8104c87c244e6077d5a819625a2fc5e3a2f1R858)

In others we are checking that result of auto X = unwrap<...>(XRef) is non nullptr. (https://github.com/IntelPython/dpctl/pull/1993/files#diff-93f8a41da82dd5dd2ae6b05e019d8104c87c244e6077d5a819625a2fc5e3a2f1R883, https://github.com/IntelPython/dpctl/pull/1993/files#diff-2cee49fe41f858d191076f530e6c6ce6ca841be844b8bca525509ed3deddbbaeR324)

Is that intentionally?
Are these checks interchangeable?
Can we unwrap<...>(XRef) if XRef is nullptr?
Can result of unwrap<...>(XRef) be nullptr if XRef is not nullptr?

Copy link
Collaborator Author

@ndgrigorian ndgrigorian Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are treated as more or less interchangeable throughout the file, and throughout libsyclinterface.

As to if we can unwrap a nullptr, yes, there are a few tests which more or less do this throughout the libsyclinterface tests.

And the last question, I can't think of such a case. Given it's treated as interchangeable: probably not.

Copy link

@AlexanderKalistratov AlexanderKalistratov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit late, but I have some comments:

  1. try to avoid calling delete by hands and use unique_ptr instead
  2. In most of the functions two try...catch blocks could be combined into one.
  3. I have question about equivalence of two checks:
if (XRef) {
    auto X = unwrap<...>(XRef);
    ...
}

vs

auto X = unwrap<...>(XRef);
if (X) {
    ...
}

@ndgrigorian
Copy link
Collaborator Author

A bit late, but I have some comments:

  1. try to avoid calling delete by hands and use unique_ptr instead
  2. In most of the functions two try...catch blocks could be combined into one.
  3. I have question about equivalence of two checks:
if (XRef) {
    auto X = unwrap<...>(XRef);
    ...
}

vs

auto X = unwrap<...>(XRef);
if (X) {
    ...
}

Suggestions themselves make sense, I think it may be most sensible to open up (a) PR(s) doing some refactoring and clean-up in libsyclinterface

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants