Skip to content

CpuMath Enhancement: Preamble for hardware intrinsics implementation #830

@briancylui

Description

@briancylui

Style changes needed to solve part of #823

Details

  • Do "preamble" for the implementation of SSE/AVX intrinsics in src\Microsoft.ML.CpuMath\SseIntrinsics.cs and src\Microsoft.ML.CpuMath\AvxIntrinsics.cs:

Preamble:

  1. while (!aligned) { do scalar operation; } // preamble
  2. Do vectorized operation using ReadAligned
  3. while (!end) { do scalar operation; }
    For large arrays, especially those that cross cache line or page boundaries, doing this should save some measurable amount of time.

Reference: https://github.com/dotnet/machinelearning/pull/562/files/f0f81a5019a3c8cbd795a970e40d633e9e1770c1#r204061074
#1143

Currently these functions are just using Unaligned Loads, we can make them after by aligning the data and doing aligned loads.

  • AddScalerU
  • ScaleSrcU
  • AddScaleU
  • ScaleAddU
  • AddU
  • AddScaleCopyU
  • AddSU
  • MulElementWiseU
  • SumU
  • SumSqU
  • SumSqDiffU
  • SumAbsU
  • SumAbsDiffU
  • MaxAbsU
  • MaxAbsDiffU
  • DotU
  • DotSU
  • Dist2
  • SdcaL1UpdateU
  • SdcaL1UpdateSU

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Priority of the issue for triage purpose: Needs to be fixed at some point.enhancementNew feature or requestup-for-grabsA good issue to fix if you are trying to contribute to the project

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions