Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving least squares interpolation #946

Merged
merged 32 commits into from
Dec 21, 2023
Merged

Conversation

mrlag31
Copy link
Contributor

@mrlag31 mrlag31 commented Sep 7, 2023

This is a follow-up from #919. It properly adds the moving least square interpolation method as well as some tests.

Copy link
Contributor

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some feedback on the example
(Haven't looked at the rest yet)

find_package(Boost COMPONENTS program_options)
if(Boost_FOUND)
add_subdirectory(viz)
add_subdirectory(raytracing)
add_subdirectory(brute_force)
endif()
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undo

Comment on lines 19 to 20
template <typename T>
KOKKOS_INLINE_FUNCTION double step(T const &p)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the argument templated but not the return type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old code when I tested different point dimensions. Simply forgot to remove the template.

template <typename T>
KOKKOS_INLINE_FUNCTION double step(T const &p)
{
return Kokkos::signbit(p[0]) ? 0 : 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also return 1 - signbit(p[0]) but I think

  auto const x = p[0];
  return x >= 0 ? 1 : 0;

might be more readable.

Kokkos::ScopeGuard guard(argc, argv);
ExecutionSpace space{};

static constexpr std::size_t num_points = 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider making it the default value and optionally taking it from the command line arguments?
This would be handy when studying convergence as the number of points increases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made that change. You can input the number of points using the command line arguments with --points.


auto approx_values = mls.apply(space, source_values);

double max_error = 0.;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually suggest not to initialize the value you reduce into because it may give the wrong impression to users,
that the reduced value will be contributed to an initial value or whatnot, when Kokkos will actually ignore that value and overwrite it with the result of the reduction.

Suggested change
double max_error = 0.;
double max_error;

Comment on lines 72 to 75
loc_error = Kokkos::max(
loc_error, Kokkos::abs(target_values(i) - approx_values(i)));
},
Kokkos::Max<double>(max_error));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you go for max absolute difference rather than L1 or L2 error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the first norm that came to my mind. I have changed it to use an L2 norm.

},
Kokkos::Max<double>(max_error));

std::cout << "Error: " << max_error << '\n';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you have a chance to study the convergence of the error as the number of points increases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit using the new L2 norm. It seems the more points, the better the error.

Comment on lines 62 to 65
ArborX::Interpolation::MovingLeastSquares<MemorySpace, double> mls(
space, source_points, target_points);

auto approx_values = mls.apply(space, source_values);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does mls.apply mean? I would prefer something more expressive like mls.interpolate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'apply' was the function name in DTK and I kept it, I will switch it

src/interpolation/ArborX_Interp.hpp Outdated Show resolved Hide resolved
Comment on lines +28 to +39
// This is done to avoid a clash with another predicates access trait
template <typename Points>
struct MLSTargetPointsPredicateWrapper
{
Points target_points;
int num_neighbors;
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which would that be?
What do the possible types for Points look like? I assume these are Views or would the implementation also allow for something different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation allows any predicates access traits as long as "get" resolves to a point. However, I could not really test if Points is valid there as ArborX::Details::check_valid_access_traits and ArborX::GeometryTraits::check_valid_geometry_traits does not return a boolean.

class MovingLeastSquares
{
public:
// If num_neighbors is 0 or negative, it will instead be a default value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the default value/behavior? Of course, the behavior should be documented in the Wiki.
I would move that comment to line 116. Also, did you consider using std::optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value is the size of the polynomial basis. And using optional is a better alternative than using zero or a negative.

src/geometry/ArborX_GeometryTraits.hpp Show resolved Hide resolved
int const num_targets = tgt_acc::size(target_points);
_source_size = source_points.extent(0);
// There must be enough source points
ARBORX_ASSERT(num_neighbors <= _source_size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure a ArborX::SearchException makes sense here.

Suggested change
ARBORX_ASSERT(num_neighbors <= _source_size);
KOKKOS_ASSERT(num_neighbors <= _source_size);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this code is part of ArborX, it makes more sense to use ArborX's assert than Kokkos one. It is also used elsewhere for non-search errors as well.

SourcePoints const &source_points,
TargetPoints const &target_points)
: MovingLeastSquares(space, source_points, target_points,
CRBF::Wendland<2>{}, PolynomialDegree<2>{})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to document default values. Why did you choose these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They were the ones that yielded the best results in my firsts small examples. We could discuss on which default would actually be best.

Kokkos::view_alloc(Kokkos::WithoutInitializing,
"ArborX::MovingLeastSquares::source_view"),
num_targets, num_neighbors);
auto const values_indices = _values_indices;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KOKKOS_CLASS_LAMBDA?

Copy link
Contributor Author

@mrlag31 mrlag31 Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previously, it was to get around nvcc's lambda restrictions.

Kokkos::View<double *, MemorySpace> tgtv0("Testing::tgtv0", 3);
Kokkos::View<double **, MemorySpace> coeffs0("Testing::coeffs0", 0, 0);
Kokkos::parallel_for(
"for", Kokkos::RangePolicy<ExecutionSpace>(space, 0, 3),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"for", Kokkos::RangePolicy<ExecutionSpace>(space, 0, 3),
"Testing::mls_coefficients_case_1", Kokkos::RangePolicy<ExecutionSpace>(space, 0, 3),

etc.

Kokkos::view_alloc(Kokkos::WithoutInitializing, "Example::target_values"),
num_points);
filledBoxRandom(0.5, source_points);
filledBoxRandom(0.5, target_points);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of convergence behavior would you expect based on the number (and positions) of the source points?
IMHO, the interpretation of the error would be easier if at least the target_points were evenly distributed.

Copy link
Contributor Author

@mrlag31 mrlag31 Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, both source and target points are uniformely distributed. However, I may try to implement a way to use any regular mesh.

I would expect, depending on the number of source points, to converge as an inverse exponential. I do not really know regarding position of the source point.

<< '\n';
}

dump_file_stream.close();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary. std::fstream will close in the destructor.

Suggested change
dump_file_stream.close();

int const num_targets = tgt_acc::size(target_points);
_source_size = source_points.extent(0);
// There must be enough source points
ARBORX_ASSERT(0 < num_neighbors_val && num_neighbors_val <= _source_size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ARBORX_ASSERT(0 < num_neighbors_val && num_neighbors_val <= _source_size);
KOKKOS_ASSERT(0 < num_neighbors_val && num_neighbors_val <= _source_size);


// Source values must be a valuation on the points so must be as big as the
// original input
ARBORX_ASSERT(_source_size == source_values.extent_int(0));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ARBORX_ASSERT(_source_size == source_values.extent_int(0));
KOKKOS_ASSERT(_source_size == source_values.extent_int(0));

@mrlag31
Copy link
Contributor Author

mrlag31 commented Nov 21, 2023

nvcc 11.0.3 used to segfault on the previous CI (and probably still segfaults). Except that, it should be ready to be merged.

Copy link
Collaborator

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does a typical profile look like?

Comment on lines 29 to 24
// Randomly fills a 2 * half_edge box centered at 0
void filledBoxRandom(double half_edge,
Kokkos::View<Point *, MemorySpace> points)
{
int n = points.extent(0);
auto points_host = Kokkos::create_mirror_view(points);

std::uniform_real_distribution<double> dist(-half_edge, half_edge);
std::random_device rd;
std::default_random_engine gen(rd());
auto random = [&dist, &gen]() { return dist(gen); };
for (int i = 0; i < n; ++i)
for (int d = 0; d < DIM; ++d)
points_host(i)[d] = random();

Kokkos::deep_copy(points, points_host);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need multiple variants of creating points in an example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is meant to show the user that they can use any set of points, either random or a regular mesh. It can be discussed if it should be kept or not.

Kokkos::deep_copy(points, points_host);
}

// Switches the set of points to the new base
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, everything is generated in a box. This lets skew and rotate the box however the user wants to. Like the previous comment, it can be removed, as well as the creation of the basis matrix.

Comment on lines 142 to 66
// Basis change
Kokkos::View<Point[DIM], MemorySpace> source_basis("Example::source_basis");
Kokkos::View<Point[DIM], MemorySpace> target_basis("Example::target_basis");
Kokkos::parallel_for(
"Example::fill_basis", Kokkos::RangePolicy<ExecutionSpace>(space, 0, 1),
KOKKOS_LAMBDA(int const) {
for (int i = 0; i < DIM; i++)
{
source_basis(i) = Point{};
target_basis(i) = Point{};
for (int j = 0; j < DIM; j++)
{
source_basis(i)[j] = double(i == j);
target_basis(i)[j] = double(i == j);
}
}
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just the identity why do we need to change the basis?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly to show that the user can do it. I also used it to build a skeded/triangular mesh in my report. It can be removed with the basis change function.

num_points);

// Sets the meshes
filledBoxEven(0.5, {side_len, side_len}, source_points);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just have this function return the source_points (and similar for the other Views/functions)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be done. But I don't know which of "returning the view" or "giving an uninitialized view" is the better design choice.

Comment on lines 119 to 121
(!num_neighbors)
? Details::polynomialBasisSize<dimension, PolynomialDegree::value>()
: *num_neighbors;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(!num_neighbors)
? Details::polynomialBasisSize<dimension, PolynomialDegree::value>()
: *num_neighbors;
num_neighbors
? *num_neighbors
: Details::polynomialBasisSize<dimension, PolynomialDegree::value>();

to improve readability some.

auto const source_view = fillValuesIndicesAndGetSourceView(
space, indices, offsets, num_targets, num_neighbors_val, source_points);

// Compute the Moving Least Squares
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Compute the Moving Least Squares
// Compute the Moving Least Squares coefficients

Comment on lines 158 to 159
Details::movingLeastSquaresCoefficients<CRBF, PolynomialDegree>(
space, source_view, target_points, _coeffs);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason not to return the coefficients?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like previously, I do not know if it is better to return the coefficients or to give an uninitialized array to it. Querying a tree requires to give out empty views for example.

@mrlag31
Copy link
Contributor Author

mrlag31 commented Nov 22, 2023

What does a typical profile look like?

What do you mean by "profile"?

@masterleinad
Copy link
Collaborator

What do you mean by "profile"?

What does the output for space-time-stack output look like for sufficiently big problems?

@mrlag31
Copy link
Contributor Author

mrlag31 commented Nov 22, 2023

Using Cuda and my laptop's Nvidia A1000, here are the results for both 4096 points and 1000000 points (source and target).

sts4k.txt
sts1m.txt

In short, to compute the coefficients, it lasts 3ms for 4096 points and 200ms for 1000000 points. And, as I had to do the same measures in my report, the duration scales linearly with the number of points.

@masterleinad
Copy link
Collaborator

In short, to compute the coefficients, it lasts 3ms for 4096 points and 200ms for 1000000 points. And, as I had to do the same measures in my report, the duration scales linearly with the number of points.

So ArborX::SymmetricPseudoInverseSVD::computations or ArborX::BVH::query::nearest would be where to look first if we wanted to improve performance.

@Rombur
Copy link
Collaborator

Rombur commented Nov 27, 2023

nvcc 11.0.3 used to segfault on the previous CI (and probably still segfaults). Except that, it should be ready to be merged.

This PR doesn't compile with CUDA 11.0.3 https://cloud.cees.ornl.gov/jenkins-ci/job/arborx/job/PR-946/4/pipeline-console/?selected-node=44

@Rombur
Copy link
Collaborator

Rombur commented Nov 27, 2023

Retest this please.

@dalg24
Copy link
Contributor

dalg24 commented Nov 27, 2023

retest this please

@masterleinad
Copy link
Collaborator

I can compile and run the example successfully on Summit using

$ module list 

Currently Loaded Modules:
  1) xl/16.1.1-10                     3) lsf-tools/2.0   5) darshan-runtime/3.4.0-lite   7) DefApps                   9) nsight-systems/2021.3.1.54  11) cmake/3.23.2
  2) spectrum-mpi/10.4.0.3-20210112   4) hsi/5.0.2.p5    6) xalt/1.2.1                   8) nsight-compute/2021.2.1  10) cuda/11.0.3                 12) boost/1.77.0

@masterleinad
Copy link
Collaborator

masterleinad commented Nov 27, 2023

@mrlag31 Can you post the issues you are seeing here if you can find them?

@mrlag31
Copy link
Contributor Author

mrlag31 commented Nov 28, 2023

For the moment, the crash appeared after updating my code to the new callback interface. I will investigate it.

Copy link
Collaborator

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with cleaning up in a follow-up.

@aprokop
Copy link
Contributor

aprokop commented Dec 21, 2023

Couple builds did not run (HIP, SYCL, Cuda-Clang), but that's fine.

@aprokop aprokop merged commit a522958 into arborx:master Dec 21, 2023
1 of 2 checks passed
@aprokop aprokop added the enhancement New feature or request label Dec 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants