Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tensor product evaluation for MappingQGeneric::fill_fe_face_values #9867

Open
kronbichler opened this issue Apr 10, 2020 · 2 comments
Open

Comments

@kronbichler
Copy link
Member

kronbichler commented Apr 10, 2020

MappingQGeneric already supports the fast tensorized evaluation of quantities on cells via the code

// do the actual tensorized evaluation
SelectEvaluator<dim, -1, 0, n_comp, VectorizedArray<double>>::
evaluate(data.shape_info,
data.values_dofs.begin(),
data.values_quad.begin(),
data.gradients_quad.begin(),
data.hessians_quad.begin(),
data.scratch.begin(),
evaluate_values,
evaluate_gradients,
evaluate_hessians);

We should introduce the same for the FEFaceValues (geometry part). The reason for this is that the constructor of FEFaceValues is very expensive for higher mapping degrees and many quadrature points. More precisely, if we have a MappingQGeneric description of degree p and r quadrature points per direction, the initialization in 3D evaluates (p+1)^3 polynomials on r^2 points, storing a table of (p+1)^3 r^2 values. What makes things worse is that we store this information on all 6 faces and with all 8 possible values for the face orientation, so a total of 48 (p+1)^3 r^2 data items need to be evaluated by TensorProductPolynomials and stored. To give a number, for p=15 and r=20 this equates to 630 MB - just that part of FEFaceValues! Parallelization won't help because we construct a separate FEFaceValues on all threads and all MPI ranks. This level of memory consumption makes this a blocker. The PR #9656 circumvented this problem for the setup of MatrixFree, which is one of the primary users of high mapping degrees and quadrature formulas with many degrees, but it would of course be better that we also provide this in other contexts, rather than saying we can't use deal.II for those degrees and quadrature formulas.

This needs the following ingredients:

  • We need to extract the reordering tables for adjusting for the face orientation from the matrix-free context to a static function:
    const unsigned int n = quadrature_1d.size();
    face_orientations.reinit(8, n * n);
    for (unsigned int j = 0, i = 0; j < n; ++j)
    for (unsigned int k = 0; k < n; ++k, ++i)
    {
    // face_orientation=true, face_flip=false, face_rotation=false
    face_orientations[0][i] = i;
    // face_orientation=false, face_flip=false, face_rotation=false
    face_orientations[1][i] = j + k * n;
    // face_orientation=true, face_flip=true, face_rotation=false
    face_orientations[2][i] = (n - 1 - k) + (n - 1 - j) * n;
    // face_orientation=false, face_flip=true, face_rotation=false
    face_orientations[3][i] = (n - 1 - j) + (n - 1 - k) * n;
    // face_orientation=true, face_flip=false, face_rotation=true
    face_orientations[4][i] = j + (n - 1 - k) * n;
    // face_orientation=false, face_flip=false, face_rotation=true
    face_orientations[5][i] = k + (n - 1 - j) * n;
    // face_orientation=true, face_flip=true, face_rotation=true
    face_orientations[6][i] = (n - 1 - j) + k * n;
    // face_orientation=false, face_flip=true, face_rotation=true
    face_orientations[7][i] = (n - 1 - k) + j * n;
    }
  • We need to check whether the quadrature formula is a tensor product at the right spot. We make the check here:
    tensor_product_quadrature = q.is_tensor_product();

    but here we have already expanded into all quadrature variants without tensor product here due to:
    data.initialize_face(this->requires_update_flags(update_flags),
    QProjector<dim>::project_to_all_faces(quadrature),
    quadrature.size());

    For background information, the expansion into the 48 cases happens here (it is even worse for the subface values):
    for (const auto &mutation : q)
    {
    // project to each face and append
    // results
    for (unsigned int face = 0; face < n_faces; ++face)
    {
    project_to_face(mutation, face, help);
    std::copy(help.begin(), help.end(), std::back_inserter(q_points));
    }
    // next copy over weights
    for (unsigned int face = 0; face < n_faces; ++face)
    std::copy(mutation.get_weights().begin(),
    mutation.get_weights().end(),
    std::back_inserter(weights));
    }
  • Once we have convinced ourselves that we have a tensor product quadrature formula, which needs some restructuring in the internals of MappingQGeneric, we can set up a ShapeInfo similarly to the cell case.
  • For the evaluation part, we would make a call in analogy to this one:
    // now let the matrix-free evaluators provide us with the
    // data on faces
    FEFaceEvaluationSelector<dim,
    -1,
    0,
    dim,
    double,
    VectorizedDouble>::
    evaluate(shape_info,
    cell_points.data(),
    face_quads.data(),
    face_grads.data(),
    scratch_data.data(),
    true,
    true,
    face_no,
    GeometryInfo<dim>::max_children_per_cell,
    faces[face].face_orientation > 8 ?
    faces[face].face_orientation - 8 :
    0,
    my_data.descriptor[0].face_orientations);

    Note that we here feed the orientation table and the right orientation, and we can also provide an index for the subface values.
@kronbichler
Copy link
Member Author

FYI @nfehn

@kronbichler
Copy link
Member Author

kronbichler commented Apr 7, 2023

Note that #15035 can address this from another angle: While slightly less efficient than the full tensor product path, that PR addresses the fundamental problem of that function with high orders. With some optimizations that @bergbauer is already planning for in terms of the faces, we might end up with an algorithm of complexity mapping_degree^2 x n_q_points, which is not much worse than the best case mapping_degree x n_q_points, but without having to check the face quadrature formula at all.

Once #15035 is addressed, we should revisit this PR and evaluate the performance and memory consumption, it looks like we can skip the high memory consumption mentioned in the original post.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant