Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compilation speed-up #9790

Closed
tjhei opened this issue Apr 1, 2020 · 13 comments
Closed

compilation speed-up #9790

tjhei opened this issue Apr 1, 2020 · 13 comments

Comments

@tjhei
Copy link
Member

tjhei commented Apr 1, 2020

@masterleinad generated a compile analysis using ClangBuildAnalyzer of a current build. I am attaching the info below in the hope that we can pick off a couple of items to improve compilation speed.

Analyzing build trace from 'summarize_clang_build'...
**** Time summary:
Compilation (1273 times):
  Parsing (frontend):         4589.7 s
  Codegen & opts (backend):   6030.1 s

**** Files that took longest to parse (compiler frontend):
 22131 ms: /source/numerics/CMakeFiles/obj_numerics_debug.dir/vector_tools_project_codim.cc.json
 20643 ms: /source/numerics/CMakeFiles/obj_numerics_debug.dir/vector_tools_project_qp.cc.json
 20232 ms: /source/numerics/CMakeFiles/obj_numerics_debug.dir/vector_tools_project_qpmf.cc.json
 20101 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_codim.cc.json
 19086 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_qp.cc.json
 18824 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_qpmf.cc.json
 18234 ms: /source/matrix_free/CMakeFiles/obj_matrix_free_debug.dir/evaluation_selector.cc.json
 17631 ms: /source/matrix_free/CMakeFiles/obj_matrix_free_release.dir/evaluation_selector.cc.json
 16941 ms: /source/numerics/CMakeFiles/obj_numerics_debug.dir/vector_tools_project_inst3.cc.json
 16819 ms: /source/numerics/CMakeFiles/obj_numerics_debug.dir/vector_tools_project_inst2.cc.json

**** Files that took longest to codegen (compiler backend):
149482 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_interpolate.cc.json
119115 ms: /source/dofs/CMakeFiles/obj_dofs_release.dir/dof_tools_sparsity.cc.json
 88797 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_inst3.cc.json
 73585 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/fe_field_function.cc.json
 68626 ms: /source/matrix_free/CMakeFiles/obj_matrix_free_release.dir/evaluation_selector.cc.json
 67495 ms: /source/fe/CMakeFiles/obj_fe_release.dir/fe_tools_interpolate.cc.json
 63666 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_inst2.cc.json
 63010 ms: /source/numerics/CMakeFiles/obj_numerics_debug.dir/vector_tools_interpolate.cc.json
 61724 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_qp.cc.json
 60535 ms: /source/matrix_free/CMakeFiles/obj_matrix_free_release.dir/matrix_free.cc.json

**** Templates that took longest to instantiate:
 18824 ms: dealii::Triangulation<2, 2> (410 times, avg 45 ms)
 18329 ms: dealii::Triangulation<3, 3> (410 times, avg 44 ms)
 17674 ms: dealii::Triangulation<1, 1> (410 times, avg 43 ms)
 16625 ms: dealii::VectorTools::internal::project<dealii::Vector<double>, 1> (4 times, avg 4156 ms)
 15722 ms: boost::archive::detail::common_iarchive<boost::archive::binary_iarch... (2553 times, avg 6 ms)
 15637 ms: dealii::VectorTools::internal::project_matrix_free_copy_vector<1, de... (4 times, avg 3909 ms)
 15633 ms: dealii::VectorTools::internal::project_matrix_free_component<1, doub... (4 times, avg 3908 ms)
 15424 ms: dealii::SolverCG<dealii::Vector<double> >::SolverCG (74 times, avg 208 ms)
 15147 ms: dealii::hp::DoFHandler<2, 3> (334 times, avg 45 ms)
 14631 ms: dealii::Triangulation<1, 2> (410 times, avg 35 ms)
 13702 ms: dealii::VectorTools::internal::project<dealii::Vector<float>, 1> (4 times, avg 3425 ms)
 13623 ms: dealii::VectorTools::internal::project<dealii::Vector<double>, 2> (4 times, avg 3405 ms)
 13409 ms: dealii::VectorTools::internal::project_matrix_free_copy_vector<2, de... (4 times, avg 3352 ms)
 13406 ms: dealii::VectorTools::internal::project_matrix_free_component<2, doub... (4 times, avg 3351 ms)
 13135 ms: dealii::VectorTools::internal::project_matrix_free_copy_vector<1, de... (4 times, avg 3283 ms)
 13130 ms: dealii::VectorTools::internal::project_matrix_free_component<1, floa... (4 times, avg 3282 ms)
 12686 ms: dealii::Triangulation<2, 3> (410 times, avg 30 ms)
 12552 ms: dealii::VectorTools::internal::project<dealii::Vector<float>, 2> (4 times, avg 3138 ms)
 12500 ms: dealii::VectorTools::internal::project_matrix_free_copy_vector<2, de... (4 times, avg 3125 ms)
 12497 ms: dealii::VectorTools::internal::project_matrix_free_component<2, floa... (4 times, avg 3124 ms)
 12311 ms: boost::archive::basic_binary_iarchive<boost::archive::binary_iarchiv... (1284 times, avg 9 ms)
 12056 ms: dealii::Triangulation<1, 3> (410 times, avg 29 ms)
 12046 ms: dealii::SolverBase<dealii::Vector<double> >::SolverBase (92 times, avg 130 ms)
 11159 ms: dealii::VectorTools::internal::project_matrix_free_degree<1, 1, doub... (4 times, avg 2789 ms)
 10628 ms: dealii::Triangulation<1, 2>::Signals (410 times, avg 25 ms)
 10341 ms: boost::variant<boost::shared_ptr<void>, boost::signals2::detail::for... (418 times, avg 24 ms)
  9371 ms: dealii::VectorTools::internal::project_matrix_free_degree<1, 2, doub... (4 times, avg 2342 ms)
  9356 ms: dealii::Triangulation<2, 2>::Signals (410 times, avg 22 ms)
  9306 ms: dealii::Triangulation<3, 3>::Signals (410 times, avg 22 ms)
  9293 ms: std::vector<boost::variant<boost::weak_ptr<boost::signals2::detail::... (418 times, avg 22 ms)

**** Template sets that took longest to instantiate:
249506 ms: std::__and_<$> (176653 times, avg 1 ms)
214391 ms: std::unique_ptr<$> (16812 times, avg 12 ms)
199476 ms: std::__uniq_ptr_impl<$> (16812 times, avg 11 ms)
129558 ms: std::is_constructible<$> (107277 times, avg 1 ms)
126655 ms: std::__is_constructible_impl<$> (106365 times, avg 1 ms)
125173 ms: std::__is_direct_constructible<$> (105613 times, avg 1 ms)
122679 ms: std::__is_direct_constructible_new<$> (103919 times, avg 1 ms)
105109 ms: std::__is_direct_constructible_new_safe<$> (85175 times, avg 1 ms)
 94203 ms: dealii::Triangulation<$> (2460 times, avg 38 ms)
 73194 ms: dealii::internal::EvaluatorTensorProduct<$>::apply<$> (26786 times, avg 2 ms)
 72809 ms: dealii::VectorTools::internal::project<$> (100 times, avg 728 ms)
 70767 ms: dealii::VectorTools::internal::project_matrix_free_copy_vector<$> (79 times, avg 895 ms)
 70703 ms: dealii::VectorTools::internal::project_matrix_free_component<$> (20 times, avg 3535 ms)
 70691 ms: dealii::VectorTools::internal::project_matrix_free_degree<$> (80 times, avg 883 ms)
 70644 ms: dealii::VectorTools::internal::project_matrix_free<$> (320 times, avg 220 ms)
 67784 ms: std::is_destructible<$> (65903 times, avg 1 ms)
 62227 ms: boost::signals2::signal<$> (8458 times, avg 7 ms)
 60645 ms: std::vector<$> (46198 times, avg 1 ms)
 59188 ms: std::_TC<$>::_ConstructibleTuple<$> (17959 times, avg 3 ms)
 58212 ms: std::tuple<$> (18341 times, avg 3 ms)
 56029 ms: dealii::Triangulation<$>::Signals (2460 times, avg 22 ms)
 54698 ms: boost::signals2::detail::signal_impl<$> (8458 times, avg 6 ms)
 49330 ms: dealii::SelectEvaluator<$>::integrate (462 times, avg 106 ms)
 49172 ms: dealii::FEEvaluation<$>::integrate (424 times, avg 115 ms)
 48933 ms: dealii::SelectEvaluator<$>::evaluate (472 times, avg 103 ms)
 48107 ms: dealii::VectorTools::internal::project_parallel<$> (288 times, avg 167 ms)
 47829 ms: dealii::internal::EvaluationSelectorImplementation::symmetric_select... (140 times, avg 341 ms)
 47799 ms: dealii::internal::EvaluationSelectorImplementation::Factory<$>::inte... (140 times, avg 341 ms)
 47712 ms: std::map<$> (17729 times, avg 2 ms)
 47551 ms: std::pair<$> (17061 times, avg 2 ms)

**** Functions that took longest to compile:
  5954 ms: dealii::Triangulation<3, 3>::DistortedCellList dealii::internal::Tri... (/home/6da/dealii-clang-9/source/grid/tria.cc)
  1615 ms: dealii::PolynomialsAdini::PolynomialsAdini() (/home/6da/dealii-clang-9/source/base/polynomials_adini.cc)
  1530 ms: void dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 4> ... (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1507 ms: void dealii::MatrixFree<1, double, dealii::VectorizedArray<double, 2... (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1506 ms: void dealii::MatrixFree<1, float, dealii::VectorizedArray<float, 4> ... (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1495 ms: void dealii::MatrixFree<1, float, dealii::VectorizedArray<float, 4> ... (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1446 ms: void dealii::MatrixFree<2, double, dealii::VectorizedArray<double, 1... (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1430 ms: void dealii::MatrixFree<1, float, dealii::VectorizedArray<float, 1> ... (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1414 ms: void dealii::MatrixFree<2, double, dealii::VectorizedArray<double, 2... (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1411 ms: void dealii::MatrixFree<1, double, dealii::VectorizedArray<double, 1... (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1392 ms: void dealii::MatrixFree<2, float, dealii::VectorizedArray<float, 4> ... (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1390 ms: void dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 2... (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1384 ms: void dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 4> ... (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1384 ms: void dealii::MatrixFree<2, float, dealii::VectorizedArray<float, 4> ... (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1382 ms: void dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 1> ... (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1382 ms: dealii::FE_Nedelec<3>::initialize_restriction() (/home/6da/dealii-clang-9/source/fe/fe_nedelec.cc)
  1367 ms: void dealii::MatrixFree<2, float, dealii::VectorizedArray<float, 1> ... (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1366 ms: void dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 1> ... (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1364 ms: void dealii::MatrixFree<2, float, dealii::VectorizedArray<float, 1> ... (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1346 ms: void dealii::MatrixFree<1, float, dealii::VectorizedArray<float, 1> ... (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1344 ms: void dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 1... (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1216 ms: umfdi_solve(int, int const*, int const*, double const*, double*, dou... (/home/6da/dealii-clang-9/bundled/umfpack/UMFPACK/Source/umf_solve.cc)
  1210 ms: dealii::GridOut::write_svg(dealii::Triangulation<2, 2> const&, std::... (/home/6da/dealii-clang-9/source/grid/grid_out.cc)
  1151 ms: void dealii::internal::TriangulationImplementation::Implementation::... (/home/6da/dealii-clang-9/source/grid/tria.cc)
  1148 ms: void dealii::internal::TriangulationImplementation::Implementation::... (/home/6da/dealii-clang-9/source/grid/tria.cc)
  1126 ms: dealii::ParameterHandler::print_parameters_section(std::ostream&, de... (/home/6da/dealii-clang-9/source/base/parameter_handler.cc)
  1049 ms: umfdl_solve(long, long const*, long const*, double const*, double*, ... (/home/6da/dealii-clang-9/bundled/umfpack/UMFPACK/Source/umf_solve.cc)
   981 ms: dealii::PolynomialsNedelec<3>::evaluate(dealii::Point<3, double> con... (/home/6da/dealii-clang-9/source/base/polynomials_nedelec.cc)
   946 ms: dealii::FE_Nedelec<3>::convert_generalized_support_point_values_to_d... (/home/6da/dealii-clang-9/source/fe/fe_nedelec.cc)
   914 ms: dealii::FE_NedelecSZ<3, 3>::get_data(dealii::UpdateFlags, dealii::Ma... (/home/6da/dealii-clang-9/source/fe/fe_nedelec_sz.cc)

**** Function sets that took longest to compile / optimize:
 71603 ms: void dealii::internal::EvaluatorTensorProduct<$>::apply<$>(dealii::V... (5351 times, avg 13 ms)
 68166 ms: dealii::internal::FEEvaluationImpl<$>::evaluate(dealii::internal::Ma... (1021 times, avg 66 ms)
 60064 ms: dealii::SolutionTransfer<$>::prepare_for_coarsening_and_refinement(s... (720 times, avg 83 ms)
 47503 ms: dealii::internal::FEEvaluationImpl<$>::integrate(dealii::internal::M... (859 times, avg 55 ms)
 45616 ms: dealii::SolutionTransfer<$>::interpolate(std::vector<$> const&, std:... (720 times, avg 63 ms)
 32107 ms: dealii::internal::FEEvaluationImplCollocation<$>::evaluate(dealii::i... (627 times, avg 51 ms)
 28567 ms: void dealii::KellyErrorEstimator<$>::estimate<$>(dealii::Mapping<$> ... (240 times, avg 119 ms)
 28565 ms: dealii::internal::FEEvaluationImplTransformToCollocation<$>::integra... (980 times, avg 29 ms)
 28531 ms: void dealii::KellyErrorEstimator<$>::estimate<$>(dealii::Mapping<$> ... (240 times, avg 118 ms)
 25766 ms: void dealii::MatrixFree<$>::initialize_indices<$>(std::vector<$> con... (54 times, avg 477 ms)
 21154 ms: dealii::internal::FEEvaluationImplTransformToCollocation<$>::evaluat... (1033 times, avg 20 ms)
 21072 ms: dealii::internal::FEEvaluationImplCollocation<$>::integrate(dealii::... (595 times, avg 35 ms)
 18805 ms: std::vector<$>::_M_default_append(unsigned long) (1058 times, avg 17 ms)
 18593 ms: dealii::FE_Poly<$>::get_data(dealii::UpdateFlags, dealii::Mapping<$>... (567 times, avg 32 ms)
 18387 ms: void dealii::FEValuesViews::internal::do_function_derivatives<$>(dea... (369 times, avg 49 ms)
 18373 ms: void tbb::interface9::internal::dynamic_grainsize_mode<$>::work_bala... (444 times, avg 41 ms)
 18239 ms: dealii::internal::EvaluationSelectorImplementation::Factory<$>::eval... (1711 times, avg 10 ms)
 16226 ms: void dealii::FETools::interpolate<$>(dealii::DoFHandler<$> const&, d... (180 times, avg 90 ms)
 15377 ms: dealii::SolutionTransfer<$>::prepare_for_pure_refinement() (241 times, avg 63 ms)
 15197 ms: void dealii::SolverCG<$>::solve<$>(dealii::MatrixFreeOperators::Mass... (432 times, avg 35 ms)
 14452 ms: void dealii::MatrixCreator::internal::mass_assembler<$>(dealii::Tria... (144 times, avg 100 ms)
 14035 ms: dealii::Functions::FEFieldFunction<$>::vector_laplacian_list(std::ve... (245 times, avg 57 ms)
 14022 ms: dealii::Functions::FEFieldFunction<$>::vector_value_list(std::vector... (244 times, avg 57 ms)
 13666 ms: dealii::Functions::FEFieldFunction<$>::vector_gradient_list(std::vec... (243 times, avg 56 ms)
 13507 ms: void dealii::FETools::interpolate<$>(dealii::DoFHandler<$> const&, d... (144 times, avg 93 ms)
 13488 ms: std::_Rb_tree<$>::_M_get_insert_hint_unique_pos(std::_Rb_tree_const_... (546 times, avg 24 ms)
 13052 ms: void dealii::MGTransferMatrixFree<$>::do_prolongate_add<$>(unsigned ... (216 times, avg 60 ms)
 12801 ms: void dealii::MGTransferMatrixFree<$>::do_restrict_add<$>(unsigned in... (216 times, avg 59 ms)
 12282 ms: void dealii::FETools::interpolate<$>(dealii::DoFHandler<$> const&, d... (144 times, avg 85 ms)
 11421 ms: void dealii::FEEvaluationBase<$>::read_write_operation<$>(dealii::in... (129 times, avg 88 ms)

*** Expensive headers:
1670756 ms: /home/6da/dealii-clang-9/include/deal.II/base/point.h (included 532 times, avg 3140 ms), included via:
  operator.cc.json newton.templates.h parameter_handler.h patterns.h  (5396 ms)
  tria.cc.json geometry_info.h  (5074 ms)
  tensor_product_polynomials_const.cc.json tensor_product_polynomials_const.h  (4864 ms)
  tensor_polynomials_base.cc.json quadrature_lib.h quadrature.h  (4792 ms)
  cell_data_transfer.cc.json cell_data_transfer.templates.h cell_data_transfer.h tria.h p4est_wrappers.h geometry_info.h  (4685 ms)
  grid_refinement.cc.json tria.h geometry_info.h  (4645 ms)
  ...

690325 ms: /home/6da/dealii-clang-9/include/deal.II/base/utilities.h (included 636 times, avg 1085 ms), included via:
  tensor_function.cc.json tensor_function.templates.h tensor.h  (1860 ms)
  grid_refinement.cc.json tria.h geometry_info.h point.h tensor.h  (1823 ms)
  grid_refinement.cc.json  (1807 ms)
  tria.cc.json geometry_info.h point.h tensor.h  (1761 ms)
  cell_data_transfer.cc.json cell_data_transfer.templates.h cell_data_transfer.h tria.h p4est_wrappers.h geometry_info.h point.h tensor.h  (1677 ms)
  step-26.cc.json  (1666 ms)
  ...

583693 ms: /home/6da/dealii-clang-9/include/deal.II/base/tensor.h (included 612 times, avg 953 ms), included via:
  tensor_function.cc.json tensor_function.templates.h  (2555 ms)
  grid_refinement.cc.json tria.h geometry_info.h point.h  (1894 ms)
  tria.cc.json geometry_info.h point.h  (1855 ms)
  cell_data_transfer.cc.json cell_data_transfer.templates.h cell_data_transfer.h tria.h p4est_wrappers.h geometry_info.h point.h  (1750 ms)
  vector_tools_point_value.cc.json vector_tools.templates.h derivative_form.h  (1722 ms)
  partitioner.cc.json mpi_compute_index_owner_internal.h mpi.h array_view.h symmetric_tensor.h  (1671 ms)
  ...

526382 ms: /home/6da/dealii-clang-9/include/deal.II/base/geometry_info.h (included 438 times, avg 1201 ms), included via:
  tria.cc.json  (5905 ms)
  fe_raviart_thomas_nodal.cc.json qprojector.h  (4926 ms)
  cell_data_transfer.cc.json cell_data_transfer.templates.h cell_data_transfer.h tria.h p4est_wrappers.h  (4753 ms)
  grid_refinement.cc.json tria.h  (4725 ms)
  grid_out.cc.json  (4674 ms)
  fe_tools_extrapolate.cc.json fe_tools_extrapolate.templates.h p4est_wrappers.h  (4638 ms)
  ...

432562 ms: /home/6da/dealii-clang-9/include/deal.II/base/quadrature.h (included 362 times, avg 1194 ms), included via:
  tensor_polynomials_base.cc.json quadrature_lib.h  (4878 ms)
  step-61.cc.json  (4277 ms)
  step-61.cc.json  (4121 ms)
  step-29.cc.json quadrature_lib.h  (3992 ms)
  step-46.cc.json quadrature_lib.h  (3981 ms)
  step-11.cc.json quadrature_lib.h  (3971 ms)
  ...

411775 ms: /home/6da/dealii-clang-9/include/deal.II/base/quadrature_lib.h (included 332 times, avg 1240 ms), included via:
  tensor_polynomials_base.cc.json  (5303 ms)
  quadrature_selector.cc.json  (4564 ms)
  step-46.cc.json  (4431 ms)
  step-11.cc.json  (4407 ms)
  step-29.cc.json  (4395 ms)
  step-10.cc.json  (4343 ms)
  ...

393552 ms: /home/6da/dealii-clang-9/include/deal.II/grid/tria.h (included 410 times, avg 959 ms), included via:
  step-49.cc.json  (5657 ms)
  grid_refinement.cc.json  (5489 ms)
  grid_refinement.cc.json  (5273 ms)
  step-2.cc.json  (5257 ms)
  step-4.cc.json  (5106 ms)
  step-1.cc.json  (4972 ms)
  ...

362244 ms: /home/6da/dealii-clang-9/include/deal.II/base/function.h (included 366 times, avg 989 ms), included via:
  function.cc.json function.templates.h  (5084 ms)
  matrix_creator.cc matrix_creator.templates.h  (4922 ms)
  matrix_creator.cc matrix_creator.templates.h  (4691 ms)
  matrix_creator.cc.json matrix_creator.templates.h  (4279 ms)
  step-30.cc.json  (4182 ms)
  step-44.cc.json  (4161 ms)
  ...

274415 ms: /home/6da/dealii-clang-9/include/deal.II/base/numbers.h (included 710 times, avg 386 ms), included via:
  grid_refinement.cc.json template_constraints.h config.h  (719 ms)
  timer.cc.json exceptions.h config.h  (699 ms)
  cell_data_transfer.cc.json cell_data_transfer.templates.h config.h  (698 ms)
  tria.cc.json geometry_info.h config.h  (666 ms)
  thread_management.cc.json thread_management.h config.h  (650 ms)
  fe_series_fourier.cc.json config.h  (647 ms)
  ...

270899 ms: include/deal.II/base/config.h (included 710 times, avg 381 ms), included via:
  grid_refinement.cc.json template_constraints.h  (719 ms)
  timer.cc.json exceptions.h  (700 ms)
  cell_data_transfer.cc.json cell_data_transfer.templates.h  (699 ms)
  tria.cc.json geometry_info.h  (667 ms)
  thread_management.cc.json thread_management.h  (650 ms)
  fe_series_fourier.cc.json  (648 ms)
  ...

  done in 16.7s.
@masterleinad
Copy link
Member

It would probably help to split vector_tools.[templates.]h up into parts that we only need for the various vector_tools*.cc files.

@kronbichler
Copy link
Member

It would probably help to split vector_tools.[templates.]h up into parts that we only need for the various vector_tools*.cc files.

I fully agree - for compatibility we will need a user-visible header vector_tools.templates.h that includes all sub-functionality, whereas the .cc files only include the individual files.

tjhei added a commit to tjhei/dealii that referenced this issue Apr 2, 2020
config.h includes numbers.h, which contains vectorization-related
funtionality, that is likely not needed by everyone. Fix this.

part of dealii#9790
@masterleinad
Copy link
Member

I am looking into splitting up VectorTools.

@masterleinad
Copy link
Member

masterleinad commented May 21, 2020

With 9.2 I am seeing

Analyzing build trace from 'clang_analyze.txt'...
**** Time summary:
Compilation (1293 times):
  Parsing (frontend):         4669.0 s
  Codegen & opts (backend):   7465.4 s

**** Files that took longest to parse (compiler frontend):
 22637 ms: /source/numerics/CMakeFiles/obj_numerics_debug.dir/vector_tools_project_codim.cc.json
 21873 ms: /source/numerics/CMakeFiles/obj_numerics_debug.dir/vector_tools_project_qpmf.cc.json
 21833 ms: /source/numerics/CMakeFiles/obj_numerics_debug.dir/vector_tools_project_qp.cc.json
 21829 ms: /source/grid/CMakeFiles/obj_grid_debug.dir/grid_tools_dof_handlers.cc.json
 21607 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_codim.cc.json
 19718 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_qp.cc.json
 19487 ms: /source/matrix_free/CMakeFiles/obj_matrix_free_debug.dir/evaluation_selector.cc.json
 19032 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_qpmf.cc.json
 17292 ms: /source/grid/CMakeFiles/obj_grid_debug.dir/grid_tools_2.cc.json
 16928 ms: /source/numerics/CMakeFiles/obj_numerics_debug.dir/vector_tools_project_inst3.cc.json

**** Files that took longest to codegen (compiler backend):
153462 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_interpolate.cc.json
127036 ms: /source/dofs/CMakeFiles/obj_dofs_release.dir/dof_tools_sparsity.cc.json
117250 ms: /source/non_matching/CMakeFiles/obj_non_matching_release.dir/coupling.cc.json
 95353 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_inst3.cc.json
 82290 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/fe_field_function.cc.json
 76607 ms: /source/matrix_free/CMakeFiles/obj_matrix_free_release.dir/evaluation_selector.cc.json
 74613 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_inst2.cc.json
 66744 ms: /source/numerics/CMakeFiles/obj_numerics_debug.dir/vector_tools_interpolate.cc.json
 65738 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_qp.cc.json
 62298 ms: /source/fe/CMakeFiles/obj_fe_release.dir/fe_tools_interpolate.cc.json

**** Templates that took longest to instantiate:
 22918 ms: dealii::Triangulation<1, 1> (448 times, avg 51 ms)
 21396 ms: dealii::Triangulation<3, 3> (448 times, avg 47 ms)
 20944 ms: dealii::Triangulation<2, 2> (448 times, avg 46 ms)
 17837 ms: boost::archive::detail::common_iarchive<boost::archive::binary_iarchive>::vload (2693 times, avg 6 ms)
 17102 ms: dealii::SolverCG<dealii::Vector<double> >::SolverCG (76 times, avg 225 ms)
 16959 ms: dealii::hp::DoFHandler<2, 3> (366 times, avg 46 ms)
 16008 ms: dealii::VectorTools::internal::project<dealii::Vector<double>, 1> (4 times, avg 4002 ms)
 15214 ms: dealii::VectorTools::internal::project_matrix_free_copy_vector<1, dealii::Vector<double>, 1> (4 times, avg 3803 ms)
 15209 ms: dealii::VectorTools::internal::project_matrix_free_component<1, double, 1> (4 times, avg 3802 ms)
 14642 ms: dealii::VectorTools::internal::project<dealii::Vector<double>, 2> (4 times, avg 3660 ms)
 14514 ms: dealii::Triangulation<2, 3> (448 times, avg 32 ms)
 14454 ms: dealii::VectorTools::internal::project_matrix_free_copy_vector<2, dealii::Vector<double>, 2> (4 times, avg 3613 ms)
 14449 ms: dealii::VectorTools::internal::project_matrix_free_component<2, double, 2> (4 times, avg 3612 ms)
 14331 ms: dealii::VectorTools::internal::project<dealii::Vector<float>, 2> (4 times, avg 3582 ms)
 14262 ms: dealii::VectorTools::internal::project_matrix_free_copy_vector<2, dealii::Vector<float>, 2> (4 times, avg 3565 ms)
 14257 ms: dealii::VectorTools::internal::project_matrix_free_component<2, float, 2> (4 times, avg 3564 ms)
 14189 ms: dealii::Triangulation<1, 3> (448 times, avg 31 ms)
 14111 ms: boost::archive::basic_binary_iarchive<boost::archive::binary_iarchive>::load_override (1392 times, avg 10 ms)
 14098 ms: dealii::VectorTools::internal::project<dealii::Vector<float>, 1> (4 times, avg 3524 ms)
 14091 ms: dealii::Triangulation<1, 2> (448 times, avg 31 ms)
 13559 ms: dealii::VectorTools::internal::project_matrix_free_copy_vector<1, dealii::Vector<float>, 1> (4 times, avg 3389 ms)
 13554 ms: dealii::VectorTools::internal::project_matrix_free_component<1, float, 1> (4 times, avg 3388 ms)
 13469 ms: dealii::SolverBase<dealii::Vector<double> >::SolverBase (96 times, avg 140 ms)
 12498 ms: boost::variant<boost::shared_ptr<void>, boost::signals2::detail::foreign_void_shared_ptr> (454 times, avg 27 ms)
 11178 ms: dealii::Triangulation<1, 1>::Signals (448 times, avg 24 ms)
 11007 ms: dealii::internal::EvaluationSelectorImplementation::symmetric_selector_evaluate<1, 1, dealii::VectorizedArray<double, 2> > (16 times, avg 687 ms)
 11003 ms: dealii::internal::EvaluationSelectorImplementation::Factory<1, 1, dealii::VectorizedArray<double, 2>, 0, 0, 0, void>::evaluate (16 times, avg 687 ms)
 10921 ms: dealii::VectorTools::internal::project_matrix_free_degree<1, 1, double, 1> (4 times, avg 2730 ms)
 10837 ms: std::vector<boost::variant<boost::weak_ptr<boost::signals2::detail::trackable_pointee>, boost::weak_ptr<void>, boost::signals2::detail::foreign_void_weak_ptr>, std::allocator<boost::variant<boost::weak_ptr<boost::signals2::detail::trackable_pointee>, boost::weak_ptr<void>, boost::signals2::detail::foreign_void_weak_ptr> > >::push_back (454 times, avg 23 ms)
 10758 ms: std::vector<boost::variant<boost::weak_ptr<boost::signals2::detail::trackable_pointee>, boost::weak_ptr<void>, boost::signals2::detail::foreign_void_weak_ptr>, std::allocator<boost::variant<boost::weak_ptr<boost::signals2::detail::trackable_pointee>, boost::weak_ptr<void>, boost::signals2::detail::foreign_void_weak_ptr> > >::emplace_back<boost::variant<boost::weak_ptr<boost::signals2::detail::trackable_pointee>, boost::weak_ptr<void>, boost::signals2::detail::foreign_void_weak_ptr> > (454 times, avg 23 ms)

**** Template sets that took longest to instantiate:
295227 ms: std::__and_<$> (195256 times, avg 1 ms)
255904 ms: std::unique_ptr<$> (18316 times, avg 13 ms)
237530 ms: std::__uniq_ptr_impl<$> (18316 times, avg 12 ms)
157066 ms: std::is_constructible<$> (118050 times, avg 1 ms)
153843 ms: std::__is_constructible_impl<$> (117646 times, avg 1 ms)
152322 ms: std::__is_direct_constructible<$> (117343 times, avg 1 ms)
150056 ms: std::__is_direct_constructible_new<$> (116747 times, avg 1 ms)
133341 ms: std::__is_direct_constructible_new_safe<$> (104374 times, avg 1 ms)
108054 ms: dealii::Triangulation<$> (2688 times, avg 40 ms)
 93589 ms: dealii::internal::EvaluatorTensorProduct<$>::apply<$> (31498 times, avg 2 ms)
 82669 ms: std::is_destructible<$> (73148 times, avg 1 ms)
 76248 ms: dealii::VectorTools::internal::project<$> (100 times, avg 762 ms)
 74438 ms: dealii::VectorTools::internal::project_matrix_free_copy_vector<$> (85 times, avg 875 ms)
 74359 ms: dealii::VectorTools::internal::project_matrix_free_component<$> (20 times, avg 3717 ms)
 74346 ms: dealii::VectorTools::internal::project_matrix_free_degree<$> (80 times, avg 929 ms)
 74326 ms: dealii::SelectEvaluator<$>::evaluate (500 times, avg 148 ms)
 74288 ms: dealii::VectorTools::internal::project_matrix_free<$> (320 times, avg 232 ms)
 71992 ms: std::vector<$> (48746 times, avg 1 ms)
 70688 ms: std::_TC<$>::_ConstructibleTuple<$> (19600 times, avg 3 ms)
 70067 ms: dealii::internal::EvaluationSelectorImplementation::symmetric_selector_evaluate<$> (174 times, avg 402 ms)
 70021 ms: dealii::internal::EvaluationSelectorImplementation::Factory<$>::evaluate (174 times, avg 402 ms)
 69721 ms: boost::signals2::signal<$> (9080 times, avg 7 ms)
 68491 ms: std::tuple<$> (20037 times, avg 3 ms)
 63234 ms: dealii::Triangulation<$>::Signals (2688 times, avg 23 ms)
 61265 ms: boost::signals2::detail::signal_impl<$> (9080 times, avg 6 ms)
 53877 ms: dealii::VectorTools::internal::project_parallel<$> (288 times, avg 187 ms)
 53375 ms: dealii::SelectEvaluator<$>::integrate (464 times, avg 115 ms)
 53012 ms: dealii::FEEvaluation<$>::integrate (425 times, avg 124 ms)
 51761 ms: dealii::internal::EvaluationSelectorImplementation::symmetric_selector_integrate<$> (140 times, avg 369 ms)
 51729 ms: dealii::internal::EvaluationSelectorImplementation::Factory<$>::integrate (140 times, avg 369 ms)

**** Functions that took longest to compile:
  6057 ms: dealii::Triangulation<3, 3>::DistortedCellList dealii::internal::TriangulationImplementation::Implementation::execute_refinement<3>(dealii::Triangulation<3, 3>&, bool) (/home/darndt/dealii-clang-9/source/grid/tria.cc)
  1625 ms: dealii::FE_Nedelec<3>::initialize_restriction() (/home/darndt/dealii-clang-9/source/fe/fe_nedelec.cc)
  1563 ms: void dealii::MatrixFree<1, double, dealii::VectorizedArray<double, 2ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<1, double, dealii::VectorizedArray<double, 2ul> >::AdditionalData const&) (/home/darndt/dealii-clang-9/source/matrix_free/matrix_free_inst3.cc)
  1506 ms: void dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 4ul> >::initialize_indices<float>(std::vector<dealii::AffineConstraints<float> const*, std::allocator<dealii::AffineConstraints<float> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 4ul> >::AdditionalData const&) (/home/darndt/dealii-clang-9/source/matrix_free/matrix_free_inst3.cc)
  1502 ms: void dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 4ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 4ul> >::AdditionalData const&) (/home/darndt/dealii-clang-9/source/matrix_free/matrix_free_inst3.cc)
  1470 ms: void dealii::MatrixFree<2, float, dealii::VectorizedArray<float, 1ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<2, float, dealii::VectorizedArray<float, 1ul> >::AdditionalData const&) (/home/darndt/dealii-clang-9/source/matrix_free/matrix_free_inst3.cc)
  1411 ms: void dealii::MatrixFree<2, float, dealii::VectorizedArray<float, 4ul> >::initialize_indices<float>(std::vector<dealii::AffineConstraints<float> const*, std::allocator<dealii::AffineConstraints<float> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<2, float, dealii::VectorizedArray<float, 4ul> >::AdditionalData const&) (/home/darndt/dealii-clang-9/source/matrix_free/matrix_free_inst2.cc)
  1386 ms: void dealii::MatrixFree<1, float, dealii::VectorizedArray<float, 4ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<1, float, dealii::VectorizedArray<float, 4ul> >::AdditionalData const&) (/home/darndt/dealii-clang-9/source/matrix_free/matrix_free_inst2.cc)
  1382 ms: void dealii::MatrixFree<2, double, dealii::VectorizedArray<double, 1ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<2, double, dealii::VectorizedArray<double, 1ul> >::AdditionalData const&) (/home/darndt/dealii-clang-9/source/matrix_free/matrix_free_inst2.cc)
  1374 ms: void dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 1ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 1ul> >::AdditionalData const&) (/home/darndt/dealii-clang-9/source/matrix_free/matrix_free_inst2.cc)
  1364 ms: void dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 2ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 2ul> >::AdditionalData const&) (/home/darndt/dealii-clang-9/source/matrix_free/matrix_free_inst2.cc)
  1357 ms: void dealii::MatrixFree<1, float, dealii::VectorizedArray<float, 1ul> >::initialize_indices<float>(std::vector<dealii::AffineConstraints<float> const*, std::allocator<dealii::AffineConstraints<float> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<1, float, dealii::VectorizedArray<float, 1ul> >::AdditionalData const&) (/home/darndt/dealii-clang-9/source/matrix_free/matrix_free_inst2.cc)
  1353 ms: void dealii::MatrixFree<1, float, dealii::VectorizedArray<float, 4ul> >::initialize_indices<float>(std::vector<dealii::AffineConstraints<float> const*, std::allocator<dealii::AffineConstraints<float> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<1, float, dealii::VectorizedArray<float, 4ul> >::AdditionalData const&) (/home/darndt/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1347 ms: void dealii::MatrixFree<2, float, dealii::VectorizedArray<float, 1ul> >::initialize_indices<float>(std::vector<dealii::AffineConstraints<float> const*, std::allocator<dealii::AffineConstraints<float> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<2, float, dealii::VectorizedArray<float, 1ul> >::AdditionalData const&) (/home/darndt/dealii-clang-9/source/matrix_free/matrix_free_inst3.cc)
  1341 ms: void dealii::MatrixFree<2, float, dealii::VectorizedArray<float, 4ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<2, float, dealii::VectorizedArray<float, 4ul> >::AdditionalData const&) (/home/darndt/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1340 ms: void dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 1ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 1ul> >::AdditionalData const&) (/home/darndt/dealii-clang-9/source/matrix_free/matrix_free_inst3.cc)
  1337 ms: void dealii::MatrixFree<2, double, dealii::VectorizedArray<double, 2ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<2, double, dealii::VectorizedArray<double, 2ul> >::AdditionalData const&) (/home/darndt/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1335 ms: void dealii::MatrixFree<1, double, dealii::VectorizedArray<double, 1ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<1, double, dealii::VectorizedArray<double, 1ul> >::AdditionalData const&) (/home/darndt/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1324 ms: dealii::GridOut::write_svg(dealii::Triangulation<2, 2> const&, std::ostream&) const (/home/darndt/dealii-clang-9/source/grid/grid_out.cc)
  1308 ms: void dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 1ul> >::initialize_indices<float>(std::vector<dealii::AffineConstraints<float> const*, std::allocator<dealii::AffineConstraints<float> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 1ul> >::AdditionalData const&) (/home/darndt/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1296 ms: void dealii::MatrixFree<1, float, dealii::VectorizedArray<float, 1ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<1, float, dealii::VectorizedArray<float, 1ul> >::AdditionalData const&) (/home/darndt/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1265 ms: void dealii::internal::TriangulationImplementation::Implementation::create_triangulation<3>(std::vector<dealii::Point<3, double>, std::allocator<dealii::Point<3, double> > > const&, std::vector<dealii::CellData<3>, std::allocator<dealii::CellData<3> > > const&, dealii::SubCellData const&, dealii::Triangulation<3, 3>&) (/home/darndt/dealii-clang-9/source/grid/tria.cc)
  1238 ms: void dealii::internal::TriangulationImplementation::Implementation::delete_children<3>(dealii::Triangulation<3, 3>&, dealii::Triangulation<3, 3>::cell_iterator&, std::vector<unsigned int, std::allocator<unsigned int> >&, std::vector<unsigned int, std::allocator<unsigned int> >&) (/home/darndt/dealii-clang-9/source/grid/tria.cc)
  1123 ms: dealii::PolynomialsAdini<2>::PolynomialsAdini() (/home/darndt/dealii-clang-9/source/base/polynomials_adini.cc)
  1120 ms: umfdl_solve(long, long const*, long const*, double const*, double*, double const*, NumericType*, long, double*, long*, double*) (/home/darndt/dealii-clang-9/bundled/umfpack/UMFPACK/Source/umf_solve.cc)
  1079 ms: umfpack_dl_qsymbolic (/home/darndt/dealii-clang-9/bundled/umfpack/UMFPACK/Source/umfpack_qsymbolic.cc)
  1071 ms: umfpack_zl_qsymbolic (/home/darndt/dealii-clang-9/bundled/umfpack/UMFPACK/Source/umfpack_qsymbolic.cc)
  1035 ms: dealii::Triangulation<3, 3>::prepare_coarsening_and_refinement() (/home/darndt/dealii-clang-9/source/grid/tria.cc)
  1027 ms: dealii::FE_PolyTensor<3, 3>::fill_fe_face_values(dealii::TriaIterator<dealii::CellAccessor<3, 3> > const&, unsigned int, dealii::Quadrature<2> const&, dealii::Mapping<3, 3> const&, dealii::Mapping<3, 3>::InternalDataBase const&, dealii::internal::FEValuesImplementation::MappingRelatedData<3, 3> const&, dealii::FiniteElement<3, 3>::InternalDataBase const&, dealii::internal::FEValuesImplementation::FiniteElementRelatedData<3, 3>&) const (/home/darndt/dealii-clang-9/source/fe/fe_poly_tensor.cc)
  1017 ms: dealii::FE_Nedelec<3>::convert_generalized_support_point_values_to_dof_values(std::vector<dealii::Vector<double>, std::allocator<dealii::Vector<double> > > const&, std::vector<double, std::allocator<double> >&) const (/home/darndt/dealii-clang-9/source/fe/fe_nedelec.cc)

**** Function sets that took longest to compile / optimize:
 96623 ms: void dealii::internal::EvaluatorTensorProduct<$>::apply<$>(dealii::VectorizedArray<$> const*, dealii::VectorizedArray<$> const*, dealii::VectorizedArray<$>*) (8072 times, avg 11 ms)
 94780 ms: dealii::internal::FEEvaluationImpl<$>::evaluate(dealii::internal::MatrixFreeFunctions::ShapeInfo<$> const&, dealii::VectorizedArray<$> const*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, bool, bool, bool) (1213 times, avg 78 ms)
 65332 ms: dealii::SolutionTransfer<$>::prepare_for_coarsening_and_refinement(std::vector<$> const&) (720 times, avg 90 ms)
 52014 ms: dealii::internal::FEEvaluationImpl<$>::integrate(dealii::internal::MatrixFreeFunctions::ShapeInfo<$> const&, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, bool, bool, bool) (943 times, avg 55 ms)
 49559 ms: dealii::SolutionTransfer<$>::interpolate(std::vector<$> const&, std::vector<$>&) const (720 times, avg 68 ms)
 46153 ms: dealii::internal::FEEvaluationImplCollocation<$>::evaluate(dealii::internal::MatrixFreeFunctions::ShapeInfo<$> const&, dealii::VectorizedArray<$> const*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, bool, bool, bool) (888 times, avg 51 ms)
 30942 ms: dealii::internal::FEEvaluationImplTransformToCollocation<$>::integrate(dealii::internal::MatrixFreeFunctions::ShapeInfo<$> const&, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, bool, bool, bool) (980 times, avg 31 ms)
 29820 ms: void dealii::KellyErrorEstimator<$>::estimate<$>(dealii::Mapping<$> const&, dealii::DoFHandler<$> const&, dealii::Quadrature<$> const&, std::map<$> const&, std::vector<$> const&, std::vector<$>&, dealii::ComponentMask const&, dealii::Function<$> const*, unsigned int, unsigned int, unsigned int, dealii::KellyErrorEstimator<$>::Strategy) (240 times, avg 124 ms)
 29757 ms: void dealii::KellyErrorEstimator<$>::estimate<$>(dealii::Mapping<$> const&, dealii::hp::DoFHandler<$> const&, dealii::Quadrature<$> const&, std::map<$> const&, std::vector<$> const&, std::vector<$>&, dealii::ComponentMask const&, dealii::Function<$> const*, unsigned int, unsigned int, unsigned int, dealii::KellyErrorEstimator<$>::Strategy) (240 times, avg 123 ms)
 27053 ms: dealii::internal::FEEvaluationImplTransformToCollocation<$>::evaluate(dealii::internal::MatrixFreeFunctions::ShapeInfo<$> const&, dealii::VectorizedArray<$> const*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, bool, bool, bool) (1249 times, avg 21 ms)
 25289 ms: void dealii::MatrixFree<$>::initialize_indices<$>(std::vector<$> const&, std::vector<$> const&, dealii::MatrixFree<$>::AdditionalData const&) (54 times, avg 468 ms)
 24493 ms: dealii::internal::FEEvaluationImplCollocation<$>::integrate(dealii::internal::MatrixFreeFunctions::ShapeInfo<$> const&, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, bool, bool, bool) (651 times, avg 37 ms)
 23132 ms: std::vector<$>::_M_default_append(unsigned long) (1256 times, avg 18 ms)
 21367 ms: dealii::FE_Poly<$>::get_data(dealii::UpdateFlags, dealii::Mapping<$> const&, dealii::Quadrature<$> const&, dealii::internal::FEValuesImplementation::FiniteElementRelatedData<$>&) const (600 times, avg 35 ms)
 19858 ms: void tbb::interface9::internal::dynamic_grainsize_mode<$>::work_balance<$>(tbb::interface9::internal::start_for<$>&, tbb::blocked_range<$>&) (440 times, avg 45 ms)
 18960 ms: void dealii::FEValuesViews::internal::do_function_derivatives<$>(dealii::ArrayView<$> const&, dealii::Table<$> const&, std::vector<$> const&, std::vector<$>&) (366 times, avg 51 ms)
 17044 ms: void dealii::FETools::interpolate<$>(dealii::DoFHandler<$> const&, dealii::Vector<$> const&, dealii::DoFHandler<$> const&, dealii::AffineConstraints<$> const&, dealii::AffineConstraints<$>&) (180 times, avg 94 ms)
 16557 ms: dealii::SolutionTransfer<$>::prepare_for_pure_refinement() (243 times, avg 68 ms)
 15976 ms: void dealii::SolverCG<$>::solve<$>(dealii::MatrixFreeOperators::MassOperator<$> const&, dealii::LinearAlgebra::distributed::Vector<$>&, dealii::LinearAlgebra::distributed::Vector<$> const&, dealii::PreconditionJacobi<$> const&) (432 times, avg 36 ms)
 15493 ms: dealii::internal::EvaluationSelectorImplementation::Factory<$>::evaluate(dealii::internal::MatrixFreeFunctions::ShapeInfo<$> const&, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, bool, bool, bool) (2071 times, avg 7 ms)
 15319 ms: dealii::Functions::FEFieldFunction<$>::vector_value_list(std::vector<$> const&, std::vector<$>&) const (257 times, avg 59 ms)
 15303 ms: dealii::Functions::FEFieldFunction<$>::vector_laplacian_list(std::vector<$> const&, std::vector<$>&) const (249 times, avg 61 ms)
 14984 ms: void dealii::MatrixCreator::internal::mass_assembler<$>(dealii::TriaActiveIterator<$> const&, dealii::MatrixCreator::internal::AssemblerData::Scratch<$>&, dealii::MatrixCreator::internal::AssemblerData::CopyData<$>&) (144 times, avg 104 ms)
 14921 ms: dealii::Functions::FEFieldFunction<$>::vector_gradient_list(std::vector<$> const&, std::vector<$>&) const (281 times, avg 53 ms)
 14696 ms: std::_Rb_tree<$>::_M_get_insert_hint_unique_pos(std::_Rb_tree_const_iterator<$>, std::pair<$> const&) (561 times, avg 26 ms)
 14141 ms: void dealii::MGTransferMatrixFree<$>::do_prolongate_add<$>(unsigned int, dealii::LinearAlgebra::distributed::Vector<$>&, dealii::LinearAlgebra::distributed::Vector<$> const&) const (216 times, avg 65 ms)
 13738 ms: void dealii::FETools::interpolate<$>(dealii::DoFHandler<$> const&, dealii::BlockVector<$> const&, dealii::DoFHandler<$> const&, dealii::AffineConstraints<$> const&, dealii::AffineConstraints<$>&) (144 times, avg 95 ms)
 13611 ms: dealii::AlignedVector<$>::reserve(unsigned long) (909 times, avg 14 ms)
 12954 ms: dealii::internal::FEEvaluationImplBasisChange<$>::do_forward(dealii::AlignedVector<$> const&, dealii::VectorizedArray<$> const*, dealii::VectorizedArray<$>*, unsigned int, unsigned int) (973 times, avg 13 ms)
 12939 ms: void dealii::NonMatching::create_coupling_sparsity_pattern<$>(double const&, dealii::GridTools::Cache<$> const&, dealii::GridTools::Cache<$> const&, dealii::DoFHandler<$> const&, dealii::DoFHandler<$> const&, dealii::Quadrature<$> const&, dealii::SparsityPattern&, dealii::AffineConstraints<$> const&, dealii::ComponentMask const&, dealii::ComponentMask const&) (168 times, avg 77 ms)

*** Expensive headers:
872130 ms: /home/darndt/dealii-clang-9/include/deal.II/base/tensor.h (included 652 times, avg 1337 ms), included via:
  data_out_base.cc.json data_out_base.h geometry_info.h point.h  (2875 ms)
  data_out_rotation.cc.json quadrature_lib.h quadrature.h point.h  (2852 ms)
  fe_tools_extrapolate.cc.json fe_tools_extrapolate.templates.h p4est_wrappers.h geometry_info.h point.h  (2761 ms)
  data_postprocessor.cc.json data_postprocessor.h point.h  (2664 ms)
  fe_bernstein.cc.json polynomials_bernstein.h polynomial.h point.h  (2583 ms)
  fe_q_hierarchical.cc.json fe_dgq.h tensor_product_polynomials.h point.h  (2581 ms)
  ...

787416 ms: /home/darndt/dealii-clang-9/include/deal.II/base/utilities.h (included 670 times, avg 1175 ms), included via:
  grid_refinement.cc.json  (2361 ms)
  data_out_rotation.cc.json quadrature_lib.h quadrature.h point.h tensor.h  (2020 ms)
  data_out_base.cc.json data_out_base.h geometry_info.h point.h tensor.h  (1988 ms)
  fe_dgp.cc.json fe_dgp.h polynomial_space.h point.h tensor.h  (1875 ms)
  fe_tools_extrapolate.cc.json fe_tools_extrapolate.templates.h p4est_wrappers.h geometry_info.h point.h tensor.h  (1852 ms)
  fe_bernstein.cc.json polynomials_bernstein.h polynomial.h point.h tensor.h  (1850 ms)
  ...

773227 ms: /home/darndt/dealii-clang-9/include/deal.II/grid/tria.h (included 450 times, avg 1718 ms), included via:
  step-49.cc.json  (4841 ms)
  step-3.cc.json  (4440 ms)
  step-49.cc.json  (4271 ms)
  step-2.cc.json  (4234 ms)
  step-1.cc.json  (4033 ms)
  step-4.cc.json  (4001 ms)
  ...

668826 ms: /home/darndt/dealii-clang-9/include/deal.II/base/point.h (included 570 times, avg 1173 ms), included via:
  data_out_rotation.cc.json quadrature_lib.h quadrature.h  (2951 ms)
  data_out_base.cc.json data_out_base.h geometry_info.h  (2900 ms)
  fe_tools_extrapolate.cc.json fe_tools_extrapolate.templates.h p4est_wrappers.h geometry_info.h  (2784 ms)
  data_postprocessor.cc.json data_postprocessor.h  (2757 ms)
  fe_q_hierarchical.cc.json fe_dgq.h tensor_product_polynomials.h  (2605 ms)
  fe_bernstein.cc.json polynomials_bernstein.h polynomial.h  (2605 ms)
  ...

433844 ms: /home/darndt/dealii-clang-9/include/deal.II/distributed/tria_base.h (included 390 times, avg 1112 ms), included via:
  grid_generator.cc.json fully_distributed_tria.h  (4628 ms)
  cell_weights.cc.json cell_weights.h  (3690 ms)
  cell_weights.cc.json cell_weights.h  (3654 ms)
  grid_generator.cc.json fully_distributed_tria.h  (3557 ms)
  intergrid_map.cc.json shared_tria.h  (3374 ms)
  intergrid_map.cc.json shared_tria.h  (3202 ms)
  ...

368572 ms: /home/darndt/dealii-clang-9/include/deal.II/dofs/dof_handler.h (included 366 times, avg 1007 ms), included via:
  dof_accessor.cc.json dof_accessor.h  (5074 ms)
  dof_accessor_get.cc.json dof_accessor.h  (4834 ms)
  dof_objects.cc.json  (4639 ms)
  dof_accessor_set.cc.json dof_accessor.h  (4059 ms)
  fe_field_function.cc.json dof_accessor.h  (3763 ms)
  mapping_q1_eulerian.cc.json dof_accessor.h  (3669 ms)
  ...

346678 ms: include/deal.II/base/config.h (included 750 times, avg 462 ms), included via:
  mpi_noncontiguous_partitioner.cc.json array_view.h  (888 ms)
  fe_tools_extrapolate.cc.json fe_tools_extrapolate.templates.h  (816 ms)
  conditional_ostream.cc.json conditional_ostream.h  (774 ms)
  block_vector.cc.json block_vector.templates.h  (769 ms)
  convergence_table.cc.json convergence_table.h  (751 ms)
  fe_dgp.cc.json memory.h  (747 ms)
  ...

338791 ms: /home/darndt/dealii-clang-9/include/deal.II/dofs/dof_accessor.h (included 366 times, avg 925 ms), included via:
  dof_accessor.cc.json  (5752 ms)
  dof_accessor_get.cc.json  (5611 ms)
  dof_accessor_set.cc.json  (4728 ms)
  fe_field_function.cc.json  (4361 ms)
  fe_field_function.cc.json  (4211 ms)
  dof_accessor_get.cc.json  (4187 ms)
  ...

324015 ms: /home/darndt/dealii-clang-9/include/deal.II/base/numbers.h (included 750 times, avg 432 ms), included via:
  mpi_noncontiguous_partitioner.cc.json array_view.h config.h  (849 ms)
  fe_tools_extrapolate.cc.json fe_tools_extrapolate.templates.h config.h  (771 ms)
  conditional_ostream.cc.json conditional_ostream.h config.h  (726 ms)
  block_vector.cc.json block_vector.templates.h config.h  (721 ms)
  convergence_table.cc.json convergence_table.h config.h  (707 ms)
  fe_dgp.cc.json memory.h config.h  (703 ms)
  ...

273949 ms: /home/darndt/dealii-clang-9/bundled/boost-1.70.0/include/boost/iostreams/filter/gzip.hpp (included 672 times, avg 407 ms), included via:
  gzip.cpp.json  (831 ms)
  gzip.cpp.json  (790 ms)
  fe_dgp.cc.json fe_dgp.h polynomial_space.h point.h tensor.h utilities.h  (704 ms)
  solution_transfer.cc tria.h p4est_wrappers.h geometry_info.h point.h tensor.h utilities.h  (702 ms)
  fe_q_hierarchical.cc.json fe_dgq.h tensor_product_polynomials.h point.h tensor.h utilities.h  (683 ms)
  data_out_base.cc.json data_out_base.h geometry_info.h point.h tensor.h utilities.h  (660 ms)
  ...

  done in 20.3s.

@kronbichler
Copy link
Member

kronbichler commented May 22, 2020

Thanks for posting the numbers. I have one question: In the expensive functions list, I see a lot of things like

  1506 ms: void dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 4ul... (/home/darndt/dealii-clang-9/source/matrix_free/matrix_free_inst3.cc)

Unfortunately the interesting name of the function is hidden. Can you easily get a hold on that?

From the code size I would believe it must be initialize_indices, which is a monster function of more than 1000 lines of code. I think we should aim to split it up into a large part that does not need the VectorizedArrayType template and a small part that needs it. If we get that, we might be able to reduce the three instantiations files into a single one again. I try to give it a look later today.

@drwells
Copy link
Member

drwells commented May 22, 2020

Another view: here are the top 20 most expensive files for debug on my desktop:

/home/drwells/Documents/Code/CPP/dealii-dev/build-profile/source/lac/unity_2.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:15.12
Maximum resident set size (kbytes): 3163076

/home/drwells/Documents/Code/CPP/dealii-dev/build-profile/source/lac/unity_1.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:59.72
Maximum resident set size (kbytes): 2937160

/home/drwells/Documents/Code/CPP/dealii-dev/source/numerics/data_out_dof_data.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:47.14
Maximum resident set size (kbytes): 3259900

/home/drwells/Documents/Code/CPP/dealii-dev/source/numerics/data_out_dof_data_inst2.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:43.85
Maximum resident set size (kbytes): 3218044

/home/drwells/Documents/Code/CPP/dealii-dev/build-profile/source/numerics/unity_1.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:24.23
Maximum resident set size (kbytes): 3079868

/home/drwells/Documents/Code/CPP/dealii-dev/source/fe/mapping_fe_field.cc"
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:24.12
Maximum resident set size (kbytes): 3134772

/home/drwells/Documents/Code/CPP/dealii-dev/source/fe/mapping_fe_field_inst2.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:21.37
Maximum resident set size (kbytes): 3015492

/home/drwells/Documents/Code/CPP/dealii-dev/source/numerics/vector_tools_project_qp.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:19.33
Maximum resident set size (kbytes): 2966276

/home/drwells/Documents/Code/CPP/dealii-dev/source/fe/fe_values_inst3.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:18.71
Maximum resident set size (kbytes): 2873160

/home/drwells/Documents/Code/CPP/dealii-dev/source/grid/grid_tools_2.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:18.64
Maximum resident set size (kbytes): 2620100

/home/drwells/Documents/Code/CPP/dealii-dev/source/fe/fe_tools_extrapolate.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:18.32
Maximum resident set size (kbytes): 2494528

/home/drwells/Documents/Code/CPP/dealii-dev/build-profile/source/lac/unity_0.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:15.11
Maximum resident set size (kbytes): 2256016

/home/drwells/Documents/Code/CPP/dealii-dev/source/numerics/vector_tools_project_qpmf.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:14.94
Maximum resident set size (kbytes): 2866892

/home/drwells/Documents/Code/CPP/dealii-dev/source/numerics/vector_tools_interpolate.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:14.54
Maximum resident set size (kbytes): 2674776

/home/drwells/Documents/Code/CPP/dealii-dev/source/numerics/vector_tools_project_inst3.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:12.24
Maximum resident set size (kbytes): 2587724

/home/drwells/Documents/Code/CPP/dealii-dev/build-profile/source/grid/unity_0.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:11.07
Maximum resident set size (kbytes): 2516008

/home/drwells/Documents/Code/CPP/dealii-dev/source/fe/fe_values_inst6.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:10.82
Maximum resident set size (kbytes): 2894856

/home/drwells/Documents/Code/CPP/dealii-dev/build-profile/source/distributed/unity_0.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:10.10
Maximum resident set size (kbytes): 2345164

/home/drwells/Documents/Code/CPP/dealii-dev/source/numerics/data_out_dof_data_codim.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:08.46
Maximum resident set size (kbytes): 2728508

/home/drwells/Documents/Code/CPP/dealii-dev/source/numerics/vector_tools_project_inst2.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:06.42
Maximum resident set size (kbytes): 2628516

and release:

/home/drwells/Documents/Code/CPP/dealii-dev/source/fe/fe_values_inst3.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 4:25.49
Maximum resident set size (kbytes): 3816972

/home/drwells/Documents/Code/CPP/dealii-dev/build-profile/source/lac/unity_2.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 4:23.85
Maximum resident set size (kbytes): 4701188

/home/drwells/Documents/Code/CPP/dealii-dev/source/numerics/vector_tools_interpolate.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 4:22.80
Maximum resident set size (kbytes): 4757416

/home/drwells/Documents/Code/CPP/dealii-dev/source/fe/fe_values_inst6.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 4:20.13
Maximum resident set size (kbytes): 3917800

/home/drwells/Documents/Code/CPP/dealii-dev/source/numerics/vector_tools_project_qp.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:45.22
Maximum resident set size (kbytes): 3979292

/home/drwells/Documents/Code/CPP/dealii-dev/source/numerics/vector_tools_project_inst3.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:42.85
Maximum resident set size (kbytes): 3889976

/home/drwells/Documents/Code/CPP/dealii-dev/source/non_matching/coupling.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:42.64
Maximum resident set size (kbytes): 3562716

/home/drwells/Documents/Code/CPP/dealii-dev/source/fe/mapping_fe_field.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:34.28
Maximum resident set size (kbytes): 4104528

/home/drwells/Documents/Code/CPP/dealii-dev/source/fe/mapping_fe_field_inst2.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:27.83
Maximum resident set size (kbytes): 4049600

/home/drwells/Documents/Code/CPP/dealii-dev/source/numerics/vector_tools_project_qpmf.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:25.05
Maximum resident set size (kbytes): 3809584

/home/drwells/Documents/Code/CPP/dealii-dev/build-profile/source/lac/unity_1.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:18.95
Maximum resident set size (kbytes): 4212396

/home/drwells/Documents/Code/CPP/dealii-dev/source/numerics/data_out_dof_data.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:12.12
Maximum resident set size (kbytes): 4314284

/home/drwells/Documents/Code/CPP/dealii-dev/source/numerics/data_out_dof_data_inst2.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:09.43
Maximum resident set size (kbytes): 4306452

/home/drwells/Documents/Code/CPP/dealii-dev/source/fe/fe_values_inst2.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:05.06
Maximum resident set size (kbytes): 3471288

/home/drwells/Documents/Code/CPP/dealii-dev/build-profile/source/numerics/unity_1.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:03.87
Maximum resident set size (kbytes): 4191984

/home/drwells/Documents/Code/CPP/dealii-dev/build-profile/source/lac/unity_0.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:03.15
Maximum resident set size (kbytes): 3080828

/home/drwells/Documents/Code/CPP/dealii-dev/source/fe/fe_values_inst5.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:02.03
Maximum resident set size (kbytes): 3375432

/home/drwells/Documents/Code/CPP/dealii-dev/source/numerics/vector_tools_project_inst2.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:59.57
Maximum resident set size (kbytes): 3698100

/home/drwells/Documents/Code/CPP/dealii-dev/source/fe/fe_tools_extrapolate.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:58.64
Maximum resident set size (kbytes): 3600168

/home/drwells/Documents/Code/CPP/dealii-dev/source/dofs/dof_tools_sparsity.cc
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:43.64
Maximum resident set size (kbytes): 3013716

@masterleinad
Copy link
Member

@kronbichler I updated the output to not truncate names. It's initialize_indices indeed.

@peterrum
Copy link
Member

I think we should aim to split it up into a large part that does not need the VectorizedArrayType template and a small part that needs it.

Mainly VectorizedArrayType::size() is used. Can we replace internal::MatrixFreeFunctions::ShapeInfo<VectorizedArrayType> by internal::MatrixFreeFunctions::ShapeInfo<Number> in that function?

@kronbichler
Copy link
Member

We can, but the function needs to be split up anyway because there is some face information that also has size() as template. I have an idea already. If I get stuck, I'll gladly take your help @peterrum.

@masterleinad
Copy link
Member

This is with a unity build:

Analyzing build trace from 'analyze_unity.txt'...
**** Time summary:
Compilation (923 times):
  Parsing (frontend):         3081.7 s
  Codegen & opts (backend):   6769.3 s

**** Files that took longest to parse (compiler frontend):
 21716 ms: /source/numerics/CMakeFiles/obj_numerics_debug.dir/vector_tools_project_qpmf.cc.json
 21268 ms: /source/numerics/CMakeFiles/obj_numerics_debug.dir/vector_tools_project_qp.cc.json
 21212 ms: /source/numerics/CMakeFiles/obj_numerics_debug.dir/vector_tools_project_codim.cc.json
 20206 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_codim.cc.json
 19131 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_qp.cc.json
 18709 ms: /source/matrix_free/CMakeFiles/obj_matrix_free_debug.dir/evaluation_selector.cc.json
 17848 ms: /source/grid/CMakeFiles/obj_grid_debug.dir/grid_tools_dof_handlers.cc.json
 17820 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_qpmf.cc.json
 17142 ms: /source/grid/CMakeFiles/obj_grid_release.dir/grid_tools_2.cc.json
 16391 ms: /source/numerics/CMakeFiles/obj_numerics_debug.dir/vector_tools_project_inst3.cc.json

**** Files that took longest to codegen (compiler backend):
159330 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_interpolate.cc.json
120694 ms: /source/dofs/CMakeFiles/obj_dofs_release.dir/dof_tools_sparsity.cc.json
114029 ms: /source/non_matching/CMakeFiles/obj_non_matching_release.dir/coupling.cc.json
 89681 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_inst3.cc.json
 86451 ms: /source/lac/CMakeFiles/obj_lac_release.dir/unity_0.cc.json
 80872 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/fe_field_function.cc.json
 74284 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_inst2.cc.json
 73439 ms: /source/fe/CMakeFiles/obj_fe_release.dir/fe_tools_extrapolate.cc.json
 71191 ms: /source/matrix_free/CMakeFiles/obj_matrix_free_release.dir/evaluation_selector.cc.json
 69391 ms: /source/fe/CMakeFiles/obj_fe_release.dir/fe_tools_interpolate.cc.json

**** Templates that took longest to instantiate:
 15897 ms: dealii::SolverCG<dealii::Vector<double> >::SolverCG (76 times, avg 209 ms)
 15644 ms: dealii::VectorTools::internal::project<dealii::Vector<double>, 1> (4 times, avg 3911 ms)
 14856 ms: dealii::VectorTools::internal::project<dealii::Vector<double>, 2> (4 times, avg 3714 ms)
 14843 ms: dealii::VectorTools::internal::project_matrix_free_copy_vector<1, dealii::Vector<double>, 1> (4 times, avg 3710 ms)
 14837 ms: dealii::VectorTools::internal::project_matrix_free_component<1, double, 1> (4 times, avg 3709 ms)
 14662 ms: dealii::VectorTools::internal::project_matrix_free_copy_vector<2, dealii::Vector<double>, 2> (4 times, avg 3665 ms)
 14657 ms: dealii::VectorTools::internal::project_matrix_free_component<2, double, 2> (4 times, avg 3664 ms)
 13879 ms: dealii::Triangulation<1, 1> (294 times, avg 47 ms)
 13647 ms: dealii::VectorTools::internal::project<dealii::Vector<float>, 1> (4 times, avg 3411 ms)
 13141 ms: dealii::VectorTools::internal::project_matrix_free_copy_vector<1, dealii::Vector<float>, 1> (4 times, avg 3285 ms)
 13136 ms: dealii::VectorTools::internal::project_matrix_free_component<1, float, 1> (4 times, avg 3284 ms)
 13053 ms: dealii::Triangulation<3, 3> (294 times, avg 44 ms)
 12836 ms: dealii::Triangulation<2, 2> (294 times, avg 43 ms)
 12804 ms: dealii::VectorTools::internal::project<dealii::Vector<float>, 2> (4 times, avg 3201 ms)
 12738 ms: dealii::VectorTools::internal::project_matrix_free_copy_vector<2, dealii::Vector<float>, 2> (4 times, avg 3184 ms)
 12735 ms: dealii::VectorTools::internal::project_matrix_free_component<2, float, 2> (4 times, avg 3183 ms)
 12681 ms: dealii::SolverBase<dealii::Vector<double> >::SolverBase (96 times, avg 132 ms)
 12084 ms: dealii::hp::DoFHandler<2, 3> (280 times, avg 43 ms)
 10689 ms: dealii::VectorTools::internal::project_matrix_free_degree<1, 1, double, 1> (4 times, avg 2672 ms)
 10514 ms: dealii::internal::EvaluationSelectorImplementation::symmetric_selector_evaluate<1, 1, dealii::VectorizedArray<double, 2> > (16 times, avg 657 ms)
 10510 ms: dealii::internal::EvaluationSelectorImplementation::Factory<1, 1, dealii::VectorizedArray<double, 2>, 0, 0, 0, void>::evaluate (16 times, avg 656 ms)
 10185 ms: dealii::VectorTools::internal::project_matrix_free_degree<1, 2, double, 2> (4 times, avg 2546 ms)
  9676 ms: std::_Function_base::_Base_manager<(lambda at /home/6da/dealii-clang-9/include/deal.II/base/thread_management.h:1742:7)>::_M_init_functor (532 times, avg 18 ms)
  9376 ms: dealii::FEEvaluation<2, -1, 1, 1, double, dealii::VectorizedArray<double, 2> >::integrate (12 times, avg 781 ms)
  9126 ms: dealii::VectorTools::internal::project_matrix_free_degree<1, 1, float, 1> (4 times, avg 2281 ms)
  9006 ms: dealii::internal::EvaluationSelectorImplementation::Factory<1, 1, dealii::VectorizedArray<double, 2>, 0, 1, 0, void>::evaluate (16 times, avg 562 ms)
  9000 ms: dealii::FEEvaluation<2, -1, 1, 1, float, dealii::VectorizedArray<float, 4> >::integrate (12 times, avg 750 ms)
  8886 ms: dealii::Triangulation<2, 3> (294 times, avg 30 ms)
  8793 ms: dealii::VectorTools::internal::project<dealii::Vector<double>, 3> (2 times, avg 4396 ms)
  8745 ms: dealii::Triangulation<1, 3> (294 times, avg 29 ms)

**** Template sets that took longest to instantiate:
212275 ms: std::__and_<$> (145578 times, avg 1 ms)
190602 ms: std::unique_ptr<$> (14596 times, avg 13 ms)
177073 ms: std::__uniq_ptr_impl<$> (14596 times, avg 12 ms)
117799 ms: std::is_constructible<$> (93372 times, avg 1 ms)
115306 ms: std::__is_constructible_impl<$> (92948 times, avg 1 ms)
114108 ms: std::__is_direct_constructible<$> (92593 times, avg 1 ms)
112142 ms: std::__is_direct_constructible_new<$> (91603 times, avg 1 ms)
 96579 ms: std::__is_direct_constructible_new_safe<$> (76153 times, avg 1 ms)
 90263 ms: dealii::internal::EvaluatorTensorProduct<$>::apply<$> (31498 times, avg 2 ms)
 73890 ms: dealii::VectorTools::internal::project<$> (100 times, avg 738 ms)
 72112 ms: dealii::VectorTools::internal::project_matrix_free_copy_vector<$> (77 times, avg 936 ms)
 72042 ms: dealii::VectorTools::internal::project_matrix_free_component<$> (20 times, avg 3602 ms)
 72030 ms: dealii::VectorTools::internal::project_matrix_free_degree<$> (80 times, avg 900 ms)
 71973 ms: dealii::VectorTools::internal::project_matrix_free<$> (320 times, avg 224 ms)
 71138 ms: dealii::SelectEvaluator<$>::evaluate (500 times, avg 142 ms)
 66791 ms: dealii::internal::EvaluationSelectorImplementation::symmetric_selector_evaluate<$> (174 times, avg 383 ms)
 66746 ms: dealii::internal::EvaluationSelectorImplementation::Factory<$>::evaluate (174 times, avg 383 ms)
 65999 ms: dealii::Triangulation<$> (1764 times, avg 37 ms)
 61163 ms: std::is_destructible<$> (57952 times, avg 1 ms)
 52997 ms: std::_TC<$>::_ConstructibleTuple<$> (15849 times, avg 3 ms)
 52965 ms: dealii::VectorTools::internal::project_parallel<$> (288 times, avg 183 ms)
 52344 ms: std::tuple<$> (16303 times, avg 3 ms)
 52079 ms: dealii::SelectEvaluator<$>::integrate (464 times, avg 112 ms)
 51771 ms: dealii::FEEvaluation<$>::integrate (425 times, avg 121 ms)
 50216 ms: dealii::internal::EvaluationSelectorImplementation::symmetric_selector_integrate<$> (140 times, avg 358 ms)
 50185 ms: dealii::internal::EvaluationSelectorImplementation::Factory<$>::integrate (140 times, avg 358 ms)
 49722 ms: std::vector<$> (36151 times, avg 1 ms)
 49356 ms: dealii::internal::EvaluatorTensorProduct<$>::values<$> (18176 times, avg 2 ms)
 46162 ms: dealii::FEEvaluation<$>::evaluate (430 times, avg 107 ms)
 44119 ms: boost::signals2::signal<$> (6152 times, avg 7 ms)

**** Functions that took longest to compile:
  6706 ms: dealii::Triangulation<3, 3>::DistortedCellList dealii::internal::TriangulationImplementation::Implementation::execute_refinement<3>(dealii::Triangulation<3, 3>&, bool) (/home/6da/dealii-clang-9/source/grid/tria.cc)
  1483 ms: dealii::FE_Nedelec<3>::initialize_restriction() (source/fe/unity_1.cc)
  1334 ms: void dealii::MatrixFree<1, double, dealii::VectorizedArray<double, 2ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<1, double, dealii::VectorizedArray<double, 2ul> >::AdditionalData const&) (/home/6da/dealii-clang-9/source/matrix_free/matrix_free_inst3.cc)
  1315 ms: void dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 1ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 1ul> >::AdditionalData const&) (/home/6da/dealii-clang-9/source/matrix_free/matrix_free_inst3.cc)
  1305 ms: dealii::GridOut::write_svg(dealii::Triangulation<2, 2> const&, std::ostream&) const (/home/6da/dealii-clang-9/source/grid/grid_out.cc)
  1298 ms: void dealii::MatrixFree<1, float, dealii::VectorizedArray<float, 4ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<1, float, dealii::VectorizedArray<float, 4ul> >::AdditionalData const&) (/home/6da/dealii-clang-9/source/matrix_free/matrix_free_inst2.cc)
  1294 ms: dealii::TimeStepping::EmbeddedExplicitRungeKutta<dealii::LinearAlgebra::distributed::BlockVector<float> >::initialize(dealii::TimeStepping::runge_kutta_method) (source/base/unity_2.cc)
  1290 ms: void dealii::MatrixFree<1, float, dealii::VectorizedArray<float, 1ul> >::initialize_indices<float>(std::vector<dealii::AffineConstraints<float> const*, std::allocator<dealii::AffineConstraints<float> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<1, float, dealii::VectorizedArray<float, 1ul> >::AdditionalData const&) (/home/6da/dealii-clang-9/source/matrix_free/matrix_free_inst2.cc)
  1285 ms: void dealii::MatrixFree<1, float, dealii::VectorizedArray<float, 1ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<1, float, dealii::VectorizedArray<float, 1ul> >::AdditionalData const&) (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1285 ms: void dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 2ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 2ul> >::AdditionalData const&) (/home/6da/dealii-clang-9/source/matrix_free/matrix_free_inst2.cc)
  1284 ms: void dealii::MatrixFree<2, float, dealii::VectorizedArray<float, 4ul> >::initialize_indices<float>(std::vector<dealii::AffineConstraints<float> const*, std::allocator<dealii::AffineConstraints<float> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<2, float, dealii::VectorizedArray<float, 4ul> >::AdditionalData const&) (/home/6da/dealii-clang-9/source/matrix_free/matrix_free_inst2.cc)
  1273 ms: void dealii::MatrixFree<2, double, dealii::VectorizedArray<double, 2ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<2, double, dealii::VectorizedArray<double, 2ul> >::AdditionalData const&) (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1270 ms: void dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 4ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 4ul> >::AdditionalData const&) (/home/6da/dealii-clang-9/source/matrix_free/matrix_free_inst3.cc)
  1266 ms: void dealii::MatrixFree<1, float, dealii::VectorizedArray<float, 4ul> >::initialize_indices<float>(std::vector<dealii::AffineConstraints<float> const*, std::allocator<dealii::AffineConstraints<float> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<1, float, dealii::VectorizedArray<float, 4ul> >::AdditionalData const&) (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1266 ms: void dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 4ul> >::initialize_indices<float>(std::vector<dealii::AffineConstraints<float> const*, std::allocator<dealii::AffineConstraints<float> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 4ul> >::AdditionalData const&) (/home/6da/dealii-clang-9/source/matrix_free/matrix_free_inst3.cc)
  1264 ms: void dealii::MatrixFree<1, double, dealii::VectorizedArray<double, 1ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<1, double, dealii::VectorizedArray<double, 1ul> >::AdditionalData const&) (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1260 ms: void dealii::MatrixFree<2, double, dealii::VectorizedArray<double, 1ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<2, double, dealii::VectorizedArray<double, 1ul> >::AdditionalData const&) (/home/6da/dealii-clang-9/source/matrix_free/matrix_free_inst2.cc)
  1253 ms: void dealii::MatrixFree<2, float, dealii::VectorizedArray<float, 1ul> >::initialize_indices<float>(std::vector<dealii::AffineConstraints<float> const*, std::allocator<dealii::AffineConstraints<float> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<2, float, dealii::VectorizedArray<float, 1ul> >::AdditionalData const&) (/home/6da/dealii-clang-9/source/matrix_free/matrix_free_inst3.cc)
  1252 ms: dealii::TimeStepping::EmbeddedExplicitRungeKutta<dealii::Vector<float> >::initialize(dealii::TimeStepping::runge_kutta_method) (source/base/unity_2.cc)
  1249 ms: void dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 1ul> >::initialize_indices<float>(std::vector<dealii::AffineConstraints<float> const*, std::allocator<dealii::AffineConstraints<float> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 1ul> >::AdditionalData const&) (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1247 ms: void dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 1ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<3, float, dealii::VectorizedArray<float, 1ul> >::AdditionalData const&) (/home/6da/dealii-clang-9/source/matrix_free/matrix_free_inst2.cc)
  1247 ms: void dealii::MatrixFree<2, float, dealii::VectorizedArray<float, 4ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<2, float, dealii::VectorizedArray<float, 4ul> >::AdditionalData const&) (/home/6da/dealii-clang-9/source/matrix_free/matrix_free.cc)
  1239 ms: void dealii::MatrixFree<2, float, dealii::VectorizedArray<float, 1ul> >::initialize_indices<double>(std::vector<dealii::AffineConstraints<double> const*, std::allocator<dealii::AffineConstraints<double> const*> > const&, std::vector<dealii::IndexSet, std::allocator<dealii::IndexSet> > const&, dealii::MatrixFree<2, float, dealii::VectorizedArray<float, 1ul> >::AdditionalData const&) (/home/6da/dealii-clang-9/source/matrix_free/matrix_free_inst3.cc)
  1221 ms: dealii::TimeStepping::EmbeddedExplicitRungeKutta<dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> >::initialize(dealii::TimeStepping::runge_kutta_method) (source/base/unity_2.cc)
  1168 ms: dealii::TimeStepping::EmbeddedExplicitRungeKutta<dealii::BlockVector<float> >::initialize(dealii::TimeStepping::runge_kutta_method) (source/base/unity_2.cc)
  1163 ms: void dealii::internal::TriangulationImplementation::Implementation::create_triangulation<3>(std::vector<dealii::Point<3, double>, std::allocator<dealii::Point<3, double> > > const&, std::vector<dealii::CellData<3>, std::allocator<dealii::CellData<3> > > const&, dealii::SubCellData const&, dealii::Triangulation<3, 3>&) (/home/6da/dealii-clang-9/source/grid/tria.cc)
  1160 ms: void dealii::internal::TriangulationImplementation::Implementation::delete_children<3>(dealii::Triangulation<3, 3>&, dealii::Triangulation<3, 3>::cell_iterator&, std::vector<unsigned int, std::allocator<unsigned int> >&, std::vector<unsigned int, std::allocator<unsigned int> >&) (/home/6da/dealii-clang-9/source/grid/tria.cc)
  1122 ms: dealii::TimeStepping::EmbeddedExplicitRungeKutta<dealii::LinearAlgebra::distributed::BlockVector<double> >::initialize(dealii::TimeStepping::runge_kutta_method) (source/base/unity_2.cc)
  1078 ms: dealii::TimeStepping::EmbeddedExplicitRungeKutta<dealii::LinearAlgebra::distributed::Vector<float, dealii::MemorySpace::Host> >::initialize(dealii::TimeStepping::runge_kutta_method) (source/base/unity_2.cc)
  1063 ms: dealii::TimeStepping::EmbeddedExplicitRungeKutta<dealii::BlockVector<double> >::initialize(dealii::TimeStepping::runge_kutta_method) (source/base/unity_2.cc)

**** Function sets that took longest to compile / optimize:
 92225 ms: dealii::internal::FEEvaluationImpl<$>::evaluate(dealii::internal::MatrixFreeFunctions::ShapeInfo<$> const&, dealii::VectorizedArray<$> const*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, bool, bool, bool) (1131 times, avg 81 ms)
 89058 ms: void dealii::internal::EvaluatorTensorProduct<$>::apply<$>(dealii::VectorizedArray<$> const*, dealii::VectorizedArray<$> const*, dealii::VectorizedArray<$>*) (6345 times, avg 14 ms)
 65149 ms: dealii::SolutionTransfer<$>::prepare_for_coarsening_and_refinement(std::vector<$> const&) (720 times, avg 90 ms)
 49994 ms: dealii::internal::FEEvaluationImpl<$>::integrate(dealii::internal::MatrixFreeFunctions::ShapeInfo<$> const&, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, bool, bool, bool) (861 times, avg 58 ms)
 48898 ms: dealii::SolutionTransfer<$>::interpolate(std::vector<$> const&, std::vector<$>&) const (720 times, avg 67 ms)
 43853 ms: dealii::internal::FEEvaluationImplCollocation<$>::evaluate(dealii::internal::MatrixFreeFunctions::ShapeInfo<$> const&, dealii::VectorizedArray<$> const*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, bool, bool, bool) (832 times, avg 52 ms)
 29567 ms: dealii::internal::FEEvaluationImplTransformToCollocation<$>::integrate(dealii::internal::MatrixFreeFunctions::ShapeInfo<$> const&, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, bool, bool, bool) (980 times, avg 30 ms)
 27128 ms: void dealii::KellyErrorEstimator<$>::estimate<$>(dealii::Mapping<$> const&, dealii::hp::DoFHandler<$> const&, dealii::Quadrature<$> const&, std::map<$> const&, std::vector<$> const&, std::vector<$>&, dealii::ComponentMask const&, dealii::Function<$> const*, unsigned int, unsigned int, unsigned int, dealii::KellyErrorEstimator<$>::Strategy) (240 times, avg 113 ms)
 27114 ms: void dealii::KellyErrorEstimator<$>::estimate<$>(dealii::Mapping<$> const&, dealii::DoFHandler<$> const&, dealii::Quadrature<$> const&, std::map<$> const&, std::vector<$> const&, std::vector<$>&, dealii::ComponentMask const&, dealii::Function<$> const*, unsigned int, unsigned int, unsigned int, dealii::KellyErrorEstimator<$>::Strategy) (240 times, avg 112 ms)
 25757 ms: dealii::internal::FEEvaluationImplTransformToCollocation<$>::evaluate(dealii::internal::MatrixFreeFunctions::ShapeInfo<$> const&, dealii::VectorizedArray<$> const*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, bool, bool, bool) (1249 times, avg 20 ms)
 23273 ms: dealii::internal::FEEvaluationImplCollocation<$>::integrate(dealii::internal::MatrixFreeFunctions::ShapeInfo<$> const&, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, bool, bool, bool) (579 times, avg 40 ms)
 23205 ms: void dealii::MatrixFree<$>::initialize_indices<$>(std::vector<$> const&, std::vector<$> const&, dealii::MatrixFree<$>::AdditionalData const&) (54 times, avg 429 ms)
 18798 ms: std::vector<$>::_M_default_append(unsigned long) (1044 times, avg 18 ms)
 17178 ms: void dealii::FETools::interpolate<$>(dealii::DoFHandler<$> const&, dealii::Vector<$> const&, dealii::DoFHandler<$> const&, dealii::AffineConstraints<$> const&, dealii::AffineConstraints<$>&) (180 times, avg 95 ms)
 17158 ms: dealii::SolutionTransfer<$>::prepare_for_pure_refinement() (241 times, avg 71 ms)
 17046 ms: void dealii::FEValuesViews::internal::do_function_derivatives<$>(dealii::ArrayView<$> const&, dealii::Table<$> const&, std::vector<$> const&, std::vector<$>&) (297 times, avg 57 ms)
 16878 ms: dealii::FE_Poly<$>::get_data(dealii::UpdateFlags, dealii::Mapping<$> const&, dealii::Quadrature<$> const&, dealii::internal::FEValuesImplementation::FiniteElementRelatedData<$>&) const (501 times, avg 33 ms)
 15246 ms: void dealii::SolverCG<$>::solve<$>(dealii::MatrixFreeOperators::MassOperator<$> const&, dealii::LinearAlgebra::distributed::Vector<$>&, dealii::LinearAlgebra::distributed::Vector<$> const&, dealii::PreconditionJacobi<$> const&) (432 times, avg 35 ms)
 14996 ms: dealii::internal::EvaluationSelectorImplementation::Factory<$>::evaluate(dealii::internal::MatrixFreeFunctions::ShapeInfo<$> const&, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, bool, bool, bool) (2071 times, avg 7 ms)
 14665 ms: dealii::Functions::FEFieldFunction<$>::vector_value_list(std::vector<$> const&, std::vector<$>&) const (242 times, avg 60 ms)
 14623 ms: dealii::Functions::FEFieldFunction<$>::vector_laplacian_list(std::vector<$> const&, std::vector<$>&) const (241 times, avg 60 ms)
 14206 ms: dealii::Functions::FEFieldFunction<$>::vector_gradient_list(std::vector<$> const&, std::vector<$>&) const (250 times, avg 56 ms)
 14118 ms: void dealii::FETools::interpolate<$>(dealii::DoFHandler<$> const&, dealii::BlockVector<$> const&, dealii::DoFHandler<$> const&, dealii::AffineConstraints<$> const&, dealii::AffineConstraints<$>&) (144 times, avg 98 ms)
 13927 ms: std::_Rb_tree<$>::_M_get_insert_hint_unique_pos(std::_Rb_tree_const_iterator<$>, std::pair<$> const&) (559 times, avg 24 ms)
 13727 ms: void dealii::MatrixCreator::internal::mass_assembler<$>(dealii::TriaActiveIterator<$> const&, dealii::MatrixCreator::internal::AssemblerData::Scratch<$>&, dealii::MatrixCreator::internal::AssemblerData::CopyData<$>&) (144 times, avg 95 ms)
 13633 ms: void tbb::interface9::internal::dynamic_grainsize_mode<$>::work_balance<$>(tbb::interface9::internal::start_for<$>&, tbb::blocked_range<$>&) (318 times, avg 42 ms)
 12934 ms: void dealii::FETools::interpolate<$>(dealii::DoFHandler<$> const&, dealii::LinearAlgebra::Vector<$> const&, dealii::DoFHandler<$> const&, dealii::AffineConstraints<$> const&, dealii::AffineConstraints<$>&) (144 times, avg 89 ms)
 12808 ms: void dealii::MGTransferMatrixFree<$>::do_prolongate_add<$>(unsigned int, dealii::LinearAlgebra::distributed::Vector<$>&, dealii::LinearAlgebra::distributed::Vector<$> const&) const (216 times, avg 59 ms)
 12392 ms: void dealii::NonMatching::create_coupling_sparsity_pattern<$>(double const&, dealii::GridTools::Cache<$> const&, dealii::GridTools::Cache<$> const&, dealii::DoFHandler<$> const&, dealii::DoFHandler<$> const&, dealii::Quadrature<$> const&, dealii::BlockSparsityPattern&, dealii::AffineConstraints<$> const&, dealii::ComponentMask const&, dealii::ComponentMask const&) (168 times, avg 73 ms)
 12378 ms: void dealii::FEEvaluationBase<$>::read_write_operation<$>(dealii::internal::VectorReader<$> const&, dealii::LinearAlgebra::distributed::Vector<$>**, std::bitset<$> const&, bool) const (135 times, avg 91 ms)

*** Expensive headers:
450383 ms: /home/6da/dealii-clang-9/include/deal.II/grid/tria.h (included 298 times, avg 1511 ms), included via:
  step-49.cc.json  (4010 ms)
  step-3.cc.json  (3973 ms)
  step-4.cc.json  (3918 ms)
  step-1.cc.json  (3876 ms)
  step-2.cc.json  (3865 ms)
  step-53.cc.json  (3847 ms)
  ...

385852 ms: /home/6da/dealii-clang-9/include/deal.II/base/utilities.h (included 344 times, avg 1121 ms), included via:
  grid_refinement.cc  (2160 ms)
  step-26.cc.json  (2136 ms)
  data_out_base.cc.json data_out_base.h geometry_info.h point.h tensor.h  (1967 ms)
  mapping_q1_eulerian.cc.json dof_accessor.h dof_handler.h function.h point.h tensor.h  (1958 ms)
  step-48.cc.json  (1892 ms)
  symmetric_tensor.cc.json symmetric_tensor.h tensor.h  (1831 ms)
  ...

329077 ms: /home/6da/dealii-clang-9/include/deal.II/base/point.h (included 322 times, avg 1021 ms), included via:
  data_out_base.cc.json data_out_base.h geometry_info.h  (2767 ms)
  mapping_q1_eulerian.cc.json dof_accessor.h dof_handler.h function.h  (2591 ms)
  mapping_q_eulerian.cc.json quadrature_lib.h quadrature.h  (2325 ms)
  step-61.cc.json quadrature.h  (2311 ms)
  scalar_polynomials_base.cc quadrature_lib.h quadrature.h  (2216 ms)
  step-59.cc.json quadrature_lib.h quadrature.h  (2194 ms)
  ...

238927 ms: /home/6da/dealii-clang-9/include/deal.II/dofs/dof_handler.h (included 280 times, avg 853 ms), included via:
  mapping_q1_eulerian.cc.json dof_accessor.h  (5237 ms)
  dof_accessor.cc.json dof_accessor.h  (3588 ms)
  mapping_q1_eulerian.cc.json dof_accessor.h  (3539 ms)
  dof_accessor_set.cc.json dof_accessor.h  (3520 ms)
  dof_accessor_get.cc.json dof_accessor.h  (3520 ms)
  dof_accessor_get.cc.json dof_accessor.h  (3490 ms)
  ...

215587 ms: /home/6da/dealii-clang-9/include/deal.II/dofs/dof_accessor.h (included 280 times, avg 769 ms), included via:
  mapping_q1_eulerian.cc.json  (5388 ms)
  dof_accessor.cc.json  (4171 ms)
  dof_accessor_set.cc.json  (4133 ms)
  dof_accessor_get.cc.json  (4104 ms)
  fe_field_function.cc.json  (4060 ms)
  dof_accessor_get.cc.json  (4034 ms)
  ...

166097 ms: /home/6da/dealii-clang-9/include/deal.II/base/quadrature_lib.h (included 262 times, avg 633 ms), included via:
  mapping_q_eulerian.cc.json  (3178 ms)
  step-59.cc.json  (2770 ms)
  scalar_polynomials_base.cc  (2702 ms)
  step-7.cc.json  (2492 ms)
  step-24.cc.json  (2416 ms)
  step-46.cc.json  (2335 ms)
  ...

165257 ms: include/deal.II/base/config.h (included 380 times, avg 434 ms), included via:
  step-61.cc.json quadrature.h  (807 ms)
  symengine_utilities.cc.json  (777 ms)
  mapping_q_eulerian.cc.json quadrature_lib.h  (758 ms)
  particle.cc.json signaling_nan.h  (743 ms)
  grid_generator.cc.json fully_distributed_tria.h  (715 ms)
  step-24.cc.json quadrature_lib.h  (714 ms)
  ...

153974 ms: /home/6da/dealii-clang-9/include/deal.II/base/numbers.h (included 380 times, avg 405 ms), included via:
  step-61.cc.json quadrature.h config.h  (763 ms)
  symengine_utilities.cc.json config.h  (729 ms)
  mapping_q_eulerian.cc.json quadrature_lib.h config.h  (697 ms)
  particle.cc.json signaling_nan.h config.h  (695 ms)
  step-24.cc.json quadrature_lib.h config.h  (669 ms)
  grid_generator.cc.json fully_distributed_tria.h config.h  (667 ms)
  ...

151117 ms: /home/6da/dealii-clang-9/include/deal.II/grid/grid_tools.h (included 98 times, avg 1542 ms), included via:
  particle_handler.cc.json  (5353 ms)
  particle_handler.cc.json  (5146 ms)
  grid_tools_nontemplates.cc.json  (3937 ms)
  grid_tools_nontemplates.cc.json  (3712 ms)
  grid_tools_cache.cc.json  (3250 ms)
  grid_tools_cache.cc.json  (3000 ms)
  ...

148374 ms: /home/6da/dealii-clang-9/include/deal.II/base/quadrature.h (included 270 times, avg 549 ms), included via:
  step-61.cc.json  (3156 ms)
  mapping_q_eulerian.cc.json quadrature_lib.h  (2386 ms)
  scalar_polynomials_base.cc quadrature_lib.h  (2262 ms)
  step-59.cc.json quadrature_lib.h  (2233 ms)
  smoothness_estimator.cc.json  (2185 ms)
  smoothness_estimator.cc.json  (2082 ms)
  ...

  done in 15.8s.

@kronbichler
Copy link
Member

#10328 must have helped on this issue. I observed for the size of the compiled library (which is at least somehow related to compile times) for the current master compared to the last time that particular machine compiled the 9.2 branch before release:

-rwxr-xr-x. 1 kronbichler users 1573302472 11. Mai 07:21 libdeal_II.g.so.9.2.0-pre*
-rwxr-xr-x. 1 kronbichler users 1485337104 16. Jun 10:28 libdeal_II.g.so.9.3.0-pre*
-rwxr-xr-x. 1 kronbichler users  242043384 11. Mai 07:22 libdeal_II.so.9.2.0-pre*
-rwxr-xr-x. 1 kronbichler users  230672424 16. Jun 10:28 libdeal_II.so.9.3.0-pre*

#10334 will bring things down by another 1% or so, so we seem to progress in the right direction.

@masterleinad
Copy link
Member

These are my latest results for 9be00d2

Analyzing build trace from 'clang_analyze.txt'...
**** Time summary:
Compilation (1335 times):
  Parsing (frontend):         5745.9 s
  Codegen & opts (backend):   7799.5 s

**** Files that took longest to parse (compiler frontend):
 23890 ms: /source/numerics/CMakeFiles/obj_numerics_debug.dir/vector_tools_project_codim.cc.json
 21733 ms: /source/numerics/CMakeFiles/obj_numerics_debug.dir/vector_tools_project_qp.cc.json
 21261 ms: /source/numerics/CMakeFiles/obj_numerics_debug.dir/vector_tools_project_qpmf.cc.json
 20619 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_codim.cc.json
 20505 ms: /source/matrix_free/CMakeFiles/obj_matrix_free_debug.dir/evaluation_selector.cc.json
 19949 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_qp.cc.json
 19468 ms: /source/grid/CMakeFiles/obj_grid_release.dir/grid_tools_2.cc.json
 19415 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_qpmf.cc.json
 18529 ms: /source/numerics/CMakeFiles/obj_numerics_debug.dir/vector_tools_project.cc.json
 17879 ms: /source/matrix_free/CMakeFiles/obj_matrix_free_release.dir/evaluation_selector.cc.json

**** Files that took longest to codegen (compiler backend):
174028 ms: /source/dofs/CMakeFiles/obj_dofs_release.dir/dof_tools_sparsity.cc.json
169927 ms: /source/non_matching/CMakeFiles/obj_non_matching_release.dir/coupling.cc.json
128658 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_interpolate.cc.json
 90782 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/fe_field_function.cc.json
 84953 ms: /source/numerics/CMakeFiles/obj_numerics_release.dir/vector_tools_project_inst3.cc.json
 81231 ms: /source/fe/CMakeFiles/obj_fe_release.dir/fe_values_inst3.cc.json
 79888 ms: /source/fe/CMakeFiles/obj_fe_release.dir/fe_values_inst6.cc.json
 76777 ms: /source/base/CMakeFiles/obj_base_release.dir/symmetric_tensor.cc.json
 72442 ms: /source/fe/CMakeFiles/obj_fe_debug.dir/fe_values_inst3.cc.json
 71579 ms: /source/fe/CMakeFiles/obj_fe_release.dir/fe_tools_extrapolate.cc.json

**** Templates that took longest to instantiate:
 28438 ms: Teuchos::PerformanceMonitorBase<Teuchos::Time>::getNewCounter (792 times, avg 35 ms)
 26341 ms: Teuchos::StringIndexedOrderedValueObjectContainer<Teuchos::ParameterEntry>::setObj (792 times, avg 33 ms)
 17892 ms: std::map<std::__cxx11::basic_string<char>, Teuchos::StringIndexedOrderedValueObjectContainerBase::OrdinalIndex, std::less<std::__cxx11::basic_string<char> >, std::allocator<std::pair<const std::__cxx11::basic_string<char>, Teuchos::StringIndexedOrderedValueObjectContainerBase::OrdinalIndex> > >::operator[] (792 times, avg 22 ms)
 17645 ms: boost::archive::detail::common_iarchive<boost::archive::binary_iarchive>::vload (2868 times, avg 6 ms)
 15368 ms: dealii::VectorTools::internal::project<dealii::Vector<double>, 1> (4 times, avg 3842 ms)
 15355 ms: dealii::Triangulation<2, 2> (462 times, avg 33 ms)
 15300 ms: dealii::Triangulation<1, 1> (462 times, avg 33 ms)
 14778 ms: dealii::VectorTools::internal::project_matrix_free_copy_vector<1, dealii::Vector<double>, 1> (4 times, avg 3694 ms)
 14772 ms: dealii::VectorTools::internal::project_matrix_free_component<1, double, 1> (4 times, avg 3693 ms)
 14657 ms: dealii::VectorTools::internal::project<dealii::Vector<float>, 1> (4 times, avg 3664 ms)
 14442 ms: dealii::VectorTools::internal::project<dealii::Vector<double>, 2> (4 times, avg 3610 ms)
 14255 ms: dealii::VectorTools::internal::project_matrix_free_copy_vector<2, dealii::Vector<double>, 2> (4 times, avg 3563 ms)
 14251 ms: dealii::VectorTools::internal::project_matrix_free_component<2, double, 2> (4 times, avg 3562 ms)
 14162 ms: dealii::VectorTools::internal::project_matrix_free_copy_vector<1, dealii::Vector<float>, 1> (4 times, avg 3540 ms)
 14155 ms: dealii::VectorTools::internal::project_matrix_free_component<1, float, 1> (4 times, avg 3538 ms)
 14038 ms: std::_Rb_tree<std::__cxx11::basic_string<char>, std::pair<const std::__cxx11::basic_string<char>, Teuchos::StringIndexedOrderedValueObjectContainerBase::OrdinalIndex>, std::_Select1st<std::pair<const std::__cxx11::basic_string<char>, Teuchos::StringIndexedOrderedValueObjectContainerBase::OrdinalIndex> >, std::less<std::__cxx11::basic_string<char> >, std::allocator<std::pair<const std::__cxx11::basic_string<char>, Teuchos::StringIndexedOrderedValueObjectContainerBase::OrdinalIndex> > >::_M_emplace_hint_unique<const std::piecewise_construct_t &, std::tuple<const std::__cxx11::basic_string<char> &>, std::tuple<> > (792 times, avg 17 ms)
 14000 ms: boost::archive::basic_binary_iarchive<boost::archive::binary_iarchive>::load_override (1489 times, avg 9 ms)
 13821 ms: dealii::Triangulation<2, 3> (462 times, avg 29 ms)
 13695 ms: dealii::Triangulation<1, 3> (462 times, avg 29 ms)
 13460 ms: dealii::Triangulation<1, 2> (462 times, avg 29 ms)
 13392 ms: dealii::Triangulation<3, 3> (462 times, avg 28 ms)
 13342 ms: dealii::SolverCG<dealii::Vector<double> >::SolverCG (76 times, avg 175 ms)
 12826 ms: dealii::VectorTools::internal::project<dealii::Vector<float>, 2> (4 times, avg 3206 ms)
 12773 ms: dealii::VectorTools::internal::project_matrix_free_copy_vector<2, dealii::Vector<float>, 2> (4 times, avg 3193 ms)
 12769 ms: dealii::VectorTools::internal::project_matrix_free_component<2, float, 2> (4 times, avg 3192 ms)
 12155 ms: boost::variant<boost::shared_ptr<void>, boost::signals2::detail::foreign_void_shared_ptr> (470 times, avg 25 ms)
 11873 ms: Teuchos::basic_FancyOStream<char, std::char_traits<char> >::basic_FancyOStream (792 times, avg 14 ms)
 11524 ms: boost::apply_visitor<boost::signals2::detail::lock_weak_ptr_visitor, const boost::variant<boost::weak_ptr<boost::signals2::detail::trackable_pointee>, boost::weak_ptr<void>, boost::signals2::detail::foreign_void_weak_ptr> &> (470 times, avg 24 ms)
 11523 ms: dealii::Triangulation<2, 2>::Signals (462 times, avg 24 ms)
 11363 ms: boost::variant<boost::weak_ptr<boost::signals2::detail::trackable_pointee>, boost::weak_ptr<void>, boost::signals2::detail::foreign_void_weak_ptr>::apply_visitor<const boost::signals2::detail::lock_weak_ptr_visitor> (470 times, avg 24 ms)

**** Template sets that took longest to instantiate:
158932 ms: std::unique_ptr<$> (21075 times, avg 7 ms)
138911 ms: std::__uniq_ptr_impl<$> (21075 times, avg 6 ms)
 87141 ms: dealii::internal::EvaluatorTensorProduct<$>::apply<$> (31738 times, avg 2 ms)
 85025 ms: dealii::Triangulation<$> (2772 times, avg 30 ms)
 82341 ms: std::__and_<$> (104613 times, avg 0 ms)
 75752 ms: std::vector<$> (53478 times, avg 1 ms)
 73604 ms: dealii::VectorTools::internal::project<$> (130 times, avg 566 ms)
 72047 ms: dealii::VectorTools::internal::project_matrix_free_copy_vector<$> (104 times, avg 692 ms)
 71943 ms: dealii::VectorTools::internal::project_matrix_free_component<$> (20 times, avg 3597 ms)
 71931 ms: dealii::VectorTools::internal::project_matrix_free_degree<$> (80 times, avg 899 ms)
 71886 ms: dealii::VectorTools::internal::project_matrix_free<$> (320 times, avg 224 ms)
 70591 ms: dealii::SelectEvaluator<$>::evaluate (508 times, avg 138 ms)
 70065 ms: boost::signals2::signal<$> (9526 times, avg 7 ms)
 65839 ms: dealii::internal::EvaluationSelectorImplementation::symmetric_selector_evaluate<$> (174 times, avg 378 ms)
 65791 ms: dealii::internal::EvaluationSelectorImplementation::Factory<$>::evaluate (174 times, avg 378 ms)
 65644 ms: std::map<$> (25701 times, avg 2 ms)
 65592 ms: std::tuple<$> (23870 times, avg 2 ms)
 62809 ms: dealii::Triangulation<$>::Signals (2772 times, avg 22 ms)
 61569 ms: boost::signals2::detail::signal_impl<$> (9526 times, avg 6 ms)
 54850 ms: dealii::WorkStream::run<$> (1000 times, avg 54 ms)
 53843 ms: std::_Rb_tree<$>::_M_emplace_hint_unique<$> (4239 times, avg 12 ms)
 51718 ms: dealii::SelectEvaluator<$>::integrate (472 times, avg 109 ms)
 51310 ms: dealii::FEEvaluation<$>::integrate (433 times, avg 118 ms)
 49242 ms: dealii::internal::EvaluationSelectorImplementation::symmetric_selector_integrate<$> (140 times, avg 351 ms)
 49210 ms: dealii::internal::EvaluationSelectorImplementation::Factory<$>::integrate (140 times, avg 351 ms)
 49194 ms: dealii::VectorTools::internal::project_parallel<$> (384 times, avg 128 ms)
 47806 ms: dealii::internal::EvaluatorTensorProduct<$>::values<$> (18320 times, avg 2 ms)
 46707 ms: dealii::FEEvaluation<$>::evaluate (438 times, avg 106 ms)
 44628 ms: boost::signals2::detail::grouped_list<$> (9526 times, avg 4 ms)
 40810 ms: dealii::internal::FEEvaluationImpl<$>::evaluate (5026 times, avg 8 ms)

**** Functions that took longest to compile:
  6070 ms: dealii::Triangulation<3, 3>::DistortedCellList dealii::internal::TriangulationImplementation::Implementation::execute_refinement<3>(dealii::Triangulation<3, 3>&, bool) (/home/6da/dealii-clang-9/source/grid/tria.cc)
  1876 ms: std::array<std::pair<Sacado::Fad::DFad<double>, dealii::Tensor<1, 3, Sacado::Fad::DFad<double> > >, 3> dealii::internal::SymmetricTensorImplementation::jacobi<3, Sacado::Fad::DFad<double> >(dealii::SymmetricTensor<2, 3, Sacado::Fad::DFad<double> >) (/home/6da/dealii-clang-9/source/base/symmetric_tensor.cc)
  1801 ms: std::array<std::pair<Sacado::Fad::DFad<Sacado::Fad::DFad<double> >, dealii::Tensor<1, 3, Sacado::Fad::DFad<Sacado::Fad::DFad<double> > > >, 3> dealii::internal::SymmetricTensorImplementation::jacobi<3, Sacado::Fad::DFad<Sacado::Fad::DFad<double> > >(dealii::SymmetricTensor<2, 3, Sacado::Fad::DFad<Sacado::Fad::DFad<double> > >) (/home/6da/dealii-clang-9/source/base/symmetric_tensor.cc)
  1686 ms: std::array<std::pair<Sacado::Rad::ADvar<double>, dealii::Tensor<1, 3, Sacado::Rad::ADvar<double> > >, 3> dealii::internal::SymmetricTensorImplementation::jacobi<3, Sacado::Rad::ADvar<double> >(dealii::SymmetricTensor<2, 3, Sacado::Rad::ADvar<double> >) (/home/6da/dealii-clang-9/source/base/symmetric_tensor.cc)
  1661 ms: std::array<std::pair<Sacado::Fad::DFad<float>, dealii::Tensor<1, 3, Sacado::Fad::DFad<float> > >, 3> dealii::internal::SymmetricTensorImplementation::jacobi<3, Sacado::Fad::DFad<float> >(dealii::SymmetricTensor<2, 3, Sacado::Fad::DFad<float> >) (/home/6da/dealii-clang-9/source/base/symmetric_tensor.cc)
  1612 ms: std::array<std::pair<Sacado::Fad::DFad<Sacado::Fad::DFad<float> >, dealii::Tensor<1, 3, Sacado::Fad::DFad<Sacado::Fad::DFad<float> > > >, 3> dealii::internal::SymmetricTensorImplementation::jacobi<3, Sacado::Fad::DFad<Sacado::Fad::DFad<float> > >(dealii::SymmetricTensor<2, 3, Sacado::Fad::DFad<Sacado::Fad::DFad<float> > >) (/home/6da/dealii-clang-9/source/base/symmetric_tensor.cc)
  1572 ms: std::array<std::pair<Sacado::Rad::ADvar<float>, dealii::Tensor<1, 3, Sacado::Rad::ADvar<float> > >, 3> dealii::internal::SymmetricTensorImplementation::jacobi<3, Sacado::Rad::ADvar<float> >(dealii::SymmetricTensor<2, 3, Sacado::Rad::ADvar<float> >) (/home/6da/dealii-clang-9/source/base/symmetric_tensor.cc)
  1412 ms: dealii::FE_Nedelec<3>::initialize_restriction() (/home/6da/dealii-clang-9/source/fe/fe_nedelec.cc)
  1394 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<3, 3, dealii::TrilinosWrappers::BlockSparsityPattern, std::complex<double> >(dealii::DoFHandler<3, 3> const&, dealii::TrilinosWrappers::BlockSparsityPattern&, dealii::AffineConstraints<std::complex<double> > const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<3, 3>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)
  1394 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<3, 3, dealii::TrilinosWrappers::BlockSparsityPattern, double>(dealii::DoFHandler<3, 3> const&, dealii::TrilinosWrappers::BlockSparsityPattern&, dealii::AffineConstraints<double> const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<3, 3>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)
  1390 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<2, 2, dealii::TrilinosWrappers::BlockSparsityPattern, double>(dealii::DoFHandler<2, 2> const&, dealii::TrilinosWrappers::BlockSparsityPattern&, dealii::AffineConstraints<double> const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<2, 2>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)
  1380 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<2, 2, dealii::TrilinosWrappers::BlockSparsityPattern, float>(dealii::DoFHandler<2, 2> const&, dealii::TrilinosWrappers::BlockSparsityPattern&, dealii::AffineConstraints<float> const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<2, 2>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)
  1369 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<3, 3, dealii::BlockDynamicSparsityPattern, std::complex<float> >(dealii::DoFHandler<3, 3> const&, dealii::BlockDynamicSparsityPattern&, dealii::AffineConstraints<std::complex<float> > const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<3, 3>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)
  1368 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<2, 2, dealii::BlockDynamicSparsityPattern, float>(dealii::DoFHandler<2, 2> const&, dealii::BlockDynamicSparsityPattern&, dealii::AffineConstraints<float> const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<2, 2>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)
  1357 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<3, 3, dealii::TrilinosWrappers::BlockSparsityPattern, float>(dealii::DoFHandler<3, 3> const&, dealii::TrilinosWrappers::BlockSparsityPattern&, dealii::AffineConstraints<float> const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<3, 3>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)
  1356 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<3, 3, dealii::BlockSparsityPattern, std::complex<float> >(dealii::DoFHandler<3, 3> const&, dealii::BlockSparsityPattern&, dealii::AffineConstraints<std::complex<float> > const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<3, 3>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)
  1355 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<3, 3, dealii::BlockSparsityPattern, std::complex<double> >(dealii::DoFHandler<3, 3> const&, dealii::BlockSparsityPattern&, dealii::AffineConstraints<std::complex<double> > const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<3, 3>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)
  1355 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<2, 2, dealii::TrilinosWrappers::BlockSparsityPattern, std::complex<double> >(dealii::DoFHandler<2, 2> const&, dealii::TrilinosWrappers::BlockSparsityPattern&, dealii::AffineConstraints<std::complex<double> > const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<2, 2>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)
  1354 ms: void dealii::internal::TriangulationImplementation::Implementation::create_triangulation<3>(std::vector<dealii::Point<3, double>, std::allocator<dealii::Point<3, double> > > const&, std::vector<dealii::CellData<3>, std::allocator<dealii::CellData<3> > > const&, dealii::SubCellData const&, dealii::Triangulation<3, 3>&) (/home/6da/dealii-clang-9/source/grid/tria.cc)
  1347 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<2, 2, dealii::BlockDynamicSparsityPattern, std::complex<float> >(dealii::DoFHandler<2, 2> const&, dealii::BlockDynamicSparsityPattern&, dealii::AffineConstraints<std::complex<float> > const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<2, 2>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)
  1345 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<3, 3, dealii::TrilinosWrappers::BlockSparsityPattern, std::complex<float> >(dealii::DoFHandler<3, 3> const&, dealii::TrilinosWrappers::BlockSparsityPattern&, dealii::AffineConstraints<std::complex<float> > const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<3, 3>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)
  1337 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<2, 2, dealii::BlockDynamicSparsityPattern, std::complex<double> >(dealii::DoFHandler<2, 2> const&, dealii::BlockDynamicSparsityPattern&, dealii::AffineConstraints<std::complex<double> > const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<2, 2>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)
  1334 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<2, 2, dealii::TrilinosWrappers::BlockSparsityPattern, std::complex<float> >(dealii::DoFHandler<2, 2> const&, dealii::TrilinosWrappers::BlockSparsityPattern&, dealii::AffineConstraints<std::complex<float> > const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<2, 2>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)
  1331 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<3, 3, dealii::BlockDynamicSparsityPattern, std::complex<double> >(dealii::DoFHandler<3, 3> const&, dealii::BlockDynamicSparsityPattern&, dealii::AffineConstraints<std::complex<double> > const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<3, 3>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)
  1323 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<3, 3, dealii::BlockDynamicSparsityPattern, float>(dealii::DoFHandler<3, 3> const&, dealii::BlockDynamicSparsityPattern&, dealii::AffineConstraints<float> const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<3, 3>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)
  1319 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<2, 2, dealii::BlockSparsityPattern, double>(dealii::DoFHandler<2, 2> const&, dealii::BlockSparsityPattern&, dealii::AffineConstraints<double> const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<2, 2>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)
  1319 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<3, 3, dealii::BlockSparsityPattern, float>(dealii::DoFHandler<3, 3> const&, dealii::BlockSparsityPattern&, dealii::AffineConstraints<float> const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<3, 3>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)
  1311 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<2, 2, dealii::BlockSparsityPattern, float>(dealii::DoFHandler<2, 2> const&, dealii::BlockSparsityPattern&, dealii::AffineConstraints<float> const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<2, 2>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)
  1309 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<2, 2, dealii::BlockSparsityPattern, std::complex<double> >(dealii::DoFHandler<2, 2> const&, dealii::BlockSparsityPattern&, dealii::AffineConstraints<std::complex<double> > const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<2, 2>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)
  1306 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<2, 2, dealii::BlockSparsityPattern, std::complex<float> >(dealii::DoFHandler<2, 2> const&, dealii::BlockSparsityPattern&, dealii::AffineConstraints<std::complex<float> > const&, bool, dealii::Table<2, dealii::DoFTools::Coupling> const&, dealii::Table<2, dealii::DoFTools::Coupling> const&, unsigned int, std::function<bool (dealii::DoFHandler<2, 2>::active_cell_iterator const&, unsigned int)> const&) (/home/6da/dealii-clang-9/source/dofs/dof_tools_sparsity.cc)

**** Function sets that took longest to compile / optimize:
119597 ms: void dealii::FEValuesViews::internal::do_function_derivatives<$>(dealii::ArrayView<$> const&, dealii::Table<$> const&, std::vector<$> const&, std::vector<$>&) (1340 times, avg 89 ms)
 88929 ms: void dealii::internal::EvaluatorTensorProduct<$>::apply<$>(dealii::VectorizedArray<$> const*, dealii::VectorizedArray<$> const*, dealii::VectorizedArray<$>*) (6810 times, avg 13 ms)
 87092 ms: dealii::internal::FEEvaluationImpl<$>::evaluate(dealii::internal::MatrixFreeFunctions::ShapeInfo<$> const&, dealii::VectorizedArray<$> const*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, bool, bool, bool) (1151 times, avg 75 ms)
 70220 ms: dealii::SolutionTransfer<$>::prepare_for_coarsening_and_refinement(std::vector<$> const&) (828 times, avg 84 ms)
 55016 ms: dealii::SolutionTransfer<$>::interpolate(std::vector<$> const&, std::vector<$>&) const (828 times, avg 66 ms)
 47895 ms: dealii::internal::FEEvaluationImpl<$>::integrate(dealii::internal::MatrixFreeFunctions::ShapeInfo<$> const&, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, bool, bool, bool) (879 times, avg 54 ms)
 40733 ms: dealii::internal::FEEvaluationImplCollocation<$>::evaluate(dealii::internal::MatrixFreeFunctions::ShapeInfo<$> const&, dealii::VectorizedArray<$> const*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, bool, bool, bool) (858 times, avg 47 ms)
 34379 ms: void dealii::FEValuesViews::internal::do_function_symmetric_gradients<$>(dealii::ArrayView<$> const&, dealii::Table<$> const&, std::vector<$> const&, std::vector<$>&) (283 times, avg 121 ms)
 33198 ms: void dealii::KellyErrorEstimator<$>::estimate<$>(dealii::Mapping<$> const&, dealii::DoFHandler<$> const&, dealii::Quadrature<$> const&, std::map<$> const&, std::vector<$> const&, std::vector<$>&, dealii::ComponentMask const&, dealii::Function<$> const*, unsigned int, unsigned int, unsigned int, dealii::KellyErrorEstimator<$>::Strategy) (276 times, avg 120 ms)
 32996 ms: void dealii::KellyErrorEstimator<$>::estimate<$>(dealii::Mapping<$> const&, dealii::hp::DoFHandler<$> const&, dealii::Quadrature<$> const&, std::map<$> const&, std::vector<$> const&, std::vector<$>&, dealii::ComponentMask const&, dealii::Function<$> const*, unsigned int, unsigned int, unsigned int, dealii::KellyErrorEstimator<$>::Strategy) (276 times, avg 119 ms)
 32678 ms: void dealii::FEValuesViews::internal::do_function_values<$>(dealii::ArrayView<$> const&, dealii::Table<$> const&, std::vector<$> const&, std::vector<$>&) (754 times, avg 43 ms)
 27871 ms: dealii::internal::FEEvaluationImplTransformToCollocation<$>::integrate(dealii::internal::MatrixFreeFunctions::ShapeInfo<$> const&, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, bool, bool, bool) (983 times, avg 28 ms)
 24053 ms: dealii::internal::FEEvaluationImplTransformToCollocation<$>::evaluate(dealii::internal::MatrixFreeFunctions::ShapeInfo<$> const&, dealii::VectorizedArray<$> const*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, bool, bool, bool) (1256 times, avg 19 ms)
 23546 ms: std::vector<$>::_M_default_append(unsigned long) (1285 times, avg 18 ms)
 22933 ms: void dealii::FEValuesViews::internal::do_function_divergences<$>(dealii::ArrayView<$> const&, dealii::Table<$> const&, std::vector<$> const&, std::vector<$>&) (704 times, avg 32 ms)
 22159 ms: dealii::internal::FEEvaluationImplCollocation<$>::integrate(dealii::internal::MatrixFreeFunctions::ShapeInfo<$> const&, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, dealii::VectorizedArray<$>*, bool, bool, bool) (603 times, avg 36 ms)
 22070 ms: std::array<$> dealii::internal::SymmetricTensorImplementation::jacobi<$>(dealii::SymmetricTensor<$>) (60 times, avg 367 ms)
 18650 ms: void dealii::FETools::interpolate<$>(dealii::DoFHandler<$> const&, dealii::Vector<$> const&, dealii::DoFHandler<$> const&, dealii::AffineConstraints<$> const&, dealii::AffineConstraints<$>&) (180 times, avg 103 ms)
 18422 ms: void tbb::interface9::internal::dynamic_grainsize_mode<$>::work_balance<$>(tbb::interface9::internal::start_for<$>&, tbb::blocked_range<$>&) (452 times, avg 40 ms)
 17747 ms: dealii::SolutionTransfer<$>::prepare_for_pure_refinement() (277 times, avg 64 ms)
 17317 ms: Teuchos::RCPNodeTmpl<$>::throw_invalid_obj_exception(std::__cxx11::basic_string<$> const&, void const*, Teuchos::RCPNode const*, void const*) const (469 times, avg 36 ms)
 16763 ms: dealii::Functions::FEFieldFunction<$>::vector_value_list(std::vector<$> const&, std::vector<$>&) const (296 times, avg 56 ms)
 16718 ms: dealii::Functions::FEFieldFunction<$>::vector_laplacian_list(std::vector<$> const&, std::vector<$>&) const (293 times, avg 57 ms)
 16225 ms: dealii::Functions::FEFieldFunction<$>::vector_gradient_list(std::vector<$> const&, std::vector<$>&) const (297 times, avg 54 ms)
 15718 ms: dealii::FE_Poly<$>::get_data(dealii::UpdateFlags, dealii::Mapping<$> const&, dealii::Quadrature<$> const&, dealii::internal::FEValuesImplementation::FiniteElementRelatedData<$>&) const (474 times, avg 33 ms)
 15643 ms: std::_Rb_tree<$>::_M_get_insert_hint_unique_pos(std::_Rb_tree_const_iterator<$>, std::pair<$> const&) (654 times, avg 23 ms)
 15198 ms: void dealii::FETools::interpolate<$>(dealii::DoFHandler<$> const&, dealii::BlockVector<$> const&, dealii::DoFHandler<$> const&, dealii::AffineConstraints<$> const&, dealii::AffineConstraints<$>&) (144 times, avg 105 ms)
 14854 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<$>(dealii::DoFHandler<$> const&, dealii::TrilinosWrappers::BlockSparsityPattern&, dealii::AffineConstraints<$> const&, bool, dealii::Table<$> const&, dealii::Table<$> const&, unsigned int, std::function<$> const&) (36 times, avg 412 ms)
 14420 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<$>(dealii::DoFHandler<$> const&, dealii::BlockDynamicSparsityPattern&, dealii::AffineConstraints<$> const&, bool, dealii::Table<$> const&, dealii::Table<$> const&, unsigned int, std::function<$> const&) (36 times, avg 400 ms)
 14383 ms: void dealii::DoFTools::internal::(anonymous namespace)::make_flux_sparsity_pattern<$>(dealii::DoFHandler<$> const&, dealii::BlockSparsityPattern&, dealii::AffineConstraints<$> const&, bool, dealii::Table<$> const&, dealii::Table<$> const&, unsigned int, std::function<$> const&) (36 times, avg 399 ms)

*** Expensive headers:
1709357 ms: include/deal.II/base/config.h (included 792 times, avg 2158 ms), included via:
  grid_tools.cc mpi.h  (3476 ms)
  tensor_function_parser.cc.json mu_parser_internal.h  (3125 ms)
  symengine_number_types.cc.json  (3092 ms)
  immersed_surface_quadrature.cc.json immersed_surface_quadrature.h  (2948 ms)
  event.cc.json event.h  (2939 ms)
  error_estimator.cc.json error_estimator.templates.h  (2842 ms)
  ...

1697670 ms: /home/6da/dealii-clang-9/include/deal.II/base/numbers.h (included 792 times, avg 2143 ms), included via:
  grid_tools.cc mpi.h config.h  (3433 ms)
  tensor_function_parser.cc.json mu_parser_internal.h config.h  (3080 ms)
  symengine_number_types.cc.json config.h  (3049 ms)
  event.cc.json event.h config.h  (2915 ms)
  immersed_surface_quadrature.cc.json immersed_surface_quadrature.h config.h  (2899 ms)
  subscriptor.cc.json logstream.h config.h  (2817 ms)
  ...

767906 ms: /home/6da/dealii-clang-9/include/deal.II/grid/tria.h (included 462 times, avg 1662 ms), included via:
  step-49.cc.json  (5591 ms)
  step-2.cc.json  (5417 ms)
  step-3.cc.json  (5398 ms)
  step-1.cc.json  (5377 ms)
  step-53.cc.json  (5326 ms)
  step-4.cc.json  (5299 ms)
  ...

714911 ms: /home/6da/dealii-clang-9/include/deal.II/base/tensor.h (included 700 times, avg 1021 ms), included via:
  scalar_polynomials_base.cc.json quadrature_lib.h quadrature.h point.h  (2214 ms)
  polynomials_rt_bubbles.cc.json polynomials_rt_bubbles.h point.h  (2038 ms)
  number_cache.cc.json mpi.h array_view.h symmetric_tensor.h  (2035 ms)
  polynomials_raviart_thomas.cc.json polynomials_raviart_thomas.h point.h  (1892 ms)
  coupling.cc.json point.h  (1831 ms)
  standard_tensors.cc.json symmetric_tensor.h  (1754 ms)
  ...

668046 ms: /home/6da/dealii-clang-9/include/deal.II/base/utilities.h (included 714 times, avg 935 ms), included via:
  grid_refinement.cc.json  (3685 ms)
  step-26.cc.json  (3184 ms)
  sparsity_pattern.cc.json  (3167 ms)
  fe_values_extractors.cc.json  (3160 ms)
  sparsity_pattern.cc.json  (3132 ms)
  fe_values_extractors.cc.json  (3106 ms)
  ...

537837 ms: /home/6da/dealii-clang-9/include/deal.II/base/point.h (included 588 times, avg 914 ms), included via:
  grid_tools_nontemplates.cc.json  (3490 ms)
  grid_tools_nontemplates.cc.json  (3459 ms)
  scalar_polynomials_base.cc.json quadrature_lib.h quadrature.h  (2297 ms)
  polynomials_rt_bubbles.cc.json polynomials_rt_bubbles.h  (2059 ms)
  polynomials_raviart_thomas.cc.json polynomials_raviart_thomas.h  (1912 ms)
  coupling.cc.json  (1859 ms)
  ...

412404 ms: /home/6da/dealii-clang-9/include/deal.II/distributed/tria_base.h (included 408 times, avg 1010 ms), included via:
  grid_generator.cc.json fully_distributed_tria.h  (3181 ms)
  intergrid_map.cc.json shared_tria.h  (3165 ms)
  cell_weights.cc.json cell_weights.h  (3163 ms)
  intergrid_map.cc.json shared_tria.h  (3142 ms)
  grid_generator.cc.json fully_distributed_tria.h  (3113 ms)
  cell_weights.cc.json cell_weights.h  (3089 ms)
  ...

371106 ms: /home/6da/dealii-clang-9/include/deal.II/base/quadrature_lib.h (included 378 times, avg 981 ms), included via:
  scalar_polynomials_base.cc.json  (4991 ms)
  fe_q.cc.json  (4097 ms)
  dof_tools.cc.json  (4074 ms)
  data_out_faces.cc.json  (3921 ms)
  step-12b.cc.json  (3759 ms)
  step-11.cc.json  (3692 ms)
  ...

348634 ms: /home/6da/dealii-clang-9/include/deal.II/dofs/dof_handler.h (included 386 times, avg 903 ms), included via:
  dof_handler.cc.json dof_handler.h  (5538 ms)
  dof_handler.cc.json dof_handler.h  (5428 ms)
  fe_field_function.cc.json dof_accessor.h  (3414 ms)
  dof_accessor_get.cc.json dof_accessor.h  (3390 ms)
  dof_objects.cc.json  (3389 ms)
  dof_accessor_set.cc.json dof_accessor.h  (3371 ms)
  ...

306464 ms: /home/6da/dealii-clang-9/include/deal.II/dofs/dof_accessor.h (included 386 times, avg 793 ms), included via:
  fe_field_function.cc.json  (5717 ms)
  dof_accessor_set.cc.json  (5663 ms)
  dof_accessor.cc.json  (5629 ms)
  mapping_q1_eulerian.cc.json  (5606 ms)
  dof_accessor_get.cc.json  (5588 ms)
  fe_field_function.cc.json  (5582 ms)
  ...

  done in 19.2s.

@drwells
Copy link
Member

drwells commented Mar 25, 2023

This data was useful three years ago but things have changed enough (e.g., no more hp::DoFHandler) that I think we can close this now.

@drwells drwells closed this as completed Mar 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants