Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Part 1 of the big GPU merge #45

Merged
merged 61 commits into from
Nov 14, 2016
Merged

Part 1 of the big GPU merge #45

merged 61 commits into from
Nov 14, 2016

Conversation

bcumming
Copy link
Member

@bcumming bcumming commented Oct 27, 2016

This PR is part of the gpu feature merge. The GPU implementation is not implemented here. Instead, we focus on refactoring of the original "multicore" back end so that it is ready for adding the GPU back end.

This is a big and messy change, for which I am sorry.

build System

  • A WITH_CUDA option has been added to the main CMakeLists. This finds the CUDA toolkit, and sets CUDA compiler flags, and will build unit tests for the gpu back end.
  • The CMakeLists that generates mechanisms with modcc has been updated to generate CUDA mechanisms.
  • the library is now named libnestmc instead of libcellalgo
  • merge the external libraries that are optionally linked againts (tbb, libunwind, etc) into a single
    EXTERNAL_LIBRARIES list for ease of linking

modcc

  • the cprinter and cudaprinter have had small changes to generate mechanism files that are compatible with the refactored library.

algorithms

  • the indexes into algorithm was "rangified". An algorithm index_into_iterator takes two ranges as inputs to make a range that lazily generates the index of sub into super set.

backends

  • made a new path src/backends/ for backend specific type and implementation code.
  • currently:
    • complete support for the multicore and gpu backends
    • gpu back end is not optimized or validated
  • the back end implementations are in src/backends
  • a single backend class, nest::mc::{multicore,gpu}::backend, is provides all backend specific type and implementation details fro each backend
    • storage containers
    • Hines matrix assembly for FVM method
    • Hines matrix solver
    • mechanism "factory"

lowered fvm cells

  • removed fvm_cell because this can be modelled with an fvm_multicell with one cell.
  • refactored to use backend type and implementation from fvm_policy
  • use std::vector instead of containers in nest::mc::memory:: where possible when building cells.

memory library

Refactor the "memory" library, making it much simpler and better integrated into the rest of the application. However, it is still far from perfect. The Coordinator approach needs to be improved, most likely by putting target-specific wisdom into pointers (which could obviate the need for a const_array_view type.

  1. renaming and moving

    • move from vector/ to src/memory
    • move into the nest::mc namespace, i.e. all types and functions are now in nest::mc::memory
    • change from camel case nameing scheme to NestMC style naming.
  2. simplification

    • remove the CRTP cruft that was used to make operator overloading work for operations like copying from one range into another, and filling a range with a constant value. These have been replaced with memory::fill() and memory::copy() helper functions. This simplified the code a lot, and makes code clearer in user land.
    // before
    vec(0, 5) = other;
    // now
    memory::copy(other, vec(0, 5));
    
    • add some wrappers in src/memory/wrappers.hpp that help with making views. These are particularly useful for passing std::vector through interfaces that expect a view.

debug backtraces

Added stack traces for debugging.

  • support for OSX and Linux via libunwind
  • backtraces can be generated manually nest::mc::util::backtrace().print()
    • creates a new file and dumps trace into file
    • prints message to stderr with file name and instructions on how to analyse
  • backtraces are also automatically generated when an assertion EXPECTS statement fails
  • a python script in scripts/print_backtrace pretty prints the output with file name, line number and demangled symbols

util simplification and consolidation

The src/util.hpp file was removed

  • much of its contents were dead code and just removed
  • useful components like pprintf and make_unique were moved into the src/utils path in standalone files
    There was a lot of overlap between functionality provided in src/memory/util.hpp and existing functions/types in the nest::mc::util namespace. The memory implementations were removed, and their nest::mc::util counterparts used. There is still some work remaining, namely moving the rest of the src/memory/util.hpp into src/util/...

bcumming and others added 30 commits September 8, 2016 16:10
* add CMake option and target for CUDA
* add some workarounds for CUDA compiler bugs
* created a gpu multicell target type
  * missing CUDA specific kernels
  * basic cell state (voltage, current, etc) is on gpu
* moved the vector library into main source tree
  * have not yet deleted external dependency
* more simplifications of the memory library
* clean up of policy based matrix gpu/multicore implementation
snapshot of code before attempting to merge the multicore and gpu
implementation of the FVM into a single code base.
The gpu version of the miniapp doesn't compile, but fixing this is
better done as part of merging.

* fixed bug in multicore fvm_multicell where solutions wasn't copied
  from rhs to voltage vector after linear system solution
* add check for ion channel CV indexes to unit tests
    - this fails due to CV bug that is in issue 20
* small fixes towards gpu support
halfflat and others added 7 commits October 28, 2016 11:09
* Add documentation of template parameters for `filter_iterator`.
* Document use of `uninitalized<F>` for holding functional objects
  in `filter_iterator` and `transform_iterator`
More range functionality, unit tests.
* Simplify trace analysis and reporting code in
  `trace_analysis.hpp`
* Consolidate convergence test run procedures into
  new class `convergence_test_runner`.
* Make `algorithm::sum`, `algorithm::mean` more generic,
  allowing use with array types.
* Add `div_compartment` compartment representation, that
  holds geometric information for each half of a compartment
  that will then be used in calculating control volumes.
* Add three compartmentalisation schemes/policies that
  discretize a segment into `div_compartment` objects:
    * `div_compartment_by_ends` divides based only on the
      segment end points and radii.
    * `div_compartment_sampler` forms frusta by sampling
      the segment radius at each compartment boundary
    * `div_compartment_integrator` computes the compartment
      areas and volumes exactly by summing all frustra
      in the intersection of the segment and the compartmnet
      span.
…ests

Consolidate validation tests (issue #41)
if(WITH_CUDA)
find_package(CUDA REQUIRED)
# the vector library has a compiled component when using the CUDA backend
include(ExternalProject)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer need ExternalProject here I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

target_link_libraries(miniapp.exe LINK_PUBLIC cellalgo)
target_link_libraries(miniapp.exe LINK_PUBLIC ${TBB_LIBRARIES})
if(WITH_TBB)
target_link_libraries(miniapp.exe ${TBB_LIBRARIES})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a LINK_PUBLIC in here and on line 16 in order to avoid CMP0023 nonsense.

text_.add_line("data_ = vector_type(field_size * num_fields);");
text_.add_line("data_(memory::all) = std::numeric_limits<value_type>::quiet_NaN();");
text_.add_line("data_ = array(field_size * num_fields);");
text_.add_line("memory::fill(data_, std::numeric_limits<value_type>::quiet_NaN());");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth considering an STL-like constructor for array which can take a fill value? c.f. std::vector<T>(size_type count, const T& value = T())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The library already has this functionality. I don't know why I have done things this way, probably just sloppiness.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it was sloppiness. I have fixed it by using just one line:

text_.add_line("data_ = array(field_size*num_fields, std::numeric_limits<value_type>::quiet_NaN());");

using sub_iterator = typename R2::const_iterator;

mutable super_iterator super_it_;
const super_iterator super_end_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, the super range may be delimited by a sentinel of different type to super_iterator. Rather than being parameterized on the two sequence types R1 and R2, I think it would be cleaner to parameterize it on the sub iterator type, the super iterator type, and the super sentinel type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

  • index_into_iterator is now templated on the 3 iterator types
  • updated unit test to check that arrays work specifically

public std::iterator<std::forward_iterator_tag, typename R1::value_type>
{
public:
using value_type = typename R1::value_type;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As dereference returns an index (offset into super range), it should be e.g. typename std::iterator_traits<super_iterator>::difference_type or typename util::sequence_traits<R1>::size_type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean

typename util::sequence_traits<R1>::difference_type

instead of size_type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the class is ultimately going to be defined over iterator types rather than range types, then iterator_traits<super_iterator>::difference_type will be the way to go. Analogous to indices for e.g. std::string, an unsigned type might be preferable.

} // namespace memory
} // namespace mc
} // namespace nest

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm basically trusting these are just lightly renamed and moved versions of the original Vector code.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, they are.

bytes, cudaMemcpyDeviceToHost
);
if(status != cudaSuccess) {
LOG_ERROR("cudaMemcpy(d2h, " + std::to_string(bytes) + ") " + cudaGetErrorString(status));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've now got multiple error, verbosity macros in use under src/; these should be consolidated in one place with consistent usage across the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree! Let's put it in the code cleanup bucket.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting just to test slack trigger.

@@ -12,7 +12,8 @@ class spike_detector {
public:
using cell_type = Cell;

spike_detector(const cell_type& cell, typename Cell::detector_handle h, double thresh, float t_init) :
spike_detector(
const cell_type& cell, typename Cell::detector_handle h, double thresh, float t_init) :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameters line up with initializers, without any intervening punctuation. I think it's a confusing compromise. We can nut this out on our coding guidelines bike shed channel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to the following... not perfect either, but at least it is clear to the reader what is going on (IMO)

      spike_detector(
~         const cell_type& cell,
+         typename Cell::detector_handle h,
+         double thresh,
+         float t_init
+     ):
          handle_(h),
          threshold_(thresh)
      {   
          reset(cell, t_init);
      } 

o << value << " ";
}
return o << "}";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to have a pretty printer wrapper for this sort of thing, as the one formatter for all ranges is a bit of a Procrustean bed. Use would be nearly as simple though, e.g.

    std::cout << range_printer(some_range) << ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. That Procrustes sounds like quite an unsavoury character, so I am sorry if my one-size-fits-all helper function caused harm to any travellers crossing our code base.

for(auto val : v) if(val>=0.) return false;
return true;
};
EXPECT_TRUE(is_neg(J.l()(1, J.size())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting of test lambdas. (Consider using std::any_of/std::all_of?)

halfflat and others added 5 commits October 30, 2016 22:31
* Use divided compartments to determine FVM coefficients.
* Pick correct control volume in FVM from sgement position (avoids
  off-by-half error.)
* Add colour override functionality to tsplot: `--colour` option.
* Add const accessor for cell soma.
* Source formatting, comments in `math.hpp`
* Fix `range_view`: was using incorrectly named type trait.
* Add unit test for `range_view`.
* Allow points of discontinuity to be omitted from L-infinity norm
  calculations.
* Add `-d, --min-dt` option to `validate.exe` to control time
  step in validation convergence tests.
* Add validation test: confirm divided compartment policy does
  not effect results on simple frustrum dendrites.
* Change default max compartments on validation tests to 100
  (ad hoc observed convergence limit at dt circa 0.001 ms;
  finder spatial division would required much finer dt.)
* Make NEURON validation data generation scripts use CVODE by
  default, and with `secondorder=2` when non-zero `dt` is given.
* Use only `div_compartment_integrator` for compartmentalization in
  `fvm_multicell`. The policy will later be moved to a backend
  policy class.
* For now, disable validation tests that test different division
  policies (see above).
* Tweak comments and remove redundant `using`, following comments
  on PR#54.
* first version of openmp threading back end

* adding openmp parallel sort implementation

* OpenMP sort working

* Support for units syntax within state block.

* Add soma-less cable cell to test cells.

Also:
* Ensure intrinsic and passive properties properly set on test cells.

* Change bulk resistivity default.

* Align defaults with values used in most of the NEURON
  validation scripts.
* Use consistent 100 Ω·m bulk resistivity across both
  NEURON test models and basic validation cells.

* OpenMP back end working

* Add Extrae+paraver support, needs to fix compilation warnings

* Reorganize validation data generation

* Move generation and data to top-level validation directory.
* Make BUILD_VALIDATION_DATA and VALIDATION_DATA_DIR cache vars.
* Add helper CMake functions for data generation.

Note `validation/ref/numeric/foo.sh` is just a placeholder.

* Bugfix: hh_soma.jl

* Use consistent scaling for y[1] scalar voltage in hh_soma.jl
* Also: add more reserved target names to CMakeLists.txt
  helper function.

* Refactor convergence tests; add numeric soma ref.

* Amend data directory path in validation tests.
* Enmodulate `hh_soma.jl`
* Add HH channel reference data generations script.
* Switch `validate_soma.cpp` to numeric reference data.
* Consolidate common code in `validate_ball_and_stick.cpp`
* Add (nearly) Rallpack1 validation test (see below).
* Gentle failure on absence of reference data in
  `validate_ball_and_stick.cpp`

Can't yet override mechanism default parameter values,
so the cable cell model added to `test_common_cells.hpp`
lets the default stand; validation script will have
to use the default membrane conductance rather than that
given by Rallpack1.

* Add Rallpack1 validation, plus bugfix, clean

* Implement Rallpack1 validation test (with a workaround
  for inability to set membrane conductance).
* Fix bug in L≠1 case in PassiveCable.jl (this may still be
  wrong).
* Fix bug in peak delta computation in trace analysis when
  both traces have no local maxima.
* Gentle failure on missing `numeric_soma.json`
* Allow multiple `-s` selection operations for `tsplot`,
  acting disjunctively.

* Remove errant test file.

* file's cleanup

* Remove tabs

* Use correct routine in numeric_rallpack1.jl x0.3

* Configure-time test for julia

* `math::infinity<>()` wrapper for infinity

* Use name `i_e` for Stim current density

* Use `math::infinity<>()` for infinite value

* Adds unit tests for the STATE block.

* Add "lib" to search prefixes for libtbb

* Fix quoting error in library search.
* Add "lib" to prefixes when system is "Linux".

* Address deprecated use of 'symbol' warning.

Julia 0.5 deprecates use of `symbol` instead of
`Symbol`. This patch just substitutes the
correct call.

* Address deprecated use of 'symbol' warning.

Julia 0.5 deprecates use of `symbol` instead of
`Symbol`. This patch just substitutes the
correct call.

* Addresses PR comments.

* Unit tests for math.hpp

* Tests for `math::pi`, `math::lerp`, `math::area_frustrum`
  and `math::volume_frustrum`
* Fix `math:pi<long double>()`.

* Extend range, view functionality.

* New `filter` view: lazily selects based on predicate.
* Generic `front` and `back` for sequences.
* New rangeutil STL wrappers `stable_sort_by`, `all_of`, `any_of`.
* Consolidate common utility unit testing structures into
  `tests/unit/common.hpp`

* Add `ball_and_squiggle` model; fix `ball_and_taper`.

* Make `test_common_cells.hpp` and `ball_and_taper.py` agree.
* Add `ball_and_squiggle` model that has a tapering undulating
  profile.

* Address PR#46 review comments.

* Add documentation of template parameters for `filter_iterator`.
* Document use of `uninitalized<F>` for holding functional objects
  in `filter_iterator` and `transform_iterator`

* Consolidate validation test code (issue #41)

* Simplify trace analysis and reporting code in
  `trace_analysis.hpp`
* Consolidate convergence test run procedures into
  new class `convergence_test_runner`.

* New compartment info structure for FVM.

* Make `algorithm::sum`, `algorithm::mean` more generic,
  allowing use with array types.
* Add `div_compartment` compartment representation, that
  holds geometric information for each half of a compartment
  that will then be used in calculating control volumes.
* Add three compartmentalisation schemes/policies that
  discretize a segment into `div_compartment` objects:
    * `div_compartment_by_ends` divides based only on the
      segment end points and radii.
    * `div_compartment_sampler` forms frusta by sampling
      the segment radius at each compartment boundary
    * `div_compartment_integrator` computes the compartment
      areas and volumes exactly by summing all frustra
      in the intersection of the segment and the compartmnet
      span.

* Extrae linked at execution time

* cleaning project

* Complex compartments

* Use divided compartments to determine FVM coefficients.
* Pick correct control volume in FVM from sgement position (avoids
  off-by-half error.)
* Add colour override functionality to tsplot: `--colour` option.
* Add const accessor for cell soma.
* Source formatting, comments in `math.hpp`
* Fix `range_view`: was using incorrectly named type trait.
* Add unit test for `range_view`.
* Allow points of discontinuity to be omitted from L-infinity norm
  calculations.
* Add `-d, --min-dt` option to `validate.exe` to control time
  step in validation convergence tests.
* Add validation test: confirm divided compartment policy does
  not effect results on simple frustrum dendrites.
* Change default max compartments on validation tests to 100
  (ad hoc observed convergence limit at dt circa 0.001 ms;
  finder spatial division would required much finer dt.)
* Make NEURON validation data generation scripts use CVODE by
  default, and with `secondorder=2` when non-zero `dt` is given.

* Remove division policy type parameter.

* Use only `div_compartment_integrator` for compartmentalization in
  `fvm_multicell`. The policy will later be moved to a backend
  policy class.
* For now, disable validation tests that test different division
  policies (see above).
* Tweak comments and remove redundant `using`, following comments
  on PR#54.

* Minor twicks and corrections
@bcumming bcumming mentioned this pull request Nov 4, 2016
* streamline CMake
  * list of external libaries to link against CMake is concatenated in
    `EXTERNAL_LIBRARIES` variable
  * UNWIND_STACK is enabled by default if the libunwind package can be found
* stack trace printer
  * optionally prints with color
  * takes the executable to perform address translation as command line argument
* added documentation to c++, in particular some comments in fvm_multicell.hpp to
  explain some steps in more detail.
Copy link
Contributor

@halfflat halfflat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor spleling fixes really, plus I'd like to see the libunwind.h include isolated.
Perhaps for another commit later, but I still believe that the indexed_view class should either be properly generic, or live in memory:: if it's tied to these storage classes.

@@ -80,6 +85,7 @@ if(WITH_CUDA)

add_definitions(-DWITH_GPU)
include_directories(SYSTEM ${CUDA_INCLUDE_DIRS})
set(EXTERNAL_LIBRARIES ${EXTERNAL_LIBRARIES} ${CUDA_LIBRARIES})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach. Might be cleaner to use

list(APPEND EXTERNAL_LIBRARIES ${CUDA_LIBRARIES})

(and likewise above) instead of set.

@@ -18,21 +18,16 @@ else()
add_executable(miniapp.exe ${MINIAPP_SOURCES} ${HEADERS})
endif()

target_link_libraries(miniapp.exe LINK_PUBLIC nestmc)
set(aaa nestmc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this debugging CMake code that snuck in?

@@ -23,27 +54,36 @@ def parse_backtrace(source):
tokens = line.split()
trace.append({'location':tokens[0], 'function':tokens[1]})
else:
print "error: unable to open file ", source
print "error: unable to back trace file ", source
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing verb in error text?

@@ -401,6 +400,15 @@ void fvm_multicell<Backend>::initialize(
std::vector<value_type> tmp_cv_areas(ncomp);
std::vector<value_type> tmp_cv_capacitance(ncomp);

// Iterate over the input cells and build the indexes etc that descrbe the
// fused cell group. On completion:
// - group_paranet_index contains the full parent index for the fused cells.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spellling eror

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self paranet is not a word

@@ -51,8 +53,8 @@ std::string demangle(std::string s) {
int status;
char* demangled = abi::__cxa_demangle(s.c_str(), nullptr, nullptr, &status);

// __cxa_demangle only returns a valid string if the identifier in s was
// a mangled c++ symbol (i.e. returns an empty string for normal c symbols)
// the string returned by __cxa_demangle is on valid if it was passed a valid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mroe speeling

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like your tone.

@@ -14,28 +9,31 @@ namespace mc {
namespace util {

#ifdef WITH_UNWIND
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the only reason libunwind.h needs to be included here is to get the definition of unw_word_t, which itself is an integral type, could we keep the #ifdefs and such inside unwind.cpp, hidden from the rest of the code, and use e.g.

struct source_location {
    std::string name;
    std::uintptr_t address;
};

which does not depend upon the architecture-specific int size used for unw_word_t?

This has the nice side effect of keeping the global namespace free of the libunwind-specific enums and typedefs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. I was thinking about this as I was implementing this. I have assumed that it is a std::uint64_t, which is how it is defined in libunwind, with a static_assert in unwind.cpp, that will break the build if that assumption is every broken.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use uintptr_t?

@halfflat halfflat merged commit 1bc18ea into arbor-sim:gpu Nov 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants