Skip to content

Commit

Permalink
Merge pull request #180 from ComputationalRadiationPhysics/release-2.…
Browse files Browse the repository at this point in the history
…4.0crp

Release 2.4.0crp
  • Loading branch information
sbastrakov committed May 28, 2020
2 parents 4fc3c52 + 0126a7f commit dce5c92
Show file tree
Hide file tree
Showing 50 changed files with 3,107 additions and 2,768 deletions.
77 changes: 77 additions & 0 deletions .clang-format
@@ -0,0 +1,77 @@
---
AccessModifierOffset: -4
AlignAfterOpenBracket: AlwaysBreak
AlignConsecutiveAssignments: false
AlignConsecutiveDeclarations: false
AlignEscapedNewlines: DontAlign
AlignOperands: false
AlignTrailingComments: false
AllowAllParametersOfDeclarationOnNextLine: false
AllowShortBlocksOnASingleLine: false
AllowShortCaseLabelsOnASingleLine: false
AllowShortFunctionsOnASingleLine: Empty
AllowShortIfStatementsOnASingleLine: false
AllowShortLoopsOnASingleLine: true
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: true
AlwaysBreakTemplateDeclarations: Yes
BinPackArguments: false
BinPackParameters: false
BreakBeforeBraces: Custom
BraceWrapping:
AfterClass: true
AfterControlStatement: true
AfterEnum: true
AfterFunction: true
AfterNamespace: true
AfterStruct: true
AfterUnion: true
AfterExternBlock: true
BeforeCatch: true
BeforeElse: true
IndentBraces: false
SplitEmptyFunction: false
SplitEmptyRecord: false
SplitEmptyNamespace: false
BreakBeforeBinaryOperators: All
BreakBeforeTernaryOperators: true
BreakConstructorInitializers: AfterColon
BreakInheritanceList: AfterColon
BreakStringLiterals: true
ColumnLimit: 80
CompactNamespaces: false
ConstructorInitializerAllOnOneLineOrOnePerLine: true
ConstructorInitializerIndentWidth: 8
ContinuationIndentWidth: 4
Cpp11BracedListStyle: true
DerivePointerAlignment: false
FixNamespaceComments: false
IncludeBlocks: Regroup
IndentCaseLabels: false
IndentPPDirectives: None
IndentWidth: 4
IndentWrappedFunctionNames: false
KeepEmptyLinesAtTheStartOfBlocks: false
Language: Cpp
NamespaceIndentation: All
PointerAlignment: Middle
ReflowComments: true
SortIncludes: true
SortUsingDeclarations: true
SpaceAfterCStyleCast: false
SpaceAfterTemplateKeyword: false
SpaceBeforeAssignmentOperators: true
SpaceBeforeCpp11BracedList: false
SpaceBeforeCtorInitializerColon: true
SpaceBeforeInheritanceColon: true
SpaceBeforeParens: Never
SpaceBeforeRangeBasedForLoopColon: true
SpaceInEmptyParentheses: false
SpacesInAngles: false
SpacesInCStyleCastParentheses: false
SpacesInContainerLiterals: false
SpacesInParentheses: false
SpacesInSquareBrackets: false
Standard: Cpp11
UseTab: Never
...
3 changes: 3 additions & 0 deletions .clang-tidy
@@ -0,0 +1,3 @@
---
Checks: '*,-llvm-header-guard,-fuchsia-default-arguments-declarations,-cppcoreguidelines-no-malloc,-cppcoreguidelines-owning-memory,-misc-non-private-member-variables-in-classes'
HeaderFilterRegex: '.*'
2 changes: 2 additions & 0 deletions .gitignore
Expand Up @@ -14,3 +14,5 @@

*~
/nbproject
/.vs
/build
11 changes: 6 additions & 5 deletions .travis.yml
Expand Up @@ -2,7 +2,7 @@ language: cpp

sudo: required

dist: trusty
dist: bionic

compiler:
- gcc
Expand All @@ -14,6 +14,7 @@ env:

script:
- mkdir build_tmp && cd build_tmp
- CXX=g++-5 && CC=gcc-5
- cmake -DCMAKE_INSTALL_PREFIX=$INSTALL_DIR $TRAVIS_BUILD_DIR
- make
- make install
Expand All @@ -28,12 +29,12 @@ before_script:
- sudo apt-get install -f -qq
- sudo dpkg --get-selections | grep hold || { echo "All packages OK."; }
- sudo apt-get install -q -y cmake-data cmake
- sudo apt-get install -qq build-essential
- gcc --version && g++ --version # 4.8
- sudo apt-get install -qq build-essential g++-5
- gcc-5 --version && g++-5 --version # 5.5.0
- apt-cache search nvidia-*
- sudo apt-get install -qq nvidia-common
- sudo apt-get install -qq nvidia-cuda-dev nvidia-cuda-toolkit # 5.5
- sudo apt-get install -qq libboost-dev # 1.54.0
- sudo apt-get install -qq nvidia-cuda-dev nvidia-cuda-toolkit # 9.1.85
- sudo apt-get install -qq libboost-dev # 1.65.1
- sudo find /usr/ -name libcuda*.so

after_script:
Expand Down
21 changes: 21 additions & 0 deletions CHANGELOG.md
@@ -1,6 +1,27 @@
Change Log / Release Log for mallocMC
================================================================

2.4.0crp
--------
**Date:** 2020-05-28

This release removes the Boost dependency and switched to C++11.

### Changes to mallocMC 2.3.1crp

**Features**
- Cleaning, remove Boost dependency & C++11 Migration #169

**Bug fixes**
- Choose the value for the -arch nvcc flag depending on CUDA version #164 #165

**Misc:**
- Travis CI: GCC 5.5.0 + CUDA 9.1.85 #170
- Adding headers to projects and applied clang-tidy #171
- clang-format #172

Thanks to Sergei Bastrakov, Bernhard Manfred Gruber and Axel Huebl for contributing to this release!

2.3.1crp
--------
**Date:** 2019-02-14
Expand Down
87 changes: 28 additions & 59 deletions CMakeLists.txt
@@ -1,7 +1,12 @@
project(mallocMC)
cmake_minimum_required(VERSION 2.8.12.2)
project(mallocMC LANGUAGES CUDA CXX)
cmake_minimum_required(VERSION 3.8)

set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

# helper for libs and packages
set(CMAKE_CUDA_STANDARD 11)
set(CMAKE_CUDA_STANDARD_REQUIRED ON)
set(CMAKE_PREFIX_PATH "/usr/lib/x86_64-linux-gnu/"
"$ENV{CUDA_ROOT}" "$ENV{BOOST_ROOT}")

Expand All @@ -14,64 +19,37 @@ set(CMAKE_PREFIX_PATH "/usr/lib/x86_64-linux-gnu/"
################################################################################

if(POLICY CMP0074)
cmake_policy(SET CMP0074 NEW)
cmake_policy(SET CMP0074 NEW)
endif()


###############################################################################
# CUDA
###############################################################################
find_package(CUDA REQUIRED)
set(CUDA_NVCC_FLAGS "-arch=sm_20;-use_fast_math;")
set(CUDA_INCLUDE_DIRS ${CMAKE_CURRENT_SOURCE_DIR})
include_directories(${CUDA_INCLUDE_DIRS})
cuda_include_directories(${CUDA_INCLUDE_DIRS})
if(NOT DEFINED COMPUTE_CAPABILITY)
set(COMPUTE_CAPABILITY "30")
endif()
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -arch=sm_${COMPUTE_CAPABILITY} -use_fast_math")

OPTION(CUDA_OUTPUT_INTERMEDIATE_CODE "Output ptx code" OFF)
if(CUDA_OUTPUT_INTERMEDIATE_CODE)
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS};-Xptxas;-v;--keep")
endif(CUDA_OUTPUT_INTERMEDIATE_CODE)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xptxas -v --keep")
endif()

SET(CUDA_OPTIMIZATION_TYPE "unset" CACHE STRING "CUDA Optimization")
set_property(CACHE CUDA_OPTIMIZATION_TYPE PROPERTY STRINGS "unset;-G0;-O0;-O1;-O2;-O3")
if(NOT ${CUDA_OPTIMIZATION_TYPE} STREQUAL "unset")
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS};${CUDA_OPTIMIZATION_TYPE}")
if(NOT ${CUDA_OPTIMIZATION_TYPE} STREQUAL "unset")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} ${CUDA_OPTIMIZATION_TYPE}")
endif()


###############################################################################
# Boost
###############################################################################
find_package(Boost 1.48.0 REQUIRED)
include_directories(SYSTEM ${Boost_INCLUDE_DIRS})
set(LIBS ${LIBS} ${Boost_LIBRARIES})

# nvcc + boost 1.55 work around
if(Boost_VERSION EQUAL 105500)
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} \"-DBOOST_NOINLINE=__attribute__((noinline))\" ")
endif(Boost_VERSION EQUAL 105500)


################################################################################
# Warnings
################################################################################
# GNU
if(CMAKE_COMPILER_IS_GNUCXX)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wshadow")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unknown-pragmas")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wextra")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-parameter")
# new warning in gcc 4.8 (flag ignored in previous version)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-local-typedefs")
# ICC
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wshadow -Wno-unknown-pragmas -Wextra -Wno-unused-parameter -Wno-unused-local-typedefs")
elseif("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Intel")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wshadow")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DBOOST_NO_VARIADIC_TEMPLATES")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DBOOST_NO_CXX11_VARIADIC_TEMPLATES")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DBOOST_NO_FENV_H")
# PGI
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wshadow")
elseif("${CMAKE_CXX_COMPILER_ID}" STREQUAL "PGI")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Minform=inform")
endif()
Expand All @@ -87,28 +65,19 @@ INSTALL(
DESTINATION include
PATTERN ".git" EXCLUDE
PATTERN "mallocMC_config.hpp" EXCLUDE
)
)


###############################################################################
# Executables
###############################################################################
file(GLOB_RECURSE headers src/include/**)
add_custom_target(mallocMC SOURCES ${headers}) # create a target with the header files for IDE projects
source_group(TREE ${CMAKE_CURRENT_LIST_DIR}/src/include FILES ${headers})

include_directories(${CMAKE_CURRENT_LIST_DIR}/src/include)
add_executable(mallocMC_Example01 EXCLUDE_FROM_ALL examples/mallocMC_example01.cu examples/mallocMC_example01_config.hpp)
add_executable(mallocMC_Example02 EXCLUDE_FROM_ALL examples/mallocMC_example02.cu)
add_executable(mallocMC_Example03 EXCLUDE_FROM_ALL examples/mallocMC_example03.cu)
add_executable(VerifyHeap EXCLUDE_FROM_ALL tests/verify_heap.cu tests/verify_heap_config.hpp)
add_custom_target(examples DEPENDS mallocMC_Example01 mallocMC_Example02 mallocMC_Example03 VerifyHeap)

cuda_add_executable(mallocMC_Example01
EXCLUDE_FROM_ALL
examples/mallocMC_example01.cu )
cuda_add_executable(mallocMC_Example02
EXCLUDE_FROM_ALL
examples/mallocMC_example02.cu )
cuda_add_executable(mallocMC_Example03
EXCLUDE_FROM_ALL
examples/mallocMC_example03.cu )
cuda_add_executable(VerifyHeap
EXCLUDE_FROM_ALL
tests/verify_heap.cu )

target_link_libraries(mallocMC_Example01 ${LIBS})
target_link_libraries(mallocMC_Example02 ${LIBS})
target_link_libraries(mallocMC_Example03 ${LIBS})
target_link_libraries(VerifyHeap ${LIBS})
15 changes: 15 additions & 0 deletions CONTRIBUTING.md
@@ -0,0 +1,15 @@
# Contributing

## Formatting

Please format your code before before opening pull requests using clang-format and the .clang-format file placed in the repository root.

### Visual Studio and CLion
Suport for clang-format is built-in since Visual Studio 2017 15.7 and CLion 2019.1.
The .clang-format file in the repository will be automatically detected and formatting is done as you type, or triggered when pressing the format hotkey.

### Bash
First install clang-format. Instructions therefore can be found on the web. To format you can run this command in bash:
```
find -iname *.cu -o -iname *.hpp | xargs clang-format-10 -i
```
5 changes: 5 additions & 0 deletions README.md
Expand Up @@ -22,6 +22,11 @@ mallocMC is header-only, but requires a few other C++ libraries to be
available. Our installation notes can be found in [INSTALL.md](INSTALL.md).


Contributing
------------

Rules for contributions are found in [CONTRIBUTING.md](CONTRIBUTING.md).

On the ScatterAlloc Algorithm
-----------------------------

Expand Down
24 changes: 12 additions & 12 deletions Usage.md
Expand Up @@ -19,15 +19,15 @@ Currently, there are the following policy classes available:

|Policy | Policy Classes (implementations) | description |
|------- |----------------------------------| ----------- |
|**CreationPolicy** | Scatter`<conf1,conf2>` | A scattered allocation to tradeoff fragmentation for allocation time, as proposed in [ScatterAlloc](http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6339604). `conf1` configures the heap layout, `conf2` determines the hashing parameters|
|**CreationPolicy** | Scatter`<conf1,conf2>` | A scattered allocation to tradeoff fragmentation for allocation time, as proposed in [ScatterAlloc](http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6339604). `conf1` configures the heap layout, `conf2` determines the hashing parameters|
| | OldMalloc | device-side malloc/new and free/delete syscalls as implemented on NVidia CUDA graphics cards with compute capability sm_20 and higher |
|**DistributionPolicy** | XMallocSIMD`<conf>` | SIMD optimization for warp-wide allocation on NVIDIA CUDA accelerators, as proposed by [XMalloc](http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5577907). `conf` is used to determine the pagesize. If used in combination with *Scatter*, the pagesizes must match |
|**DistributionPolicy** | XMallocSIMD`<conf>` | SIMD optimization for warp-wide allocation on NVIDIA CUDA accelerators, as proposed by [XMalloc](http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5577907). `conf` is used to determine the pagesize. If used in combination with *Scatter*, the pagesizes must match |
| | Noop | no workload distribution at all |
|**OOMPolicy** | ReturnNull | pointers will be *NULL*, if the request could not be fulfilled |
|**OOMPolicy** | ReturnNull | pointers will be *nullptr*, if the request could not be fulfilled |
| | ~~BadAllocException~~ | will throw a `std::bad_alloc` exception. The accelerator has to support exceptions |
|**ReservePoolPolicy** | SimpleCudaMalloc | allocate a fixed heap with `CudaMalloc` |
| | CudaSetLimits | call to `CudaSetLimits` to increase the available Heap (e.g. when using *OldMalloc*) |
|**AlignmentPolicy** | Shrink`<conf>` | shrinks the pool so that the starting pointer is well aligned, applies padding to requested memory chunks. `conf` is used to determine the alignment|
|**AlignmentPolicy** | Shrink`<conf>` | shrinks the pool so that the starting pointer is well aligned, applies padding to requested memory chunks. `conf` is used to determine the alignment|
| | Noop | no alignment at all |

The user has to choose one of each policy that will form a useful allocator
Expand All @@ -45,7 +45,7 @@ to the policy class:
```c++
// configure the AlignmentPolicy "Shrink"
struct ShrinkConfig : mallocMC::AlignmentPolicies::Shrink<>::Properties {
typedef boost::mpl::int_<16> dataAlignment;
static constexpr auto dataAlignment = 16;
};
```
Expand All @@ -57,29 +57,29 @@ parameters to create the desired allocator type:
```c++
using namespace mallocMC;
typedef mallocMC::Allocator<
using Allocator1 = mallocMC::Allocator<
CreationPolicy::OldMalloc,
DistributionPolicy::Noop,
OOMPolicy::ReturnNull,
ReservePoolPolicy::CudaSetLimits,
AlignmentPolicy::Noop
> Allocator1;
>;
```

`Allocator1` will resemble the behaviour of classical device-side allocation known
from NVIDIA CUDA since compute capability sm_20. To get a more novel allocator, one
could create the following typedef instead:
could create the following alias instead:

```c++
using namespace mallocMC;

typedef mallocMC::Allocator<
using ScatterAllocator = mallocMC::Allocator<
CreationPolicies::Scatter<>,
DistributionPolicies::XMallocSIMD<>,
OOMPolicies::ReturnNull,
ReservePoolPolicies::SimpleCudaMalloc,
AlignmentPolicies::Shrink<ShrinkConfig>
> ScatterAllocator;
>;
```

Notice, how the policy classes `Scatter` and `XMallocSIMD` are instantiated without
Expand Down Expand Up @@ -122,13 +122,13 @@ A simplistic example would look like this:
namespace mallocMC = MC;
typedef MC::Allocator<
using ScatterAllocator = MC::Allocator<
MC::CreationPolicies::Scatter<>,
MC::DistributionPolicies::XMallocSIMD<>,
MC::OOMPolicies::ReturnNull,
MC::ReservePoolPolicies::SimpleCudaMalloc,
MC::AlignmentPolicies::Shrink<ShrinkConfig>
> ScatterAllocator;
>;
__global__ exampleKernel(ScatterAllocator::AllocatorHandle sah)
{
Expand Down

0 comments on commit dce5c92

Please sign in to comment.