Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU make edge scal 3d (GPU hydro PR 7/n) #82

Merged
merged 94 commits into from Oct 30, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
5dc93e6
started porting the ppm routines
harpolea Jul 15, 2019
cc4a96f
started combining loops in ppm
harpolea Jul 15, 2019
12c055a
got rid of one temporary array
harpolea Jul 15, 2019
a19b0c9
got rid of dsvl and sedge temporary arrays for ppm 1 in 3d
harpolea Jul 16, 2019
173c40d
saving changes
harpolea Jul 16, 2019
1bf7307
got rid of all temporary arrays for ppm 1 in 2d
harpolea Jul 16, 2019
3e5d428
got rid of all temporary arrays for ppm 1 in 3d
harpolea Jul 16, 2019
e6574b4
got rid of temporary arrays for ppm 2 in x-direction in 2d
harpolea Jul 16, 2019
14d00b0
got rid of all temporary arrays for 2d ppm
harpolea Jul 17, 2019
9f95021
got rid of temporary arrays for ppm 2 in x-direction in 3d
harpolea Jul 17, 2019
599a1dc
got rid of temporary arrays for ppm 2 in y-direction in 3d
harpolea Jul 17, 2019
9b3a8d4
got rid of temporary arrays for ppm 2 in 3d(!)
harpolea Jul 17, 2019
23bf75b
merging diffs
harpolea Jul 18, 2019
768bdf2
offloaded(?) mkutrans 2d
harpolea Jul 18, 2019
68aff4a
Broken 3d array indices
harpolea Jul 18, 2019
96a4eae
it runs again!
harpolea Jul 18, 2019
1db4764
added BC threads stuff from Castro
harpolea Jul 19, 2019
cb09281
undid some changes to get it running again
harpolea Jul 22, 2019
a539689
moved ppm_2d call outside of mkutrans
harpolea Jul 22, 2019
c87793c
mkutrans and ppm_2d now running on the GPU (in 2d)
harpolea Jul 23, 2019
eeea55a
3d compiles and runs again
harpolea Jul 23, 2019
ab8a4c4
removed temporary arrays from mkutrans_3d
harpolea Jul 23, 2019
e60ee8f
switched indices in ppm_3d
harpolea Jul 23, 2019
b6fc1ed
moved ppm_3d outside of mkutrans_3d
harpolea Jul 23, 2019
d0290cb
mkutrans_3d and ppm_3d both on the GPU
harpolea Jul 24, 2019
a6ae36e
removed temporary arrays and combined loops in slope_2d
harpolea Jul 24, 2019
b971a0b
slope_2d operates over lo:hi only
harpolea Jul 24, 2019
28ffce2
slope_2d and ppm_type 0 offloaded to GPU
harpolea Jul 24, 2019
43a5eb7
save changes
harpolea Jul 25, 2019
b7df527
undid broken code so it runs again
harpolea Jul 25, 2019
d163481
combined loops and got rid of temporary arrays in slopez_3d
harpolea Jul 25, 2019
cf078f4
got rid of slope temporary arrays from mkutrans_3d and split off slop…
harpolea Jul 25, 2019
a1dd845
mkutrans_3d and slopez_3d on the GPU
harpolea Jul 25, 2019
9ddbf1c
fixed 2d
harpolea Jul 25, 2019
9db6862
split ppm_2d out of velpred_2d
harpolea Jul 26, 2019
f3be523
split velpred_2d into two functions as they use different stencils
harpolea Jul 26, 2019
ae896f3
velpred_2d offloaded to GPU
harpolea Jul 26, 2019
75d28dc
merge conflicts
harpolea Jul 26, 2019
2031549
merge conflicts
harpolea Jul 26, 2019
8d9b914
merge conflicts
harpolea Jul 26, 2019
27f570d
merge conflicts
harpolea Jul 26, 2019
8dcca27
3d compiles and runs again
harpolea Jul 26, 2019
2a4e09e
got rid of some of the temporary arrays in velpred_3d
harpolea Jul 26, 2019
edbe73d
split velpred into 3 separate subroutines due to differing stencils
harpolea Jul 29, 2019
e6a6f01
offloaded velpred_3d to the GPU
harpolea Jul 29, 2019
65ef2e1
got rid of temporary arrays from make_edge_scal_2d
harpolea Jul 29, 2019
c80aac1
offloaded make_edge_scal_2d to GPU
harpolea Jul 29, 2019
b741e88
merge conflicts
harpolea Jul 29, 2019
b80de32
merge conflicts
harpolea Jul 29, 2019
d78a31b
split make_edge_scal_3d into 4 separate routines and got rid of tempo…
harpolea Jul 30, 2019
ff03f6f
offloaded make_edge_scal_3d to GPU
harpolea Jul 30, 2019
2473996
added register capping
harpolea Jul 30, 2019
f055556
added CUDA launches so offloaded code actually runs on the GPU
harpolea Jul 31, 2019
abc77d9
merge conflicts
harpolea Jul 31, 2019
a1595e9
added in missing g_lo and g_hi
harpolea Jul 31, 2019
14a82f2
fixed a bug in the boundary conditions in make_edge_scal
harpolea Jul 31, 2019
60fb3b6
merge conflicts
harpolea Aug 1, 2019
6a20d7a
saving changes
harpolea Aug 1, 2019
bd3779b
merge conflicts
harpolea Aug 1, 2019
f7b6848
compiles and runs with OMP
harpolea Aug 1, 2019
3dfd63b
compiles and runs with OMP
harpolea Aug 1, 2019
8058147
compiles and runs with OMP
harpolea Aug 1, 2019
729f597
compiles and runs with OMP
harpolea Aug 1, 2019
6e405fd
compiles and runs with OMP
harpolea Aug 1, 2019
af08fb7
velpred_3d compiles and runs with OMP
harpolea Aug 1, 2019
88a740a
make_edge_scal_2d compiles and runs with OMP
harpolea Aug 1, 2019
c10cc0a
make_edge_scal_3d compiles and runs with OMP
harpolea Aug 1, 2019
0ab9df3
removed commented-out launch region
harpolea Aug 5, 2019
63a0f49
merged with development
harpolea Aug 7, 2019
9e51688
merged with development
harpolea Aug 7, 2019
fc5f24b
merged type0 with development
harpolea Aug 7, 2019
9322a6e
fixed bc index in make_edge_scal
harpolea Aug 19, 2019
06ce2f5
merge conflicts
harpolea Aug 19, 2019
294cf47
Merge branch 'development' into gpu_ppm_2d
zingale Aug 25, 2019
230af88
Merge branch 'gpu_ppm_2d' into gpu_ppm_3d
zingale Aug 25, 2019
0f6d596
simplified if statements in ppm.F90
harpolea Aug 26, 2019
37b8641
fixed inflow BCs for ppm type 1
harpolea Aug 26, 2019
d0a2890
Merge branch 'development' of github.com:AMReX-Astro/MAESTROeX into g…
harpolea Aug 26, 2019
917d9dc
Merge branch 'gpu_ppm_3d' into gpu_ppm_type0
harpolea Aug 26, 2019
e16bdac
simplified if statements in slope.F90
harpolea Aug 26, 2019
d6e1479
fixed merge conficts
harpolea Aug 26, 2019
ec2517f
simplified if statements in velpred.F90
harpolea Aug 26, 2019
af1697d
merge conflicts
harpolea Aug 26, 2019
7a590c8
simplified if statements in velpred.F90
harpolea Aug 26, 2019
b822a96
simplified if statements in make_edge_scal.F90
harpolea Aug 26, 2019
989d729
simplified if statements in make_edge_scal.F90
harpolea Aug 26, 2019
2193ad4
merged with development and fixed conflicts
harpolea Sep 16, 2019
1dec5da
fixed merge conflicts with development
harpolea Sep 23, 2019
0834a6a
got rid of MaestroDt changes
harpolea Sep 24, 2019
394d91f
think I've got rid of the roundoff errors
harpolea Sep 24, 2019
f860a14
fixed merge conflicts
harpolea Oct 8, 2019
3b1f209
Merge branch 'development' into gpu_make_edge_scal_3d
harpolea Oct 28, 2019
cbaf396
Merge branch 'development' into gpu_make_edge_scal_3d
zingale Oct 29, 2019
74facf3
Merge branch 'gpu_make_edge_scal_3d' of github.com:amrex-astro/MAESTR…
zingale Oct 29, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 8 additions & 0 deletions Exec/Make.Maestro
Expand Up @@ -261,6 +261,14 @@ $(objEXETempDir)/AMReX_buildInfo.o: .FORCE

include $(AMREX_HOME)/Tools/GNUMake/Make.rules

ifeq ($(USE_CUDA),TRUE)
ifeq ($(USE_GPU_PRAGMA), TRUE)

include $(TOP)/Exec/Make.cuda_rules

endif
endif

clean::

$(SILENT) $(RM) extern.F90 extern.f90
Expand Down
72 changes: 72 additions & 0 deletions Exec/Make.cuda_rules
@@ -0,0 +1,72 @@

# Specialize rules for files that need register capping for CUDA.
# This corresponds to what we do to AMReX_filcc_mod.F90 in AMReX.

$(objEXETempDir)/MaestroHydro.o: MaestroHydro.cpp
@echo Compiling $(<F)
@if [ !-d $(objExeTempDir) ]; then mkdir -p $(objEXETempDir); fi
ifeq ($(cap_registers),1)
$(SILENT) $(CCACHE) $(CXX) $(patsubst -maxrregcount=$(CUDA_MAXREGCOUNT), -maxrregcount=128, $(CXXFLAGS)) $(CPPFLAGS) $(includes) -c $(srcTempDir)/$(<F) -o $(objEXETempDir)/MaestroHydro.o
else
$(SILENT) $(CCACHE) $(CXX) $(CXXFLAGS) $(CPPFLAGS) $(includes) -c $(srcTempDir)/$(<F) -o $(objEXETempDir)/MaestroHydro.o
endif

$(objEXETempDir)/ppm.o: ppm.F90
@echo Compiling $(<F) ...
@if [ ! -d $(objEXETempDir) ]; then mkdir -p $(objEXETempDir); fi
@if [ ! -d $(srcTempDir) ]; then mkdir -p $(srcTempDir); fi
@$(SHELL) -ec 'cp $< $(srcTempDir)'
$(AMREX_HOME)/Tools/F_scripts/gpu_fortran.py --fortran "$(srcTempDir)/$(<F)"
ifeq ($(cap_registers),1)
$(SILENT) $(F90CACHE) $(F90) $(patsubst $(cap_register_flag)$(CUDA_MAXREGCOUNT), $(cap_register_flag)128, $(F90FLAGS)) $(FMODULES) -DBL_LANG_FORT -DAMREX_LANG_FORT $(FCPPFLAGS) $(fincludes) -c $(srcTempDir)/$(<F) -o $(objEXETempDir)/ppm.o
else
$(SILENT) $(F90CACHE) $(F90) $(F90FLAGS) $(FMODULES) -DBL_LANG_FORT -DAMREX_LANG_FORT $(FCPPFLAGS) $(fincludes) -c $(srcTempDir)/$(<F) -o $(objEXETempDir)/ppm.o
endif

$(objEXETempDir)/slope.o: slope.F90
@echo Compiling $(<F) ...
@if [ ! -d $(objEXETempDir) ]; then mkdir -p $(objEXETempDir); fi
@if [ ! -d $(srcTempDir) ]; then mkdir -p $(srcTempDir); fi
@$(SHELL) -ec 'cp $< $(srcTempDir)'
$(AMREX_HOME)/Tools/F_scripts/gpu_fortran.py --fortran "$(srcTempDir)/$(<F)"
ifeq ($(cap_registers),1)
$(SILENT) $(F90CACHE) $(F90) $(patsubst $(cap_register_flag)$(CUDA_MAXREGCOUNT), $(cap_register_flag)128, $(F90FLAGS)) $(FMODULES) -DBL_LANG_FORT -DAMREX_LANG_FORT $(FCPPFLAGS) $(fincludes) -c $(srcTempDir)/$(<F) -o $(objEXETempDir)/slope.o
else
$(SILENT) $(F90CACHE) $(F90) $(F90FLAGS) $(FMODULES) -DBL_LANG_FORT -DAMREX_LANG_FORT $(FCPPFLAGS) $(fincludes) -c $(srcTempDir)/$(<F) -o $(objEXETempDir)/slope.o
endif

$(objEXETempDir)/velpred.o: velpred.F90
@echo Compiling $(<F) ...
@if [ ! -d $(objEXETempDir) ]; then mkdir -p $(objEXETempDir); fi
@if [ ! -d $(srcTempDir) ]; then mkdir -p $(srcTempDir); fi
@$(SHELL) -ec 'cp $< $(srcTempDir)'
$(AMREX_HOME)/Tools/F_scripts/gpu_fortran.py --fortran "$(srcTempDir)/$(<F)"
ifeq ($(cap_registers),1)
$(SILENT) $(F90CACHE) $(F90) $(patsubst $(cap_register_flag)$(CUDA_MAXREGCOUNT), $(cap_register_flag)128, $(F90FLAGS)) $(FMODULES) -DBL_LANG_FORT -DAMREX_LANG_FORT $(FCPPFLAGS) $(fincludes) -c $(srcTempDir)/$(<F) -o $(objEXETempDir)/velpred.o
else
$(SILENT) $(F90CACHE) $(F90) $(F90FLAGS) $(FMODULES) -DBL_LANG_FORT -DAMREX_LANG_FORT $(FCPPFLAGS) $(fincludes) -c $(srcTempDir)/$(<F) -o $(objEXETempDir)/velpred.o
endif

$(objEXETempDir)/mkutrans.o: mkutrans.F90
@echo Compiling $(<F) ...
@if [ ! -d $(objEXETempDir) ]; then mkdir -p $(objEXETempDir); fi
@if [ ! -d $(srcTempDir) ]; then mkdir -p $(srcTempDir); fi
@$(SHELL) -ec 'cp $< $(srcTempDir)'
$(AMREX_HOME)/Tools/F_scripts/gpu_fortran.py --fortran "$(srcTempDir)/$(<F)"
ifeq ($(cap_registers),1)
$(SILENT) $(F90CACHE) $(F90) $(patsubst $(cap_register_flag)$(CUDA_MAXREGCOUNT), $(cap_register_flag)128, $(F90FLAGS)) $(FMODULES) -DBL_LANG_FORT -DAMREX_LANG_FORT $(FCPPFLAGS) $(fincludes) -c $(srcTempDir)/$(<F) -o $(objEXETempDir)/mkutrans.o
else
$(SILENT) $(F90CACHE) $(F90) $(F90FLAGS) $(FMODULES) -DBL_LANG_FORT -DAMREX_LANG_FORT $(FCPPFLAGS) $(fincludes) -c $(srcTempDir)/$(<F) -o $(objEXETempDir)/mkutrans.o
endif

$(objEXETempDir)/make_edge_scal.o: make_edge_scal.F90
@echo Compiling $(<F) ...
@if [ ! -d $(objEXETempDir) ]; then mkdir -p $(objEXETempDir); fi
@if [ ! -d $(srcTempDir) ]; then mkdir -p $(srcTempDir); fi
@$(SHELL) -ec 'cp $< $(srcTempDir)'
$(AMREX_HOME)/Tools/F_scripts/gpu_fortran.py --fortran "$(srcTempDir)/$(<F)"
ifeq ($(cap_registers),1)
$(SILENT) $(F90CACHE) $(F90) $(patsubst $(cap_register_flag)$(CUDA_MAXREGCOUNT), $(cap_register_flag)128, $(F90FLAGS)) $(FMODULES) -DBL_LANG_FORT -DAMREX_LANG_FORT $(FCPPFLAGS) $(fincludes) -c $(srcTempDir)/$(<F) -o $(objEXETempDir)/make_edge_scal.o
else
$(SILENT) $(F90CACHE) $(F90) $(F90FLAGS) $(FMODULES) -DBL_LANG_FORT -DAMREX_LANG_FORT $(FCPPFLAGS) $(fincludes) -c $(srcTempDir)/$(<F) -o $(objEXETempDir)/make_edge_scal.o
endif
15 changes: 8 additions & 7 deletions Source/Maestro.H
Expand Up @@ -15,7 +15,6 @@
#include <AMReX_PlotFileUtil.H>

#include <PhysBCFunctMaestro.H>
#include <Maestro_F.H>

#ifdef AMREX_USE_CUDA
// #include <AMReX_CudaAllocators.H>
Expand All @@ -26,8 +25,10 @@
/// this will be stored in CUDA managed memory.
#ifdef AMREX_USE_CUDA
typedef amrex::Gpu::ManagedVector<amrex::Real> RealVector;
typedef amrex::Gpu::ManagedVector<int> IntVector;
#else
typedef amrex::Vector< amrex::Real > RealVector;
typedef amrex::Vector< int > IntVector;
#endif

class Maestro
Expand Down Expand Up @@ -1246,9 +1247,9 @@ private:
amrex::Vector<amrex::MultiFab> normal;
amrex::Vector<amrex::MultiFab> cell_cc_to_r;

/// Stores domain boundary conditions.
/// stores domain boundary conditions.
/// These muse be vectors (rather than arrays) so we can ParmParse them
amrex::Vector<int> phys_bc;
IntVector phys_bc;

/// Boundary condition objects needed for FillPatch routines.
/// This is essentially an array (over components)
Expand Down Expand Up @@ -1301,12 +1302,12 @@ private:
int nr_irreg;

// these provide information about the multilevel base state configuration
amrex::Vector<int> numdisjointchunks;
amrex::Vector<int> r_start_coord;
amrex::Vector<int> r_end_coord;
IntVector numdisjointchunks;
IntVector r_start_coord;
IntVector r_end_coord;

/// array of tagged boxes (planar)
amrex::Vector<int> tag_array;
IntVector tag_array;

// diag file array buffers
amrex::Vector<amrex::Real> diagfile1_data;
Expand Down
1 change: 1 addition & 0 deletions Source/Maestro.cpp
@@ -1,5 +1,6 @@

#include <Maestro.H>
#include <Maestro_F.H>

using namespace amrex;

Expand Down
1 change: 1 addition & 0 deletions Source/MaestroAdvance.cpp
@@ -1,5 +1,6 @@

#include <Maestro.H>
#include <Maestro_F.H>

using namespace amrex;

Expand Down
1 change: 1 addition & 0 deletions Source/MaestroAdvanceAvg.cpp
@@ -1,5 +1,6 @@

#include <Maestro.H>
#include <Maestro_F.H>

using namespace amrex;

Expand Down
1 change: 1 addition & 0 deletions Source/MaestroAdvanceIrreg.cpp
@@ -1,5 +1,6 @@

#include <Maestro.H>
#include <Maestro_F.H>

using namespace amrex;

Expand Down