Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two error during the compile for 19_large_depthwise_conv2d_torch_extension #34

Open
ewrfcas opened this issue Jul 7, 2022 · 4 comments

Comments

@ewrfcas
Copy link

ewrfcas commented Jul 7, 2022

My environment:
python 3.8.8
cuda 11.1
pytorch 1.7.1/1.8.1/1.9 all failed

2 errors detected in the compilation of "forward_fp32.cu".
error: command '/usr/local/cuda-11.1/bin/nvcc' failed with exit status 1

forward_fp32.cu(212): error: more than one instance of constructor "cutlass::Tensor4DCoord::Tensor4DCoord" matches the argu
ment list:
            function "cutlass::Tensor4DCoord::Tensor4DCoord(cutlass::Tensor4DCoord::Index, cutlass::Tensor4DCoord::Index, c
utlass::Tensor4DCoord::Index, cutlass::Tensor4DCoord::Index)"
            function "cutlass::Tensor4DCoord::Tensor4DCoord(cutlass::Tensor4DCoord::LongIndex, cutlass::Tensor4DCoord::Long
Index, cutlass::Tensor4DCoord::LongIndex, cutlass::Tensor4DCoord::LongIndex)"
            argument types are: (int64_t, int64_t, int64_t, int)

forward_fp32.cu(232): error: no instance of constructor "cutlass::conv::kernel::ImplicitBatchedGemmTnDepthwiseConvo[6/1944]
ma_, Epilogue_, ThreadblockSwizzle_, ConvOperator, ConvProblemSize_>::Arguments::Arguments [with Mma_=cutlass::conv::thread
block::MmaTnPrecompPipelined<ThreadblockShape, cutlass::conv::threadblock::Dwconv2dTileIterator<cutlass::MatrixShape<64, 8>
, float, cutlass::layout::TensorNCHW, cutlass::transform::PitchLinearStripminedThreadMap<cutlass::layout::PitchLinearShape<
8, 64>, 128, 1>, 1, 0>, cutlass::conv::threadblock::RegularTileIteratorTransposed<cutlass::MatrixShape<64, 8>, float, cutla
ss::layout::ColumnMajor, 1, cutlass::conv::threadblock::DefaultMmaCore<ThreadblockShape, WarpShape, cutlass::gemm::GemmShap
e<1, 1, 1>, float, cutlass::layout::TensorNCHW, 1, float, cutlass::layout::TensorNCHW, 1, ElementDst, LayoutDst, cutlass::$
rch::OpClassSimt, 2, cutlass::arch::OpMultiplyAdd, true, cutlass::conv::ImplicitGemmMode::GEMM_TN, cutlass::arch::CacheOper
ation::Global, cutlass::arch::CacheOperation::Global>::TransposedPitchLinearThreadMapVec, 4>, cutlass::conv::threadblock::D
wconv2dTileFilterIteratorFpropPrecomp<cutlass::MatrixShape<8, 128>, float, cutlass::layout::TensorNCHW, cutlass::conv::thre
adblock::PitchLinearStripminedThreadMapStrided<cutlass::layout::PitchLinearShape<128, 8>, 128, 1>, 1>, cutlass::transform::
threadblock::RegularTileIterator<cutlass::MatrixShape<8, 128>, float, cutlass::layout::RowMajor, 0, cutlass::conv::threadbl
ock::PitchLinearStripminedThreadMapStrided<cutlass::layout::PitchLinearShape<128, 8>, 128, 1>, 4>, ElementDst, LayoutDst, c
utlass::gemm::threadblock::MmaPolicy<cutlass::gemm::warp::MmaSimt<WarpShape, float, cutlass::layout::ColumnMajor, float, cu
tlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShape<8,
4>, cutlass::layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>, 1, cutlass::ComplexTransform::kNone, cutla
ss::ComplexTransform::kNone, __nv_bool>, cutlass::MatrixShape<4, 0>, cutlass::MatrixShape<0, 0>, 1>, cutlass::NumericArrayC
onverter<float, float, 4, cutlass::FloatRoundStyle::round_to_nearest>, cutlass::NumericArrayConverter<float, float, 8, cutl
ass::FloatRoundStyle::round_to_nearest>, __nv_bool>, Epilogue_=cutlass::epilogue::threadblock::ConvolutionEpilogue<Threadbl
ockShape, cutlass::layout::TensorNCHW, 1, cutlass::gemm::warp::MmaSimt<WarpShape, float, cutlass::layout::ColumnMajor, floa
t, cutlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShap
e<8, 4>, cutlass::layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>, 1, cutlass::ComplexTransform::kNone,
cutlass::ComplexTransform::kNone, __nv_bool>, cutlass::epilogue::threadblock::Dwconv2dPredicatedTileIterator<cutlass::epilo
gue::threadblock::OutputTileOptimalThreadMap<cutlass::epilogue::threadblock::OutputTileShape<128, 1, 8, 1, 1>, cutlass::epi
logue::threadblock::OutputTileShape<1, 4, 2, 1, 8>, 128, 1, 32>, cutlass::layout::TensorNCHW, ElementDst>, cutlass::epilogu
e::warp::FragmentIteratorSimt<WarpShape, cutlass::gemm::thread::Mma<cutlass::gemm::GemmShape<8, 8, 1>, float, cutlass::layo
ut::ColumnMajor, float, cutlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd, __n
v_bool>, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShape<8, 4>, cutlass::layout::RowMajo
rInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>, cutlass::epilogue::warp::SimtPolicy<WarpShape, cutlass::gemm::thread::
Mma<cutlass::gemm::GemmShape<8, 8, 1>, float, cutlass::layout::ColumnMajor, float, cutlass::layout::RowMajor, ElementDst, c
utlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd, __nv_bool>, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimt
Policy<cutlass::MatrixShape<8, 4>, cutlass::layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>>>, cutlass::
epilogue::warp::TileIteratorSimt<WarpShape, cutlass::gemm::thread::Mma<cutlass::gemm::GemmShape<8, 8, 1>, float, cutlass::l
ayout::ColumnMajor, float, cutlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd,
__nv_bool>, ElementDst, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShape<8, 4>, cutlass::
layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>>, cutlass::epilogue::threadblock::SharedLoadIterator<cut
lass::epilogue::threadblock::OutputTileOptimalThreadMap<cutlass::epilogue::threadblock::OutputTileShape<128, 1, 8, 1, 1>, c
utlass::epilogue::threadblock::OutputTileShape<1, 4, 2, 1, 8>, 128, 1, 32>::CompactedThreadMap, ElementDst, 4>, cutlass::ep
ilogue::threadblock::Dwconv2dBiasTileIterator<cutlass::layout::TensorNCHW, ElementDst, 1>, EpilogueOp, cutlass::MatrixShape
<0, 17>, false>, ThreadblockSwizzle_=SwizzleThreadBlock, ConvOperator=cutlass::conv::Operator::kFprop, ConvProblemSize_=cut
lass::conv::Conv2dProblemSize]" matches the argument list
argument types are: ({...}, cutlass::TensorRef<ElementSrc, LayoutSrc>, cutlass::TensorRef<ElementSrc, LayoutSrc>, long, long, cutlass::TensorRef<ElementSrc, LayoutSrc>, {...})
@sleeplessai
Copy link

Same error occurred on PyTorch 1.10 with CUDA 11.3/11.0 and cuDNN 8.4.1/8.2.0.
And we received an error from cutlass

cutlass/include/cutlass/fast_math.h(741): error: no suitable conversion function from "__half" to "float" exists

@sleeplessai
Copy link

sleeplessai commented Jul 10, 2022

@ewrfcas
We attempted to solve this problem by downgrading Python version to 3.7.
It finally works.

@ChuanchuanZheng
Copy link

Could you please share the environment you used to install? like os version, gcc version, whether used C++14

@twmht
Copy link

twmht commented Sep 27, 2022

@sleeplessai

python 3.7.1 still not work. What is the minor version you used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants