Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clangd crashes on delete[] operator (cuda project) #1452

Open
IlVirtuoso opened this issue Jan 4, 2023 · 8 comments
Open

clangd crashes on delete[] operator (cuda project) #1452

IlVirtuoso opened this issue Jan 4, 2023 · 8 comments

Comments

@IlVirtuoso
Copy link

IlVirtuoso commented Jan 4, 2023

Please describe the problem.

clangd crashes on this snippet of code

int md5_gpu(const std::vector<std::string> &chunk, int threads, std::string targetMd5)
{
    CheckGpuCondition();
    size_t sum = 0;

    uint32_t *sizes = new uint32_t[chunk.size()];
    for (size_t i = 0; i < chunk.size(); i++)
    {
        sizes[i] = chunk[i].size();
        sum += chunk[i].size();
    }
    uint8_t *data = new uint8_t[sum];
    uint8_t *results =
        new uint8_t[chunk.size() * sizeof(uint32_t) * 4]; // every state vector is 4 elements composed of 4 bytes
    size_t offset = 0;
    for (int i = 0; i < chunk.size(); i++)
    {
        auto str = chunk.at(i).c_str();
        auto size = sizes[i];
        memcpy(data + offset, str, sizeof(uint8_t) * size);
        offset += size;
    }
    data[sum] = '\0';
    int result = md5_gpu(data, sizes, chunk.size(), threads,digesthex(targetMd5));
    delete[] sizes;
    delete[] data;
    delete[] results;
    return result;
}

if and only if the delete[] operators are present (does not happen if i use std::free instead), this snippet come from a cuda project and this is the cpp implementation of the interface that i provide to my main project.

Logs

#0 0x000000000052eb43 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/runner/work/clangd/clangd/llvm-project/llvm/lib/Support/Unix/Signals.inc:565:13
#1 0x000000000052cabc llvm::sys::RunSignalHandlers() /home/runner/work/clangd/clangd/llvm-project/llvm/lib/Support/Signals.cpp:104:18
#2 0x000000000052eeb6 SignalHandler(int) /home/runner/work/clangd/clangd/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1
#3 0x00007f739e6bd520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
#4 0x0000000000f08a01 setReferenced /home/runner/work/clangd/clangd/llvm-project/clang/include/clang/AST/DeclBase.h:595:50
#5 0x0000000000f08a01 clang::Sema::MarkFunctionReferenced(clang::SourceLocation, clang::FunctionDecl*, bool) /home/runner/work/clangd/clangd/llvm-project/clang/lib/Sema/SemaExpr.cpp:17938:9
#6 0x0000000000f9115d clang::Sema::ActOnCXXDelete(clang::SourceLocation, bool, bool, clang::Expr*) /home/runner/work/clangd/clangd/llvm-project/clang/lib/Sema/SemaExprCXX.cpp:3635:9
#7 0x0000000001ded5fd clang::Parser::ParseCXXDeleteExpression(bool, clang::SourceLocation) /home/runner/work/clangd/clangd/llvm-project/clang/lib/Parse/ParseExprCXX.cpp:0:0
#8 0x0000000001dd37cb clang::Parser::ParseCastExpression(clang::Parser::CastParseKind, bool, bool&, clang::Parser::TypeCastState, bool, bool*) /home/runner/work/clangd/clangd/llvm-project/clang/lib/Parse/ParseExpr.cpp:0:0
#9 0x0000000001dd0991 clang::Parser::ParseCastExpression(clang::Parser::CastParseKind, bool, clang::Parser::TypeCastState, bool, bool*) /home/runner/work/clangd/clangd/llvm-project/clang/lib/Parse/ParseExpr.cpp:681:20
#10 0x0000000001dcf16c clang::Parser::ParseAssignmentExpression(clang::Parser::TypeCastState) /home/runner/work/clangd/clangd/llvm-project/clang/lib/Parse/ParseExpr.cpp:176:10
#11 0x0000000001dcf0a9 clang::Parser::ParseExpression(clang::Parser::TypeCastState) /home/runner/work/clangd/clangd/llvm-project/clang/lib/Parse/ParseExpr.cpp:125:10
#12 0x0000000001e31f25 clang::Parser::ParseExprStatement(clang::Parser::ParsedStmtContext) /home/runner/work/clangd/clangd/llvm-project/clang/lib/Parse/ParseStmt.cpp:509:19
#13 0x0000000001e3058f clang::Parser::ParseStatementOrDeclarationAfterAttributes(llvm::SmallVector<clang::Stmt*, 32u>&, clang::Parser::ParsedStmtContext, clang::SourceLocation*, clang::ParsedAttributes&, clang::ParsedAttributes&) /home/runner/work/clangd/clangd/llvm-project/clang/lib/Parse/ParseStmt.cpp:264:12
#14 0x0000000001e2fea6 clang::Parser::ParseStatementOrDeclaration(llvm::SmallVector<clang::Stmt*, 32u>&, clang::Parser::ParsedStmtContext, clang::SourceLocation*) /home/runner/work/clangd/clangd/llvm-project/clang/lib/Parse/ParseStmt.cpp:114:20
#15 0x0000000001e38dc1 clang::Parser::ParseCompoundStatementBody(bool) /home/runner/work/clangd/clangd/llvm-project/clang/lib/Parse/ParseStmt.cpp:1157:11
#16 0x0000000001e3a2ed clang::Parser::ParseFunctionStatementBody(clang::Decl*, clang::Parser::ParseScope&) /home/runner/work/clangd/clangd/llvm-project/clang/lib/Parse/ParseStmt.cpp:2428:21
#17 0x0000000001d877a9 clang::Parser::ParseFunctionDefinition(clang::ParsingDeclarator&, clang::Parser::ParsedTemplateInfo const&, clang::Parser::LateParsedAttrList*) /home/runner/work/clangd/clangd/llvm-project/clang/lib/Parse/Parser.cpp:0:0
#18 0x0000000001da4bc0 clang::Parser::ParseDeclGroup(clang::ParsingDeclSpec&, clang::DeclaratorContext, clang::ParsedAttributes&, clang::SourceLocation*, clang::Parser::ForRangeInit*) /home/runner/work/clangd/clangd/llvm-project/clang/lib/Parse/ParseDecl.cpp:2121:18
#19 0x0000000001d865e2 clang::Parser::ParseDeclOrFunctionDefInternal(clang::ParsedAttributes&, clang::ParsingDeclSpec&, clang::AccessSpecifier) /home/runner/work/clangd/clangd/llvm-project/clang/lib/Parse/Parser.cpp:1179:10
#20 0x0000000001d86020 clang::Parser::ParseDeclarationOrFunctionDefinition(clang::ParsedAttributes&, clang::ParsingDeclSpec*, clang::AccessSpecifier) /home/runner/work/clangd/clangd/llvm-project/clang/lib/Parse/Parser.cpp:1193:12
#21 0x0000000001d85210 clang::Parser::ParseExternalDeclaration(clang::ParsedAttributes&, clang::ParsingDeclSpec*) /home/runner/work/clangd/clangd/llvm-project/clang/lib/Parse/Parser.cpp:1019:12
#22 0x0000000001d82d1c clang::Parser::ParseTopLevelDecl(clang::OpaquePtrclang::DeclGroupRef&, clang::Sema::ModuleImportState&) /home/runner/work/clangd/clangd/llvm-project/clang/lib/Parse/Parser.cpp:737:10
#23 0x0000000001d7e7ce clang::ParseAST(clang::Sema&, bool, bool) /home/runner/work/clangd/clangd/llvm-project/clang/lib/Parse/ParseAST.cpp:161:5
#24 0x0000000001b92bbb clang::FrontendAction::Execute() /home/runner/work/clangd/clangd/llvm-project/clang/lib/Frontend/FrontendAction.cpp:1041:10
#25 0x0000000001360eae getPtr /home/runner/work/clangd/clangd/llvm-project/llvm/include/llvm/Support/Error.h:274:12
#26 0x0000000001360eae operator bool /home/runner/work/clangd/clangd/llvm-project/llvm/include/llvm/Support/Error.h:234:16
#27 0x0000000001360eae clang::clangd::ParsedAST::build(llvm::StringRef, clang::clangd::ParseInputs const&, std::unique_ptr<clang::CompilerInvocation, std::default_deleteclang::CompilerInvocation >, llvm::ArrayRefclang::clangd::Diag, std::shared_ptr<clang::clangd::PreambleData const>) /home/runner/work/clangd/clangd/llvm-project/clang-tools-extra/clangd/ParsedAST.cpp:623:19
#28 0x00000000013d0729 ~__shared_count /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr_base.h:732:6
#29 0x00000000013d0729 ~__shared_ptr /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr_base.h:1183:31
#30 0x00000000013d0729 generateDiagnostics /home/runner/work/clangd/clangd/llvm-project/clang-tools-extra/clangd/TUScheduler.cpp:1185:40
#31 0x00000000013d0729 clang::clangd::(anonymous namespace)::ASTWorker::updatePreamble(std::unique_ptr<clang::CompilerInvocation, std::default_deleteclang::CompilerInvocation >, clang::clangd::ParseInputs, std::shared_ptr<clang::clangd::PreambleData const>, std::vector<clang::clangd::Diag, std::allocatorclang::clangd::Diag >, clang::clangd::WantDiagnostics)::$_2::operator()() /home/runner/work/clangd/clangd/llvm-project/clang-tools-extra/clangd/TUScheduler.cpp:1119:5
#32 0x00000000013cb8b3 operator() /home/runner/work/clangd/clangd/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#33 0x00000000013cb8b3 clang::clangd::(anonymous namespace)::ASTWorker::runTask(llvm::StringRef, llvm::function_ref<void ()>) /home/runner/work/clangd/clangd/llvm-project/clang-tools-extra/clangd/TUScheduler.cpp:1299:3
#34 0x00000000013c9b9f run /home/runner/work/clangd/clangd/llvm-project/clang-tools-extra/clangd/TUScheduler.cpp:1433:5
#35 0x00000000013c9b9f operator() /home/runner/work/clangd/clangd/llvm-project/clang-tools-extra/clangd/TUScheduler.cpp:815:42
#36 0x00000000013c9b9f void llvm::detail::UniqueFunctionBase::CallImpl<clang::clangd::(anonymous namespace)::ASTWorker::create(llvm::StringRef, clang::clangd::GlobalCompilationDatabase const&, clang::clangd::TUScheduler::ASTCache&, clang::clangd::TUScheduler::HeaderIncluderCache&, clang::clangd::AsyncTaskRunner*, clang::clangd::Semaphore&, clang::clangd::TUScheduler::Options const&, clang::clangd::ParsingCallbacks&)::$_7>(void*) /home/runner/work/clangd/clangd/llvm-project/llvm/include/llvm/ADT/FunctionExtras.h:222:12
#37 0x0000000001565621 PointerIntPair /home/runner/work/clangd/clangd/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:49:12
#38 0x0000000001565621 UniqueFunctionBase /home/runner/work/clangd/clangd/llvm-project/llvm/include/llvm/ADT/FunctionExtras.h:340:3
#39 0x0000000001565621 unique_function /home/runner/work/clangd/clangd/llvm-project/llvm/include/llvm/ADT/FunctionExtras.h:369:3
#40 0x0000000001565621 operator() /home/runner/work/clangd/clangd/llvm-project/clang-tools-extra/clangd/support/Threading.cpp:102:14
#41 0x0000000001565621 Apply<(lambda at /home/runner/work/clangd/clangd/llvm-project/clang-tools-extra/clangd/support/Threading.cpp:97:15)> /home/runner/work/clangd/clangd/llvm-project/llvm/include/llvm/Support/thread.h:42:5
#42 0x0000000001565621 GenericThreadProxy<std::tuple<(lambda at /home/runner/work/clangd/clangd/llvm-project/clang-tools-extra/clangd/support/Threading.cpp:97:15)> > /home/runner/work/clangd/clangd/llvm-project/llvm/include/llvm/Support/thread.h:50:5
#43 0x0000000001565621 void* llvm::thread::ThreadProxy<std::tuple<clang::clangd::AsyncTaskRunner::runAsync(llvm::Twine const&, llvm::unique_function<void ()>)::$_1> >(void*) /home/runner/work/clangd/clangd/llvm-project/llvm/include/llvm/Support/thread.h:60:5
#44 0x00007f739e70fb43 start_thread ./nptl/./nptl/pthread_create.c:442:8
#45 0x00007f739e7a1a00 ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:83:0
Signalled during AST worker action: Build AST
Filename: /home/drfaust/Scrivania/uni/Magistrale/SCPD/Project/DistributedCrack/GPU/src/md5_gpu.cpp
Directory: /home/drfaust/Scrivania/uni/Magistrale/SCPD/Project/DistributedCrack/build
Command Line: /usr/lib/llvm-14/bin/clang++ --driver-mode=g++ -DDistributedCrack_GPU_EXPORTS -I/home/drfaust/Scrivania/uni/Magistrale/SCPD/Project/DistributedCrack/GPU/./kernelinclude -I/usr/local/cuda/targets/x86_64-linux/include -I/home/drfaust/Scrivania/uni/Magistrale/SCPD/Project/DistributedCrack/GPU/./include -g -fPIC -std=gnu++20 -o GPU/CMakeFiles/DistributedCrack.GPU.dir/src/md5_gpu.cpp.o -c -xc++ -Wall -fopenmp=libomp -xcuda -resource-dir=/home/drfaust/.config/Code/User/globalStorage/llvm-vs-code-extensions.vscode-clangd/install/15.0.6/clangd_15.0.6/lib/clang/15.0.6 -- /home/drfaust/Scrivania/uni/Magistrale/SCPD/Project/DistributedCrack/GPU/src/md5_gpu.cpp
Version: 95

System information

Output of clangd --version:
15.0.6
Editor/LSP plugin:
vscode
Operating system:
Ubuntu linux 22.04

The open source project is available at https://github.com/IlVirtuoso/DistributedCrack

@IlVirtuoso IlVirtuoso changed the title clangd crashes on delete[] operator (cuda project) clangd crashes on delete[] operator (cuda project) [bug] Jan 4, 2023
@IlVirtuoso IlVirtuoso changed the title clangd crashes on delete[] operator (cuda project) [bug] clangd crashes on delete[] operator (cuda project) Jan 4, 2023
@HighCommander4
Copy link

Would you be able to reduce the crashing code to a single, self-contained file? (Standard library includes are fine as long as you specify the standard library version you have.)

(I know you made the entire project available but you're in a better position to do this initial step of the investigation since you have the dependencies of the project installed locally and so on.)

@IlVirtuoso
Copy link
Author

#include <string>
#include <vector>
#include <cuda.h>
#include <cuda_runtime.h>
#include <cstddef>
#include <cstdint>
#include <cstdio>
#include <cstdlib>
#include <cstring>


#ifdef _WIN32
#define DLL __declspec(dllexport)
#else
#define DLL
#endif // _WIN32



//md5cuda.cuh
#define uint uint32_t
#define uchar uint8_t

#define block_size 64

#define S11 7
#define S12 12
#define S13 17
#define S14 22
#define S21 5
#define S22 9
#define S23 14
#define S24 20
#define S31 4
#define S32 11
#define S33 16
#define S34 23
#define S41 6
#define S42 10
#define S43 15
#define S44 21

#define F(x, y, z) (((x) & (y)) | ((~x) & (z)))
#define G(x, y, z) (((x) & (z)) | ((y) & (~z)))
#define H(x, y, z) ((x) ^ (y) ^ (z))
#define I(x, y, z) ((y) ^ ((x) | (~z)))

#define ROTATE_LEFT(x, n) (((x) << (n)) | ((x) >> (32 - (n))))

#define FF(a, b, c, d, x, s, ac)                    \
	{                                               \
		(a) += F((b), (c), (d)) + (x) + (uint)(ac); \
		(a) = ROTATE_LEFT((a), (s));                \
		(a) += (b);                                 \
	}

#define GG(a, b, c, d, x, s, ac)                    \
	{                                               \
		(a) += G((b), (c), (d)) + (x) + (uint)(ac); \
		(a) = ROTATE_LEFT((a), (s));                \
		(a) += (b);                                 \
	}

#define HH(a, b, c, d, x, s, ac)                    \
	{                                               \
		(a) += H((b), (c), (d)) + (x) + (uint)(ac); \
		(a) = ROTATE_LEFT((a), (s));                \
		(a) += (b);                                 \
	}

#define II(a, b, c, d, x, s, ac)                    \
	{                                               \
		(a) += I((b), (c), (d)) + (x) + (uint)(ac); \
		(a) = ROTATE_LEFT((a), (s));                \
		(a) += (b);                                 \
	}

__device__ __host__ uint byteswap(uint word);

__device__ __host__ void transform(uint state[4], const uchar block[block_size]);

__device__ __host__ void md5(const uchar* data, const uint size, uint result[4]);

__host__ __device__ void md5(const uint8_t *data, uint32_t size, uint8_t *result);
__host__ void md5_gpu(const uint8_t *data, const uint32_t *sizes, uint8_t *result, uint32_t size, int threads);
__host__ int md5_gpu(const uint8_t *data, const uint32_t *sizes, uint32_t size, int threads, uint8_t * targetDigest);
__host__ void CheckGpuCondition();

template <typename T> cudaError_t GpuMalloc(T **pointer, size_t size)
{
    return cudaMalloc(pointer, sizeof(T) * size);
}

template <typename T> cudaError_t GpuCopy(T *dst, const T *src, size_t size, cudaMemcpyKind kind)
{
    return cudaMemcpy(dst, src, sizeof(T) * size, kind);
}

template <typename T> cudaError_t GpuManagedMalloc(T **pointer, size_t size)
{
    return cudaMallocManaged(pointer, sizeof(T) * size);
}

//md5cuda.cu
__host__ inline void HandleError(cudaError_t cudaError)
{
    if (cudaError != cudaSuccess)
    {
        printf("Error on cuda execution: %s\n", cudaGetErrorString(cudaError));
    }
}

__device__ constexpr uchar padding[block_size] = {0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                                                  0,    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                                                  0,    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

__device__ __host__ uint byteswap(uint word)
{
    return ((word >> 24) & 0x000000FF) | ((word >> 8) & 0x0000FF00) | ((word << 8) & 0x00FF0000) |
           ((word << 24) & 0xFF000000);
}

__device__ __host__ void transform(uint state[4], const uchar block[block_size])
{
    uint a = state[0], b = state[1], c = state[2], d = state[3];
    uint x[16];

    for (uint i = 0, j = 0; j < block_size && i < 16; i++, j += 4)
    {
        x[i] = (uint)block[j] | ((uint)block[j + 1] << 8) | ((uint)block[j + 2] << 16) | ((uint)block[j + 3] << 24);
    }

    FF(a, b, c, d, x[0], S11, 0xd76aa478);
    FF(d, a, b, c, x[1], S12, 0xe8c7b756);
    FF(c, d, a, b, x[2], S13, 0x242070db);
    FF(b, c, d, a, x[3], S14, 0xc1bdceee);
    FF(a, b, c, d, x[4], S11, 0xf57c0faf);
    FF(d, a, b, c, x[5], S12, 0x4787c62a);
    FF(c, d, a, b, x[6], S13, 0xa8304613);
    FF(b, c, d, a, x[7], S14, 0xfd469501);
    FF(a, b, c, d, x[8], S11, 0x698098d8);
    FF(d, a, b, c, x[9], S12, 0x8b44f7af);
    FF(c, d, a, b, x[10], S13, 0xffff5bb1);
    FF(b, c, d, a, x[11], S14, 0x895cd7be);
    FF(a, b, c, d, x[12], S11, 0x6b901122);
    FF(d, a, b, c, x[13], S12, 0xfd987193);
    FF(c, d, a, b, x[14], S13, 0xa679438e);
    FF(b, c, d, a, x[15], S14, 0x49b40821);

    GG(a, b, c, d, x[1], S21, 0xf61e2562);
    GG(d, a, b, c, x[6], S22, 0xc040b340);
    GG(c, d, a, b, x[11], S23, 0x265e5a51);
    GG(b, c, d, a, x[0], S24, 0xe9b6c7aa);
    GG(a, b, c, d, x[5], S21, 0xd62f105d);
    GG(d, a, b, c, x[10], S22, 0x2441453);
    GG(c, d, a, b, x[15], S23, 0xd8a1e681);
    GG(b, c, d, a, x[4], S24, 0xe7d3fbc8);
    GG(a, b, c, d, x[9], S21, 0x21e1cde6);
    GG(d, a, b, c, x[14], S22, 0xc33707d6);
    GG(c, d, a, b, x[3], S23, 0xf4d50d87);
    GG(b, c, d, a, x[8], S24, 0x455a14ed);
    GG(a, b, c, d, x[13], S21, 0xa9e3e905);
    GG(d, a, b, c, x[2], S22, 0xfcefa3f8);
    GG(c, d, a, b, x[7], S23, 0x676f02d9);
    GG(b, c, d, a, x[12], S24, 0x8d2a4c8a);

    HH(a, b, c, d, x[5], S31, 0xfffa3942);
    HH(d, a, b, c, x[8], S32, 0x8771f681);
    HH(c, d, a, b, x[11], S33, 0x6d9d6122);
    HH(b, c, d, a, x[14], S34, 0xfde5380c);
    HH(a, b, c, d, x[1], S31, 0xa4beea44);
    HH(d, a, b, c, x[4], S32, 0x4bdecfa9);
    HH(c, d, a, b, x[7], S33, 0xf6bb4b60);
    HH(b, c, d, a, x[10], S34, 0xbebfbc70);
    HH(a, b, c, d, x[13], S31, 0x289b7ec6);
    HH(d, a, b, c, x[0], S32, 0xeaa127fa);
    HH(c, d, a, b, x[3], S33, 0xd4ef3085);
    HH(b, c, d, a, x[6], S34, 0x4881d05);
    HH(a, b, c, d, x[9], S31, 0xd9d4d039);
    HH(d, a, b, c, x[12], S32, 0xe6db99e5);
    HH(c, d, a, b, x[15], S33, 0x1fa27cf8);
    HH(b, c, d, a, x[2], S34, 0xc4ac5665);

    II(a, b, c, d, x[0], S41, 0xf4292244);
    II(d, a, b, c, x[7], S42, 0x432aff97);
    II(c, d, a, b, x[14], S43, 0xab9423a7);
    II(b, c, d, a, x[5], S44, 0xfc93a039);
    II(a, b, c, d, x[12], S41, 0x655b59c3);
    II(d, a, b, c, x[3], S42, 0x8f0ccc92);
    II(c, d, a, b, x[10], S43, 0xffeff47d);
    II(b, c, d, a, x[1], S44, 0x85845dd1);
    II(a, b, c, d, x[8], S41, 0x6fa87e4f);
    II(d, a, b, c, x[15], S42, 0xfe2ce6e0);
    II(c, d, a, b, x[6], S43, 0xa3014314);
    II(b, c, d, a, x[13], S44, 0x4e0811a1);
    II(a, b, c, d, x[4], S41, 0xf7537e82);
    II(d, a, b, c, x[11], S42, 0xbd3af235);
    II(c, d, a, b, x[2], S43, 0x2ad7d2bb);
    II(b, c, d, a, x[9], S44, 0xeb86d391);

    state[0] += a;
    state[1] += b;
    state[2] += c;
    state[3] += d;
}

__device__ __host__ void md5(const uint8_t *data, const uint32_t size, uint8_t *result)
{
    uint state[4] = {0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476}, i;
    for (i = 0; i + block_size <= size; i += block_size)
    {
        transform(state, data + i);
    }

    uint size_in_bits = size << 3;
    uchar buffer[block_size];

    memcpy(buffer, data + i, size - i);
    memcpy(buffer + size - i, padding, block_size - (size - i));
    memcpy(buffer + block_size - (2 * sizeof(uint)), &size_in_bits, sizeof(uint));

    transform(state, buffer);

    memcpy(result, state, 4 * sizeof(uint));
}

__global__ void md5_gpu_comparer(const uint8_t *digests, const uint8_t *targetDigest, uint32_t size, uint32_t *result)
{
    int i = blockDim.x * blockIdx.x + threadIdx.x;
    if (i < size)
    {
        uint8_t eq = 1;
        for (int j = 0; j < 16; j++)
        {
            if (digests[j + i * 16] != targetDigest[j])
            {
                eq = 0;
                break;
            }
        }
        if (eq)
            *result = i+1;
    }
}

__global__ void md5_call_gpu(const uint8_t *data, const uint32_t *sizes, uint32_t *offsets, uint8_t *result,
                             uint32_t size)
{
    int i = blockDim.x * blockIdx.x + threadIdx.x;
    if (i < size)
        md5(data + offsets[i], sizes[i], result + i * sizeof(uint32_t) * 4);
}

__host__ int md5_gpu(const uint8_t *data, const uint32_t *sizes, uint32_t size, int threads, uint8_t *targetDigest)
{
    uint8_t *remoteData = nullptr, *remoteResults = nullptr, *remoteTarget = nullptr;
    uint32_t *remoteSizes = nullptr, *offsets = nullptr;
    size_t grandTotal = 0;
    HandleError(GpuManagedMalloc(&offsets, size));
    for (int i = 0; i < size; i++)
    {
        offsets[i] = grandTotal;
        grandTotal += sizes[i];
    }
    HandleError(GpuMalloc(&remoteData,  grandTotal));
    HandleError(GpuMalloc(&remoteResults, size * 4));
    HandleError(GpuMalloc(&remoteSizes, size));
    HandleError(GpuCopy(remoteData, data,  grandTotal, cudaMemcpyHostToDevice));
    HandleError(GpuCopy(remoteSizes, sizes, size , cudaMemcpyHostToDevice));
    int blocks = ceil((float)size / threads);
    md5_call_gpu<<<blocks, threads>>>(remoteData, remoteSizes, offsets, remoteResults, size);
    HandleError(GpuMalloc(&remoteTarget, 16));
    HandleError(GpuCopy(remoteTarget, targetDigest, 16, cudaMemcpyHostToDevice));
    md5_gpu_comparer<<<blocks, threads>>>(remoteResults, remoteTarget, size, offsets);
    HandleError(cudaDeviceSynchronize());
    HandleError(cudaFree(remoteData));
    HandleError(cudaFree(remoteSizes));
    HandleError(cudaFree(remoteResults));
    uint32_t res = offsets[0] - 1;
    HandleError(cudaFree(offsets));
    HandleError(cudaFree(remoteTarget));
    return res;
}

__host__ void md5_gpu(const uint8_t *data, const uint32_t *sizes, uint8_t *result, uint32_t size, int threads)
{
    uint8_t *remoteData = nullptr, *remoteResults = nullptr;
    uint32_t *remoteSizes = nullptr, *offsets = nullptr;
    size_t grandTotal = 0;
    HandleError(GpuManagedMalloc(&offsets, size * sizeof(uint32_t)));
    for (int i = 0; i < size; i++)
    {
        offsets[i] = grandTotal;
        grandTotal += sizes[i];
    }
    HandleError(GpuMalloc(&remoteData,  grandTotal));
    HandleError(GpuMalloc(&remoteResults, size * sizeof(uint32_t) * 4));
    HandleError(GpuMalloc(&remoteSizes, size * sizeof(uint32_t)));
    HandleError(GpuCopy(remoteData, data, grandTotal, cudaMemcpyHostToDevice));
    HandleError(GpuCopy(remoteSizes, sizes, size * sizeof(uint32_t), cudaMemcpyHostToDevice));

    int blocks = ceil((float)size / threads);
    md5_call_gpu<<<blocks, threads>>>(remoteData, remoteSizes, offsets, remoteResults, size);

    HandleError(cudaDeviceSynchronize());
    HandleError(GpuCopy(result, remoteResults, size * sizeof(uint32_t) * 4, cudaMemcpyDeviceToHost));

    HandleError(cudaFree(remoteData));
    HandleError(cudaFree(remoteSizes));
    HandleError(cudaFree(remoteResults));
}

__host__ void CheckGpuCondition()
{
    static bool initialized = false;
    CUresult result;
    if (!initialized && (result = cuInit(0)) != CUDA_SUCCESS)
        printf("Error on gpu initialization: %s\n", cudaGetErrorString((cudaError)result));
}

//MD5gpu.hpp
DLL std::vector<std::string>& md5_gpu(const std::vector<std::string> &chunk, int threads);
DLL std::vector<std::string> &hexdigest(const std::vector<std::string> &results);

DLL int md5_gpu(const std::vector<std::string> &chunk,int threads, std::string targetMd5);

//md5gpu.cpp 
const char *hexdigest(const uint8_t *digest)
{

    char *buf = (char *)std::malloc(33);
    for (int i = 0; i < 16; i++)
        sprintf(buf + i * 2, "%02x", digest[i]);
    buf[32] = '\000';

    return buf;
}

uint8_t *digesthex(std::string md5)
{
    std::vector<unsigned char> digest;

    for (size_t i = 0; i < md5.size(); i += 2)
    {
        std::string byte_string = md5.substr(i, 2);
        unsigned char byte = (unsigned char)strtol(byte_string.c_str(), NULL, 16);
        digest.push_back(byte);
    }
    uint8_t *result = new uint8_t[4*sizeof(uint32_t)];
    for (int i = 0; i < 16; i++)
    {
        result[i] = digest[i];
    }
    return result;
}

std::vector<std::string> &hexdigest(const std::vector<std::string> &results)
{
    std::vector<std::string> &result = *new std::vector<std::string>();
    for (size_t i = 0; i < results.size(); i++)
    {
        result.push_back(hexdigest((uint8_t *)results[i].c_str()));
    }
    return result;
}

int md5_gpu(const std::vector<std::string> &chunk, int threads, std::string targetMd5)
{
    CheckGpuCondition();
    size_t sum = 0;

    uint32_t *sizes = new uint32_t[chunk.size()];
    for (size_t i = 0; i < chunk.size(); i++)
    {
        sizes[i] = chunk[i].size();
        sum += chunk[i].size();
    }
    uint8_t *data = new uint8_t[sum];
    uint8_t *results =
        new uint8_t[chunk.size() * sizeof(uint32_t) * 4]; // every state vector is 4 elements composed of 4 bytes
    size_t offset = 0;
    for (int i = 0; i < chunk.size(); i++)
    {
        auto str = chunk.at(i).c_str();
        auto size = sizes[i];
        memcpy(data + offset, str, sizeof(uint8_t) * size);
        offset += size;
    }
    data[sum] = '\0';
    int result = md5_gpu(data, sizes, chunk.size(), threads,digesthex(targetMd5));
    delete[] sizes;
    delete[] data;
    delete[] results;
    return result;
}

std::vector<std::string> &md5_gpu(const std::vector<std::string> &chunk, int threads)
{
    CheckGpuCondition();
    std::vector<std::string> &resultsVector = *new std::vector<std::string>();
    size_t sum = 0;
    uint32_t *sizes = new uint32_t[chunk.size()];
    for (size_t i = 0; i < chunk.size(); i++)
    {
        sizes[i] = chunk[i].size();
        sum += chunk[i].size();
    }
    uint8_t *data = new uint8_t[chunk.size() * sum];
    uint8_t *results =
        new uint8_t[chunk.size() * sizeof(uint32_t) * 4]; // every state vector is 4 elements composed of 4 bytes
    size_t offset = 0;
    for (int i = 0; i < chunk.size(); i++)
    {
        auto str = chunk.at(i).c_str();
        auto size = sizes[i];
        memcpy(data + offset, str, sizeof(uint8_t) * size);
        offset += size;
    }
    data[sum] = '\0';
    md5_gpu(data, sizes, results, chunk.size(), threads);
    for (int i = 0; i < chunk.size(); i++)
    {
        resultsVector.push_back(std::string((char *)results + (i * sizeof(uint32_t) * 4), sizeof(uint32_t) * 4));
    }
    delete[] sizes;
    delete[] data;
    delete[] results;
    return resultsVector;
}

@IlVirtuoso
Copy link
Author

IlVirtuoso commented Jan 8, 2023

@HighCommander4 i reported in the previous comment a self contained file. The only 2 dependencies of this file is the stdlibc++(version 12) and the CUDA runtime (mine is version 11.7 with driver 515), hope it helps, let me know if i can help in any other way.

@HighCommander4
Copy link

Thanks. Unfortunately, I don't have CUDA installed, and so far have been unsuccessful in installing it.

I went to https://developer.nvidia.com/cuda-11-7-0-download-archive, but unfortunately my Linux distro (PureOS 10) is not one of the listed ones. I tried using the package for Debian 11 which I believe is the closest, but I got package dependency errors when I tried to install the cuda package.

If you're able to reduce the code further so it doesn't depend on CUDA, that would be ideal. Alternatively, if you can help me figure out how to install CUDA, I'm happy to do that.

@HighCommander4
Copy link

HighCommander4 commented Jan 9, 2023

Update: I was able to install CUDA by manually downloading some of the dependent packages from the Debian 11 package repositories.

I now have a cuda.h at /usr/include/linux/cuda.h. However, I do not have a cuda_runtime.h anywhere.

What package do I need to install to get cuda_runtime.h? (I did try cuda-runtime-11-7, it's installed already.)

@IlVirtuoso
Copy link
Author

IlVirtuoso commented Jan 10, 2023

You can use

wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.run

to download the cuda installation file from nvidia, then run it with administrative privileges, a menu will appear giving you the option to install the driver, you can uncheck that box if you want since is not needed. The installer will install the library at /usr/local/cuda, make sure to launch an ldconfig on that folder, then you have 2 ways to procede:

  1. (this is what i usually do) configure a mini project with cmake that will discover the library and output a compile_commands.json with all the dependencies included

  2. configure manually the compile_commands.json in order to inlclude /usr/loca/cuda/include.

This particular issue in fact happen only if you use delete[] operator on an array when cuda.h and cuda_runtime.h are included.
I expected more a thing like operator delete[] not defined since it is possible that in those headers a delete[] operator is defined but not implemented.

@HighCommander4
Copy link

I was able to reproduce this using the minimal delete.cu example in #1815 (comment).

The crash is preceded by this assertion failure:

clang/lib/Sema/SemaExprCXX.cpp:3225: clang::FunctionDecl *clang::Sema::FindUsualDeallocationFunction(clang::SourceLocation, bool, bool, clang::DeclarationName): Assertion `Result.FD && "operator delete missing from global scope?"' failed.

@HighCommander4
Copy link

HighCommander4 commented Feb 12, 2024

I was able to reproduce this using the minimal delete.cu example in #1815 (comment).

This example wasn't really minimal, because when compiling a .cu file, clang implicitly includes a whole bunch of library code.

The implicit inclusion can be disabled with -nocudainc, which then makes the crash go away (suggesting that it's related to something in the implicitly included library code).

I looked at the cc1 args to find the implicitly included code, and reduced across it to find this minimal testcase which triggers the crash with -nocudainc:

preamble.hpp:

void operator delete(void*) noexcept;

delete.cu:

#include "preamble.hpp"

auto main() -> int {
  delete (void *)(0); // <- this line can make clangd crash
}

(Triggering the crash still requires -std=c++20 as well.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants