Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example does not work #45

Closed
marekkokot opened this issue Apr 19, 2023 · 3 comments · Fixed by #47
Closed

Example does not work #45

marekkokot opened this issue Apr 19, 2023 · 3 comments · Fixed by #47

Comments

@marekkokot
Copy link

Hi,

this example does not work:

nthash::NtHash nth("TGACTGATCGAGTCGTACTAG", 1, 5);  // 1 hash per 5-mer
while (nth.roll()) {
    // use nth.hashes() for canonical hashes
    //     nth.get_forward_hash() for forward strand hashes
    //     nth.get_reverse_hash() for reverse strand hashes
}

more precisely this is not working:

 nthash::NtHash nth("TGACTGATCGAGTCGTACTAG", 1, 5);  // 1 hash per 5-mer
    while (nth.roll()) {
        std::cout << nth.hashes()[0] << '\n';
        // use nth.hashes() for canonical hashes
        //     nth.get_forward_hash() for forward strand hashes
        //     nth.get_reverse_hash() for reverse strand hashes
    }

I get just one hash.
I'm not using cmake build, I just get all the sources needed and use

g++ *.cpp btllib/status.cpp

When I change to set the length it works:

nthash::NtHash nth("TGACTGATCGAGTCGTACTAG", 21, 1, 5);  // 1 hash per 5-mer

Also this works:

 std::string s = std::string("TGACTGATCGAGTCGTACTAG");
 nthash::NtHash nth(s, 1, 5);  // 1 hash per 5-mer

but no this:

nthash::NtHash nth(std::string("TGACTGATCGAGTCGTACTAG"), 1, 5);  // 1 hash per 5-mer

(the equivalent of the same as in the original example).

It took me a while to understand this.
Ctor captures const std::string& ref.
C++ guarantees that the temporal value (explicit like in the last code fragment, or implicit like in the original example) will live until the function end. The problem is the delegated ctor (const char*) does not copy the string content and we have a dangling pointer at the end.
Proposed solution:
maybe use memcpy like in the case of BlindNtHash ?
Although maybe there are reasons to not do this.
Anyway would be great to fix the example.

BTW. I had some issues cloning submodules, I have changed .gitmodules to

[submodule "vendor/argparse"]
	path = vendor/argparse
	url = https://github.com/p-ranav/argparse/
[submodule "vendor/btllib"]
	path = vendor/btllib
	url = https://github.com/bcgsc/btllib/

I'm not sure if it does make any difference.

BTW2.
Let's say I have a bit packet sequence, i.e. two bits per symbol in a form of uint64_t array.
Is there a way to use ntHash without unpacking it to std::string/cstring?
It's not crucial and may potentially be even worse vs efficient unpacking, but just out of curiosity.

@parham-k
Copy link
Member

Hi and thanks for your report and suggestions.

would be great to fix the example.

Agreed, at least until we find a better solution. I don't think using memcpy in NtHash::NtHash would be an efficient thing to do. It's ok for BlindNtHash since the input string is supposed to be just the first k-mer, but NtHash will get a whole sequence, and copying that would take extra time+memory.

BTW. I had some issues cloning submodules, I have changed .gitmodules

I used ssh to initialize the submodules, using HTTPS would be more reliable here. I'm going to remove all the submodules in the next release (soon) anyway to make ntHash easier to install/use.

BTW2.
Let's say I have a bit packet sequence, i.e. two bits per symbol in a form of uint64_t array.
Is there a way to use ntHash without unpacking it to std::string/cstring?
It's not crucial and may potentially be even worse vs efficient unpacking, but just out of curiosity.

This was coincidentally suggested to me in-person very recently. One would have to modify the ntf64 function to replace the SEED_TABLE lookup with the desired uint64_t value in the current implementation. I'll implement a better way to do this for the next release.

@parham-k parham-k mentioned this issue Apr 20, 2023
10 tasks
@marekkokot
Copy link
Author

Hi,

thanks for the quick responses to all my questions!

I expected that you may want to avoid extra copying.
One suggestion that I think of is to replace const std::string& in ctor with std::string_view

Best
Marek

@gf777
Copy link

gf777 commented Jul 21, 2023

this is still not working! It took me a while to realize that too :-)
It produces garbage values

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants