Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay in execution in multiple calls #157

Closed
Leandroglez39 opened this issue Aug 7, 2020 · 9 comments
Closed

Delay in execution in multiple calls #157

Leandroglez39 opened this issue Aug 7, 2020 · 9 comments
Labels

Comments

@Leandroglez39
Copy link

Describe the bug
When executing multiple times there is a delay in execution, which in large cases becomes impossible to compute.

To Reproduce
#include
#include
#include
#include
#include <windows.h>
#include "edlib.h"

using namespace std;

int main()
{

string s1 = "AGTCAAAAGTCCCCCCAGTAGCT";
string s2 = "AGGGGGTTTTCCCCAAAAAA";


	for (size_t j = 0; j < 5000; i++)
	{

		clock_t start = clock();

		EdlibAlignResult result = edlibAlign(s1.c_str(), s1.size(), s2.c_str(), s2.size(),
			edlibNewAlignConfig(-1, EDLIB_MODE_NW, EDLIB_TASK_PATH, nullptr, 0));

		edlibFreeAlignResult(result);

		std::cout << static_cast<double>(clock() - start) / static_cast<double>(CLOCKS_PER_SEC) << " seconds." << endl;
	}

}

Expected behavior
The execution time of this operation should remain constant, but it is not.
time
.

Environment (please complete the following information):

  • OS: Windows 10
  • Visual Studio 2019 C++
@Martinsos
Copy link
Owner

Hi @Leandroglez39 !

What do you mean by saying "in large cases becomes impossible to compute"? Impossible because the delay is too small? Or too big? Or smth else? Pls provide more details on this!

With these short times I see from the image you attached, I would bet on jitter coming from other things, and not from Edlib.

However, this period of 7 runs is unusual. Only thing I can think of is if there is memory leak in edlib that accumulates and somehow gets cleaned up at that point.

I can't easily run this stuff in Windows, so I would suggest following:

  1. Could you report time in micro or nano seconds so we see more precise times?
  2. Could you somehow see memory usage through time?
  3. I am interested in what happens when you run calculation with longer sequences.

Thanks!
Martin

@Leandroglez39
Copy link
Author

Leandroglez39 commented Aug 8, 2020 via email

@Martinsos
Copy link
Owner

Thank you for sharing more information :)!
Ok hm, I would certainly like to identify the bug if there is one, and as you said, there has been this other report #155 .

Main thing is, both in this issue and in this other one, that it is not clear where is problem coming from, what is causing it, or if there even is a problem hm (although it does seem like there is one). Problem is that I don't have Windows environment set up to run Edlib and test it, and I was not able to reproduce the problem so far on Linux. I could eventually set up Win env to do this, but I did not yet find the time to do it.

I suggest following:

  1. I will try running this on Linux and seeing if there is same problem happening. However, I am currently on vacation, so it will probably take me about a week to get to it.
  2. See if you can get more information on the error, for example how is memory behaving and possibly even where is memory leak coming from.
  3. I could set up Win env and run this in it, but I can't promise when I will get to that since it requires significantly more time from my side.

p.s. I did not quite understand, is the anomaly/delay behaving the same way if you use longer sequences?

Btw, this problem you are solving -> 4000 x 4000 -> there is part of calculation, building the Peq table, that could be reused between the calculations that share the same query and probably significantly reduce the time of calculation, however there is no API currently for that. So that might be interesting feature request, that would help with the use cases like yours.

@Leandroglez39
Copy link
Author

As for your question about whether it occurs with longer sequences, the answer is YES ...
What I tried to refer to was that there is no pattern, for example:

1- Not every number of fixed interactions occurs.
2- The delay time is not constant.

On the other hand, it would be interesting to talk about the Small table ...

Enjoy your vacation, then we continue the exchange.

@Martinsos
Copy link
Owner

Martinsos commented Aug 14, 2020

@Leandroglez39 I tried running this on Linux, and this is what I got:

I tried sequences with sizes from 20 up to 10000 and times were fairly consistent -> sure, there are some oscillations, but nothing major that I would consider to be problematic or suspicious. However, times were bigger than the ones you have, they were about 0.000x seconds, not 0.00000x seconds as in your example -> I am guessing my machine is slower than yours.

I then tried running edlib with very small sequences, one of size 5, another of size 3, and I got the same pattern as you have: 0.000075 for some time, then suddenly 0.000120 a few times, and then back to 0.000075 seconds.

I run valgrind --tool=memcheck on all of these, and there are no memory leaks.

My conclusion from this is: there are no memory leaks on Linux, but I managed to reproduce the thing when times are very small. But I am pretty sure in this is case it is not something we should worry about in Edlib, times are just so short that this happens, I don't see it is a problem, and we identified it is not caused by a memory leak. It could be jitter in the clock itself, or maybe it took a little bit longer to allocate some memory, or CPU was busy with something else, who knows hm, times are just too short to measure and expect same number every time.

So, I would say problem does not exist, BUT, you said you are also having this problem with longer sequences! Could you tell me how big were they, and again display the times? If this is happening for you for bigger sequences, that have longer times, then we should certainly continue investigating it. I however could not reproduce the problem for bigger sequences.

You can see the code I used here: https://github.com/Martinsos/edlib/blob/issue-157-test/apps/hello-world/helloWorld.c .

@Martinsos
Copy link
Owner

Btw I opened this issue as feature request for 1 vs N and N vs N use case, and I also added reference to this issue there now: #118 .

@Leandroglez39
Copy link
Author

Thanks anyway, it seems that for the moment we will not be able to detect what is happening to me.

What you propose feature request for 1 vs N and N vs N would be very helpful for the type of work I am trying to do.

I will continue investigating, if I find any other clue about why it is delayed in those cases I will let you know. Thank you

@Martinsos
Copy link
Owner

@Leandroglez39 What about those longer sequences you mentioned? Could you create longer sequences so that you have longer times, so we can see then if this is still happening? This should be easy to test and I believe it will answer our question. Because these measurements so far, I am pretty sure they are just normal jitter due to measuring such short times -> that is why normally you never measure such short stuff directly, instead you measure a large amount of executions/calls and then take the average, when single execution time is so short.

I will be closing this one for now then, but if you do measurements with longer sequences, pls post the results here!

@Leandroglez39
Copy link
Author

Leandroglez39 commented Aug 18, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants