Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recovery.db.mv.db size crashes Mist #339

Closed
utkarshsaraf19 opened this issue Nov 6, 2017 · 9 comments
Closed

Recovery.db.mv.db size crashes Mist #339

utkarshsaraf19 opened this issue Nov 6, 2017 · 9 comments
Labels

Comments

@utkarshsaraf19
Copy link

utkarshsaraf19 commented Nov 6, 2017

I have setup a VM having following configuration : Redhat 7.4, 4 GB RAM

I have visualized that the size of Recovery.db.mv.db increases which is obvious as i run more jobs.

It is crashing when the size reaches 37 MB with java heap space error.

I wanted to know the reason of it.Is it due to browser loading this whole file or mist itself and what configuration changes/factors i have to keep in mind while deploying it?

@dos65
Copy link
Contributor

dos65 commented Nov 6, 2017

Does it always crash after restart?

@utkarshsaraf19
Copy link
Author

utkarshsaraf19 commented Nov 6, 2017

Yes..once this starts,it never gets normal.I have to delete Recovery.db.mv.db manually to make it work again.

@dos65
Copy link
Contributor

dos65 commented Nov 6, 2017

As a workaround you could increase Xmx for mist master process.
But we definitely should reconsider our way to store jobs history.

@dos65 dos65 added the bug label Nov 6, 2017
@utkarshsaraf19
Copy link
Author

utkarshsaraf19 commented Nov 6, 2017

I have some suggestion for the same:

a. Bucketing recoverydb on basis of number of jobs and output size of it.
b. Storing recoverydb on basis of folder named with endpoint
c. Showing it in front end with pagination feature as one page will utilize only one recoverydb at a time.
d. Seperate jvm for history logs.

@dos65
Copy link
Contributor

dos65 commented Nov 6, 2017

Yes bucketing or limiting could help.
I think at least we should provide a way to configure history storage (use external databases).

There is also a complicated question - should we continue storing jobs results inside database? They can very large and using databases may be an inefficient approach.

@spushkarev @mkf-simpson - This may be interesting for you - if we found another way how to store job results, it can be possible to build pipelines over datasets with that feature.

@mkf-simpson
Copy link
Member

I cannot understand how job history relates to pipelines?

@dos65
Copy link
Contributor

dos65 commented Nov 6, 2017

To invoke pipeline stages on different spark-contexts we need to store jobs results somewhere.

@mkf-simpson
Copy link
Member

Ok, MistWarehouse? :) But this discussion is for another ticket I guess.

@utkarshsaraf19
Copy link
Author

utkarshsaraf19 commented Nov 7, 2017

@mkf-simpson @dos65 @spushkarev Whichever way you choose, just one thing should be considered that job should not be deployed every time we hit it with endpoint due to this seperation. As of now,it takes 25 seconds for first run and less then 2 seconds for after run for same endpoint which is better in case of production use.

@dos65 dos65 closed this as completed Jul 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants