Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CouchDB unavailable after documents deletion #16

Closed
christophe-lejeune opened this issue Jul 5, 2012 · 9 comments
Closed

CouchDB unavailable after documents deletion #16

christophe-lejeune opened this issue Jul 5, 2012 · 9 comments
Labels

Comments

@christophe-lejeune
Copy link
Member

When multiple documents are deleted (even only 10 at once), CouchDB (version 1.0.1) enters views update. As a consequence, the whole base/application becomes unavailable for minutes (!). Such a behavior is not necessarily a bug, but the result is not really robust.

@benel
Copy link
Member

benel commented Jul 5, 2012

Here are some possible workarounds:

  • accessing the old view with stale=ok,
  • distributing views computation with BigCouch.

@benel
Copy link
Member

benel commented Jul 6, 2012

An other workaround exists but requires a major refactoring. The idea is to have 2 different design documents:

  • one for heavy computation views (kwic, lexicometrics),
  • the other for simple views used to browse corpora.

The problem however would be that heavy computation views updates would be triggered on using them.

@benel
Copy link
Member

benel commented Jul 6, 2012

stale=update_after could also be an option.

@benel
Copy link
Member

benel commented Jan 17, 2014

@christophe-lejeune Could you test it on a more recent version of CouchDB to check if the problem persists?

@benel
Copy link
Member

benel commented Nov 24, 2014

I experienced the same problem thursday evening.

During the week-end, I tried the refactoring strategy explained earlier. But, because it was only on lexicometrics (and not KWIC), there was no real difference.

I then tried another refactoring strategy on KWIC: moving parts of the computation from the view to the list.

@christophe-lejeune Would you agree testing my development code on your data to confirm that KWIC index computation has improved?

@benel
Copy link
Member

benel commented Nov 24, 2014

The view code is just a strip-downed version of the current one:

function (o) {
  if (!o.draft) {
    const KEYWORD_LENGTH = 25;
    const WORD_MATCHER = /\\[nt]|[^\s,;:\.!?…—–)(\][}{`'‘’"″“”«»&%<>€$*/+-]+/gi;
    for (var p in o.speeches) {
      var speech = o.speeches[p];
      var speech_text = speech.text;
      var match;
      while ((match = WORD_MATCHER.exec(speech_text))) {
        var keyword = speech_text.substr(match.index, KEYWORD_LENGTH);
        var value = {speech:p, match:match.index};
        emit([o.corpus, keyword], value);
        // emit([o._id, keyword], value);
      }
    }
  }
}

This should be enough to test the computation time (in a new separate design document).

If it is far better, I will publish the whole code (with updated list, rewrites and cassandre_php).

@benel
Copy link
Member

benel commented Jun 23, 2019

The idea is to have 2 different design documents:

  • one for heavy computation views (kwic, lexicometrics),
  • the other for simple views used to browse corpora.

I propose to refactor design documents as such:

  • app for the Web user interface,
  • api for Hypertopic compliant services,
  • mining for lexicometrics computing.

I'm still not sure about where to put the KWIC, since it must be accessible from both the app and the api.

I will probably also move the rewrite logic to the proxy.
Are you OK with this refactoring?

Do you have any ongoing non merged developments that you would want to finish before this big refactoring?

@christophe-lejeune
Copy link
Member Author

Some days ago, I was precisely thinking about strategies to avoid the kind of downtime we are experiencing when documents are deleted or when some minor modification is made in the design document (almost any update causes such an inconvenience). I was about to propose a round robin balance between two CouchDB nodes in one cluster, which is (of course) costlier than the strategy you propose here.

Have you conducted some tests that suggest that, when the indexer is busy on one design, other designs are still available ?

I am currently abroad for holidays. I, however, have just merged last months developments. Albeit this was part of my purpose, I do not have enough time to further discuss this issue nor to publish a new release these days.

Some developments were initiated before my departure, namely concerning a new graph rendering and the localization of sorting (because of diacritics). However, given these developments are at an early stage, a commit would be premature.

@christophe-lejeune
Copy link
Member Author

Now that KWIC is moved into a separate design document, the situation should be improved. It however requires further tests to check whether the hereabove problem still occurs after documents deletion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants