Skip to content
This repository has been archived by the owner on Oct 17, 2022. It is now read-only.

_replicate vs _replication API endpoints #346

Closed
ufobat opened this issue Oct 31, 2018 · 7 comments
Closed

_replicate vs _replication API endpoints #346

ufobat opened this issue Oct 31, 2018 · 7 comments

Comments

@ufobat
Copy link
Contributor

ufobat commented Oct 31, 2018

I am currently writing a little script that helps me to setup a replication between two couchdbs. Therefore I want to read the current replication jobs, check if my desired job is already there. If not, I would setup a replication. If there are jobs that are no longer required (doesnt match my criteria) I wanted to remove the those.

While doing so I was falling in some traps:

In the Replication Section of the documentation there are _replication and _replicate. It took me a while to realize that those two are actually not the same names.

First I send my "replication requests" into _replicate instead of _replication. It took several minutes before something showed up in _schedule/jobs. I was confused that there was no way to show my configuration immediatly. What is the expected behaviour when you configure a replication to _replicate and what is the expected behaviour when you write it to _replication?

I would like to have some information in the documentation

  1. that there are actually those two, and explain what is used for which usecase. (I dont know it, could you give me to write a PR, please)

  2. that explains the expected behaviour in way like: When you do X you can see Y after Z happend.
    (Like When you write to _replicate it takes at least x seconds x secons and then you can find the results in _schedule/jobs - not sure if that is correct)

If you could provide me some information I will gladly write a PR. :-)

@ufobat
Copy link
Contributor Author

ufobat commented Oct 31, 2018

FYI

<vatamane> ufobat: try checking  _scheduler/docs/ instead
<vatamane> _replicate endpoint is to create replications that are not backed by a document from the _replicator database
<vatamane> also note there is  the http /_replicate endpoint and a _replicator database I think you might be confusing the two
<ufobat> vatamane, when if i only use _replicator db? 
<ufobat> i can POST and GET from there?
<ufobat> i just care about the configruation, not of the replication results/states
<vatamane> you can create replications by creating documents in the _replicator database
<ufobat> vatamane, i actually did :-( 
<vatamane> these replication will persist even after a server is restarted (unlike replication from the _replicate endpoint)
<vatamane> to check the status of your replication try http://docs.couchdb.org/en/stable/api/server/common.html#scheduler-docs
<ufobat> because of my confusion i wrote https://github.com/apache/couchdb-documentation/issues/346
<vatamane> thanks ufobat, I'll update the ticket with more info
<ufobat> is the scheduler-doc information about which documents in which db where replicated?
<ufobat> my db is currently empty
<vatamane> _scheduler/doc allows you to track the state of replications which are backed by a document in a _replicator database
<ufobat> ah and _scheduler/jobs is for the _replicate api?
<vatamane> _scheduler/jobs tracks replication jobs that started broth from the _replicate endpoint and by creating documents in the _replicator db
<ufobat> and this applys also if i have a "custom_replicator" database, because docs says i could use one ( which would fit perfectly for my script approach, cause i wouldnt see other replications that dont belong to me)
<vatamane> you could use one, yeah, any database that end with the /_replicator suffix will work as a replicator db
<vatamane> when you query _scheduler/docs you'd have to pass that database name in the path explicitly like say _scheduler/docs/mycustom/_replicator
<ufobat> i think thats missin in the documentation as well

@nickva
Copy link
Contributor

nickva commented Oct 31, 2018

Creating Replications

There are two ways to create replications:

  1. POST-ing to the /_replicate HTTP endpoint. Let's call these transient.

  2. Creating a document in the _replicator database. Any database which ends in the /_replicator will work as a replicator database. Let's call these persistent.

Each are slightly different. For the transient ones there is no document backing the replication job. If the server crashes the job will just disappear. When jobs finish there is no way to query their state. This method of replicating was implemented initially.

Persistent ones are backed by a document in a _replicator db. They will persist across server restarts and it is possible to inspect their state after they finished.

The reason there are two is because transient ones where implemented first and were kept for backwards compatibility and are actually useful in some cases when programmatically creating replication jobs.

Monitoring Replications

  1. Transient replications can be monitored via the /_active_tasks endpoint or the /_scheduler/jobs endpoint. It might take a a few minutes between the time a replication is created and it appears as a replication job in these endpoints.

http://docs.couchdb.org/en/latest/api/server/common.html#scheduler-jobs

  1. Permanent replications can be monitored via the /_scheduler/docs endpoint as well as the /_scheduler/jobs and /_active_tasks. The /_scheduler/docs is preferred as it will show the state of the replication document before it becomes a replication job. Some documents could be invalid and could not become a replication job. Others might be delayed because they are fetching say the filter code from a slow source database.

http://docs.couchdb.org/en/latest/api/server/common.html#get--_scheduler-docs-replicator_db

Replication States

Replication documents become replication jobs and then replication jobs do all the replication work. There are a number of states a replication goes through so this chart might be helpful:

http://docs.couchdb.org/en/latest/replication/replicator.html#replication-states

@ufobat
Copy link
Contributor Author

ufobat commented Oct 31, 2018

what about
{"id":"_design/_replicator","key":"_design/_replicator","value":{"rev":"1-85a961d0d9b235b7b4f07baed1a38fda"}} which is a default document in the _replication db?

@ufobat
Copy link
Contributor Author

ufobat commented Oct 31, 2018

<vatamane> that's expected it's automatically created and is used to validate replicate documents created in that database
<rnewson> it enforces correctness of the docs, it's an internal detail really
<ufobat> i am just asking because when i do a get on _replicator/_all_docs i need to know that this is there in order to skip over it
<vatamane> you can skip over it
<ufobat> because "remove anything thats not my desired replication configuration" would be wrong
<ufobat> ty :)

@wohali
Copy link
Member

wohali commented Oct 31, 2018

@ufobat do you have everything you need to write a PR?

@ufobat
Copy link
Contributor Author

ufobat commented Oct 31, 2018

@wohali I think I do, thank you :-)

@ufobat
Copy link
Contributor Author

ufobat commented Nov 12, 2018

I am not sure if and where i should mention the _design/_replicator document. any idea?

ufobat added a commit to ufobat/couchdb-documentation that referenced this issue Nov 12, 2018
ufobat added a commit to ufobat/couchdb-documentation that referenced this issue Nov 12, 2018
@wohali wohali closed this as completed in 6a29484 Nov 21, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants