Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Replication KVO doesn't seem to work after calling "restart" on the replication. #164

Closed
tleyden opened this Issue · 22 comments

3 participants

@tleyden
Owner

Here's what I'm doing:

1) Kick off replication with:

    NSArray* repls = [self.database replicateWithURL:[self couchDBURL] exclusively:YES];
    serverFilterAndViewName = @"filters/filterByAllAssociates";
    [self.pull setFilter:serverFilterAndViewName];
    [self.pull restart];  

Replication KVO seems to work well so far..

2) Update the filter and restart the replication:

    serverFilterAndViewName = @"filters/filterByCurrentRetailerPlusAllAssociates";
    [self.pull setFilter:serverFilterAndViewName];
    NSString *retailerId = [[user retailerObject] identifier];
    NSDictionary *queryParams = [NSDictionary dictionaryWithObject:retailerId forKey:@"retailer"];
    [self.pull setQuery_params:queryParams];
    [self.pull restart];

Replication KVO seems to stop working now. (just the KVO, not the replication itself..)

I captured a full log of the app run with all tracing turned on: http://pastebin.ca/2206430

The new filter is added and the filter restarted when this log statement happens:

2012-09-21 11:54:31.229 Associate[17829:c07] Using pull replication filter: filters/filterByCurrentRetailerPlusAllAssociates

After this, I expected to receive KVO notifications with replication in the "active" state (which will cause "SGAssociateDashboardViewController#CouchDB sync mode changed" to print out along with replication state)

But the actual behavior was that it received a few KVO notifications, but both were in the "idle state". However I could see replication happening while looking at it in an HTTP proxy. And looking at the logs it seems like there are two replications,.. one idle and one active, but I'm not sure if I'm reading the logs correctly.

Version being tested: CouchCocoa commit: af9732 TouchDB commit: a4da9c (stable branch)

@snej
Owner

Looks like the CouchPersistentReplication is no longer recognizing itself in the server's _activity list — did it lose its _replication_id? After the document update that restarts it, the only activity changes it posts are

CouchPersistentReplication[731E..966A]: state := (null)
CouchPersistentReplication[731E..966A] = (null)

which implies it didn't see itself in the _activity list.

@tleyden
Owner

So you are saying the second element in the activity list:

2012-09-21 11:54:39.873 Associate[17829:c07] CouchServer: activeTasks = (
        {
        continuous = 1;
        source = signature;
        .... etc
    },
        {
        continuous = 1;
        progress = 0;
        source = "https://app6748255.heroku.cloudant.com/signature";
        status = "Processed 0 / 29 changes";
        target = signature;
        task = "9D817EE8-5C68-471E-8672-4610E8745D99";
        type = Replication;
        "x_active_requests" =         (
                        {
                URL = "https://app6748255.heroku.cloudant.com/signature/_all_docs?include_docs=true";
                method = POST;
            }
            ......... etc
        );
    }
)

is the object CouchPersistentReplication[731E..966A]?

What is the difference between the _replication_id and the task id? (eg, 9D81...5D99)

@tleyden
Owner

OK I think I see the answer to my last question in CouchPersistentReplication#observeValueForKeyPath. Looks like essentially it stores the _replication_id of the replication task as the "task" field in the server.activeTasks array.

@snej
Owner

Right — what's really going on is that a TDReplicator has a UUID, and that UUID is exposed in the _active_tasks dictionary and also as the _replication_id in the replication document.

The thing I don't understand is why the CouchPersistentReplication object can't find itself in the activeTasks anymore. Its replication ID hasn't changed.

@tleyden
Owner

OK I think I found the issue .. the CouchPersistentReplication is removing itself as KVO observer of the CouchServer activeTasks array.

I added the following debug statements to CouchPersistentReplication#setState:

- (void) setState:(CouchReplicationState)state {
    // Add/remove myself as an observer of the server's activeTasks:
    CouchServer* server = self.database.server;
    if (state == kReplicationTriggered) {
        if (_state != kReplicationTriggered) {
            COUCHLOG(@"%@: addObserver forKeyPath: activeTasks", self);

            [server addObserver: self forKeyPath: @"activeTasks"
                        options:0 context: NULL];
        }
    } else {
        if (_state == kReplicationTriggered) {
            COUCHLOG(@"%@: removeObserver forKeyPath: activeTasks", self);

            [server removeObserver: self forKeyPath: @"activeTasks"];
            [self setStatusString: nil];
        }
    }
    _state = state;
}

and when I re-ran the test, saw this debugging output:

2012-09-21 15:58:24.523 Associate[61903:c07] CouchPersistentReplication[271C..66A6]: state := (null)
2012-09-21 15:58:24.523 Associate[61903:c07] CouchPersistentReplication[271C..66A6]: removeObserver forKeyPath: activeTasks
2012-09-21 15:58:24.523 Associate[61903:c07] CouchPersistentReplication[271C..66A6] = (null)

and after that, it no longer has its observeValueForKeyPath callback called, and therefore does not update itself with changes happening in activeTasks.

@snej
Owner

the CouchPersistentReplication is removing itself as KVO observer of the CouchServer activeTasks array

Oh yeah, that would do it. Duh. I just didn't see that this would happen indirectly as a side effect of the state changing. The question now is why the other side of that method, where it calls addObserver:, isn't being called. Looks like the state property isn't changing back? Could you investigate?

@tleyden
Owner

Yeah looking into it.

@tleyden
Owner

Looks like setState is never called again after this:

2012-09-21 16:28:49.539 Associate[65366:c07] CouchPersistentReplication[DDBF..6299]: state := (null)
2012-09-21 16:28:49.540 Associate[65366:c07] CouchPersistentReplication[DDBF..6299]: setState called with: 0.  current state: 1
2012-09-21 16:28:49.540 Associate[65366:c07] CouchPersistentReplication[DDBF..6299]: removeObserver forKeyPath: activeTasks
2012-09-21 16:28:49.540 Associate[65366:c07] CouchPersistentReplication[DDBF..6299] = (null)

So next.. I'll try to figure out what is calling setState and why it's not being called. Any hints?

Have to run, will try to pick this up later..

@snej
Owner

setState gets called from -didLoadFromDocument. The idea is that when there's a change to the document, it looks at the _replication_state property and maps that into a CouchReplicationState enum.

So it makes sense that deleting the _replication_state property, which is what -restart does, would trigger the state changing to idle. But it should then immediately thereafter change back to triggered because the ReplicatorManager in TouchDB will set the _replication_state again when it starts the new replication. (Or at least it should...)

@tleyden
Owner

OK makes perfect sense. Digging into it.

@tleyden
Owner

Pretty sure I've pinpointed the problem: in CouchPersistentReplication#didLoadFromDocument calling

[self getValueOfProperty: @"_replication_state"] 

seems to be giving stale values.

Evidence of that.. here is where the ReplicationManager sets the state to completed as part of the restart sequence:

10:25:58.770| Sync: <TDReplicatorManager: 0xaea7dd0> updateDoc:forReplicator called
10:25:58.770| Sync: <TDReplicatorManager: 0xaea7dd0> state is: completed
2012-09-22 10:25:58.770 Associate[18369:c07] unmonitorCouchReplicationStatus
10:25:58.770| Sync: <TDReplicatorManager: 0xaea7dd0> set _replication_state to: completed
10:25:58.771| Sync: ReplicatorManager: Updating {C723C123-86E2-49E0-A05E-842C922ACBA6 #3-ef64c4d2786f5ecbf7cd66677fe7acf9} with {
    "_replication_id" = "EDA28372-E636-49EE-996B-48ACC0A8295B";
    "_replication_state" = completed;
    "_replication_state_time" = 1348334758;
}

but when the CouchPersistentReplication's didLoadDocument method is called back, here's what happens:

2012-09-22 10:25:58.853 Associate[18369:c07] CouchPersistentReplication[C723..CBA6] External change (rev=4-a3fc64bff311d50a0f5b9358edcf9bf8)
2012-09-22 10:25:58.854 Associate[18369:c07] CouchPersistentReplication[C723..CBA6]: didLoadFromDocument
2012-09-22 10:25:58.854 Associate[18369:c07] CouchPersistentReplication[C723..CBA6]: stateStr: (null)
2012-09-22 10:25:58.854 Associate[18369:c07] CouchPersistentReplication[C723..CBA6]: state == NSNotFound, state = kReplicationIdle
2012-09-22 10:25:58.855 Associate[18369:c07] CouchPersistentReplication[C723..CBA6]: self.document: CouchDocument[C723..CBA6]
2012-09-22 10:25:58.860 Associate[18369:c07] CouchPersistentReplication[C723..CBA6]: self.document.properties: {
    "_id" = "C723C123-86E2-49E0-A05E-842C922ACBA6";
    "_replication_id" = "EDA28372-E636-49EE-996B-48ACC0A8295B";
    "_replication_state" = completed;
    "_replication_state_time" = 1348334758;
    "_rev" = "4-a3fc64bff311d50a0f5b9358edcf9bf8";
    continuous = 1;
    filter = "filters/filterByCurrentRetailerPlusAllAssociates";
    "query_params" =     {
        retailer = "043FA364-9CF6-4FEA-A11F-CB565B7D1375";
    };
    source = "https://app6748255.heroku.cloudant.com/signature";
    target = signature;
}

Even though the underlying document has _replication_state == completed, the call to:

[self getValueOfProperty: @"_replication_state"] 

is returning null, which is what is causing the issue from what I can tell. Later when the ReplicationManager updates the state to Idle, the same thing happens and so the CouchPersistentReplication object does not see the new state, and addObserver is never called.

NOTE: I added this log line to print out the stateStr: COUCHLOG(@"%@: stateStr: %@", self, stateStr); directly after the call to getValueOfProperty

The full logs for that run is here: http://filebin.ca/GbvRThD4wfS

I tried the following workaround in CouchPersistentReplication#didLoadFromDocument:

// Bug workaround attempt: https://github.com/couchbaselabs/TouchDB-iOS/issues/164
// NSString* stateStr = [self getValueOfProperty: @"_replication_state"];
NSString* stateStr = [[[self document] properties] objectForKey:@"_replication_state"];

and it seems to have fixed the original issue I was seeing. (Logs after workaround: http://filebin.ca/Gc1G5d8AHTr)

The real fix is probably related to figuring out why CouchModel#getValueOfProperty is giving stale values from its underlying document.

@snej
Owner

The real fix is probably related to figuring out why CouchModel#getValueOfProperty is giving stale values

CouchModel caches property values, but the cache is invalidated when -couchDocumentChanged: is called (by the document when it's notified of a change via the _changes feed.)

I think the problem has to do with the timing of the -observeValueForKeyPath: call that's asking for the _replication_state. It may be occurring before the model gets the -couchDocumentChanged: notification. Could you look into that?

@tleyden
Owner

Sure.

Actually that sounds likely given the following observation I neglected to mention earlier: the bug doesn't manifest itself every single time. So it looks like there might be some sort of race condition involved.

@tleyden
Owner

I looked over the logs and there is a clear pattern.

When the bug is observed, the logs happen in this order:

2012-09-22 10:25:58.853 Associate[18369:c07] CouchPersistentReplication[C723..CBA6] External change (rev=4-a3fc64bff311d50a0f5b9358edcf9bf8)

2012-09-22 10:25:58.867 Associate[18369:c07] CouchPersistentReplication[C723..CBA6] External change (rev=5-82c6e712a7863ff094903ac7bb4a72f4)

2012-09-22 10:25:58.926 Associate[18369:c07] CouchPersistentReplication[BE10..0BF4]: observeValueForKeyPath called, keyPath: activeTasks

OTOH when the bug is not observed:

2012-09-22 10:20:06.876 Associate[17260:c07] CouchPersistentReplication[A619..4D8B]: observeValueForKeyPath called, keyPath: activeTasks

2012-09-22 10:20:06.882 Associate[17260:c07] CouchPersistentReplication[CA4A..F4C4] External change (rev=4-d9e122ba96bf7154b43324b3a548ea48)

2012-09-22 10:20:06.892 Associate[17260:c07] CouchPersistentReplication[CA4A..F4C4] External change (rev=5-3111f56aec35b05c4bac57093c6f1f1c)

(I checked four log files, two of each case, and it was consistent)

I think this is the opposite of what you expected right? I double checked these, and it looks accurate.

I'm a little confused though, why does the observeValueForKeyPath even come into play? To me it seems like the bug is related to CouchPersistentReplication#didLoadFromDocument and CouchModel#couchDocumentChanged. Eg, in CouchPersistentReplication#didLoadFromDocument

[self getValueOfProperty: @"_replication_state"]

is returning a different value than

[[[self document] properties] objectForKey:@"_replication_state"];

which seems like there is a bug in the way the cache is being invalidated. Maybe somewhere in CouchModel#couchDocumentChanged, it's not invalidating the right values in the cache? Actually this code block in CouchModel#couchDocumentChanged is confusing to me..

// Remove unchanged cached values in _properties:
if (_changedNames && _properties) {
    NSMutableSet* removeKeys = [NSMutableSet setWithArray: [_properties allKeys]];
    [removeKeys minusSet: _changedNames];
    [_properties removeObjectsForKeys: removeKeys.allObjects];
} else {
    [_properties release];
    _properties = nil;
}

shouldn't it be preserving unchanged values and removing (invalidating) values that have changed?

@tleyden
Owner

Not sure if this has anything to do with anything, but I noticed that when I call:

[[self class] propertyNames]

from within CouchPersistentReplication, it does not contain _replication_state

@snej
Owner

it does not contain _replication_state

That's correct, because it's not declared as an @property. It's still available by accessing the document properties, though.

@tleyden
Owner

Just an update on this issue .. we are using the workaround on our fork which has been working out great.

I did go ahead and try to retest the latest versions of TouchDB (e0daa4a) and CouchCocoa (896a259) and was still able to reproduce the issue.

Side note: saw the issue while replicating with both Iris and Cloudant.

@tleyden
Owner

I believe I've narrowed down the issue further.

I tried creating a unit test, but didLoadFromDocument is not getting called back after the save operation, I guess maybe some of the change tracker machinery not setup properly. Planning to post to the google group with a question about that.

Here's what seems to be going on:

When the replication is restarted, the CouchPersistentReplication calls

[self setValue: nil ofProperty: @"_replication_state"];

In the logs, this happens here:

2012-10-18 07:45:57.162 Associate[47089:c07] self=CouchPersistentReplication[F44B..0210] cacheValue:ofProperty: called with property: _replication_state changed: 1
2012-10-18 07:45:57.164 Associate[47089:c07] self=CouchPersistentReplication[F44B..0210] backtrace: (
0   Associate                           0x001defd5 -[CouchModel cacheValue:ofProperty:changed:] + 229
1   Associate                           0x001df794 -[CouchModel setValue:ofProperty:] + 420
2   Associate                           0x001e2c5d -[CouchPersistentReplication restartWithRetries:] + 125

Then save is called here:

2012-10-18 07:45:57.166 Associate[47089:c07] CouchPersistentReplication[F44B..0210] Save called _changedNames: {(
"_replication_state",
filter,
"query_params"
)}

The save is then aborted for some reason:

2012-10-18 07:45:57.261 Associate[47089:c07] CouchPersistentReplication[F44B..0210] !_needsSave || (!_changedNames && !_changedAttachments, aborting save

Then save is called again, and aborted again!

2012-10-18 07:45:57.261 Associate[47089:c07] CouchPersistentReplication[F44B..0210] Save called _changedNames: {(
"_replication_state",
filter,
"query_params"
)}
2012-10-18 07:45:57.261 Associate[47089:c07] CouchPersistentReplication[F44B..0210] !_needsSave || (!_changedNames && !_changedAttachments, aborting save

Then later.. [CouchPersistentReplication didLoadFromDocument] is called back:

2012-10-18 07:45:57.451 Associate[47089:c07] self=CouchPersistentReplication[F44B..0210] backtrace: (
0   Associate                           0x001df567 -[CouchModel getValueOfProperty:] + 903
1   Associate                           0x001e2623 -[CouchPersistentReplication didLoadFromDocument] + 291
2   Associate                           0x001dd70a -[CouchModel couchDocumentChanged:] + 1322
3   Associate                           0x001bea76 -[CouchDocument notifyChanged:] + 774
4   Associate                           0x001d5d68 -[CouchDatabase changeTrackerReceivedChange:] + 568
5   Associate                           0x001d60b6 -[CouchDatabase processDeferredChanges] + 406
6   Associate                           0x001d4d83 -[CouchDatabase endDocumentOperation:] + 451
7   Associate                           0x001bf07a -[CouchDocument operation:willCompleteWithError:] + 874
8   Associate                           0x001ccd87 -[RESTOperation completedWithError:] + 919

The code in [CouchModel getValueOfProperty:] is called and returns null:

2012-10-18 07:45:57.450 Associate[47089:c07] self=CouchPersistentReplication[F44B..0210] value || [_changedNames containsObject: property].  value: (null), _changedNames: {(
"_replication_state",
filter,
"query_params"
)} _document.properties: {
"_id" = "F44BDAEF-6F3E-48B4-94C2-5E959C870210";
"_replication_id" = "E07C9630-B0B4-4183-9D67-79EF2906BE74";
"_replication_state" = completed;
"_replication_state_time" = 1350571557;
"_rev" = "4-6e8cd4063fceb2a559bf748e4f79e23e";
continuous = 1;
filter = "filters/filterByCurrentRetailerPlusAllAssociates";
"query_params" =     {
    retailer = "043FA364-9CF6-4FEA-A11F-CB565B7D1375";
};
source = "http://127.0.0.1:5984/signature";
target = signature;
}
2012-10-18 07:45:57.451 Associate[47089:c07] self=CouchPersistentReplication[F44B..0210] backtrace: [snip..]
2012-10-18 07:45:57.452 Associate[47089:c07] self=CouchPersistentReplication[F44B..0210] returning value: (null)
2012-10-18 07:45:57.452 Associate[47089:c07] CouchPersistentReplication[F44B..0210]: stateStr: (null)

My interpretation of what's happening in the call to [CouchModel getValueOfProperty:]:

  • Tries to look for the property in [self _properties], apparently does not find it there

  • Since _changedNames containsObject: _replication_state, it stops looking, rather than looking for it in the document properties via [_document propertyForKey: @"_replication_state"]

  • Returns null for that property (erroneously)

Possible Fix: since _replication_state is not even a model property, should it even be added to _changedNames? Maybe the fix is to just add a check so that before adding something to _changedNames, make sure it is a valid property of the model, otherwise don't add it. Any thoughts?

Full logs here

@tleyden
Owner

Forgot to mention a few things:

@snej
Owner

The save is then aborted for some reason:

That sounds like the bug. Could you step into that -setValue:ofProperty: call and see what's going on? It should call into -cacheValue:ofProperty:, which will initialize _changedNames, and then -markNeedsSave, which will set _needsSave. If those get set then the -save call will work.

Just had a thought: what might be happening is that after this change but before the autosave occurs, TouchDB may update the document with progress info. That would force CouchModel to reload it. CouchModel doesn't have any conflict handling yet, so I think the unsaved changes will get lost. Let me know if that's what's happening.

@tleyden
Owner

It should call into -cacheValue:ofProperty:, which will initialize _changedNames, and then -markNeedsSave, which will set _needsSave. If those get set then the -save call will work.

From the logs it is definitely seems to be calling -cacheValue:ofProperty:

2012-10-18 07:45:57.162 Associate[47089:c07] self=CouchPersistentReplication[F44B..0210] cacheValue:ofProperty: called with property: _replication_state changed: 1
2012-10-18 07:45:57.164 Associate[47089:c07] self=CouchPersistentReplication[F44B..0210] backtrace: (
0   Associate                           0x001defd5 -[CouchModel cacheValue:ofProperty:changed:] + 229
1   Associate                           0x001df794 -[CouchModel setValue:ofProperty:] + 420
2   Associate                           0x001e2c5d -[CouchPersistentReplication restartWithRetries:] + 125

so I'm going to assume _needsSave is being set.

Just had a thought: what might be happening is that after this change but before the autosave occurs, TouchDB may update the document with progress info. That would force CouchModel to reload it. CouchModel doesn't have any conflict handling yet, so I think the unsaved changes will get lost. Let me know if that's what's happening.

What's the best way to check whether this is happening or not?

@tleyden
Owner

Just had a thought: what might be happening is that after this change but before the autosave occurs, TouchDB may update the document with progress info.

Looks like there is a debug println statement whenever CouchPersistentReplication#didLoadFromDocument is called, and I checked the logs and verified that this does not appear after:

2012-10-18 07:45:57.166 Associate[47089:c07] CouchPersistentReplication[F44B..0210] Save called _changedNames: {(
"_replication_state",
filter, 
"query_params"
)}

but before the save is aborted here:

2012-10-18 07:45:57.261 Associate[47089:c07] CouchPersistentReplication[F44B..0210] !_needsSave || (!_changedNames && !_changedAttachments, aborting save

It does appear right after the save is aborted though:

2012-10-18 07:45:57.261 Associate[47089:c07] CouchPersistentReplication[F44B..0210] !_needsSave || (!_changedNames && !_changedAttachments, aborting save
2012-10-18 07:45:57.261 Associate[47089:c07] <<<<<< 0 docs being updated
2012-10-18 07:45:57.262 Associate[47089:c07] **** CHANGE #6: CouchDocument[F44B..0210]  3-b1e587845950ba67298d21da58141a21 -> 4-6e8cd4063fceb2a559bf748e4f79e23e
2012-10-18 07:45:57.262 Associate[47089:c07] CouchPersistentReplication[F44B..0210] External change (rev=4-6e8cd4063fceb2a559bf748e4f79e23e)
2012-10-18 07:45:57.263 Associate[47089:c07] CouchPersistentReplication[F44B..0210]: didLoadFromDocument

which seems a little odd.

@jchris jchris closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.