Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the user data export archive format. #6726

Merged
merged 10 commits into from Aug 10, 2017

Conversation

Projects
None yet
7 participants
@cmrd-senya
Copy link
Member

commented Feb 29, 2016

This PR introduces changes to the user data export archive format.

Names of some parameters are changed to be less diaspora-specific. This was originally proposed by @jaywink in order to implement a common data format for The Federation at whole. Also the commit adds the archive format json schema. ATM it is used in automatic tests only, but in future it will also be used to validate incoming archives.

fixes #6138
fixes #6780

Gemfile Outdated
@@ -19,6 +19,7 @@ gem "diaspora_federation-rails", "0.0.12"
gem "acts_as_api", "0.4.2"
gem "json", "1.8.3"
gem "json-schema", "2.6.0"
gem "json-schema-rspec", "0.0.4"

This comment has been minimized.

Copy link
@denschub

denschub Feb 29, 2016

Member

That gem is not needed in production.

@@ -0,0 +1,203 @@
{

This comment has been minimized.

Copy link
@denschub

denschub Feb 29, 2016

Member

I wonder if we want to have the schemas inside the assets folder, since it's technically not a frontend asset. Do you have a better idea?

This comment has been minimized.

Copy link
@cmrd-senya

cmrd-senya Feb 29, 2016

Author Member

How about app/schemas then?

"id": "https://the-federation.info/specs/backup-restore#archive_format/user/posts/131/public",
"type": "boolean"
},
"thefederation_uid": {

This comment has been minimized.

Copy link
@denschub

denschub Feb 29, 2016

Member

Maybe not for discussion here, but I find that name rather weird. I'd prefer something more neutral like postIdentifier or similar. What do you think?

"id": "https://the-federation.info/specs/backup-restore#archive_format/user/contacts/185/receiving",
"type": "boolean"
},
"person_thefederation_uid": {

This comment has been minimized.

Copy link
@denschub

denschub Feb 29, 2016

Member

Maybe not for discussion here, but I find that name rather weird. I'd prefer something more neutral like accountIdentifier or similar. What do you think?

This comment has been minimized.

Copy link
@cmrd-senya

cmrd-senya Feb 29, 2016

Author Member

I think "account identifier" is fine. But the snake_case is used in this document, so it probably should be account_identifier or account_id.

This comment has been minimized.

Copy link
@denschub

denschub Feb 29, 2016

Member

Agreed. Was just brianstorming there on the go.

@denschub

This comment has been minimized.

@cmrd-senya cmrd-senya force-pushed the cmrd-senya:archive-schema branch 2 times, most recently from 20a240b to c2380f6 Feb 29, 2016

@cmrd-senya

This comment has been minimized.

Copy link
Member Author

commented Mar 3, 2016

I would like to link #6138 which I've just noticed because it's related to the export archive.

I haven't checked it closely, but probably it's a good idea to take into account the #6138 in that PR as well. Also because the point of this PR is to make data export more platform agnostic and the issue is from @redmatrix.

@cmrd-senya

This comment has been minimized.

Copy link
Member Author

commented Mar 3, 2016

BTW, the tests were fixed after rebasing on the newer develop.

@denschub

This comment has been minimized.

Copy link
Member

commented Mar 3, 2016

Actually, the tests were failing because of the leap day... I'll submit a PR to fix this very edgy edge case soon and maybe even write a blog post about it, since that was pretty funny...

@cmrd-senya cmrd-senya changed the title Update the user data export archive format. [WIP] Update the user data export archive format. Mar 4, 2016

@cmrd-senya cmrd-senya force-pushed the cmrd-senya:archive-schema branch from c2380f6 to c22d214 Mar 6, 2016

@cmrd-senya

This comment has been minimized.

Copy link
Member Author

commented Mar 6, 2016

c22d214 fixes #6138.

Maybe add more tests?

@cmrd-senya cmrd-senya changed the title [WIP] Update the user data export archive format. Update the user data export archive format. Mar 6, 2016

@@ -0,0 +1,222 @@
{
"$schema": "http://json-schema.org/draft-04/schema#",
"id": "https://the-federation.info/specs/backup-restore#archive_format",

This comment has been minimized.

Copy link
@denschub

denschub Apr 21, 2016

Member

So, what's the status of this document? Is it even maintained? I see a lot of opened issues in the gitlab and nobody seems to care.

This comment has been minimized.

Copy link
@cmrd-senya

cmrd-senya May 24, 2016

Author Member

This document is likely to be discontinued and the ID should be changed. But I'm not sure how to make a new one.

This comment has been minimized.

Copy link
@jhass

jhass May 24, 2016

Member

One should host the exact document at that URL. We can host stuff on diaspora.software, perhaps a hostname pointing to some Github pages repo or just on the diasporafoundation.org server.

This comment has been minimized.

Copy link
@cmrd-senya

cmrd-senya May 24, 2016

Author Member

What document is expected to be at the URL? Does it have to be something official? I'm writing something like an implementation plan&details for the account migration. But that though related doesn't specify the exact format of the archive (but rather relies on the schema). Do I need to have a separate document defining the archive format? To me it seemed, that I could avoid having a document for the archive specification since there is a schema (it's kinda self-documenting).

This comment has been minimized.

Copy link
@jhass

jhass May 24, 2016

Member

The very exact schema file referencing it.

This comment has been minimized.

Copy link
@cmrd-senya

cmrd-senya May 24, 2016

Author Member

Ah, I see, thanks for clarification

@@ -17,8 +17,13 @@ class UserSerializer < ActiveModel::Serializer
has_many :comments, each_serializer: Export::CommentSerializer

def comments
object.person.comments
Comment

This comment has been minimized.

Copy link
@denschub

denschub Apr 21, 2016

Member

Why should we export of other users if they've been made to the export users posts?

This comment has been minimized.

Copy link
@cmrd-senya

cmrd-senya Apr 21, 2016

Author Member

I guess it should be discussed at #6138

This comment has been minimized.

Copy link
@denschub

denschub May 6, 2016

Member

Agreed, so please don't change it here unless we're sure what we want to do. This would export other peoples comments below limited posts as well and I don't feel comfortable with exporting them unless it's clear that everybody wants that. Maybe we should move this onto Loomio since I really feel like this could violate other users data safety.

@denschub

This comment has been minimized.

Copy link
Member

commented May 6, 2016

I'm somewhat worried about the lifetime/maintenance of the spec you're using and you did not comment on that one so far. Also, please notice my latest comment. Other than that, it looks nice!

Sorry for the long delay, but I've been incredibly busy. :)

@cmrd-senya

This comment has been minimized.

Copy link
Member Author

commented May 10, 2016

Yep, I still have to update the spec to some more complete form. I guess, #6138 must be decided before, since it affects the data format as well.

@cmrd-senya cmrd-senya changed the title Update the user data export archive format. [WIP] Update the user data export archive format. May 25, 2016

@cmrd-senya cmrd-senya force-pushed the cmrd-senya:archive-schema branch from c22d214 to bc9b1dd May 27, 2016

@@ -6,22 +6,23 @@ module Diaspora

class Exporter

SERIALIZED_VERSION = '1.0'
SERIALIZED_VERSION = "1.1"

This comment has been minimized.

Copy link
@diaspora-code-review

diaspora-code-review May 27, 2016

Freeze mutable objects assigned to constants.

@cmrd-senya cmrd-senya force-pushed the cmrd-senya:archive-schema branch from 76eb76e to 8cb02ce Jan 7, 2017

@cmrd-senya cmrd-senya force-pushed the cmrd-senya:archive-schema branch from 8cb02ce to bafc0e6 Apr 2, 2017

@cmrd-senya cmrd-senya force-pushed the cmrd-senya:archive-schema branch from bafc0e6 to 72133a0 Apr 13, 2017

@cmrd-senya cmrd-senya force-pushed the cmrd-senya:archive-schema branch 2 times, most recently from 8453730 to 1f04adb Apr 24, 2017

cmrd-senya added some commits Apr 2, 2017

New Exporter::OthersRelayables class
This class implements methods that allow to query relayables (comments, likes, participations,
poll_participations) of other people for posts of the given person.
Refactor account deletion spec
This commit refactors account deletion spec by moving data creation to a helper
object DataGenerator.

@cmrd-senya cmrd-senya force-pushed the cmrd-senya:archive-schema branch from 2a97a49 to a07f282 Aug 9, 2017

@cmrd-senya

This comment has been minimized.

Copy link
Member Author

commented Aug 9, 2017

I squashed and rebased everything that was done before your notes. For your latest notes I added changes as fixups so you could check what I changed exactly.

@SuperTux88

This comment has been minimized.

Copy link
Member

commented Aug 9, 2017

fixups look good 👍

One little thing: I try to start the commit message with a capital letter, but at least do it the same for every commit (New Exporter::OthersRelayables class and next commit new Exporter::NonContactAuthors class). So since you need to rebase again anyway, can you please change this commit message and also write Bump with a capital letter.

I'll do a complete review now again, because I only viewed the diffs until now, but you can already rebase and squash, so I can merge it afterwards (I expect that everything will be fine ;) )

cmrd-senya added some commits Apr 2, 2017

New Exporter::NonContactAuthors class
This class is capable of quering a list of people from authors of given posts that are non-contacts of a given
user.

@cmrd-senya cmrd-senya force-pushed the cmrd-senya:archive-schema branch from a07f282 to 9be0581 Aug 9, 2017

@cmrd-senya

This comment has been minimized.

Copy link
Member Author

commented Aug 9, 2017

Done

@@ -1,12 +1,28 @@
module Export
class ContactSerializer < ActiveModel::Serializer
attributes :sharing,
:receiving,
:following,

This comment has been minimized.

Copy link
@SuperTux88

SuperTux88 Aug 9, 2017

Member

I just realized, that you renamed that, probably because it is also named following in the federation gem. But these are different things:

  • sharing: I share with that person and the person can view my private profile, send me private messages, ...
  • following: I have this persons posts in my stream (currently always true for diaspora, the plan with adding sharing and following to the contact entity in the protocol was, that you can only follow the posts, without sharing your information, or you maybe want to share with the person, but this person posts much, so you don't want that in your stream)
  • receiving: The other person shares with me (other direction of sharing)

So the diaspora DB currently only has sharing and receiving.

I think we should rename following back to receiving here. And we can add a new following that's hardcoded true here (because for data currently exported it is true).

We can also add a subscribed that is always true, for the other direction of following (so when the other person subscribed to us). We can then replace the hardcoded true with the correct value when the diaspora DB supports following and subscribed.

And we should definitely document that somewhere, what sharing, following, receiving and subscribed means.

This comment has been minimized.

Copy link
@cmrd-senya

cmrd-senya Aug 9, 2017

Author Member

Do you think we should federate this subscribed info? What user see in his stream feels like a local setting, rather than network-wise. So sharing/receiving is a representation of exchange relationships in federation, while following is just a setting of composing the stream. We can include following since user will want to preserve this through migration, while subsribed is something meaningful only for the remote person, not for the local one. Right?

This comment has been minimized.

Copy link
@SuperTux88

SuperTux88 Aug 9, 2017

Member

No, when you can follow without sharing (for example follow a newsfeed, but don't want to share the profile to the feed), then you need the information federated. So when following in the contact message is true, you need to send all (public) posts to this user. That's why we currently federate always true for following.

So in the export we need sharing and following for the users setting of the contact, and receiving and subscribed for if we are allowed to display the profile, and send private messages, and if we need to send new posts to that contact.

So we should export sharing and receiving from the database, and export following with the same value as sharing and subscribed with the same value as receiving (instead of always true what I wrote above, because it's wrong). That represents the way how diaspora currently works and also what we currently federate with the contact message.

This comment has been minimized.

Copy link
@cmrd-senya

cmrd-senya Aug 9, 2017

Author Member

Okay, so in terms of the federation:
sharing - I send limited posts and profile to the contact
following - I request contact to send me public posts
receiving - Contact sends me limited posts and profile (when subscribed is implemented that is only needed for the green check mark on profile view page actually)
subscribed - I send public posts to the contact

Actually this vocabulary feels quite confusing. It's not clear who is following who and who is subscribed for what out of this words. following and subscribed are synonyms and if we just swap these two names it won't look like anything changed at all.

How about using the word followed instead of subscribed? It's better because:

  1. It's obvious that it's an opposite of following this way
  2. It gives grammatical hint: "I am following" -> I read public posts. "I am followed" -> I send public posts

So then we have a simple relation:
sharing - I send private messages
receiving - I receive private messages
followed - I send public messages (I am followed)
following - I receive public message

This comment has been minimized.

Copy link
@SuperTux88

SuperTux88 Aug 9, 2017

Member

Yes, that makes sense 👍 (nevertheless we need to document that somewhere, but that's out of scope for this PR).

(for federation it's clear, because it's only what the sender does, but for export we need to export both sides, so following and followed makes it clearer :) )

This comment has been minimized.

Copy link
@SuperTux88

SuperTux88 Aug 10, 2017

Member

What just came to my mind: We shouldn't trust receiving when importing (so that is something we need to remember when adding the import functionality), because otherwise it is a privacy issue, when somebody adds contacts with receiving true to their export. But we still need it in the export (so the new format is OK), because we need to send the account_migration message to all contacts, even when only receiving is true.

For the import, the contact should resend a contact message when receiving the account_migration message, so we can recreate the contact with receiving = true. That's something that can already be added to #6750 at the end of AccountMigration.perform!: If new person is remote, then send new contact for all users who are sharing with the old person.

This comment has been minimized.

Copy link
@cmrd-senya

cmrd-senya Aug 11, 2017

Author Member

Could you please decribe in more details what is the issue with privacy when someone sets receiving to true before import? I didn't quite get it.

This comment has been minimized.

Copy link
@SuperTux88

SuperTux88 Aug 11, 2017

Member

When one sets receiving to true, one could see the private profile of the person (when it's already on the pod for another contact), even when the person never shared the profile with the importing user. And the only way we can be sure one is allowed to see the private profile of a person is, when we ask the person, and not trust the person who says "I'm allowed to see that, trust me!"

So we need it in the export, so we can send the account_migration to them, but we should import it with receiving set to false until the person sends us a contact message again, as response to the account_migration message.

This comment has been minimized.

Copy link
@cmrd-senya

cmrd-senya Aug 11, 2017

Author Member

Yes, actually we need some mechanism to sync data between pods, since it is currenlty possible to have non-synchronized contact states accross pods. If one pod misses Contact message and then sends Contact message itself for the same users , then this pod will see only sharing=true with receiving=false, while actually it is receiving=true also. What do you think, maybe we should implement fetching for Contacts instead? So that it was possible to update Contact state on demand.

Also imagine that we import old archive, and Contact state has changed in the federation. Then it is possible that we have sharing=true, receiving=false in the archive, when the remote pod has sharing=false, receiving=false. Just setting sharing to true would mean that we make the state non-synchronized. So generally we should request all the contacts for the migrating user, not only for receiving=true. But also it means that we can't just create them as is in case of mismatch. Because if user has deleted the contact but imports the old archive it doesn't mean he wants to create the contact again. So we should make some UI, that will report the user that there are some mismatches between you and the remote contact and let the user decide what to do: add contact or remove it again (and resend Contact if the state has changed).

Also, since it is possible, that this Contact state mismatch happens during the normal operation, not only the migration, we allow that procedure to be called on demand (at least by a podmin). Therefore I think that fetching of Contacts from the remote pod for this purposes is a better way than using push model.

This comment has been minimized.

Copy link
@SuperTux88

SuperTux88 Aug 11, 2017

Member

Fetching would need access control, currently it's only possible to fetch public stuff in the protocol. So that's totally out of scope here (and for 0.7.0.0).

We can ask the user which contacts to add again on import, that would fix changed sharing states since the backup was created. We can then send new contact messages for all contacts from the archive with sharing/following to true or false, depending on what the user selected. But that's something for when the import is done, not now.

The only thing we need to do now (for 0.7.0.0) is send contact messages again after receiving a account_migration for all contacts still sharing=true with the migrated user to update the receiving=true on the imported users contact. That should be done as part of #6750.

:person_first_name,
:person_diaspora_handle
:account_id,
:serialized_public_key

This comment has been minimized.

Copy link
@SuperTux88

SuperTux88 Aug 9, 2017

Member

Oh, another little thing: You renamed serialized_private_key to private_key for the users own private key. So I think this should be renamed to just public_key.

class PersonMetadataSerializer < ActiveModel::Serializer
attributes :guid,
:account_id,
:serialized_public_key

This comment has been minimized.

Copy link
@SuperTux88

SuperTux88 Aug 9, 2017

Member

And here too: public_key

@SuperTux88

This comment has been minimized.

Copy link
Member

commented Aug 9, 2017

Looks good 👍 I found still two things that I missed when only reviewing the single commits. But they are only a naming issues with the export-format (which should be fixed before "releasing" the 2.0 format), so that should be fixable quick :) I commented it as single comments (so you don't need to wait until I completed the whole PR, but I'm finished now), so you probably already saw them.

And I promise that it's ready to merge this time after you fixed these two (three) things :)

@cmrd-senya

This comment has been minimized.

Copy link
Member Author

commented Aug 9, 2017

I added a diff for your notes. I didn't add a "descirption" since I don't have time for that right now. Also I didn't run all the tests locally, so maybe I missed something.

@SuperTux88

This comment has been minimized.

Copy link
Member

commented Aug 9, 2017

Thanks 👍 The diff looks good, so I approved the PR. Let's see what travis says (travis is currently busy anyway :/ ), so if it's green, I'll merge it. (If you want, you can squash it, if not, I'll do that before merging).

And we can do the documentation later, that's not a blocker for 0.7.0.0 :)

cmrd-senya added some commits Apr 24, 2017

Update the user data export archive format.
This commit introduces changes to the user data export archive format.
This extends data set which is included in the archive. This data can be
then imported to other pods when this feature is implemented.

Also the commit adds the archive format json schema. ATM it is used in
automatic tests only, but in future it will also be used to validate
incoming archives.
Move misc_spec.rb to spec/spec
spec/spec is a new folder for "tests for tests"
Add a little cuke test for profile export
That should increase test coverage percentage
Fix Person.in_aspects scope multiple return
Fix Person.in_aspects scope to return each person only once when the
person is in multiple aspects.

@cmrd-senya cmrd-senya force-pushed the cmrd-senya:archive-schema branch from e2742b7 to 9bcdc90 Aug 10, 2017

@cmrd-senya

This comment has been minimized.

Copy link
Member Author

commented Aug 10, 2017

Squashed!

@SuperTux88 SuperTux88 merged commit 9bcdc90 into diaspora:develop Aug 10, 2017

1 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
coverage/coveralls Coverage increased (+0.9%) to 85.826%
Details

@SuperTux88 SuperTux88 removed this from Waiting on contributor in Pull Requests Aug 10, 2017

SuperTux88 added a commit that referenced this pull request Aug 10, 2017

Merge pull request #6726 from cmrd-senya/archive-schema
Update the user data export archive format.
@SuperTux88

This comment has been minimized.

Copy link
Member

commented Aug 10, 2017

Merged, thanks for your work! ❤️

@cmrd-senya cmrd-senya deleted the cmrd-senya:archive-schema branch Aug 10, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.