Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional validation on updating social fields via API v2 [PLAT-925] #8551

Merged

Conversation

erinspace
Copy link
Member

@erinspace erinspace commented Jul 17, 2018

Purpose

One could update social fields via API v2 after the social fields were added. This adds additional validation using jsonschema, and allows fields that aren't currently allowed for in the model to be sent and ignored.

This updates the format a bit, migrates old data to match that format, and makes sure that payloads to set the data conform to that spec.

Changes

  • add jsonschema document for the social fields
  • use this to validate data being sent to the API
  • add a validate method on the serializer to process the data
  • Social fields that can be lists are now stored as lists, and those that are singular are stored as strings

QA Notes

This would only require API testing, and shouldn't be very different from what exists already anyhow! Should be all good, and will be tested by Embosf when they use this feature for the social page!

Documentation

CenterForOpenScience/developer.osf.io#23

Side Effects

None anticipated

Ticket

https://openscience.atlassian.net/browse/PLAT-925

Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome @erinspace

@@ -0,0 +1,108 @@
# -*- coding: utf-8 -*-

def get_user_social_jsonschema():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice schema 👍

@@ -128,20 +130,31 @@ def profile_image_url(self, user):
size = self.context['request'].query_params.get('profile_image_size')
return user.profile_image_url(size=size)

def validate_social(self, value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot you could do this! .validate_<field_name>

@@ -0,0 +1,108 @@
# -*- coding: utf-8 -*-

def get_user_social_jsonschema():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erinspace, @Johnetordoff's PR adds validation schemas for employment and education information and implements a from_json method for loading the correct schema. Would you mind mimicking that approach in this PR?

@erinspace erinspace force-pushed the feature/social_api_updates branch 2 times, most recently from 1ec8719 to 4830724 Compare July 26, 2018 16:26

social_dict = {}
for key, val in value.iteritems():
# Ignore fields that are not specified in the current social fields list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by: this shouldn't be necessary if you add

"additionalProperties": false

to the json schema (although still a good idea to have tests for it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for history's sake, when I wrote this the requirement was "Ignore attempts to update social fields that are not in [the list of fields]" which is what this was doing. Requirement has changed so will now error with extra fields instead of ignoring.

Copy link
Contributor

@caseyrollins caseyrollins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me; passing off to @sloria to review the data migration. Would you mind updating the docstring around the social field in users.py so devs will know the (new) correct structure?

@sloria
Copy link
Contributor

sloria commented Aug 1, 2018

@erinspace Can you get this up to date with develop and repoint the PR pls?

@erinspace erinspace changed the base branch from release/next-platform to develop August 1, 2018 21:04
@erinspace
Copy link
Member Author

@sloria, up-to-dated and re-pointed!

Copy link
Contributor

@sloria sloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass finished.

@@ -0,0 +1,73 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schemas are now located in api/users/schemas.


users_updated = 0
for user in users_with_social:
old_social = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Isn't this "new_social"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naw, cause it's in the reset_social_fields method, so it's putting it back to the old social field!

for user in users_with_social:
old_social = {}
for key, value in user.social.iteritems():
if key in SOCIAL_LIST_FIELDS and key != 'profileWebsites':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: why include SOCIAL_LIST_FIELDS in the first place? Is it because profileWebsites is a social list field and you wanted to be complete? Alternatively, you could rename the constant to FIELDS_TO_MIGRATE and omit profileWebsites. Not a big deal, though.

old_social = {}
for key, value in user.social.iteritems():
if key in SOCIAL_LIST_FIELDS and key != 'profileWebsites':
old_social[key] = value[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration should error if it finds a value that contains more than one item, so we don't accidentally remove data.


here = os.path.split(os.path.abspath(__file__))[0]

def from_json(fname):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was moved to a directory. I'm getting an ImportError when initializing the app. I think this file just needs to be removed.

WaterbutlerLink, ShowIfCurrentUser
)
from api.base.utils import absolute_reverse, get_user_auth, waterbutler_api_url_for
from api.files.serializers import QuickFilesSerializer
from osf.exceptions import ValidationValueError, ValidationError
from osf.models import OSFUser, QuickFilesNode
from api.users.schemas import from_json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be api.users.schemas.utils.

Copy link
Contributor

@sloria sloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass finished. Mostly looked at the migration.

There is a significant number (> 10K) of users that have the github, linkedIn, or twitter keys in social. Therefore, the migration will need to be optimized in order to prevent downtime.

Look into doing an update in order to bulk-update (using the bulk update strategy we've used in the past).

Turns out Django has built-in support for querying on JSON fields, via KeyTextTransform

from django.contrib.postgres.fields.jsonb import KeyTextTransform

(
    OSFUser.objects.filter(social__has_any_keys=['twitter'])
    .annotate(twitter=KeyTextTransform('twitter', 'social'))
    .annotate(twitter_length=Length('twitter'))...
)

This should enable you to update the users in one or a few queries.

for user in users_with_social:
new_social = {}
for key, value in user.social.iteritems():
if key in SOCIAL_LIST_FIELDS:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be FIELDS_TO_MIGRATE.

def update_social_fields(state, schema):
OSFUser = state.get_model('osf', 'osfuser')
# If a user has no social info, it is stored as an empty dict
users_with_social = OSFUser.objects.exclude(social={})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use social__has_any_keys=FIELDS_TO_MIGRATE here.

new_social[key] = [value]
else:
new_social[key] = value
user.social = new_social
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous comment got lost...

The migration should error if it finds a value that contains more than one item, so we don't accidentally remove data.

users_updated = 0
for user in users_with_social:
new_social = {}
for key, value in user.social.iteritems():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to iterate over FIELDS_TO_MIGRATE instead of over all the items in the dictionary.

new_social = {}
for key, value in user.social.iteritems():
if key in SOCIAL_LIST_FIELDS:
new_social[key] = [value]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The majority of OSFUser records with a set social field have empty strings for their unset social fields, in which case this will be incorrect. Empty strings should be updated to empty lists, right?

]


BATCH_SIZE = 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by: Are syntax checks passing (inv syntax)? This variable appears to be unused.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, perhaps uppercased constants don't count as unused variables because it's assumed that they may be imported by other modules.

In any case, I think it can be removed.

Copy link
Contributor

@sloria sloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration runs in 13s locally. Nice! 🎉

python manage.py migrate  2.55s user 1.22s system 27% cpu 13.665 total

The frontend, however needs to be updated to account for the new format.

image

Also, this is a breaking change, which means you'll need to increment the API version. Can you also verify that old versions still return the old format?

@erinspace erinspace force-pushed the feature/social_api_updates branch 2 times, most recently from 0081b47 to cc00be0 Compare August 28, 2018 14:31
@@ -34,6 +35,27 @@ def to_representation(self, value):
return relationship_links


class SocialField(ser.DictField):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't sure about this approach -- it's not really a deprecated field, as the field is still there, just returns data in a slightly different format depending on the version, which I couldn't find another example of.

show_old_format = request and is_deprecated(request.version, self.min_version)
if show_old_format:
new_value = {}
for key, socialvalue in value.iteritems():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by: It would be more straight-forward to loop through old_social_string_field rather than the entire social dictionary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make sure to make a copy of the social dictionary for the return value (rather than mutating it in-place.

@@ -741,6 +744,17 @@ $.extend(SocialViewModel.prototype, SerializeMixin.prototype, TrackedMixin.proto

SocialViewModel.prototype.serialize = function() {
var serializedData = ko.toJS(this);
// Do some extra work to store these as arrays in the DB
var arrayInDBKeys = ['twitter', 'linkedIn', 'github'];
$.each(serializedData || {}, function(key, value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: more straightforward to loop over the arrayInDBKeys than the whole input dictionary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a nit: we no longer need to use $.each, since--as I learned yesterday--we're not supporting browsers older than IE 11, which means Array.prototype.forEach is AOK.

@@ -753,11 +767,15 @@ SocialViewModel.prototype.serialize = function() {
SocialViewModel.prototype.unserialize = function(data) {
var self = this;
var websiteValue = [];
// Do some extra work to display these as strings for now
var arrayInDBKeys = ['twitter', 'linkedIn', 'github'];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than define this twice, better to define this as a constant.

@erinspace erinspace force-pushed the feature/social_api_updates branch 3 times, most recently from ead3120 to 872125e Compare August 28, 2018 19:18
def to_representation(self, value):
old_social_string_fields = ['twitter', 'github', 'linkedIn']
request = self.context.get('request')
show_old_format = request and is_deprecated(request.version, self.min_version) and request.method == 'GET'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we only show the old format for GET requests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! So this was because I wasn't exactly sure if versioning was for other operations other than get, and by extension didn't know if we should support patches/puts for data that was in the legacy format. If we indeed are going to support a patch to the social field in the old format, as well as the new format if they're using a current version of the API, that will be a little more work as the jsonschema isn't set up to validate based on version of the API.

Lemmie know and I can make that happen!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right--non-GET requests were disallowed in older versions. So maybe the request.method == 'GET' is just redundant? Can we remove it?

Copy link
Member Author

@erinspace erinspace Aug 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I added the request.method check was because updates actually secretly were allowed in earlier versions -- this feature was actually present already and this PR was just intended to clean it up/make it work the way that's expected.

This was added primarily to make the tests in this class pass while still not specifying the version of the API, cause they were using version 2.0. I figured since this was a stealth feature before anyhow, it'd be ok to require "proper form" for PUT/POST requests, while still returning data in the way that's expected in older versions for GET requests.

@@ -739,8 +742,19 @@ var SocialViewModel = function(urls, modes, preventUnsaved) {
SocialViewModel.prototype = Object.create(BaseViewModel.prototype);
$.extend(SocialViewModel.prototype, SerializeMixin.prototype, TrackedMixin.prototype);

const arrayInDBKeys = ['twitter', 'linkedIn', 'github'];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We're not using Babel to transpile ES2015, so we've just been using var (though it's not a huge deal, since const is supported in IE11). We also use UPPER_SNAKE_CASE for constants.

Copy link
Contributor

@sloria sloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit and a question. When those are addressed, this will be good to go to staging3.

[#PLAT-925]

- add a jsonschema validating schema
- use that along with other validator methods
…from a file

- Use this method in the validate_social method
- use DictField instead of ListDictField that coerced everything to lists
- certain fields are now strings, others, lists, see [PLAT-925] for details
- update tests to reflect change
- Bring up to date with develop
- Don't lose data on a reverse migration if more than one field is included in a new list field
…ings

This frontend will be redone eventually, but since this is just an API change and not changing the frontend just yet, keep things as they were
- Increment max allowed API version
- Make new social field that will display social information in the old format if the version requested is less than the max version
- Add test to make sure requests for older version show certain social fields in the dict as strings insread of lists
@sloria
Copy link
Contributor

sloria commented Sep 17, 2018

This is headed to staging3.

@sloria sloria merged commit b011d89 into CenterForOpenScience:develop Sep 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants