-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional validation on updating social fields via API v2 [PLAT-925] #8551
Additional validation on updating social fields via API v2 [PLAT-925] #8551
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome @erinspace
api/users/schemas.py
Outdated
@@ -0,0 +1,108 @@ | |||
# -*- coding: utf-8 -*- | |||
|
|||
def get_user_social_jsonschema(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice schema 👍
api/users/serializers.py
Outdated
@@ -128,20 +130,31 @@ def profile_image_url(self, user): | |||
size = self.context['request'].query_params.get('profile_image_size') | |||
return user.profile_image_url(size=size) | |||
|
|||
def validate_social(self, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot you could do this! .validate_<field_name>
api/users/schemas.py
Outdated
@@ -0,0 +1,108 @@ | |||
# -*- coding: utf-8 -*- | |||
|
|||
def get_user_social_jsonschema(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erinspace, @Johnetordoff's PR adds validation schemas for employment and education information and implements a from_json
method for loading the correct schema. Would you mind mimicking that approach in this PR?
1ec8719
to
4830724
Compare
api/users/serializers.py
Outdated
|
||
social_dict = {} | ||
for key, val in value.iteritems(): | ||
# Ignore fields that are not specified in the current social fields list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by: this shouldn't be necessary if you add
"additionalProperties": false
to the json schema (although still a good idea to have tests for it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for history's sake, when I wrote this the requirement was "Ignore attempts to update social fields that are not in [the list of fields]" which is what this was doing. Requirement has changed so will now error with extra fields instead of ignoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me; passing off to @sloria to review the data migration. Would you mind updating the docstring around the social field in users.py so devs will know the (new) correct structure?
f75a1d9
to
a87108b
Compare
@erinspace Can you get this up to date with develop and repoint the PR pls? |
a87108b
to
f626ceb
Compare
@sloria, up-to-dated and re-pointed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass finished.
api/users/social-schema.json
Outdated
@@ -0,0 +1,73 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schemas are now located in api/users/schemas
.
|
||
users_updated = 0 | ||
for user in users_with_social: | ||
old_social = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Isn't this "new_social"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naw, cause it's in the reset_social_fields
method, so it's putting it back to the old social field!
for user in users_with_social: | ||
old_social = {} | ||
for key, value in user.social.iteritems(): | ||
if key in SOCIAL_LIST_FIELDS and key != 'profileWebsites': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: why include SOCIAL_LIST_FIELDS in the first place? Is it because profileWebsites
is a social list field and you wanted to be complete? Alternatively, you could rename the constant to FIELDS_TO_MIGRATE
and omit profileWebsites
. Not a big deal, though.
old_social = {} | ||
for key, value in user.social.iteritems(): | ||
if key in SOCIAL_LIST_FIELDS and key != 'profileWebsites': | ||
old_social[key] = value[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The migration should error if it finds a value that contains more than one item, so we don't accidentally remove data.
api/users/schemas.py
Outdated
|
||
here = os.path.split(os.path.abspath(__file__))[0] | ||
|
||
def from_json(fname): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file was moved to a directory. I'm getting an ImportError
when initializing the app. I think this file just needs to be removed.
api/users/serializers.py
Outdated
WaterbutlerLink, ShowIfCurrentUser | ||
) | ||
from api.base.utils import absolute_reverse, get_user_auth, waterbutler_api_url_for | ||
from api.files.serializers import QuickFilesSerializer | ||
from osf.exceptions import ValidationValueError, ValidationError | ||
from osf.models import OSFUser, QuickFilesNode | ||
from api.users.schemas import from_json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be api.users.schemas.utils
.
f626ceb
to
ca827c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass finished. Mostly looked at the migration.
There is a significant number (> 10K) of users that have the github
, linkedIn
, or twitter
keys in social
. Therefore, the migration will need to be optimized in order to prevent downtime.
Look into doing an update
in order to bulk-update (using the bulk update strategy we've used in the past).
Turns out Django has built-in support for querying on JSON fields, via KeyTextTransform
from django.contrib.postgres.fields.jsonb import KeyTextTransform
(
OSFUser.objects.filter(social__has_any_keys=['twitter'])
.annotate(twitter=KeyTextTransform('twitter', 'social'))
.annotate(twitter_length=Length('twitter'))...
)
This should enable you to update the users in one or a few queries.
for user in users_with_social: | ||
new_social = {} | ||
for key, value in user.social.iteritems(): | ||
if key in SOCIAL_LIST_FIELDS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be FIELDS_TO_MIGRATE
.
def update_social_fields(state, schema): | ||
OSFUser = state.get_model('osf', 'osfuser') | ||
# If a user has no social info, it is stored as an empty dict | ||
users_with_social = OSFUser.objects.exclude(social={}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use social__has_any_keys=FIELDS_TO_MIGRATE
here.
new_social[key] = [value] | ||
else: | ||
new_social[key] = value | ||
user.social = new_social |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous comment got lost...
The migration should error if it finds a value that contains more than one item, so we don't accidentally remove data.
users_updated = 0 | ||
for user in users_with_social: | ||
new_social = {} | ||
for key, value in user.social.iteritems(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to iterate over FIELDS_TO_MIGRATE
instead of over all the items in the dictionary.
new_social = {} | ||
for key, value in user.social.iteritems(): | ||
if key in SOCIAL_LIST_FIELDS: | ||
new_social[key] = [value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The majority of OSFUser records with a set social field have empty strings for their unset social fields, in which case this will be incorrect. Empty strings should be updated to empty lists, right?
a9b11e8
to
756a901
Compare
] | ||
|
||
|
||
BATCH_SIZE = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by: Are syntax checks passing (inv syntax
)? This variable appears to be unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, perhaps uppercased constants don't count as unused variables because it's assumed that they may be imported by other modules.
In any case, I think it can be removed.
756a901
to
17b5c4d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migration runs in 13s locally. Nice! 🎉
python manage.py migrate 2.55s user 1.22s system 27% cpu 13.665 total
The frontend, however needs to be updated to account for the new format.
Also, this is a breaking change, which means you'll need to increment the API version. Can you also verify that old versions still return the old format?
0081b47
to
cc00be0
Compare
@@ -34,6 +35,27 @@ def to_representation(self, value): | |||
return relationship_links | |||
|
|||
|
|||
class SocialField(ser.DictField): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't sure about this approach -- it's not really a deprecated field, as the field is still there, just returns data in a slightly different format depending on the version, which I couldn't find another example of.
api/users/serializers.py
Outdated
show_old_format = request and is_deprecated(request.version, self.min_version) | ||
if show_old_format: | ||
new_value = {} | ||
for key, socialvalue in value.iteritems(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by: It would be more straight-forward to loop through old_social_string_field
rather than the entire social
dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just make sure to make a copy of the social dictionary for the return value (rather than mutating it in-place.
website/static/js/profile.js
Outdated
@@ -741,6 +744,17 @@ $.extend(SocialViewModel.prototype, SerializeMixin.prototype, TrackedMixin.proto | |||
|
|||
SocialViewModel.prototype.serialize = function() { | |||
var serializedData = ko.toJS(this); | |||
// Do some extra work to store these as arrays in the DB | |||
var arrayInDBKeys = ['twitter', 'linkedIn', 'github']; | |||
$.each(serializedData || {}, function(key, value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: more straightforward to loop over the arrayInDBKeys than the whole input dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, a nit: we no longer need to use $.each
, since--as I learned yesterday--we're not supporting browsers older than IE 11, which means Array.prototype.forEach
is AOK.
website/static/js/profile.js
Outdated
@@ -753,11 +767,15 @@ SocialViewModel.prototype.serialize = function() { | |||
SocialViewModel.prototype.unserialize = function(data) { | |||
var self = this; | |||
var websiteValue = []; | |||
// Do some extra work to display these as strings for now | |||
var arrayInDBKeys = ['twitter', 'linkedIn', 'github']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than define this twice, better to define this as a constant.
ead3120
to
872125e
Compare
def to_representation(self, value): | ||
old_social_string_fields = ['twitter', 'github', 'linkedIn'] | ||
request = self.context.get('request') | ||
show_old_format = request and is_deprecated(request.version, self.min_version) and request.method == 'GET' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we only show the old format for GET
requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! So this was because I wasn't exactly sure if versioning was for other operations other than get, and by extension didn't know if we should support patches/puts for data that was in the legacy format. If we indeed are going to support a patch to the social field in the old format, as well as the new format if they're using a current version of the API, that will be a little more work as the jsonschema isn't set up to validate based on version of the API.
Lemmie know and I can make that happen!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right--non-GET requests were disallowed in older versions. So maybe the request.method == 'GET'
is just redundant? Can we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I added the request.method check was because updates actually secretly were allowed in earlier versions -- this feature was actually present already and this PR was just intended to clean it up/make it work the way that's expected.
This was added primarily to make the tests in this class pass while still not specifying the version of the API, cause they were using version 2.0. I figured since this was a stealth feature before anyhow, it'd be ok to require "proper form" for PUT
/POST
requests, while still returning data in the way that's expected in older versions for GET
requests.
website/static/js/profile.js
Outdated
@@ -739,8 +742,19 @@ var SocialViewModel = function(urls, modes, preventUnsaved) { | |||
SocialViewModel.prototype = Object.create(BaseViewModel.prototype); | |||
$.extend(SocialViewModel.prototype, SerializeMixin.prototype, TrackedMixin.prototype); | |||
|
|||
const arrayInDBKeys = ['twitter', 'linkedIn', 'github']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We're not using Babel to transpile ES2015, so we've just been using var
(though it's not a huge deal, since const
is supported in IE11). We also use UPPER_SNAKE_CASE for constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit and a question. When those are addressed, this will be good to go to staging3.
[#PLAT-925] - add a jsonschema validating schema - use that along with other validator methods
…from a file - Use this method in the validate_social method
- use DictField instead of ListDictField that coerced everything to lists - certain fields are now strings, others, lists, see [PLAT-925] for details - update tests to reflect change
- Bring up to date with develop - Don't lose data on a reverse migration if more than one field is included in a new list field
…ings This frontend will be redone eventually, but since this is just an API change and not changing the frontend just yet, keep things as they were
- Increment max allowed API version - Make new social field that will display social information in the old format if the version requested is less than the max version - Add test to make sure requests for older version show certain social fields in the dict as strings insread of lists
72e9c62
to
b011d89
Compare
This is headed to staging3. |
Purpose
One could update social fields via API v2 after the social fields were added. This adds additional validation using jsonschema, and
allows fields that aren't currently allowed for in the model to be sent and ignored.This updates the format a bit, migrates old data to match that format, and makes sure that payloads to set the data conform to that spec.
Changes
QA Notes
This would only require API testing, and shouldn't be very different from what exists already anyhow! Should be all good, and will be tested by Embosf when they use this feature for the social page!
Documentation
CenterForOpenScience/developer.osf.io#23
Side Effects
None anticipated
Ticket
https://openscience.atlassian.net/browse/PLAT-925