Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PLAT-926] Add employment and institution information to user serializer #8546

Merged
merged 23 commits into from
Jul 31, 2018

Conversation

Johnetordoff
Copy link
Contributor

Purpose

As part of Emberization we're exposing everything to our V2 API and that includes employment and institution information. This infomation sadly lacks self-esteem and must be constantly validated, so here we are.

Changes

  • changes field types to JSONAPIListField
  • add extensive validation to ensure the JSON structure is not violated.
  • add equally extensive tests

QA Notes

Issue different PUT requests against the user endpoint with various payloads and permissions. Make sure error messages/info sent back is correct.

Documentation

When this is approved and the behavior is confirmed I'll write up dev docs.

Side Effects

None that I know of.

Ticket

https://openscience.atlassian.net/projects/PLAT/issues/PLAT-926

@coveralls
Copy link

coveralls commented Jul 13, 2018

Pull Request Test Coverage Report for Build 30530

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at ?%

Totals Coverage Status
Change from base Build 30529: 0.0%
Covered Lines:
Relevant Lines: 0

💛 - Coveralls

Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thorough tests! I would use jsonschema for validation, which will take care of a lot of this for you. This involves quite a bit of changes to your validation and probably tests.

)
user.social[key] = val[0]

def _validate_user_json(self, info_list, required_fields):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend using jsonschema for most of this validation, instead of individually type checking each field. We use this for validating form information for draft registrations. I doubt jsonschema can validate for everything that you want, for example, end date must be greater that start date, but it works well for required fields, field types, etc. You can also define field dependencies (if endMonth is present, endYear must be present, for example).

You would do something like:

  • import jsonschema
  • define your schema, started below:
  • jsonschema.validate(validated_data, schema)

Here's a start on the schema for education data, doesn't have everything:

{
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "degree": {
        "type": "string",
      },
      "startYear": {
        "type": "integer",
      },
      "startMonth": {
        "type": "integer"
      },
      "endMonth": {
        "type": "integer"
      },
      "endYear": {
        "type": "integer"
      },
      "ongoing": {
        "type": "boolean"
      },
      "department": {
        "type": "string"
      },
      "institution": {
        "type": "string"
      }
    },
    "required": [
        "startYear",
        "startMonth",
        "ongoing",
        "department",
        "institution",
        "degree"
    ],
    "dependencies": {
        "endMonth": ["endYear"],
        "endYear": ["endMonth"]
     }
  }
}

@Johnetordoff Johnetordoff force-pushed the jobs-and-schools-v2 branch 2 times, most recently from a87ef3e to 844a624 Compare July 17, 2018 16:20
Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job adding jsonschema! Just a few remaining comments, mostly around cleanup.

@@ -128,6 +134,44 @@ def profile_image_url(self, user):
size = self.context['request'].query_params.get('profile_image_size')
return user.profile_image_url(size=size)

def _update_social(self, social_dict, user):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was pulled from the first part of the update method? doesn't look like it's being used.

employment = JSONAPIListField(required=False, source='jobs')
employment_required_keys = {'department', 'institution', 'ongoing', 'startMonth', 'startYear', 'title'}
education = JSONAPIListField(required=False, source='schools')
education_required_keys = {'department', 'institution', 'ongoing', 'startMonth', 'startYear', 'degree'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove employment_required_keys and education_required_keys since we're using jsonschema.

"endMonth": {
"type": "integer",
"minimum": 1,
"maximum": 12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}]

def test_user_put_schools_401(self, app, user_one, url_user_one):
# Tests to make sure institution is not empty string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have this comment in a lot of places where it's not accurate.

raise InvalidModelValueError("For '{}' the field value {}".format(e.path[-1], e.message))

raise InvalidModelValueError(e.message)
except jsonschema.SchemaError as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catching of the different schema errors.

},
}
}, expect_errors=True)
assert res.status_code == 401
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test_user_put_schools_403/test_user_put_jobs_403 test for users trying to edit someone else's profile info?

@pattisdr
Copy link
Contributor

Looks great @Johnetordoff

Copy link
Contributor

@caseyrollins caseyrollins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent use of jsonschema and good test coverage here, @Johnetordoff!

It looks like we could still DRY up the tests quite a bit by creating a mixin since the only difference between most of the jobs and schools tests are the payloads. Something like:

class UserProfileMixin(object):

    @pytest.fixture()
    def request_payload(self):
        raise NotImplementedError

    # define other fixtures 

    def test_user_put_profile_info_200(...):
        ...
      
    def test_user_put_profile_info_401(...):
        ...

    def test_user_put_profile_info_403(...):
        ...

    def test_user_put_profile_info_validation(...):
        ...

    def test_user_put_profile_info_date_validation(...):
        ....


class TestUserProfileJobs(UserProfileMixin):

    @pytest.fixture()
    def request_payload(self):
        ...

class TestUserProfileSchools(UserProfileMixin):

    @pytest.fixture()
    def request_payload(self):
        ...

The advantage here is that shared tests can be added in a single place (the mixin) instead of duplicated for both jobs and schools. Also if a test needs to be changed, it only needs to be changed in one place instead of several.

"title"
],
"additionalProperties": false,
"dependencies": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be dependencies for startMonth and startYear here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they are not necessary because startMonth and startYear are both always required no matter what, so they're not really dependencies of each other.

"degree"
],
"additionalProperties": false,
"dependencies": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be dependencies for startMonth and startYear here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they are not necessary because startMonth and startYear are both always required no matter what, so they're not really dependencies of each other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that doesn't match current functionality. I'm able to add an education entry to my profile without a start date.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check these dependencies with functionality on production? Start year and end year can be entered and saved without start month and end month, but months can't be saved without years. I also think there should be a "endYear": "startYear" dependency here because an end year can't be saved unless there's a start year.

'type': 'users',
'attributes': {
'full_name': user_one.fullname,
'education': [{'degree': '',

This comment was marked as resolved.

},
}
}, auth=user_one.auth, expect_errors=True)
print res.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rogue print

'type': 'users',
'attributes': {
'full_name': user_one.fullname,
'education': [{'degree': '',

This comment was marked as resolved.

This comment was marked as resolved.

@@ -142,6 +172,13 @@ def update(self, instance, validated_data):
detail='{} only accept a list of one single value'. format(key)
)
instance.social[key] = val[0]
elif 'schools' == attr:
self._validate_user_json(value, 'education-schema.json')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to reduce the amount of logic that happens in this method (and better separate logic), you could implement field level validators for schools and jobs (each method would just call self._validate_user_json(...)).

See @erinspace's work in #8551 for an example of this.


here = os.path.split(os.path.abspath(__file__))[0]

def from_json(fname):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the use of the helper function. One nitpick: it's a bit of a misnomer. From the name, I'd expect from_json to take a JSON string as input, but it actually takes a filepath. Perhaps load_jsonschema would be more appropriate?

Also see my other comment about co-locating all the jsonschema-related code.

'type': 'users',
'attributes': {
'full_name': user_one.fullname,
'education': [{'degree': '',

This comment was marked as resolved.

Copy link
Contributor

@caseyrollins caseyrollins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small changes, namely a few more dependencies need to be added to the jsonschemas.

"degree"
],
"additionalProperties": false,
"dependencies": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that doesn't match current functionality. I'm able to add an education entry to my profile without a start date.

@@ -142,6 +180,10 @@ def update(self, instance, validated_data):
detail='{} only accept a list of one single value'. format(key)
)
instance.social[key] = val[0]
elif 'schools' == attr:
instance.schools = value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these assignment statements are necessary anymore. The fields will be set by the setattr(instance, attr, value) call below.

request_payload['data']['attributes'][request_key] = [{}]
res = app.put_json_api(user_one_url, request_payload, auth=user_one.auth, expect_errors=True)
assert res.status_code == 400
assert res.json['errors'][0]['detail'] == "u'startYear' is a required property"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned above -- this doesn't match current functionality.

 into jobs-and-schools-v2

* 'develop' of https://github.com/CenterForOpenScience/osf.io: (87 commits)
  Bump version and update changelog
  fix tests for non-dict based client attributes
  Update gitlab serializers for fangorn config js
  Add enforce_csrf waffle switch so that this can be merged earlier
  Update markdown-it-video with infinite loop fix
  Fix indentation
  Improve tests by using user cookie method and faster user creation
  Tests to ensure API requests with Session auth fail without CSRF protection
  Travis and Test Speed ups
  Rename --quick to --all
  Don't transact to avoid deadlocks
  Add script to fix poor mapping logic -h/t @caseyrollins
  Add same enforce_csrf method as drf's SessionAuthentication
  replace bad reverse migration with warning
  Use timezone.now() when checking for failed registrations
  Remove duplicate psyarxiv custom subject.
  Use fork of celery 4.2.1 with celery/celery#4839 merged
  Upgrade celery and kombu
  Set TIME_ZONE in settings
  add scheme to match UI
  ...
Copy link
Contributor

@caseyrollins caseyrollins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just a few more gotchas with the dependencies and validation.

Would you mind also adding a test to confirm that unexpected fields are ignored? i.e. if someone attempts to send something like this:

{
	'pizza': 'pepperoni',
	'title': '',
	'startYear': 1991,
	'startMonth': 9,
	'endYear': 1992,
	'endMonth': 9,
	'ongoing': False,
	'department': '',
	'institution': 'Fake U'
}

then we should throw out the pepperoni pizza.

jsonschema.validate(value, from_json(json_schema))
except jsonschema.ValidationError as e:
if len(e.path) > 1:
raise InvalidModelValueError("For '{}' the field value {}".format(e.path[-1], e.message))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you post an example of the existing error and why reformatting is necessary/beneficial?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing message does not have the name of the field erroring in it. For example if institution field is an empty string, the error is just to '' is too short. So as you can see the new message indicates the erroring field and what is happening to it.

},
"required": [
"ongoing",
"department",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should department and degree be in this list? I'm able to add an education entry to my profile with only "Institution" set.

"degree"
],
"additionalProperties": false,
"dependencies": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check these dependencies with functionality on production? Start year and end year can be entered and saved without start month and end month, but months can't be saved without years. I also think there should be a "endYear": "startYear" dependency here because an end year can't be saved unless there's a start year.

},
"required": [
"ongoing",
"department",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above: should department and title be in this list?

def _validate_dates(self, info):
for history in info:

startDate = datetime.date(history['startYear'], history['startMonth'], 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed above: startMonth and endMonth aren't required so these lines will throw KeyErrors

Signed-off-by: John Tordoff <john@cos.io>
@@ -148,13 +148,16 @@ def _validate_user_json(self, value, json_schema):
def _validate_dates(self, info):
for history in info:

startDate = datetime.date(history['startYear'], history['startMonth'], 1)
if history.get('startYear'):
startDate = datetime.date(history['startYear'], history['startMonth'], 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be

if history.get('startYear'):
	startDate = datetime.date(history['startYear'], history.get('startMonth'), 1)

to prevent KeyErrors.

if (endDate - startDate).days <= 0:
raise InvalidModelValueError(detail='End date must be greater than or equal to the start date.')
elif history.get('endYear') or history.get('endMonth'):
raise InvalidModelValueError(detail='Ongoing positions cannot have end dates.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe adding the following to the json schema will validate this for you:

"dependencies": {
	"endMonth": ["endYear"],
	"startMonth": ["startYear"],
    "endYear": ["startYear"]
},
"anyOf": [
  {
    "not": {
      "properties": {
        "ongoing": { "enum": [true] }
      },
      "required": ["ongoing"]
    }
  },
  { "required": ["startYear"] }
],
"anyOf": [
  {
    "not": {
      "properties": {
        "ongoing": { "enum": [false] }
      },
      "required": ["ongoing"]
    }
  },
  { "required": ["endYear"] }
],
"not": {
	"properties": {
    	"ongoing": {"enum": [true]}
    },
    "required": ["endYear"]
}

The "not" will ensure that "endYear" is not included if "ongoing" is true.

The "anyOf" will ensure that "startYear" is required if "ongoing" is true and that "endYear" is included if "ongoing" is false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to add this to the schema because:
A) It's hard for an outside observer to immediately know what this is for and I can't comment inside a JSON file to make this clearer.
B) The error message this produces isn't really human readable, getting back to the current message is likely going to include a lot of catching and re-throwing.

But there's definitely a benefit to always relying using the schema, so 🤷‍♂️ .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I definitely agree that the error messages are way less than ideal. My opinion is that using the schema for validation is a lot nicer than writing our own, and we can document this in a more human-readable format in the dev docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@caseyrollins caseyrollins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like each schema just needs an "endYear": "startYear" dependency and this will be RTM.

# End dates, but no start dates
res = app.put_json_api(user_one_url, end_dates_no_start_dates_payload, auth=user_one.auth)
user_one.reload()
assert res.status_code == 200
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The frontend doesn't currently allow this:
screen shot 2018-07-30 at 2 37 44 pm

@@ -128,6 +132,40 @@ def profile_image_url(self, user):
size = self.context['request'].query_params.get('profile_image_size')
return user.profile_image_url(size=size)

def _validate_user_json(self, value, json_schema):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure these should be instance methods--there's no usage of self. How about co-locating all the jsonschema validation functions in your new api/users/schemas.py module?

@@ -0,0 +1,62 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we put these in a directory?

api/users/schemas
├── education-schema.json
├── ....schema.json
└── utils.py  # contains the functions for using the schemas -- see previous comments

@@ -38,7 +38,7 @@ def validate_year(item):
except ValueError:
raise ValidationValueError('Please enter a valid year.')
else:
if len(item) != 4:
if type(item) == str and len(item) != 4:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there's multiple string types in Python (and the names differ between Python 2 and 3), it's best to do this when checking against a string:

from django.utils.six import string_types

isinstance(thing, string_types)

Copy link
Contributor

@sloria sloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits mostly re: code organization. Once those are addressed, this is good to merge.

@sloria sloria merged commit 92c8afb into CenterForOpenScience:develop Jul 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants