Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert profile attributes to JSON #599

Merged
merged 1 commit into from Oct 26, 2016
Merged

Convert profile attributes to JSON #599

merged 1 commit into from Oct 26, 2016

Conversation

pkarman
Copy link
Contributor

@pkarman pkarman commented Oct 22, 2016

This is part 2 of #498

Please review and merge #598 first to reduce the changes here.

Why: All PII must be encrypted at rest.
Storing attributes in a single JSON blob makes this much easier.

How:

This change does not implement any actual encryption. That will follow
in a separate change.

This change includes:

  • schema modifications
  • user flow for collecting passphrase at IdV review
  • internal flow for decrypt PII at login for use in session

@pkarman pkarman mentioned this pull request Oct 22, 2016
@pkarman pkarman changed the title Convert profile attributes to encrypted JSON Convert profile attributes to JSON Oct 22, 2016
@@ -0,0 +1,4 @@
module Pii
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class will be exercised once #498 is complete

@@ -0,0 +1,21 @@
module Pii
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is not encryption per se. Just Base64 encoding. The class name is a placeholder for the concept of encrypting. Actual encryption will happen in #498

expect(FeatureManagement.use_kms?).to eq(true)
end

xit 'throws exception when attempting to fetch PII signing key' do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KeyMaker class is not present in this PR but will be in #498

@monfresh
Copy link
Contributor

598 has been merged. Wanna rebase?

@pkarman pkarman force-pushed the encrypt-pii-part2 branch 2 times, most recently from 851f92a to 795e9d9 Compare October 22, 2016 21:05
@pkarman
Copy link
Contributor Author

pkarman commented Oct 22, 2016

thanks @monfresh -- rebased

@pkarman
Copy link
Contributor Author

pkarman commented Oct 24, 2016

@monfresh @jessieay you reviewed this when it was part of #498. Please take another look. I addressed all the comments you left on that other PR.

#498 (comment)
#498 (comment)

self.user = user
self.service_provider = service_provider
self.authn_request = authn_request
@decrypted_pii = decrypted_pii
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed in many other places we prefix instance variables with _, so this would be @_decrypted_pii -- should we be doing this here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used the leading underscore for memoized private instance variables that should not be accessed directly anywhere else in the code except for via the memoized method. That doesn't apply here, since we do want to use the instance var elsewhere. I can make it a private accessor if that would be clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Private accessor sounds good to me, I didn't see the difference between memoized instance variables vs otherwise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 51dd5df

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we convert the other initialized variables into instance variables for consistency?

@@ -68,5 +76,13 @@ def submit_applicant
idv_attempter.increment
resolution
end

def password
params.require(:user)[:password]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be either .require(:password)[:password] or .require(user: :password)[:user][:password] ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I tried a half dozen different combinations to get what I wanted, which is:

  • do not raise an exception if the param is missing
  • do return nil if the param is missing

We want the error to come from the valid_password? method, not the controller param fetcher.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I understand.

I think we want:

params.fetch(:user, {})[:password].presence

The methods from strong parameters .require and .permit are only useful if we're passing entire hashes around (ex User.create(params)), but in this case, we just want a single key, so I think we can avoid that approach and be explicit.

attr_name_sym = method_sym.to_s.gsub(/=\z/, '').to_sym
if plain_pii.members.include?(attr_name_sym)
return plain_pii[attr_name_sym] if arguments.empty?
plain_pii[attr_name_sym] = arguments.first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is imitating writing an attribute, should we do anything to mark the record as dirty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question.

PII must always be encrypted before saving, and because encryption requires the user passphrase, we can't make it a before_save hook or equivalent. So the only attribute that really should be marked as dirty is encrypted_pii and marking it dirty would be technically incorrect until after encrypt_pii is explicitly called.

I'm starting to think that the "convenience magic" of the method_missing stuff maybe isn't worth the complexity it creates. These two do the same thing:

profile.plain_pii[:ssn] = '1234'
profile.ssn = '1234'

and while I personally love the terse-ness and error checking of the 2nd form, the non-magical-ness of the 1st form might be friendlier to the code maintainer. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very wary of method_missing and believe we should minimize Ruby dark magic, big fan of profile.plain_pii[:ssn] = '1234'!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you feel about this instead of method_missing etc:

  Pii::Attributes.new.members.each do |attr|
    delegate attr, to: :plain_pii
    setter = "#{attr}=".to_sym
    delegate setter, to: :plain_pii
  end 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'd be really happy with just writing profile.plain_pii.foobar everywhere since it makes for a really clear relation, but it may be easier to refactor the code with this delegation layer? Would love to get other folk's opinions.

(Also teeny nit, structs have a class-level .members so you could do Pii::Attributes.members without the .new)

@pkarman
Copy link
Contributor Author

pkarman commented Oct 24, 2016

@zachmargolis thanks for the review. updates in 51dd5df

@pkarman
Copy link
Contributor Author

pkarman commented Oct 24, 2016

per the codeclimate warning, I tried changing this:

if evaluator.pii
  evaluator.pii.each { |key, val| profile.plain_pii[key] = val }
end

to this:

evaluator.pii&.each { |key, val| profile.plain_pii[key] = val }

and got an error about "can't call each on a False::false value"

I'm in favor of disabling that rubocop rule. I've never seen that safe navigation syntax before.

UPDATE: I worked around the rule.

def verified?
verified_at.present?
def plain_pii
@_plain_pii ||= Pii::Attributes.new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's address this later, but wanted to comment before I forget:

I'm not sure we should memoize this method in this way -- I think that this would allow pretending to read PII attributes without actually decrypting them. IMO if a profile has PII but it has not been decrypted, this should be nil or something so we get errors when try to access fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point. I am not happy with how the PII encryption is neither automatic in a hook, nor clearly a manual process.

We could get rid of plain_pii altogether to remove confusion. E.g.

profile = Profile.new( user: user )
pii_attrs = Pii::Attributes.new_from_hash( ssn: '1234' )
profile.encrypt_pii(passphrase, pii_attrs)
profile.save!

pii_decrypted = profile.decrypt_pii(passphrase)
puts pii_decrypted == pii_attrs  # -> true

You would just never handle un-encrypted PII aside from an explicit Pii::Attributes instance. The Profile only ever holds it encrypted.

That would make the tests a little more verbose but at least there would not be unexpected behaviour.

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zachmargolis I have removed plain_pii method in d17b0a7

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, makes sense!

I would even be in favor of something like this (very verbose and separate but very clear)

profile = Profile.new( user: user )
pii_attrs = Pii::Attributes.new_from_hash( ssn: '1234' )
profile.encrypted_pii = pii_attrs.encrypt(passphrase, pii_attrs)
profile.save!

pii_decrypted = Pii::Attributes.decrypt(passphrase, profile.encrypted_pii)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Profile.encrypt_pii also fingerprints the ssn, which must be done in sync with whatever PII is being encrypted. So having a method to handle all the encryption-related stuff seems ok to me.

@@ -3,5 +3,7 @@ class ProfileController < ApplicationController
layout 'card_wide'

def index
cacher = Pii::Cacher.new(current_user, user_session)
@active_profile = cacher.fetch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above, in saml_idp_auth_concern, you are doing the same thing, but the method is called decrypted_pii, whereas here you are saying the result represents an active profile. IIUC cacher.fetch returns a hash of attributes, so it seems like decrypted_pii is the correct term, and perhaps the instance variable here should be called that as well?

def build_profile(applicant, user, password)
Profile.create_with_encrypted_pii(user, pii_from_applicant(applicant), password)
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverse the order of these 2 methods so that they read from top-down?


def build_profile(applicant, user, password)
Profile.create_with_encrypted_pii(user, pii_from_applicant(applicant), password)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_profile implies calling new and not saving, but then create_with_encrypted_pii is called, which does save to the database. Perhaps this class should be renamed to ProfileFromApplicant and the build_profile method should be renamed to create, with the contents of the create_with_encrypted_pii from the Profile class? This will keep both the Profile class clean and keep the profile creation in one place.

@pkarman
Copy link
Contributor Author

pkarman commented Oct 25, 2016

thanks @monfresh -- hit all your review feedback in 27fc889

@pkarman pkarman self-assigned this Oct 25, 2016
pii_json = decrypted_pii.to_json
encrypted_pii = encryptor.encrypt(pii_json)
user_session[:encrypted_pii] = encrypted_pii
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the purpose of this Cacher class. It seems like in order for the encrypted PII to be cached, the user must have an active profile. But if they already have an active profile, that means they already have encrypted PII in the DB, right? If so, why can't the encrypted PII be fetched directly from the DB without the session as a middleman?

From what I can tell, here are the sequence of events:

  1. User with an active profile signs in
  2. Their encrypted PII is fetched from the DB and stored in the session (after_action in users/sessions_controller)
  3. Since their profile is already active, meaning they've already proofed, their PII is not going to be modified in this session
  4. The encrypted PII is then fetched from the session and decrypted in 2 places: 1) the ProfileController in order to display to the user their verified attributes, 2) in SamlIdpController to assert attributes back to the SP

If this is correct, it looks like the caching is not necessary since the data is already available in the DB.

Copy link
Contributor Author

@pkarman pkarman Oct 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PII may only be decrypted from the db with the user's passphrase. Since we only have the passphrase available to us at log in time, we must decrypt it then, and then store it re-encrypted w/o the passphrase for later use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's hard to see that in this PR because there is no actual encryption happening. All the crypto was yanked out to simplify the review process.

In a subsequent PR, the passphrase will actually be used for encryption, and there will be different encryption methods used for storage in the db vs storage in the session.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. I keep forgetting about that. I think some comments at the top of this class explaining why it's needed would be helpful.

@@ -86,6 +92,9 @@ production:
participate_in_dap: 'false'
password_pepper: 'rake secret'
password_strength_enabled: 'true'
pii_passphrase: 'trust-but-verify'
pii_server_cek: 'rake secret 256 bits'
pii_signing_passphrase: 'trust-but-verify'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 3 keys don't seem to be used in this PR. Move to another PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, that will go into a later PR. good catch. yanked in 61f6f79

Copy link
Contributor

@jessieay jessieay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh shoot, reviewed this earlier but forgot to submit

ssn: applicant.ssn,
phone: applicant.phone
)
def self.create_with_encrypted_pii(user, plain_pii, password)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on using named args here? I find them very helpful when dealing with more than 1 arg

@@ -15,10 +15,11 @@ class AttributeAsserter
:phone
].freeze

def initialize(user, service_provider, authn_request)
def initialize(user, service_provider, authn_request, decrypted_pii)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

named args?

@@ -52,6 +55,8 @@ def complete_profile
profile.verified_at = Time.zone.now
profile.vendor = vendor
profile.activate
# move from temp slot to 'permanent' slot
user_session[:encrypted_pii] = session.delete(:encrypted_pii)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this in a method named move_from_temp_to_permanent instead of having comment?

@@ -0,0 +1,32 @@
module Pii
Attributes = Struct.new(
:first_name, :middle_name, :last_name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rubocop might freak out if we do this, but I find lists like this easier to read when each arg is on its own line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed on both counts. I dislike fighting with rubocop more than I dislike grouped args.

decrypted_pii = profile.decrypt_pii(password)
pii_json = decrypted_pii.to_json
encrypted_pii = encryptor.encrypt(pii_json)
user_session[:encrypted_pii] = encrypted_pii
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like this method is doing more than save -- would it make sense to put parts of this in other (mayve private) methods so it's clearer what is happening?

@@ -28,7 +28,7 @@ h2.pb0.heading = t('idv.titles.session.basic')
.mt0.mb3.pb-tiny.border-bottom.border-teal = t('profile.index.dob')
= f.input :dob, label: t('idv.form.dob'), required: true,
pattern: '(0[1-9]|1[012])/(0[1-9]|1[0-9]|2[0-9]|3[01])/[0-9]{4}',
input_html: { class: 'dob', value: idv_profile_form.dob.try(:strftime, '%m/%d/%Y'),
input_html: { class: 'dob', value: idv_profile_form.dob,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why no more strftime ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because dob is no longer a DateTime object. It is a String.

@pkarman
Copy link
Contributor Author

pkarman commented Oct 25, 2016

thanks @jessieay -- feedback in dcf10d8

@pkarman
Copy link
Contributor Author

pkarman commented Oct 25, 2016

@jessieay named args change in 1c88161

note that reek has a TooManyMethods rule so refactoring to increase number of methods can sometimes hit that rule.

@pkarman
Copy link
Contributor Author

pkarman commented Oct 26, 2016

@jessieay @monfresh thanks for your comments. They weren't in the form of an official GH "review" so I am not sure if you approve yet of this PR. Please comment/review as appropriate. Thanks.

Copy link
Contributor

@jessieay jessieay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some smaller comments but nothing that blocks PR approval. I will hold off on merging to give @monfresh a chance to re-look. 🌵

@@ -68,5 +76,13 @@ def submit_applicant
idv_attempter.increment
resolution
end

def password
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super minor detail, but I usually put methods in order of dependencies (so password below where it is used, which here is valid_password?). I find this makes diffs more readable because you read from top to bottom and see why each thing is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in f6fffdb


def initialize(params, user)
@user = user
profile.attributes = params.select { |key, _val| respond_to?(key.to_sym) }
params.each do |key, value|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might consider putting this activity into a method rather than initialize -- seems like a lot of data parsing for calling .new All I usually expect to see in initialize is ivar assignment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in f6fffdb

self.authn_request = authn_request
URI_PATTERN = Saml::Idp::Constants::REQUESTED_ATTRIBUTES_CLASSREF

def initialize(options)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah so you'd rather use hash than ruby named args? I kind of prefer named args but not a big deal: https://robots.thoughtbot.com/ruby-2-keyword-arguments#keyword-arguments-vs-options-hash

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, am a big fan of named arguments, then we get bonuses like errors when there are extra/typoed arguments

Since we know we're on a recent version of Ruby and AFAIK are not planning on switching platforms (ex JRuby) I think there's a huge upside to named arguments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in f6fffdb

allow(subject).to receive(:confirm_idv_steps_complete).and_return(true)
allow(subject).to receive(:confirm_idv_attempts_allowed).and_return(true)
idv_session.params = user_attrs.merge(phone_confirmed_at: Time.zone.now)
allow(subject).to receive(:idv_session).and_return(idv_session)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good example of why before and let statements are tricky. When reading this diff, I have no idea what idv_session is, am just trusting it's some thing (factory? mock? stub? string?) above in the spec file ☘

If the test setup were all in each test, there'd be more duplication but I could see what was going on much more easily

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to see a refactor of this, maybe in a separate PR?

profile = Idv::ProfileFromApplicant.create(applicant, current_user, password)
self.profile_id = profile.id
cache_encrypted_pii(password)
profile
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell, the return value (the profile) is not used when this method is called. Perhaps the profile_id assignment should be in its own method: set_applicant_profile_id, then instead of calling idv_session.profile_from_applicant(idv_session.applicant, password) in init_questions_and_profile, you would call both of these methods:

idv_session.set_applicant_profile_id(idv_session.applicant, password)
idv_session.cache_encrypted_pii(password)

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in f6fffdb

@pkarman
Copy link
Contributor Author

pkarman commented Oct 26, 2016

thanks for all the comments. I think I've hit everything.

@monfresh comments added to Cacher in 652f01386bffb28bda061f560f7cc9dbd4bfd406

0anypRH3z3wSpO1R/5qG1YUZOQlr46CSCUxwOv0RuPMmyAxAvvEzlsVb2CwikCGD
O2L4KQ+tOCBfjFf+OYcxV3Z3WrasR77xfgCxA3x2ePlpQSzeHKt3pXy229URMGl3
GCJlM8uk66/rwARsfIDVrDYy5QzYBBWYrBIqCKzyGQn023+sW8kejRSTqF525CBt
-----END RSA PRIVATE KEY-----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this key used in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no but it will be in the next PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove it here

@@ -3,5 +3,7 @@ class ProfileController < ApplicationController
layout 'card_wide'

def index
cacher = Pii::Cacher.new(current_user, user_session)
@decrypted_pii = cacher.fetch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every time the profile is accessed, and every time a SAML request is made, this will decrypt the PII and create a hash from them. Instead of repeating this action on every request, would it make sense to store the decrypted PII in the user_session so that it can be reused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's never ok to store PII decrypted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subsequent PRs will encrypt the entire session as well.

@monfresh
Copy link
Contributor

LGTM. Wanna squash?

**Why**: All PII must be encrypted at rest.
Storing attributes in a single JSON blob makes this much easier.

**How**:

This change does not implement any actual encryption. That will follow
in a separate change.

This change includes:

* schema modifications
* user flow for collecting passphrase at IdV review
* internal flow for decrypt PII at login for use in session
@pkarman
Copy link
Contributor Author

pkarman commented Oct 26, 2016

@monfresh all squashed.

@pkarman pkarman merged commit 94f3587 into master Oct 26, 2016
@pkarman pkarman deleted the encrypt-pii-part2 branch October 26, 2016 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants