Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tools to replicate sanitized real cases in dev/test environment #15976

Merged
merged 91 commits into from Mar 25, 2021

Conversation

yoomlam
Copy link
Contributor

@yoomlam yoomlam commented Mar 8, 2021

Innovation Idea: Create a tool to replicate sanitized real cases in dev/test environment

Slack channel: #appeals-ip-replicate-data

Details: Many engineers desire a way to test solutions on real data without affecting prod data. The approach is to create a tool that can export an appeal and associated records, sanitize them of PII, then import them into a dev/test environment.

Hypothesis: If realistic appeals can be replicated in our dev/test environment, then we can test solutions locally and be more confident in the outcome of the solution. This can also be a building block to import more realistic cases into the default dev environment.

User Story: As a developer, I want to recreate sanitized instances of prod data in my dev (or testing) environment so that I can experiment with realistic data.

  • The data can be imported to be part of our seed data in the dev DB.
  • The data can be imported into the test DB for individual Rspec tests.

Description

Tools to replicate sanitized real cases in dev/test environment

Acceptance Criteria

  • Code compiles correctly
  • Able to export and import AMA appeals with tasks, issues, hearings, and associated users and orgs

Testing Plan

  1. Copy files to prod (or ask Yoom to do it)
INSTANCE=i-06b8441957026bb62
scp lib/helpers/sanit*.rb  lib/helpers/association_wrapper.rb $INSTANCE:/tmp
  1. ssm into the instance and move the new files to the Caseflow folder and start a Rails console in prod
sudo bash
cd /opt/caseflow-certification/src
mv -vf /tmp/sanit*.rb /tmp/association_wrapper.rb lib/helpers/

source /opt/caseflow-certification/caseflow-certification_env.sh
IRBRC=/tmp/my_repl.rb bin/rails c
  1. In the Rails console, do the following
require "faker"
require "helpers/sanitized_json_configuration.rb"
require "helpers/sanitized_json_exporter.rb"

# Choose a random appeal
appeal=Appeal.find(rand(Appeal.count))
# Examine the appeal
appeal.treee
pp appeal.hearings
IntakeRenderer.patch_intake_classes
puts appeal.render_intake

# Export to a file
sje=SanitizedJsonExporter.new(appeal, verbosity: 5) # Maximize verbosity to see what it's doing
# scroll up to view the verbose output
# Save to file
sje.save('/tmp/appeal1.json', purpose: 'test1')

Note: Multiple appeals can be exported at one time: SanitizedJsonExporter.new(appeal1, appeal2)
4. Exit Rails console and examine the file for PII and sensitive data
5. Once satisfied that there is no sensitive data, copy the file locally. Back on your computer, copy the file:

scp $INSTANCE:'/tmp/appeal*.json' .
  1. Then start a local Rails console and import the file.
sji = SanitizedJsonImporter.from_file('appeal1.json', verbosity: 5) # Maximize verbosity to see what it's doing
sji.import
# scroll up to view the verbose output

# Examine results
sji.imported_records # Hash of records imported into the database
sji.reused_records # Existing ActiveRecords that are not imported because they already exist in the database

# Compare appeal to the original in prod
IntakeRenderer.patch_intake_classes
sji.imported_records["appeals"].map{|appeal|
  appeal.treee
  pp appeal.hearings
  puts appeal.render_intake
}

Code Documentation Updates

  • Add or update code comments at the top of the class, module, and/or component.

@yoomlam yoomlam self-assigned this Mar 8, 2021
@va-bot
Copy link
Collaborator

va-bot commented Mar 8, 2021

1 Warning
⚠️ This is a Big PR. Try to break this down if possible. Stacked pull requests encourage more detailed and thorough code reviews

Generated by 🚫 Danger

@codeclimate
Copy link

codeclimate bot commented Mar 8, 2021

Code Climate has analyzed commit c7af906 and detected 0 issues on this pull request.

View more on Code Climate.

Comment on lines 47 to 66
def typed_associations(excluding: nil)
belongs_to.having_type_field.except_fieldnames(excluding)
end

def typed_associations_with(assoc_class)
belongs_to.having_type_field.associated_with_type(assoc_class)
end

def untyped_associations_with(assoc_class)
belongs_to.without_type_field.associated_with_type(assoc_class)
end

def grouped_fieldnames_of_typed_associations_with(known_classes)
# Foreign keys that are not strings (e.g., Claimant.participant_id) involves
# more complex association that isn't currently handled (and may not need to be)
belongs_to.associations.group_by(&:class_name)
.slice(*known_classes)
.transform_values { |assocs| assocs.map(&:foreign_key).select { |fk| fk.is_a?(String) } }
.compact
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These last 4 methods are used by SanitizedJsonConfiguration

Comment on lines +6 to +7
# Configuration for exporting/importing data from/to Caseflow's database.
# Needed by SanitizedJsonExporter and SanitizedJsonImporter.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SanitizedJsonConfiguration is specific to the Caseflow domain, while SanitizedJsonExporter/Importer are meant to be domain-independent.

Comment on lines +119 to +131
module TaskAssignment
def assigned_to_user
select { |task| task.assigned_to_type == "User" }
end

def assigned_to_org
select { |task| task.assigned_to_type == "Organization" }
end

def with_type(task_type)
select { |task| task.type == task_type }
end
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods are only used by the retrieval: lines above.

end
end

USE_PROD_ORGANIZATION_IDS ||= false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be set to true later when we update Organization singletons in our seed data to have the same id as in prod to avoid problems with MailTeam.singleton returning the wrong record when a MailTeam org with a different id is imported.

lib/helpers/sanitized_json_configuration.rb Outdated Show resolved Hide resolved

return mapped_value if mapped_value

default_mapped_value(orig_value, field_name, **kwargs)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only happens when a new field is added for sanitizing and none of the transforms return a non-nil.

lib/helpers/sanitized_json_importer.rb Outdated Show resolved Hide resolved
lib/helpers/sanitized_json_importer.rb Outdated Show resolved Hide resolved
lib/helpers/sanitized_json_importer.rb Outdated Show resolved Hide resolved
spec/factories/cavc_remand.rb Show resolved Hide resolved
@yoomlam
Copy link
Contributor Author

yoomlam commented Mar 19, 2021

Demo plan:

  • present slide
  • show export output in prod, explaining that CAVC remand appeal is associated with a source appeal
# CAVC remand with several issues
Appeal.where(stream_type: "court_remand", docket_type: "hearing").map{|a| [a.id, a.request_issues.count]}
=> [[133016, 4] ...]
appeal=Appeal.find(133016)
appeals=[appeal]
appeals.map &:treee
appeals.map{|a| puts a.render_intake}
sje=SanitizedJsonExporter.new(appeal, verbosity: 5);
sje.save('/tmp/appeal15.json', purpose: 'Mar 19th Demo')
  • scan through appeal15.json, noting obviously fake veteran_file_number/ssn/full_name/email and record ids are retained
  • describe rspec; run import within rspec, comparing task trees and intake trees (note similar record ids and css_ids for easy referencing with prod data (and consistency across many imports); timestamps are the same); note the source appeal for the CAVC remand is also imported
      sji = SanitizedJsonImporter.from_file('appeal15.json')
      imp_appeal1 = sji.import
      IntakeRenderer.patch_intake_classes  
      sji.imported_records[Appeal.table_name].map{|appeal|
        appeal.treee
        pp appeal.hearings
        puts appeal.render_intake
      }
  • describe SanitizedJsonConfiguration: @configuration
  • describe how SanitizationTransforms is applied to sanitize field values
  • SanitizedJsonExporter (~200 lines) and SanitizedJsonImporter (~300 lines) are domain-independent, which means they can be reused for other databases.

def find_existing_record(clazz, obj_hash, importer: nil)
if clazz == User
# The index for css_id has an odd column name plus find_by_css_id is faster.
User.find_by_css_id(obj_hash["css_id"])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The User class has this curious index:

ActiveRecord::Base.connection.indexes(User.table_name).select(&:unique).first.columns
=> "upper((css_id)::text)"

instead of the typical Array:

ActiveRecord::Base.connection.indexes(Organization.table_name).select(&:unique).first.columns
=> ["url"]

lib/helpers/sanitized_json_importer.rb Outdated Show resolved Hide resolved
},
Veteran => {
track_imported_ids: true,
sanitize_fields: %w[file_number first_name last_name middle_name ssn],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to update the column comments in schema.rb so that these PII fields can be automatically identified, so we don't have to specify sanitize_fields manually here.

Copy link
Contributor

@tomas-nava tomas-nava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite done reviewing, but wanted to give you something to respond to.

I followed the testing plan and successfully exported & imported a relatively complex record. Most of my comments are minor. I will continue reviewing this week.

Something that occurred to me while using: you could put a check on SanitizedJsonImporter that stops it from importing if Rails.env.production?

lib/helpers/association_wrapper.rb Show resolved Hide resolved
lib/helpers/association_wrapper.rb Outdated Show resolved Hide resolved
lib/helpers/sanitation_transforms.rb Outdated Show resolved Hide resolved
spec/lib/helpers/sanitized_json_exporter_spec.rb Outdated Show resolved Hide resolved
spec/lib/helpers/sanitized_json_exporter_spec.rb Outdated Show resolved Hide resolved
spec/lib/helpers/sanitized_json_exporter_spec.rb Outdated Show resolved Hide resolved
lib/helpers/sanitized_json_exporter.rb Outdated Show resolved Hide resolved
lib/helpers/sanitized_json_exporter.rb Show resolved Hide resolved
lib/helpers/sanitized_json_importer.rb Outdated Show resolved Hide resolved
yoomlam and others added 3 commits March 23, 2021 08:03
@yoomlam
Copy link
Contributor Author

yoomlam commented Mar 25, 2021

Copy link
Contributor

@tomas-nava tomas-nava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approve! with non-blocking questions/suggestions

lib/helpers/sanitized_json_importer.rb Show resolved Hide resolved
lib/helpers/sanitized_json_exporter.rb Show resolved Hide resolved
return unless obj_hash[field_name]

# Loop to ensure hash @value_mapping has a different value for each distinct key
10.times do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if it's still not unique after 10 tries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses the last generated value and moves on. The user can rerun the export as well.

@yoomlam yoomlam added the Ready-to-Merge This PR is ready to be merged and will be picked up by va-bot to automatically merge to master label Mar 25, 2021
@va-bot va-bot merged commit 8f971bc into master Mar 25, 2021
@va-bot va-bot deleted the yoom/export-import-from-prod branch March 25, 2021 23:00
va-bot pushed a commit that referenced this pull request May 27, 2021
### Description
This PR doesn't affect end-users. It affects only the export utility in #15976 that engineers use.

Improves #15976 by:
* exports inactive `Organization` records so we don't have to manually create them in rspecs
* eager loads Task associations
* removes the now unnecessary `TaskAssignment` module
* makes `AttorneyCaseReview` and `JudgeCaseReview` less specific to certain tasks
* re-orders `AttorneyCaseReview` and `JudgeCaseReview` since they now depend on `Tasks` record to be retrieved

### Acceptance Criteria
- [x] Tests pass

### Testing Plan
None -- covered by RSpecs that use the export/import utility.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ready-to-Merge This PR is ready to be merged and will be picked up by va-bot to automatically merge to master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants