New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tools to replicate sanitized real cases in dev/test environment #15976
Conversation
Generated by 🚫 Danger |
Code Climate has analyzed commit c7af906 and detected 0 issues on this pull request. View more on Code Climate. |
lib/helpers/association_wrapper.rb
Outdated
def typed_associations(excluding: nil) | ||
belongs_to.having_type_field.except_fieldnames(excluding) | ||
end | ||
|
||
def typed_associations_with(assoc_class) | ||
belongs_to.having_type_field.associated_with_type(assoc_class) | ||
end | ||
|
||
def untyped_associations_with(assoc_class) | ||
belongs_to.without_type_field.associated_with_type(assoc_class) | ||
end | ||
|
||
def grouped_fieldnames_of_typed_associations_with(known_classes) | ||
# Foreign keys that are not strings (e.g., Claimant.participant_id) involves | ||
# more complex association that isn't currently handled (and may not need to be) | ||
belongs_to.associations.group_by(&:class_name) | ||
.slice(*known_classes) | ||
.transform_values { |assocs| assocs.map(&:foreign_key).select { |fk| fk.is_a?(String) } } | ||
.compact | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These last 4 methods are used by SanitizedJsonConfiguration
# Configuration for exporting/importing data from/to Caseflow's database. | ||
# Needed by SanitizedJsonExporter and SanitizedJsonImporter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This SanitizedJsonConfiguration is specific to the Caseflow domain, while SanitizedJsonExporter/Importer are meant to be domain-independent.
module TaskAssignment | ||
def assigned_to_user | ||
select { |task| task.assigned_to_type == "User" } | ||
end | ||
|
||
def assigned_to_org | ||
select { |task| task.assigned_to_type == "Organization" } | ||
end | ||
|
||
def with_type(task_type) | ||
select { |task| task.type == task_type } | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods are only used by the retrieval:
lines above.
end | ||
end | ||
|
||
USE_PROD_ORGANIZATION_IDS ||= false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be set to true later when we update Organization singletons in our seed data to have the same id
as in prod to avoid problems with MailTeam.singleton
returning the wrong record when a MailTeam org with a different id
is imported.
|
||
return mapped_value if mapped_value | ||
|
||
default_mapped_value(orig_value, field_name, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only happens when a new field is added for sanitizing and none of the transforms return a non-nil.
Demo plan:
# CAVC remand with several issues
Appeal.where(stream_type: "court_remand", docket_type: "hearing").map{|a| [a.id, a.request_issues.count]}
=> [[133016, 4] ...]
appeal=Appeal.find(133016)
appeals=[appeal]
appeals.map &:treee
appeals.map{|a| puts a.render_intake}
sje=SanitizedJsonExporter.new(appeal, verbosity: 5);
sje.save('/tmp/appeal15.json', purpose: 'Mar 19th Demo')
sji = SanitizedJsonImporter.from_file('appeal15.json')
imp_appeal1 = sji.import
IntakeRenderer.patch_intake_classes
sji.imported_records[Appeal.table_name].map{|appeal|
appeal.treee
pp appeal.hearings
puts appeal.render_intake
}
|
def find_existing_record(clazz, obj_hash, importer: nil) | ||
if clazz == User | ||
# The index for css_id has an odd column name plus find_by_css_id is faster. | ||
User.find_by_css_id(obj_hash["css_id"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The User class has this curious index:
ActiveRecord::Base.connection.indexes(User.table_name).select(&:unique).first.columns
=> "upper((css_id)::text)"
instead of the typical Array:
ActiveRecord::Base.connection.indexes(Organization.table_name).select(&:unique).first.columns
=> ["url"]
}, | ||
Veteran => { | ||
track_imported_ids: true, | ||
sanitize_fields: %w[file_number first_name last_name middle_name ssn], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to update the column comments in schema.rb
so that these PII fields can be automatically identified, so we don't have to specify sanitize_fields
manually here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite done reviewing, but wanted to give you something to respond to.
I followed the testing plan and successfully exported & imported a relatively complex record. Most of my comments are minor. I will continue reviewing this week.
Something that occurred to me while using: you could put a check on SanitizedJsonImporter
that stops it from importing if Rails.env.production?
Co-authored-by: Tomas Apodaca <Thomas.Apodaca@va.gov>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approve! with non-blocking questions/suggestions
return unless obj_hash[field_name] | ||
|
||
# Loop to ensure hash @value_mapping has a different value for each distinct key | ||
10.times do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if it's still not unique after 10 tries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It uses the last generated value and moves on. The user can rerun the export as well.
### Description This PR doesn't affect end-users. It affects only the export utility in #15976 that engineers use. Improves #15976 by: * exports inactive `Organization` records so we don't have to manually create them in rspecs * eager loads Task associations * removes the now unnecessary `TaskAssignment` module * makes `AttorneyCaseReview` and `JudgeCaseReview` less specific to certain tasks * re-orders `AttorneyCaseReview` and `JudgeCaseReview` since they now depend on `Tasks` record to be retrieved ### Acceptance Criteria - [x] Tests pass ### Testing Plan None -- covered by RSpecs that use the export/import utility.
Innovation Idea: Create a tool to replicate sanitized real cases in dev/test environment
Slack channel: #appeals-ip-replicate-data
Details: Many engineers desire a way to test solutions on real data without affecting prod data. The approach is to create a tool that can export an appeal and associated records, sanitize them of PII, then import them into a dev/test environment.
Hypothesis: If realistic appeals can be replicated in our dev/test environment, then we can test solutions locally and be more confident in the outcome of the solution. This can also be a building block to import more realistic cases into the default dev environment.
User Story: As a developer, I want to recreate sanitized instances of prod data in my dev (or testing) environment so that I can experiment with realistic data.
Description
Tools to replicate sanitized real cases in dev/test environment
Acceptance Criteria
Testing Plan
Note: Multiple appeals can be exported at one time:
SanitizedJsonExporter.new(appeal1, appeal2)
4. Exit Rails console and examine the file for PII and sensitive data
5. Once satisfied that there is no sensitive data, copy the file locally. Back on your computer, copy the file:
Code Documentation Updates