New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SO-2127 snomed rf2 importer issues #122
Conversation
Clear unnecessary allocated data when after validator.doValidate(). Validate Stated Relationships file instead of validating Relationships file twice. Report errors to a files instead of console. Report taxonomy errors by effective time. Allow unpublished components to be imported from FULL RF2 files.
...instead of Strings Add writeTo(Writer) method to SnomedValidationDefect.
Bulk ID operations will load documents in bulk instead of one by one before executing the operation.
...instead of referencing variable from anonym class. This will ensure that only relevant information from the RF2 rows will be kept in memory and not the entire row.
Component importers will read RF2 rows until they reach the specified threshold (50000 atm) and then start processing them instead of processing the rows one by one. This way the importer will execute way more less index lookups than before (single index query per 50000 rows to get the current revision).
…ed-rf2-importer-issues
...for the last imported RF2 effective time
Fetch Description objects by storageKey and attach new members to lang members list directly instead of hacking with low-level revision objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with the changes!
Please take a look at the review comments below; there doesn't seem to be any big stuff to move around. Some of them are already outdated because I reviewed commits in chronological order. 😄
try { | ||
for (SnomedValidationDefect defect : defects) { | ||
final String filePath = defect.getFileName(); | ||
final String defectsFile = String.format("d:/%s_%s_defects.txt", configuration.getArchiveFile().getName(), filePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard-coded file location above 👓
} | ||
for (String lineDefect : defect.getDefects()) { | ||
defectWriters.get(defectsFile).write(defect.getDefectType().name() + "\t" + lineDefect); | ||
defectWriters.get(defectsFile).write(lineSeparator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BufferedWriter
exposes a newLine
method, which uses the line.separator
system property as well, so lineSeparator
could be removed if it's not used elsewhere.
@@ -38,6 +38,8 @@ | |||
*/ | |||
public class Rf2FileModifier { | |||
|
|||
private static final Splitter ON = Splitter.on('\t'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename this constant so that it indicates that it's a Splitter
.
@@ -160,6 +160,7 @@ protected void doValidate(String effectiveTime, IProgressMonitor monitor) { | |||
addDefect(DefectType.REFSET_MEMBER_COMPONENT_NOT_EXIST, referencedComponentNotExist); | |||
uuidInvalid.clear(); | |||
referencedComponentNotExist.clear(); | |||
memberDataByUuid.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uuidNotUnique
is not cleared here. Do all fields need to be cleared in all validators after checking an effectiveTime layer? If not, please revisit doValidate
methods and check if the appropriate fields are cleared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on the field whether it needs to be cleared or not after doValidate()
. uuidNotUnique
seems to be used to report ID uniqueness issues independent of the effective time.
final Collection<String> defects = newHashSet(); | ||
final Collection<String> conceptIdsToInactivate = newHashSet(); | ||
|
||
final Collection<TaxonomyDefect> defects = newArrayListWithExpectedSize(invalidRelationships.size()); | ||
for (final String effectiveTime : invalidRelationships.keySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that ArrayListMultimap
's key set might not preserve the ordering used in processTaxonomy
. (Not sure if we are relying on it here.)
editedConcept.setEffectiveTime(currentRow.getEffectiveTime()); | ||
editedConcept.setReleased(true); | ||
protected void applyRow(Concept component, ConceptRow row, Collection<Concept> componentsToAttach) { | ||
component.setId(row.getId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as in reference set importers, core component importers could also avoid setting the SCTID twice.
|
||
return description; | ||
protected void attach(Collection<Description> componentsToAttach) { | ||
// nothing to do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thought: it might be quicker to store the concept IDs instead of resolving them in applyRow
, retrieve the actual concepts with a single call here, then use concept.getDescriptions().add(description)
on the appropriate concept. But we'd have to test this somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my original intention with the attach
method, use it everywhere, but I found the current impl. good enough (in terms of performance) so I did not change these. (takes ~1 sec to process 50.000
descriptions/relationships at the moment)
component.setActive(row.isActive()); | ||
component.setModule(getConceptSafe(row.getModuleId(), SnomedRf2Headers.FIELD_MODULE_ID, row.getId())); | ||
|
||
if (component.getSource() == null || !component.getSource().getId().equals(row.getSourceId())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to accept import files where the source ID gets changed? Is the SCT change processor prepared to properly index these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully, no. 😄
|
||
return relationship; | ||
protected void attach(Collection<Relationship> componentsToAttach) { | ||
// nothing to do, we already registered the relationships to the source concepts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Same as with descriptions.)
@@ -406,8 +402,7 @@ private ImportAction checkHeaders(final String[] expectedHeader, final String[] | |||
} | |||
|
|||
private ImportAction handlePreImportException(final SuperCSVException e) { | |||
final String reason = null != e.getMessage() ? " Reason: '" + e.getMessage() + "'" : ""; | |||
log("Exception caught while reading release file. Continuing with next row." + reason); | |||
log("Exception caught while reading release file. Continuing with next row. {}", e.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the vararg Object
s land in an SLF4J Logger
, we could propagate exceptions as an additional parameter now, to get a stack trace along with the exception message in the log output. There are other opportunities to do so below.
@@ -633,13 +634,13 @@ protected void preCommit(final InternalCDOTransaction transaction) throws Snowow | |||
} | |||
|
|||
private ImportAction checkCommitException(final SnowowlServiceException e) { | |||
log("SNOMED CT import failed. Caught exception while import, aborting. {}", e.getMessage()); | |||
log("SNOMED CT import failed. Caught exception while import, aborting. {}", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The placeholder should be removed from this format string.
importContext.getLogger().warn("Caught exception while import, aborting.", e); | ||
return ImportAction.BREAK; | ||
} | ||
|
||
private ImportAction checkCommitException(final CommitException e) { | ||
log("SNOMED CT import failed. Caught exception while import, aborting. {}", e.getMessage()); | ||
log("SNOMED CT import failed. Caught exception while import, aborting. {}", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also here)
Good to merge otherwise. ✔️ |
Thank you! |
SNOMED CT RF2 importer enhancements, bugfixes: