You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes you may want to check if a record is a duplicate based on a relation being the same. E.g. ProductSelection is a duplicate when the ProductID is the same.
We don't always want relation objects to be created, so the behaviour should be configurable.
The existing behaviour does not allow duplicate checks based on a relation ID, and relation objects are always created.
Current duplicate check / loading issues
The current loading approach is:
Find(duplicate check) or create a new $dataobject
Loop over each record field to find or create and set any relations (known as the first run)
Dot notation or a callback can be used to reference relation object
Write new relation objects, if not already written
Loop over each field and set data on the $dataobject using ->update() (known as the second run)
When dot notation is used a write() will be performed on the $dataobject and new relation objects.
Reasoning for the two phase (first/second run) approach:
//find/create any relations and store them on the object//we can't combine runs, as other columns might rely on the relation being present
Cyclic dependency prevents simple solution
Relation callbacks are currently run after the duplicate checks run. To introduce relation-based duplicate checks, we need to fire the developer-configured relation/dotnotation callbacks, which either reside on a subclass of BulkLoader, or a singleton of the relation object class. This puts us in a chicken-egg situation, where we want the relation to perform a duplicate check, but the duplicate check is run before relation callbacks, because callbacks may need to be fired on an existing object.
Update() method is not flexible
The ->update() method gives a specific behaviour that can't really be manipulated. It is limiting, and inflexible.
Proposed solution
A new approach could be:
Loop over fields in columnMap, populating a placeholder DataObject with relation ids, and fields from record data. Callbacks can transform data or retrieve relation objects.
Find existing objects matching various specified duplicateCheck fields on the placeholder. Either update an existing object, save the placeholder, or do nothing...depending on configuration. Duplicate checks could be on multiple fields, e.g. ProductID && Size.
Because we loop over the columnMap, rather than the record itself, we can configure the order that fields are imported. So if importing one field relies on another, there is no need to do the two stage/phase approach.
If a columnMap is not provided, then the mappable columns need to be scaffolded.
We would need to somehow ensure that callbacks don't try writing the placeholder object. This could persist DataObjects that should never be persisted.
To tidy up the configuration system, I think that all of the callbacks should be anonymous functions/ Closures, instead of string callback names of functions on $obj and subclasses of BulkLoader.
Whilst the ->update() function may continue to be used, by the time it is reached, it will not contain any dot notation fields that would trigger a potential relation creation. Relation creation will be handled separately to make it configurable.
Here is some pseudo code demonstrating how data could go through a new process:
<?php/** * process: * raw data is extracted using BulkLoaderSource as iterable rows * row data is mapped into a standardised form * standard form is transformed into a placeholder dataobject *///raw data$rawdata = "name,age,country joe bloggs,62,NZ alice smith,24,AU";
//CSVBulkLoaerSource parses raw into records$rows = array(
array("name" => "joe bloggs", "age" => "62", "country" => "NZ"),
array("name" => "alice smith", "age" => "24", "country" => "AU")
);
//mapping for getting data into a standard form//(either hard-coded, or defined by user)$mapping = array(
"first name" => "FirstName",
"last name" => "Surname",
"name" => "Name",
"age" => "Age",
"country" => "Country.Code",
);
//first record after mapping has been performed$record = array(
"Name" => "joe bloggs",
"Age" => "62",
"Country.Code" => "NZ"
);
//define how data will be transformed$transforms = array(
"Name" => array(
'callback' => function($value, $obj){
$name = explode("", $value);
$obj->FirstName = $name[0];
$obj->Surname = $name[1];
}
),
"Country.Code" => array(
"link" => true, //link up relations
"create" => false//don't creaet new relation objects
)
);
//dataobject record after tranformation$dataobj->record = array(
"FirstName" => "Joe",
"Surname" => "Bloggs",
"CountryID" => 234
);
The text was updated successfully, but these errors were encountered:
Sometimes you may want to check if a record is a duplicate based on a relation being the same. E.g.
ProductSelection
is a duplicate when theProductID
is the same.We don't always want relation objects to be created, so the behaviour should be configurable.
The existing behaviour does not allow duplicate checks based on a relation ID, and relation objects are always created.
Current duplicate check / loading issues
The current loading approach is:
$dataobject
$dataobject
using->update()
(known as the second run)write()
will be performed on the$dataobject
and new relation objects.Reasoning for the two phase (first/second run) approach:
Cyclic dependency prevents simple solution
Relation callbacks are currently run after the duplicate checks run. To introduce relation-based duplicate checks, we need to fire the developer-configured relation/dotnotation callbacks, which either reside on a subclass of
BulkLoader
, or a singleton of the relation object class. This puts us in a chicken-egg situation, where we want the relation to perform a duplicate check, but the duplicate check is run before relation callbacks, because callbacks may need to be fired on an existing object.Update() method is not flexible
The
->update()
method gives a specific behaviour that can't really be manipulated. It is limiting, and inflexible.Proposed solution
A new approach could be:
columnMap
, populating a placeholder DataObject with relation ids, and fields from record data. Callbacks can transform data or retrieve relation objects.duplicateCheck
fields on the placeholder. Either update an existing object, save the placeholder, or do nothing...depending on configuration. Duplicate checks could be on multiple fields, e.g.ProductID
&&Size
.Because we loop over the
columnMap
, rather than the record itself, we can configure the order that fields are imported. So if importing one field relies on another, there is no need to do the two stage/phase approach.If a
columnMap
is not provided, then the mappable columns need to be scaffolded.We would need to somehow ensure that callbacks don't try writing the placeholder object. This could persist DataObjects that should never be persisted.
To tidy up the configuration system, I think that all of the callbacks should be anonymous functions/ Closures, instead of string callback names of functions on
$obj
and subclasses ofBulkLoader
.Whilst the
->update()
function may continue to be used, by the time it is reached, it will not contain any dot notation fields that would trigger a potential relation creation. Relation creation will be handled separately to make it configurable.Here is some pseudo code demonstrating how data could go through a new process:
The text was updated successfully, but these errors were encountered: