Add relation duplicate checks and creation strategies #11

jedateach · 2015-03-31T21:33:37Z

Sometimes you may want to check if a record is a duplicate based on a relation being the same. E.g. ProductSelection is a duplicate when the ProductID is the same.

We don't always want relation objects to be created, so the behaviour should be configurable.

The existing behaviour does not allow duplicate checks based on a relation ID, and relation objects are always created.

Current duplicate check / loading issues

The current loading approach is:

Find(duplicate check) or create a new $dataobject
Loop over each record field to find or create and set any relations (known as the first run)
- Dot notation or a callback can be used to reference relation object
- Write new relation objects, if not already written
Loop over each field and set data on the $dataobject using ->update() (known as the second run)
- When dot notation is used a write() will be performed on the $dataobject and new relation objects.

Reasoning for the two phase (first/second run) approach:

//find/create any relations and store them on the object
//we can't combine runs, as other columns might rely on the relation being present

Cyclic dependency prevents simple solution

Relation callbacks are currently run after the duplicate checks run. To introduce relation-based duplicate checks, we need to fire the developer-configured relation/dotnotation callbacks, which either reside on a subclass of BulkLoader, or a singleton of the relation object class. This puts us in a chicken-egg situation, where we want the relation to perform a duplicate check, but the duplicate check is run before relation callbacks, because callbacks may need to be fired on an existing object.

Update() method is not flexible

The ->update() method gives a specific behaviour that can't really be manipulated. It is limiting, and inflexible.

Proposed solution

A new approach could be:

Loop over fields in columnMap, populating a placeholder DataObject with relation ids, and fields from record data. Callbacks can transform data or retrieve relation objects.
Find existing objects matching various specified duplicateCheck fields on the placeholder. Either update an existing object, save the placeholder, or do nothing...depending on configuration. Duplicate checks could be on multiple fields, e.g. ProductID && Size.

Because we loop over the columnMap, rather than the record itself, we can configure the order that fields are imported. So if importing one field relies on another, there is no need to do the two stage/phase approach.
If a columnMap is not provided, then the mappable columns need to be scaffolded.

We would need to somehow ensure that callbacks don't try writing the placeholder object. This could persist DataObjects that should never be persisted.

To tidy up the configuration system, I think that all of the callbacks should be anonymous functions/ Closures, instead of string callback names of functions on $obj and subclasses of BulkLoader.

Whilst the ->update() function may continue to be used, by the time it is reached, it will not contain any dot notation fields that would trigger a potential relation creation. Relation creation will be handled separately to make it configurable.

Here is some pseudo code demonstrating how data could go through a new process:

<?php

/**
 * process:
 * raw data is extracted using BulkLoaderSource as iterable rows
 * row data is mapped into a standardised form
 * standard form is transformed into a placeholder dataobject
 */

//raw data
$rawdata = "name,age,country
            joe bloggs,62,NZ
            alice smith,24,AU
            ";

//CSVBulkLoaerSource parses raw into records
$rows = array(
    array("name" => "joe bloggs", "age" => "62", "country" => "NZ"),
    array("name" => "alice smith", "age" => "24", "country" => "AU")
);

//mapping for getting data into a standard form
//(either hard-coded, or defined by user)
$mapping = array(
    "first name" => "FirstName",
    "last name" => "Surname",
    "name" => "Name",
    "age" => "Age",
    "country" => "Country.Code",
);

//first record after mapping has been performed
$record = array(
    "Name" => "joe bloggs",
    "Age" => "62",
    "Country.Code" => "NZ"
);

//define how data will be transformed
$transforms = array(
    "Name" => array(
        'callback' => function($value, $obj){
           $name =  explode(" ", $value);
           $obj->FirstName = $name[0];
           $obj->Surname = $name[1];
        }
    ),
    "Country.Code" => array(
        "link" => true, //link up relations
        "create" => false //don't creaet new relation objects
    )
);

//dataobject record after tranformation
$dataobj->record = array(
    "FirstName" => "Joe",
    "Surname" => "Bloggs",
    "CountryID" => 234
);

The text was updated successfully, but these errors were encountered:

Relates to #11

jedateach · 2015-04-08T04:58:30Z

This has been added in 0.2.0

jedateach added this to the 0.1.x milestone Mar 31, 2015

jedateach added enhancement help wanted labels Apr 1, 2015

jedateach changed the title ~~Allow duplicate checks on relation ID~~ Relation duplicate checks and creation strategies Apr 1, 2015

jedateach changed the title ~~Relation duplicate checks and creation strategies~~ Add relation duplicate checks and creation strategies Apr 1, 2015

jedateach added a commit that referenced this issue Apr 2, 2015

Initial work on getting relation duplicates to work

8cef50d

Relates to #11

jedateach closed this as completed Apr 8, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add relation duplicate checks and creation strategies #11

Add relation duplicate checks and creation strategies #11

jedateach commented Mar 31, 2015

jedateach commented Apr 8, 2015

Add relation duplicate checks and creation strategies #11

Add relation duplicate checks and creation strategies #11

Comments

jedateach commented Mar 31, 2015

Current duplicate check / loading issues

Cyclic dependency prevents simple solution

Update() method is not flexible

Proposed solution

jedateach commented Apr 8, 2015