Skip to content

Latest commit

 

History

History
175 lines (108 loc) · 7.93 KB

README.md

File metadata and controls

175 lines (108 loc) · 7.93 KB
title
Data Importer

Torq IT Data Importer Extensions

This extension adds a number of additional features to the Pimcore Data Importer bundle.

Path Syntax

A number of our extensions make use of the Path syntax that allows for Paths to be created based on values in the import.

Take for example an Excel File:

Year Make Model Color
2015 GMC Sierra White
2001 Chevrolet Silverado Blue

To build the Path /Products/Cars/GMC/Sierra/2015 using Path Syntax would be /Products/Cars/$[1]/$[2]/$[0]. The numerical values correspond to the indexes of the values in the Excel file (starting at 0).

For an XML file:

<Cars>
    <Car>
        <Make>GMC</Make>
        <Model>Sierra</Model>
        <Year>2015</Year>
        <Color>White</Color>
    </Car>
    <Car>
        <Make>Chevrolet</Make>
        <Model>Silverado</Model>
        <Year>2001</Year>
        <Color>Blue</Color>
    </Car>
</Cars>

the Path Syntax would use the Attribute names instead /Products/Cars/$[Make]/$[Model]/$[Year]

Data Interpreters

Data Interpreters are the supported "File Formats" that the Data Importer bundle can use. We've added a few of our own.

Advanced XLSX Interpreter

The Advanced XLSX interpreter makes a few improvements over the default XLSX interpreter.

This interpreter uses openspout as the Excel parser. Open Spout XLSX parsing uses much less memory than the default XLSX parses which makes use of PHPOffice. We've seen files that required >4GB RAM on PHPOffice use less than 50MB with openspout. We've also detected a memory leak in some cases with the PHPOffice implementation where RAM gets allocated on the server and never released.

Configuration Option Description
Unique Column Indexes Accepts a comma separated list of column indexes to treat as unique values. Used to filter the rows in an excel file. For example an excel file with the headers Brand,Model,SubModel and you want to import a unique Brand object for each new Brand you encounter. In this case, use value 0 to only take unique values from the first column in the Excel file. If you want to create a data object for each Brand and Model use 0,1 as the value.
Row Filter This accepts a Symfony Expression to be applied to the rows of the Excel file. Each row in the Excel file get's handed to the expression evaluator as a variable named row. The expression row[0] == 'Apple' would only process rows where the value of the first column is Apple.

Bulk XLSX Interpreter

This Bulk XLSX Interpreter has the same options as the Advanced XLSX Interpreter. The main difference is that the Excel file gets converted to a CSV and loaded into the Data Importer queue using LOAD LOCAL INFILE. This VERY DRASTICALLY improves the performance of loading the queue table. We've seen 200K rows loaded in <5s. Our experience with a 16GB RAM server shows that Excel files over 30K rows often are not imported successfully by the default XLSX Interpreter.

This Feature Requires the Database Server to be configured to permit local infile / infile permissions!

See MySQL Documentation regarding LOCAL INFILE.

Also in your database connection you'll need to add the Bulk option (1001:true) in example:

doctrine:
    dbal:
        connections:
            default:
                host: "%env(string:DATABASE_HOST)%"
                port: 3306
                user: "%env(string:DATABASE_USER)%"
                password: "%env(string:DATABASE_PASSWORD)%"
                dbname: "%env(string:DATABASE_NAME)%"
                mapping_types: { enum: string, bit: boolean }
                server_version: "5.5.5-10.4.22-MariaDB-1:10.4.22+maria~focal"
                options:
                    1001: true

Bulk CSV Interpreter

The Bulk CSV Interpreter has the same options as the regular CSV interpreter but like the Bulk XLSX Interpreter it uses LOAD LOCAL INFILE to queue data rows. Please see the Bulk XLSX Interpreter Section for limitions and requirments.

SQL Interpreter

This Interpreter is to be used when using the SQL Data Loader. Behind the scenes this uses the Bulk CSV Interpreter as it is very fast. If you run into errors please see the Bulk XLSX Interpreter Section for limitions and requirments.

XML Schema Based Preview Interpreter

This Interpreter is an expansion upon the default XML based interpreter that will load all fields defined by the provided Xsd file for use in the preview screen.

Data Loaders

SQL Data Loader

The SQL Data Loader uses DBAL to allow data to be loaded from a SQL source. Connections to any database supported by DBAL will work provided they are configured correctly inside of database.yaml. (Database configuration can be placed in any valid Symfony config file, provided its in the correct format as can be seen in database.yaml).

To set up a SQL loader

  1. Create a new connection in database.yaml or if using the Pimcore database skip this step. SQL Loader Configuration
  2. Select the correct connection from the Connection Name dropdown
  3. Provide a valid query using the Select, where, from, Group By, and Limit fields.
  4. Ensure to select SQL under File Format! This loader produces a CSV file as part of loading the SQL.

Data Targets

Data Targets control where data flows as its being mapped to Data Objects.

Advanced Classification Store

This is the same as the Classification Store Data Target except it adds the Overwrite options as seen on the Direct Data Target.

Image Gallery Appender

This can be used to add an image into an Image Gallery field.

Property

This is used to set a property on a Data Object.

Operators

Constants

This operator simply returns a constant string. Useful if wanting to control OBJECT_TYPE object or variant.

SafeKey

This ensure that a value is cleaned to be a valid Key value.

Import Asset Advanced

This allows two additional pieces of functionality when importing an asset:

Path Uses the Path Syntax described above to store the asset in a specified folder.

URL Property Specifies the name of the property on the asset to store the source URL the asset was captured from.

Arithmetic

This allows you to apply addition, subtraction, multiplication, or division with a defined constant and your value.

Regex Replace

This allows you to do string replaces using preg_replace

Element Loading

Advanced Path Strategy

This allows loading objects using the Path syntax described earlier in this ReadMe.

Using the example Excel file in the Path section you could load the Data Object at /Products/Cars/GMC/Sierra/2015 using Path Syntax /Products/Cars/$[1]/$[2]/$[0].

Property

This allows a data object to be loaded based on the value of a property stored on it.

This assumes that the property value is unique. If a non-unique value exists, it'll be a random object returned that matches the criteria.

Element Creation

Advanced Parent Strategy

This allows locating objects using the Path syntax described earlier in this ReadMe.

Using the example Excel file in the Path section you could create a Data Object with parent /Products/Cars/GMC/Sierra/2015 using Path Syntax /Products/Cars/$[1]/$[2]/$[0].