Skip to content

Configuration

ServerlessSam edited this page Nov 10, 2022 · 56 revisions

The data file merge (dfm) config file defines a set of rules for how you'd like to merge your data files. The file itself has the following structure:

{
  "SourceFiles": [
    <SourceFile>,
    <SourceFile>,
    <SourceFile>,
    ...
  ],
  "DestinationFile": <DestinationFile>
}

The merging will occur in the order of the SourceFiles list from SourceFiles[0] to SourceFiles[n].

SourceFile

A source file object contains all the information needed for dfm to extract data from one or more data files at a given node:

{
  "SourceFileLocation": <FileLocation>,
  "SourceFileNode": <string>,
  "DestinationFileNode": <string>
}

SourceFile.SourceFileLocation

A FileLocation object that points to one or more local source files for the merge.

See FileLocation.

SourceFile.SourceFileNode

A string following json-path dollar-notation syntax. The json-path points to the node to treat as the root node when extracting data from the source file(s). I.E the value of this node will be the content copied to the destination file.

SourceFile.DestinationFileNode

A string following json-path dollar-notation syntax. The json-path points to the node who's value will contain the content merged from source file(s). See merging logic for details of how the merge occurs.

DestinationFile

A destination file object contains all the information needed for dfm to know what file to merge the source data into:

{
  "DestinationFileLocation": <FileLocation>
}

DestinationFile.DestinationFileLocation

A FileLocation object that points to one destination file for the merge. Currently only supports merging to one file. If the file exists, the merge will preserve the existing data within it (see merging logic. dfm supports writing to a brand new file, so the destination file (or desired node) does not need to exist already.

See FileLocation.

FileLocation

A file location object contains all the information needed to identify one or more files on your local file system:

{
  "Path": <string>,
  "PathSubs": <dict> // (optional)
}

FileLocation.Path

Path is string containing the sub-section of a local path to one or more files. E.g foo/bar.py. This sub-section of a path will be appended to the value set in the DFM_ROOT_PATH env var. Note you do not need a trailing / in the env var value, nor do you need a / prefix in your Path value.

The string supports pathlib syntax so the following can also be used (DFM_ROOT_PATH=/) :

  • Asterisk wildcarding: e.g:
    • foo/*.py would find /foo/bar.py and /foo/baz.py
    • */bar.py would find /foo/bar.py and /baz/bar.py
  • Double asterisk wildcarding: e.g:
    • **/bar.py would find /foo/bar.py and /foo/baz/bar.py

In addition you can also use Substitutions placeholders which will be substituted when searching for paths. These use the ${<key>} syntax and the value of <key> will correspond to a key from PathSub.

For example, the configuration below would provide the path /foo/hello/world.json:

{
  "Path": "foo/${Bar}/${Baz}.json",
  "PathSubs": {
    "Bar": {
      "Type": "Literal",
      "Value": "hello",
    },
    "Baz": {
      "Type": "Literal",
      "Value": "world",
    }
  }
}

(More on the meaning of ["Type": "Literal"] can be found in FileLocation.PathSubs.Type).

FileLocation.PathSubs (Optional)

An object who's key represent a string to replace in the corresponding Path. The value to substitute in is determined in the following order:

  1. Initial substitution value is determined using the Type and Value keys.
  2. This value is optionally passed into a regex query and a capture group is extracted if the Regex key is present.
  3. This capture group value is optionally converted to a new naming convention if the NamingConvention key is present.
{
  "Sub1": {
    "Type": <string>,
    "Value": <string>,
    "Regex": <Regex>, // (optional)
    "NamingConvention": <string> // (optional)
  },
  "Sub2": ...,
  ...
  "SubN": ...
}

FileLocation.PathSubs.Type

The type of substitution to make. It can be one of the following:

  • Literal: Will literally substitute Value into the string.
  • Parameter: Will substitute in the value of the parameter with the key name Value. See CLI Usage for more information regarding parameters.
  • Key (Not supported yet): TBC
  • Content (Not supported yet): TBC

FileLocation.PathSubs.Value

The value is a string which will be used as part of the logic for the corresponding substitution type.

FileLocation.PathSubs.Regex (Optional)

The presence of a Regex key will execute a regex query against the current substitution value candidate. A particular named or un-named capture group value is then selected.

{
  "Expression": <string>,
  "CaptureGroup": <string|int>
}

FileLocation.PathSubs.Regex.Expression

A regex expression to query a string against. It must contain named or un-named capture groups.

FileLocation.PathSubs.Regex.CaptureGroup

The name of the capture group to extract the value from. This can be the string name of a named capture group or the index of an un-named capture group.

FileLocation.PathSubs.NamingConvention (Optional)

An optional parameter which will convert from one established naming convention to another. The value must be of the form FromXToY where the supported conventions are:

  • camelCase
  • PascalCase
  • snake_case
  • UPPERCASE
  • lowercase

e.g FromCamelToUpper or FromPacalToSnake etc


Examples

See the dfm-examples repo