You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using dynamic/multi writes, one can also specify a formatting of the value returned by the field. Out of the box, elasticsearch-hadoop provides formatting for date/timestamp fields which is useful for automatically grouping time-based data (such as logs) within a certain time range under the same index. By using the Java SimpleDataFormat syntax, one can format and parse the date in a locale-sensitive manner.
What this boils down to is that you can use a pattern like index/type-{time:YYYY.MM.dd} to denote that the time field should be formatted as YYYY.MM.dd and inserted into the type name. This works fine when used in the type name, but in 6.0 by default indices will only have one type allowed on them. Which leads us to the problem:
Cascading and Pig break when this formatting is applied in the index part of the name. There is a lot of logic in both Cascading and Pig that expects that the Tap or Load function refer to resources that can be resolved on HDFS. Generally this is not a problem since the paths are processed and do not contain any host address in them, but when the pattern is specified, the colon trips up the path parsing and causes the code to throw an exception.
There's really no way to change this behavior since much of the code lives on Cascading and Pig's side, and when the input classes are queried they must respond with an absolute path to the resource. In Cascading's case this is low impact since we could just return a placeholder and be on our way, but Pig requires the resource to be real, as it will later pass the resource to the load function for loading, and modifying it in any way would cause the load function to not receive the index and type correctly.
To get around this we will need to change the separator in the parsing of formatted patterns. In 6.0 only the pipe character "|" will be accepted, but in 5.x we will continue to accept colon ":" as well as the new | character going forward, with a deprecation warning about the former when we encounter it.
The text was updated successfully, but these errors were encountered:
… format.
Colon character gets in the way when some frameworks attempt to fit the input resource
into a Path for HDFS (even though they eventually never use it as an HDFS path). The
break in parsing causes the jobs to fail when using this format.
Applying fix to the format separator in the index pattern.
relates #985
As described in the documentation:
What this boils down to is that you can use a pattern like
index/type-{time:YYYY.MM.dd}
to denote that thetime
field should be formatted asYYYY.MM.dd
and inserted into the type name. This works fine when used in the type name, but in 6.0 by default indices will only have one type allowed on them. Which leads us to the problem:Cascading and Pig break when this formatting is applied in the index part of the name. There is a lot of logic in both Cascading and Pig that expects that the Tap or Load function refer to resources that can be resolved on HDFS. Generally this is not a problem since the paths are processed and do not contain any host address in them, but when the pattern is specified, the colon trips up the path parsing and causes the code to throw an exception.
There's really no way to change this behavior since much of the code lives on Cascading and Pig's side, and when the input classes are queried they must respond with an absolute path to the resource. In Cascading's case this is low impact since we could just return a placeholder and be on our way, but Pig requires the resource to be real, as it will later pass the resource to the load function for loading, and modifying it in any way would cause the load function to not receive the index and type correctly.
To get around this we will need to change the separator in the parsing of formatted patterns. In 6.0 only the pipe character "
|
" will be accepted, but in 5.x we will continue to accept colon ":
" as well as the new|
character going forward, with a deprecation warning about the former when we encounter it.The text was updated successfully, but these errors were encountered: