New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom 'path.format' of 'FieldPartitioner' #397
Comments
The reason for the
Yes, as the docs say, when partitioning with TimeBasedPartitioner, you can look in the source of the |
@Cricket007 Thanks for your reply. In my situation, many of downstream application code depend on the path format of hdfs, the former way of collecting log is through Flume. Flume then send it to Kafka and HDFS. If only add this feature. |
I still think Flume (or Filebeat or Fluentbit) are still necessary tools. Running a JVM to tail a log file is rather unnecessary, in my opinion. Plus, I've not used the spooldir connector to really say if that's a reasonable option for tailing logs... My point here is that Flume itself has an HDFS Sink, which is one option, otherwise Hortonworks supports Nifi for this use case, and Cloudera seems to partner with Streamsets. Overall, I'm saying this feature doesn't currently exist, but if you're comfortable with Java, adding JARs to connector classpath isn't too difficult |
My initial idea was to replace Flume with Kafka Connector in order to reduce the maintenance cost of the system. At the same time, the impact of replacement on the original system should minimized. I thought at first that all I had to do was to simply modify In this case, it seems that the source code has to be modified to meet this requirement.And implement Thank you for your patient reply and explanation.😃 I will close the issue then. |
Note: This project is a sink to HDFS. It cannot read your log files. The FileStreamSinkConnector also doesn't monitor new files, or tail them. Perhaps you are looking for this source connector -https://github.com/jcustenborder/kafka-connect-spooldir Or one of the other tools I mentioned (Filebeat and Fluentbit), which, as I point out, are more lightweight than Flume, but still require a config file. Or you can try using NiFi and Minifi, or Streamsets and SDC-Edge for a GUI / WYSIWYG approach at data collection, and optionally into Kafka, or just directly into HDFS/Hive using those tools |
I'm new for HDFS connector.In my case, I want to use kafka to do something like log collect. I'm using HDFS connector to put my data from kafka to HDFS. and I'm using Fieldpartitioner with fields name.
For example , here is my sample data:
I want to store them to hdfs as such path "path-to/type/time"
for example "/tmp/type1/2018-12-13".
however, it worked as "/tmp/tpye=type1/time=2018-12-13"
here is my configuration of HDFS connector:
I try to use "path.format", but it didn't work, in the doc of "HDFS Connector Configuration Options",it says
So is it unable to set the format of "FieldPartitioner" through "path.format"?
Is there any easy way to config the connector to achieve my goal?
Thank you!
The text was updated successfully, but these errors were encountered: