-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Describe the bug
When using the file connector, if specifying a directory, a file name and setting keepFile = true the connector simply re-reads the file once it is complete, this is regardless of whether it was modified or not. If setting keepFile = false then it does not, as after the first read the connector deletes the file.
To Reproduce
Steps to reproduce the behavior:
- First run pulsar standalone, and create a local file reader connector, using the following config (any.yaml) read the file. Note any.csv can be any file
configs:
inputDirectory: "/tmp/"
recurse: false
keepFile: true
fileFilter: "any.csv"- Run the connector via the following command
pulsar-admin sources localrun \
--archive connectors/pulsar-io-file-2.8.1.nar \
--name spout \
--destination-topic-name file-raw \
--source-config-file any.yaml- Check the stats via the stats msgIn from the topic stats
./pulsar-admin topics stats lotr-file-raw | grep msgIn
You will see that the msgIn exceeds the number of lines in the csv file, if you now repeat this experiment but change the yaml to keepFile: false you will see that the msgIn will not exceed the number of lines in the csv file.
Expected behavior
The file connector should not repeatedly read the file, it should read the file once.
If the file has changed, then it should either read and continue where it left off. But this should be a config set.
Desktop (please complete the following information):
- OS: Mac OSX latest, latest pulsar 2.8.1.
This issue is still in the latest build of pulsar because the source code for the file connector has not changed in years.