Skip to content

File Connector repeats reading the same file if not deleted #13301

@Haaroon

Description

@Haaroon

Describe the bug

When using the file connector, if specifying a directory, a file name and setting keepFile = true the connector simply re-reads the file once it is complete, this is regardless of whether it was modified or not. If setting keepFile = false then it does not, as after the first read the connector deletes the file.

To Reproduce
Steps to reproduce the behavior:

  1. First run pulsar standalone, and create a local file reader connector, using the following config (any.yaml) read the file. Note any.csv can be any file
configs:
    inputDirectory: "/tmp/"
    recurse: false
    keepFile: true
    fileFilter: "any.csv"
  1. Run the connector via the following command
pulsar-admin sources localrun \
--archive connectors/pulsar-io-file-2.8.1.nar \
--name spout \
--destination-topic-name file-raw \
--source-config-file any.yaml
  1. Check the stats via the stats msgIn from the topic stats
./pulsar-admin topics stats lotr-file-raw | grep msgIn

You will see that the msgIn exceeds the number of lines in the csv file, if you now repeat this experiment but change the yaml to keepFile: false you will see that the msgIn will not exceed the number of lines in the csv file.

Expected behavior

The file connector should not repeatedly read the file, it should read the file once.
If the file has changed, then it should either read and continue where it left off. But this should be a config set.

Desktop (please complete the following information):

  • OS: Mac OSX latest, latest pulsar 2.8.1.
    This issue is still in the latest build of pulsar because the source code for the file connector has not changed in years.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions