Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pulsar-io-hdfs2] Add config to create subdirectory from current time #7771

Merged
merged 4 commits into from
Aug 11, 2020

Conversation

BewareMyPower
Copy link
Contributor

Motivation

Adding a subdirectory associated with current time willmake it easier to process HDFS files in batch.

For example, user can create multiple running sink instances with yyyy-MM-dd-hh pattern. Then stop all instances at next hour. Eventually, files of the subdirectory will contain all messages consumed during this hour.

Modifications

  • Add a subdirectoryPattern field to HdfsSinkConfig
  • Update some simple tests for HdfsSinkConfig
  • Update the doc of HDFS2 sink

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (docs)

@BewareMyPower
Copy link
Contributor Author

/pulsarbot run-failure-checks

2 similar comments
@BewareMyPower
Copy link
Contributor Author

/pulsarbot run-failure-checks

@BewareMyPower
Copy link
Contributor Author

/pulsarbot run-failure-checks

@sijie sijie added this to the 2.7.0 milestone Aug 11, 2020
@sijie sijie merged commit 569b8f9 into apache:master Aug 11, 2020
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Aug 24, 2020
…apache#7771)

### Motivation

Adding a subdirectory associated with current time willmake it easier to process HDFS files in batch.

For example, user can create multiple running sink instances with `yyyy-MM-dd-hh` pattern. Then stop all instances at next hour. Eventually, files of the subdirectory will contain all messages consumed during this hour.

### Modifications

- Add a `subdirectoryPattern` field to `HdfsSinkConfig`
- Update some simple tests for `HdfsSinkConfig`
- Update the doc of HDFS2 sink

### Documentation

  - Does this pull request introduce a new feature? (yes)
  - If yes, how is the feature documented? (docs)
lbenc135 pushed a commit to lbenc135/pulsar that referenced this pull request Sep 5, 2020
…apache#7771)

### Motivation

Adding a subdirectory associated with current time willmake it easier to process HDFS files in batch.

For example, user can create multiple running sink instances with `yyyy-MM-dd-hh` pattern. Then stop all instances at next hour. Eventually, files of the subdirectory will contain all messages consumed during this hour.

### Modifications

- Add a `subdirectoryPattern` field to `HdfsSinkConfig`
- Update some simple tests for `HdfsSinkConfig`
- Update the doc of HDFS2 sink

### Documentation

  - Does this pull request introduce a new feature? (yes)
  - If yes, how is the feature documented? (docs)
lbenc135 pushed a commit to lbenc135/pulsar that referenced this pull request Sep 5, 2020
…apache#7771)

### Motivation

Adding a subdirectory associated with current time willmake it easier to process HDFS files in batch.

For example, user can create multiple running sink instances with `yyyy-MM-dd-hh` pattern. Then stop all instances at next hour. Eventually, files of the subdirectory will contain all messages consumed during this hour.

### Modifications

- Add a `subdirectoryPattern` field to `HdfsSinkConfig`
- Update some simple tests for `HdfsSinkConfig`
- Update the doc of HDFS2 sink

### Documentation

  - Does this pull request introduce a new feature? (yes)
  - If yes, how is the feature documented? (docs)
lbenc135 pushed a commit to lbenc135/pulsar that referenced this pull request Sep 5, 2020
…apache#7771)

### Motivation

Adding a subdirectory associated with current time willmake it easier to process HDFS files in batch.

For example, user can create multiple running sink instances with `yyyy-MM-dd-hh` pattern. Then stop all instances at next hour. Eventually, files of the subdirectory will contain all messages consumed during this hour.

### Modifications

- Add a `subdirectoryPattern` field to `HdfsSinkConfig`
- Update some simple tests for `HdfsSinkConfig`
- Update the doc of HDFS2 sink

### Documentation

  - Does this pull request introduce a new feature? (yes)
  - If yes, how is the feature documented? (docs)
@BewareMyPower BewareMyPower deleted the hdfs2-dev branch May 24, 2021 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants