Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolved issue #15 and added true iterator for efficient reading of records #17

Merged
merged 4 commits into from Sep 26, 2015
Merged

Conversation

thammegowda
Copy link
Contributor

  • The issue sequence_reader.slice() is skipping the first record in sequence file #15 is resolved. slice() method is now proper
  • RecordIterator, a real iterator in java is added to efficiently read records( the subsequent calls to slice() has heavy disk seek overhead as it has to skip records till the start position for each call).
  • Test case SequenceReaderTest is added

@chrismattmann
Copy link
Collaborator

great work thanks @thammegowda

chrismattmann added a commit that referenced this pull request Sep 26, 2015
Resolved issue #15 and added true iterator for efficient reading of records
@chrismattmann chrismattmann merged commit c9fb0b9 into ContinuumIO:master Sep 26, 2015
@@ -10,7 +10,7 @@

data = nutchpy.sequence_reader.slice(5,20,path)
# print(data)
assert len(data) == 2
assert len(data) == 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a slice from 5 to 20 yielding a length of 3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the example data has 8 records [0..7], the slice(5, 20) function returns last three records [5,6,7].

@ahmadia
Copy link
Contributor

ahmadia commented Sep 26, 2015

This looks really good. Thanks @thammegowda

@thammegowda
Copy link
Contributor Author

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants