Skip to content

Conversation

@ABScripts
Copy link
Owner

Implement tail-n

* 3. In each chunk, search for newline characters (`\n`).
* 4. Accumulate the relevant text into a final buffer that will contain the last N lines.
*/
fn get_last_n_lines(filename: &String, n: u64) -> String {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: it is almost always preferred to pass &str instead of &String

*/
fn get_last_n_lines(filename: &String, n: u64) -> String {
let mut file = match File::open(filename) {
Err(why) => panic!("Failed to open {}: {}", &filename, why),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: formatting arguments can be inlined (just my personal preference, you can choose however you like it):

panic!("Failed to open {filename}: {why}")


file.seek(SeekFrom::End(0)).unwrap();

let mut file_size = file.stream_position().unwrap();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can get file_size from the line above.

// adjust buf to the desired amount of data to read (max READ_BUFFER_SIZE)
// size of this slice automatically lets 'read_exact' how many bytes to read
let buf_slice = &mut buf[..read_size as usize];
match file.read_exact(buf_slice) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is better to use if let when match is used only for a single case:

if let Err(e) = file.read_exact(buf_slice) {
    let pos = file.stream_position();
    panic!("Couldn't read data from position {pos:?}, err: {e}");
}

PS: we didn't talk about pattern matching yet, will be a forecasting for you.

// Sometimes it can be lass than its max size as we may already found all sentences in this batch
let mut text_slice = &buf_slice[..];
for (pos, ch) in buf_slice.iter().rev().enumerate() {
if matches!(ch, b'\n') {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matches! is mostly used for "complex" pattern matching. you probably want to use *ch == b'\n' here.

FYI: you can also look at .iter().rposition(...)

}

// get backwards once again as file pointer was moved by previous 'read_exact' call
file_size = file.seek(SeekFrom::Current(-(read_size as i64))).unwrap();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double seek can be mitigated by offseting from start/end instead of current position

// NOTE: Seems that we can also pass &text_slice here - no difference?
if let Ok(str) = str::from_utf8(text_slice) {
// TODO: It doesn't seem to be efficient...
tailed_output = format!("{}{}", str, tailed_output);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is not efficient. String::insert_str is a little better.


[dependencies]
clap = { version = "4.5.50", features = ["derive"] }
io = "0.0.2"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not needed

}

// NOTE: Seems that we can also pass &text_slice here - no difference?
if let Ok(str) = str::from_utf8(text_slice) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also not correct when working with utf8. chunk might start/end in the middle of the utf8 code point and you'll get an error here.

utf8 code unit - 1 byte
utf8 code point - 1-4 bytes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants