Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Append to existing export if one exists #485

Closed
rikai opened this issue Jan 25, 2021 · 4 comments
Closed

Append to existing export if one exists #485

rikai opened this issue Jan 25, 2021 · 4 comments

Comments

@rikai
Copy link

rikai commented Jan 25, 2021

Hello,

I'm not too sure how viable this would be for certain export formats like HTML, but as someone that uses the TXT export, it would be nice if rather than pulling the entire date range, if the exporter could check the date of the export and instead append new messages from where the left off.

I'm not sure how exactly this would be implemented, maybe some sort of tracking of the last exported message for a channel at the top of the file?

I know that for many use cases, creating a new text file for each export would be the solution, however in my use case, it would save a lot of time to be able to not do so, and additionally would help minimize requests.

If this feature already exists, I apologize, I was unable to locate it when poking around.

@andrewkolos
Copy link
Contributor

andrewkolos commented Jan 25, 2021

@Tyrrrz Has this been requested in the past? I feel like it has, but that might be a false memory. If it hasn't, please add your thoughts.

@rikai, Hello and thanks for bringing your issue forward! Outside of the (minor as I currently see it) inconvenience, why not just perform a second export by setting the start time (--after option) to the time of the last message from the previous export? If this something you do often and are handy with programming, you could write a script that stitches the two files together.

@Tyrrrz
Copy link
Owner

Tyrrrz commented Jan 25, 2021

Yes, it has been requested before. The complexity in a feature like this is significant, as we would need to be able to parse the output in order to determine where it ended. Currently we avoid a myriad of problems by not doing that, including having the flexibility to not worry about changing the format (with the exception of JSON). On top of that, if this feature is added, it will need to be supported across all formats, not just one or few of them.

The suggested alternative is, like @andrewkolos said, to do timeboxed exports. This will result in multiple files instead of one, but when exporting large channels this is what you'll end up wanting to do anyway.

As it stands now, there is very little chance that this feature will be implemented, sorry.

Possibly related/duplicating issues:

#32 #295 #444 #476

@rikai
Copy link
Author

rikai commented Feb 2, 2021

To explain why I was looking for a feature like this, it essentially boils down to attempting to fit it into an existing workflow. I am using the logs to generate statistics, and the software I use expects a single file as an input.

I'm unfortunately not too handy with programming, which is why I didn't write a script to do so myself, but I do understand enough to know the level of complexity here, I was just hoping it might be able to be avoided using some sort of notation of last message ID parsed or somesuch.

That said, while disappointing, I fully understand the inability to add such a feature! Thanks for your consideration.

@Tyrrrz
Copy link
Owner

Tyrrrz commented Feb 8, 2021

Yes, it's possible to include some hidden metadata for last exported message, at least in some formats (not in plain text and csv). However, that's not the biggest issue.

The main issue is that we would have to parse the output to determine the point from which to continue writing. For example, in HTML we'd have to find the div that contains the last message and start inserting content right after it, inside body; in JSON we'd have to do a similar thing to ensure we write between the right braces. Now imagine if at some point we decide to change the structure slightly. This is further complicated by the fact that the export output may be huge and it's impractical to load it all in memory at one time. Additionally, the implementation we use to write HTML and JSON don't support inserting content in the middle of an existing stream, so we'd have to apply some black magic to do it.

I understand that this feature could be very useful for some use cases, but ultimately it looks horrifyingly difficult to implement and maintain. I think, in your case, it may be easier to modify the software to allow it to aggregate data from multiple files, instead of just one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants