Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

return "pick up where you left off" feature back to search #68

Closed
skilfullycurled opened this issue May 6, 2015 · 4 comments
Closed

Comments

@skilfullycurled
Copy link

It's been a few months since I used Twarc and upon returning to it, I see that the feature of "picking up where you left off" has been moved to archive.py. If I'm understanding the reasoning behind this, it's that moving it elsewhere leaves room for people to send their data where they want to.

That makes sense, but I cannot think of an instance in which I would not want this feature. If the program shuts down for any reason, you will need this feature. Why would I use the --search option given that all the features of search are contained in archive.py with the added benefit of being able to start where I left off should the program fail for some reason. It should be noted, I do not think of all things, so reasons may exist.

My limited experience aside, the only other places I think one would want to write data is to either a database or a program that does something with the data first and then passes it on. While doing so might be preferred to a single json file, what I love about twarc is it's simplicity. It writes to a json file, and you can manage the json file afterwards. Using twarc in any other capacity would take enough re-writing on the part of the user to make it work that it's not worth losing this as a default feature. I think it would be preferable to have a flag that sets the output to a different location than it would be to assume that you want it to go to a different location by default.

@edsu
Copy link
Member

edsu commented May 6, 2015

The problem is the "pick up" feature didn't really didn't work well once the --stream option was added. The code also got a bit complicated and tangled, and I needed to simplify it. I am sorry if this doesn't fit your use case. Please give the archive utility a try.

@edsu edsu closed this as completed May 6, 2015
@skilfullycurled
Copy link
Author

Ah I see. I didn't realize it was an architectural issue that was final. The archive.py file indicates otherwise.

"This functionality was initially part of twarc.py itself (not in a utility). If it proves useful perhaps it can go back in. But for now twarc.py writes to stdout to let you manage your data the way you want to."

I am happily using the archive.py function. That fits my use case perfectly, but my larger point was that I can't think of a use case where I would use --search given that archive.py is search plus. Again, I do not think of all things so my assertion is suspect.

Thanks, Ed!

@remagio
Copy link

remagio commented May 7, 2015

Related to this issue, not to return past features, I would suggest to add a tricky feature. Probably useful for both --search --stream and archive.py: exit the process when twarc output "no new tweets…", catch python errors and do de same I found cases that it doesn't exit to the shell.
This way supervisor (or other linux system tools to running daemons) could be used. Still some trouble to catch error coming by twitter, you know.

@edsu
Copy link
Member

edsu commented May 7, 2015

I believe this problem of not exiting was fixed already. Update to the latest version, and please file a bug report if you notice a specific problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants