Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbounded Log File Growth #9

Closed
cwilliams5 opened this issue Feb 6, 2022 · 7 comments
Closed

Unbounded Log File Growth #9

cwilliams5 opened this issue Feb 6, 2022 · 7 comments
Labels
bug Something isn't working

Comments

@cwilliams5
Copy link
Contributor

Describe the bug
Logging file grows unbounded.

To Reproduce
Steps to reproduce the behavior:

  1. Run the script.
  2. Script loops forever.
  3. Logfile grows forever.

Expected behavior
I actually like the script looping (more on that in future feature request), but I don't expect my disk to fill up forever with a log file. Not sure what the best solution is, but a simple one would be to cap the log file at a certain size, or only keep one loops worth of data. Just something to stop infinite growth.

Screenshots
After just 5 minute or so of looping. That would be 4GB of data a day if left alone.
Iimage

Desktop (please complete the following information):

  • Windows
  • Latest .py direct

Additional context
N/A

@cwilliams5 cwilliams5 added the bug Something isn't working label Feb 6, 2022
@cwilliams5
Copy link
Contributor Author

Your definitely hitting the disk hard too. One pass through the script is:

image
33k+ IO operations.

My latest comment in #5 suggests an improvement to the way you check against activated_packages to decrease disk access and improve user experience.

But in this bug's context, from a logging perspective, there is just a lot of useless information in the log:
image
Thoughts:

  • Perhaps a setting, defaulted on, to not log debug level messages?
  • If your going to log activation skips like this, with the other suggested method you could have it be one line item (ie 5,432 packages found in activated_packages and skipped)

@shakeyourbunny
Copy link

shakeyourbunny commented Feb 6, 2022

Well, I hacked the script for my purposes and added even more (descriptive) stuff in the DEBUG channel.

If you really wanna see less stuff in the log, set the log level in the script to INFO (logging.basicConfig).

Why are you dancing around the number of disk accesses? This makes only sense if the script is running on some sort of cheapo flash disk or similar.

Granted, the script can use some more enhancements (like attaching steam tags to each appid to activate only wanted stuff) and shows that the author has not much Python knowledge (has used the example code from the ASF python module as a base), but if you have such a long running script (run time is ~ 330 days on my side ... to activate everything ), you ought to do some dumping to disk and other stuff once a while.

@shakeyourbunny
Copy link

If I'm inclined, I'd do a complete rewrite of the script, but currently, I've other things to do.

@cwilliams5
Copy link
Contributor Author

If you really wanna see less stuff in the log, set the log level in the script to INFO (logging.basicConfig

Done and works great, TY.

Why are you dancing around the number of disk accesses? This makes only sense if the script is running on some sort of cheapo flash disk or similar.

Because exactly that, I want to throw it on a Pi and not burn out the SD card. Either way why work the drive harder then you need to? Opening a file for every item in the array, even those your skipping, is inefficient and easily improved by comparing the arrays up front. Combined with not needing debug logging for normal operations and you've made massive improvement.

Granted, the script can use some more enhancements (like attaching steam tags to each appid to activate only wanted stuff) and shows that the author has not much Python knowledge (has used the example code from the ASF python module as a base)

We all start somewhere. I also know little real Python, but my coding days are 10 years behind me. He's listening, improving it, and sharing it publicly so that's good. I couldn't find another working script out there and I wanted to automate this process and not rely on manual steamdb website interactions.

(run time is ~ 330 days on my side ... to activate everything )

That's part of the problem. The current package list @ 50 attempts an hour should be 14 days. Giving what speedup advice I can in #4 and #5.

but if you have such a long running script (run time is ~ 330 days on my side ... to activate everything ), you ought to do some dumping to disk and other stuff once a while.

I'm not saying never write to disk, I'm just saying don't write to the disk up to 60,000 times a minute. There is space between the two positions. Keeping activated_packages.txt up to date is obviously good.

If I'm inclined, I'd do a complete rewrite of the script, but currently, I've other things to do.

Understood. Ty for the feedback either way.

@shakeyourbunny
Copy link

Granted, the script can use some more enhancements (like attaching steam tags to each appid to activate only wanted stuff) and shows that the author has not much Python knowledge (has used the example code from the ASF python module as a base)

We all start somewhere. I also know little real Python, but my coding days are 10 years behind me. He's listening, improving it, and sharing it publicly so that's good. I couldn't find another working script out there and I wanted to automate this process and not rely on manual steamdb website interactions.

I don't blame @Luois45 for that, I also started Python programming after being very upset and disgruntled about a problem, but he shows in his code that he is learning, so no biggie.

(run time is ~ 330 days on my side ... to activate everything )

That's part of the problem. The current package list @ 50 attempts an hour should be 14 days. Giving what speedup advice I can in #4 and #5.

I already put my active steam licenses in activated_packages.txt tho.

Best way to slim that down would be having a correlation to the Steam tags or similar to really cut the unwanted stuff out (like all this f2p, mmo, student projects, video player, VR stuff), but this would need loading (at least the raw) HTML store page and scraping the tags off it.

No, SteamDB is off limits for that and I don't want to get IP banned for that :)

@Luois45
Copy link
Owner

Luois45 commented Feb 6, 2022

Granted, the script can use some more enhancements (like attaching steam tags to each appid to activate only wanted stuff) and shows that the author has not much Python knowledge (has used the example code from the ASF python module as a base)

We all start somewhere. I also know little real Python, but my coding days are 10 years behind me. He's listening, improving it, and sharing it publicly so that's good. I couldn't find another working script out there and I wanted to automate this process and not rely on manual steamdb website interactions.

I don't blame @Luois45 for that, I also started Python programming after being very upset and disgruntled about a problem, but he shows in his code that he is learning, so no biggie.

I'm not at all upset or disgruntled about any problem at all, but just wanted to create this project to help other people out who do want to use such a script, have fun and learn more along the way.

(run time is ~ 330 days on my side ... to activate everything )

That's part of the problem. The current package list @ 50 attempts an hour should be 14 days. Giving what speedup advice I can in #4 and #5.

I already put my active steam licenses in activated_packages.txt tho.

Best way to slim that down would be having a correlation to the Steam tags or similar to really cut the unwanted stuff out (like all this f2p, mmo, student projects, video player, VR stuff), but this would need loading (at least the raw) HTML store page and scraping the tags off it.

No, SteamDB is off limits for that and I don't want to get IP banned for that :)

You wouldn't have to scrape the raw HTML steam page.
You do receive it in the requests, which are used to create the package_list.txt.
This is an example of the answer received: https://pastebin.com/AUuw0v2Z
@shakeyourbunny If you do want to improve the script to use categories you can do it for sure, but I don't think that feature would be important for most people.

I actually like the script looping (more on that in future feature request), but I don't expect my disk to fill up forever with a log file. Not sure what the best solution is, but a simple one would be to cap the log file at a certain size, or only keep one loops worth of data. Just something to stop infinite growth.

I've set the log level to just log everything with the level warning or higher levels.

EDIT: corrected typo

@cwilliams5
Copy link
Contributor Author

Confirming disk access is now reasonable, issue resolved, ty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants