You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running a Selenium instance with a Firefox driver to scrape and download many files from a website. This runs as a Python Flask web service inside a Docker container.
I discovered that my container would scrape a few pages before it began to hit it's memory limits and need a restart. I used Python's default profiler to investigate where the memory allocation was growing and discovered that the process handler to the log file continued to grow with each execution. This was especially surprising given that I was pointing the logging service to /dev/null.
I was able to resolve this in my code by manually closing the file handler prior to calling driver.quit(). I think it might be best if driver.quit() handled closing this handler internally.
How can we reproduce the issue?
# This was a wrapper I created to ensure that the log file handler closes when# I am finished with the driver. If you remove the line that closes the handler# and run this driver instance against a site multiple times, you'll observe# that the handler eats up more and more space. If you don't have a lot of# memory, you may also observe that each execution gets slower.
class DriverManager:
"""wraps a selenium driver instance attrs: download_path (str): the folder location that driver downloads will be placed inside firefox_exe_path (str): the path to the Firefox executable gecko_driver_exe_path (str): the path to the Gecko Driver executable"""
def __init__(
self,
download_path: str,
firefox_exe_path: str,
gecko_driver_exe_path: str,
):
self.download_path = download_path
# Setup the firefox webdriver
service = Service(executable_path=gecko_driver_exe_path, log_path=os.devnull)
options = Options()
options.headless = True
options.binary = firefox_exe_path
options.set_preference("browser.download.folderList", 2)
options.set_preference("browser.download.manager.showWhenStarting", False)
options.set_preference("browser.download.dir", download_path)
options.set_preference("download.prompt_for_download", False)
options.set_preference(
"browser.helperApps.neverAsk.saveToDisk", "application/pdf"
)
options.set_preference("pdfjs.disabled", True)
options.set_capability("marionette", True)
self.driver = Firefox(options=options, service=service)
def __enter__(self) -> Firefox:
return self.driver
def __exit__(self, exception_type, exception_val, trace):
# closes file handler manually to fix memory leakself.driver.binary._log_file.close()
self.driver.quit()
Relevant log output
I wish I'd kept the profiler output, but I don't have it anymore.
Operating System
Debian Buster
Selenium version
Python 4.1.3
What are the browser(s) and version(s) where you see this issue?
Firefox 102.0.1
What are the browser driver(s) and version(s) where you see this issue?
GeckoDriver v0.31.0
Are you using Selenium Grid?
No response
The text was updated successfully, but these errors were encountered:
@acbilson, thank you for creating this issue. We will troubleshoot it as soon as we can.
Info for maintainers
Triage this issue by using labels.
If information is missing, add a helpful comment and then I-issue-template label.
If the issue is a question, add the I-question label.
If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.
If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C),
add the applicable G-* label, and it will provide the correct link and auto-close the
issue.
After troubleshooting the issue, please add the R-awaiting answer label.
It was a few months back that I ran into this, so my memory is a little fuzzy. I just remembered while looking at the configuration that I had added harakiri mode to my USWGI config. It's possible that the log file handler was actually keeping the main process alive and not only an open handler to the log file.
What happened?
I am running a Selenium instance with a Firefox driver to scrape and download many files from a website. This runs as a Python Flask web service inside a Docker container.
I discovered that my container would scrape a few pages before it began to hit it's memory limits and need a restart. I used Python's default profiler to investigate where the memory allocation was growing and discovered that the process handler to the log file continued to grow with each execution. This was especially surprising given that I was pointing the logging service to
/dev/null
.I was able to resolve this in my code by manually closing the file handler prior to calling driver.quit(). I think it might be best if driver.quit() handled closing this handler internally.
How can we reproduce the issue?
Relevant log output
I wish I'd kept the profiler output, but I don't have it anymore.
Operating System
Debian Buster
Selenium version
Python 4.1.3
What are the browser(s) and version(s) where you see this issue?
Firefox 102.0.1
What are the browser driver(s) and version(s) where you see this issue?
GeckoDriver v0.31.0
Are you using Selenium Grid?
No response
The text was updated successfully, but these errors were encountered: