Skip to content

Conversation

bonigarcia
Copy link
Member

@bonigarcia bonigarcia commented Jul 12, 2023

Description

This PR implements the first feature for automated browser management in Selenium Manager, concretely for the latest version of Chrome for Testing (CfT).

The general procedure of this feature is as follows. When the user (e.g., the bindings) invokes Selenium Manager, currently, only the driver is managed (i.e., downloaded and cached). With this PR, when managing Chrome, the browser binary will also be managed (if it is not present in the system). Moreover, this PR includes the flag --force-browser-download to force downloading of the browser (even in the -usual- case that Chrome is already installed). I created a Gist document to showcase the most representative cases.

Different things have changed with PR and deserve attention and discussion. Comments are welcome. Maybe the most relevant change is the output of Selenium Manager. So far, when using the JSON output, the field result.message contained the driver path. With this PR, this is no longer convenient since Selenium Manager can return two things: the driver path and (optionally) the browser path. Therefore, with this PR, there are two new JSON fields: result.driver_path and result.browser_path. These values should be read in the bindings. See the previous Gist snippets for examples.

A second change has to do with the browser TTL. So far, the browser TTL was used to cache the browser versions discovered by running shell command(s). But by default, the browser_TTL was 0, so in practice, this caching mechanism (stored in the local metadata file, i.e., ~/.cache/selenium/selenium-manager.json) is not used (although the logic is present in the Rust code).

With this PR, network calls to the CfT endpoints are used to discover the latest version of Chrome (very similar to what happens with chromedriver and other drivers). I believe these discovered versions deserve better to be cached in the local metadata. Therefore, I made two changes related to this:

  1. Stop caching the discovered local browser versions (by running shell commands). Therefore, shell commands are always used to discover the browser version.
  2. Use the existing mechanism to cache the discovered browser versions to be downloaded (e.g., CfT releases). This way, the second time a browser needs to be downloaded, the version can be already cached in the metadata.
  3. Change the default value of browser_TTL, from 0 to 3600 seconds (i.e., 1 hour, in the same way than driver_TTL. As a possible idea for the future, we can unify both TTLs in a single.

Finally, the overall timeout for network requests has been increased from 180 (in seconds) to 300 since browser download takes longer than driver download.

Motivation and Context

This is the first PR related to milestone 4 in the original Selenium Manager roadmap, i.e., "M4: Browser management: Chrome/Chromium".

I have tested this feature both in Windows and Linux. Moreover, the tests are executed in Windows, Linux, and macOS on GitHub actions. Unfortunately, I no longer have a macOS, so I did have the chance to test it locally on a macOS machine. It would be great if someone could try this PR on macOS.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • I have read the contributing document.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@bonigarcia bonigarcia added the C-rust Rust code is mostly Selenium Manager label Jul 12, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jul 12, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (592d8e8) 58.09% compared to head (8ff5f88) 58.09%.

❗ Current head 8ff5f88 differs from pull request most recent head 1def632. Consider uploading reports for the commit 1def632 to get more accurate results

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@           Coverage Diff           @@
##            trunk   #12353   +/-   ##
=======================================
  Coverage   58.09%   58.09%           
=======================================
  Files          86       86           
  Lines        5348     5348           
  Branches      207      207           
=======================================
  Hits         3107     3107           
  Misses       2034     2034           
  Partials      207      207           

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@titusfortner
Copy link
Member

I'm not sure I understand the --force-browser-download approach.
I only want to download it if I need it...

  • If the browser isn't found in expected directory, on PATH and user didn't specify, then presumably the code would fail, and Selenium Manager should always download it.
  • If the browser is found but it isn't the version specified by the user, Selenium Manager should download the right version.

And it doesn't look like we always get the browser location back. I think we should always report the browser location for what we are using to match the driver.

When running command line without --output json is it just outputting driver location?

What is the difference now between using --browser and --driver?

@diemol
Copy link
Member

diemol commented Jul 13, 2023

A second change has to do with the browser TTL. So far, the browser TTL was used to cache the browser versions discovered by running shell command(s). But by default, the browser_TTL was 0, so in practice, this caching mechanism (stored in the local metadata file, i.e., ~/.cache/selenium/selenium-manager.json) is not used (although the logic is present in the Rust code).

Do we really need this on this first iteration? In short, we want to download the browser if the user does not have it. And starting to cache versions from CfT would mean that we want to update the browser as well?

@bonigarcia
Copy link
Member Author

bonigarcia commented Jul 13, 2023

I'm not sure I understand the --force-browser-download approach. I only want to download it if I need it...

From the bindings perspective, you can safely ignore this flag. I added it for two reasons. First, I need a way to test this feature in CI (CfT is only downloaded when Chrome is unavailable, and in GH Actions, Chrome is already installed). Second, it can potentially be used by someone to download CfT even when a complete Chrome release is already in the system.

  • If the browser isn't found in expected directory, on PATH and user didn't specify, then presumably the code would fail, and Selenium Manager should always download it.

That is what this feature does. When Chrome is not found, the latest version of CfT is downloaded.

  • If the browser is found but it isn't the version specified by the user, Selenium Manager should download the right version.

That is not part of this PR. It will be implemented for #11678.

And it doesn't look like we always get the browser location back. I think we should always report the browser location for what we are using to match the driver.

Currently, the browser path is only returned when CfT is downloaded. But indeed, Selenium Manager can always return the browser path (the found one or the downloaded one). Do you want to implement it that way?

When running command line without --output json is it just outputting driver location?

See "Resolving chrome (forced) for the first time (regular output)" on the Gist document.

What is the difference now between using --browser and --driver?

Selenium Manager works in the same way for both flags. It is a way to select a "manager" (i.e., a module for managing driver/browser for different vendors, such as Chrome, Firefox, or Edge) by browser name (chrome, firefox, edge) or driver name (chromedriver, geckodriver, msedgedriver). But once the manager is defined, the behavior is the same.

@bonigarcia
Copy link
Member Author

Do we really need this on this first iteration? In short, we want to download the browser if the user does not have it. And starting to cache versions from CfT would mean that we want to update the browser as well?

The TTL mechanism is a way to improve the overall performance of the browser/driver management. So strictly, we don't need it for this iteration. But since this is already implemented for drivers, I decided to use it for browser version discovery like it is already used for driver version discovery (which, in both cases, it requires network requests to remote endpoints). And as usual, we can put it to 0 to turn it off.

@diemol
Copy link
Member

diemol commented Jul 13, 2023

Do we really need this on this first iteration? In short, we want to download the browser if the user does not have it. And starting to cache versions from CfT would mean that we want to update the browser as well?

The TTL mechanism is a way to improve the overall performance of the browser/driver management. So strictly, we don't need it for this iteration. But since this is already implemented for drivers, I decided to use it for browser version discovery like it is already used for driver version discovery (which, in both cases, it requires network requests to remote endpoints). And as usual, we can put it to 0 to turn it off.

That is the question, do we need to do browser version discovery always or only when we do not find the Chrome browser in the system?

@bonigarcia
Copy link
Member Author

That is the question, do we need to do browser version discovery always or only when we do not find the Chrome browser in the system?

Browser version discovery using shell commands is always done. And now, when the browser is not discovered in the system, remote (for download) browser version discovery (using CfT endpoints) is used. And that found version is written in the metadata (considered fresh using the TTL_browsers).

@titusfortner
Copy link
Member

Currently, the browser path is only returned when CfT is downloaded. But indeed, Selenium Manager can always return the browser path (the found one or the downloaded one). Do you want to implement it that way?

Yes, please. Right now the bindings can't even make use of the feature you implemented to reference beta/dev versions because we have no way to tell the driver to use them.

Currently we can get the browser binary path from the options class, and we next need to implement setting it. If we always have the browser value returned we can always just set it.

@bonigarcia
Copy link
Member Author

I managed to run this PR in a macOS VM and found a relevant difference. In macOS, the CtF binary package is called Google Chrome for Testing.app, which is something that the Selenium Manager logic needs to consider. I have just committed a second patch to this PR to fix this difference.

Also, I have updated the Gist document, including the Selenium Manager output in macOS (see the two last snippets).

@bonigarcia
Copy link
Member Author

Currently, the browser path is only returned when CfT is downloaded. But indeed, Selenium Manager can always return the browser path (the found one or the downloaded one). Do you want to implement it that way?

Yes, please. Right now the bindings can't even make use of the feature you implemented to reference beta/dev versions because we have no way to tell the driver to use them.

Currently we can get the browser binary path from the options class, and we next need to implement setting it. If we always have the browser value returned we can always just set it.

No problem, I will do that. I inspected what needs to be done in the code, and it requires different changes. So I prefer to do it in a separate PR since this one is already quite big.

@bonigarcia bonigarcia force-pushed the se_mgr_chrome branch 5 times, most recently from 909685f to ee422c4 Compare July 13, 2023 15:53
@bonigarcia
Copy link
Member Author

I have implemented the requested feature, i.e., always returning the browser path (the found one or the downloaded one). For that, I have a local commit on top of the commits of this PR.

Should I commit these new changes as part of this PR, or should I wait until this PR is accepted and merged?

@nir-tal-talkspace
Copy link
Contributor

this PR is important -
this is the log when trying to run SM with Chrome v115 currently SM downgrades chromedriver to v114 instead of using v115

https://www.selenium.dev/documentation/webdriver/drivers/service/#setting-log-output
Jul 19, 2023 3:38:25 PM org.openqa.selenium.manager.SeleniumManager lambda$runCommand$1
WARNING: Error getting version of chromedriver 115. Retrying with chromedriver 114 (attempt 1/5)

@titusfortner @diemol @bonigarcia

@bonigarcia
Copy link
Member Author

@nir-tal-talkspace In fact, the important PR for discovering chromedriver 115+ is #12208. It is already merged, but unfortunately, it is unavailable in Selenium 4.10.0.

@bonigarcia bonigarcia force-pushed the se_mgr_chrome branch 2 times, most recently from 58f4f8f to 395d144 Compare July 19, 2023 14:16
@bonigarcia
Copy link
Member Author

bonigarcia commented Jul 20, 2023

I believe we don't want to merge this PR until Selenium 4.11.0 is released, so I have just sent two new commits to this PR. With these new commits:

  1. Selenium Manager always returns the browser path: the found one (already installed) or the downloaded one (CfT).
  2. Selenium Manager tries to find the browser in the PATH when it is not found in the usual locations (e.g., /Applications/Google Chrome.app/Contents/MacOS/Google Chrome in macOS, etc.). This case (i.e., the browser is in the PATH and not in the known locations) is not usual, but it makes Selenium Manager a bit more robust in finding browser paths.

Here it is some examples of the output:

C:\Users\boni\Documents\dev\selenium\rust>cargo run -- --browser chrome --debug
DEBUG   Checking chromedriver in PATH
DEBUG   Running command: chromedriver --version
DEBUG   Output: ""
DEBUG   chromedriver not found in PATH
DEBUG   chrome detected at C:\Program Files\Google\Chrome\Application\chrome.exe
DEBUG   Using shell command to find out chrome version
DEBUG   Running command: wmic datafile where name='C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe' get Version /value
DEBUG   Output: "\r\r\n\r\r\nVersion=115.0.5790.99\r\r\n\r\r\n\r\r\n\r"
DEBUG   Detected browser: chrome 115.0.5790.99
DEBUG   Reading metadata from https://googlechromelabs.github.io/chrome-for-testing/known-good-versions-with-downloads.json
DEBUG   Required driver: chromedriver 115.0.5790.98
DEBUG   Driver URL: https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/115.0.5790.98/win64/chromedriver-win64.zip
INFO    Driver path: C:\Users\boni\.cache\selenium\chromedriver\win64\115.0.5790.98\chromedriver.exe
INFO    Browser path: C:\Program Files\Google\Chrome\Application\chrome.exe
C:\Users\boni\Documents\dev\selenium\rust>cargo run -- --browser chrome --output json
{
  "logs": [
    {
      "level": "INFO",
      "timestamp": 1689852905,
      "message": "Driver path: C:\\Users\\boni\\.cache\\selenium\\chromedriver\\win64\\115.0.5790.98\\chromedriver.exe"
    },
    {
      "level": "INFO",
      "timestamp": 1689852905,
      "message": "Browser path: C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe"
    }
  ],
  "result": {
    "code": 0,
    "message": "C:\\Users\\boni\\.cache\\selenium\\chromedriver\\win64\\115.0.5790.98\\chromedriver.exe",
    "driver_path": "C:\\Users\\boni\\.cache\\selenium\\chromedriver\\win64\\115.0.5790.98\\chromedriver.exe",
    "browser_path": "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe"
  }
}

For the bindings, the JSON output is what matters. Notice that now, there are two new fields: result.driver_path and result.browser_path. I left the driver path in result.message, but maybe it is time to make a breaking change and stop using result.message for driver's path, and leave that field for error messages only.

@diemol @titusfortner What do you think?

@bonigarcia bonigarcia force-pushed the se_mgr_chrome branch 2 times, most recently from 729a203 to beb12d6 Compare July 21, 2023 13:42
@titusfortner
Copy link
Member

@bonigarcia can you rebase this one? If Google isn't going to fix their bug right away, maybe we should knock out the fix in Selenium after all since none of the 3rd party managers can do what we can do. 😆

@titusfortner
Copy link
Member

titusfortner commented Jul 21, 2023

Shouldn't this get v116? Or are we only working on stable right now?

2023-07-21 16:28:20 DEBUG Selenium [:selenium_manager] Executing Process ["/Users/titusfortner/code/selenium/bazel-bin/rb/bin/macos/selenium-manager", "--browser", "chrome", "--output", "json", "--browser-version", "116", "--debug"] 
2023-07-21 16:28:23 DEBUG Selenium [:selenium_manager] Checking chromedriver in PATH 
2023-07-21 16:28:23 DEBUG Selenium [:selenium_manager] Running command: chromedriver --version 
2023-07-21 16:28:23 DEBUG Selenium [:selenium_manager] Output: "" 
2023-07-21 16:28:23 DEBUG Selenium [:selenium_manager] chromedriver not found in PATH 
2023-07-21 16:28:23 DEBUG Selenium [:selenium_manager] Reading metadata from https://googlechromelabs.github.io/chrome-for-testing/known-good-versions-with-downloads.json 
2023-07-21 16:28:23 DEBUG Selenium [:selenium_manager] Required driver: chromedriver 116.0.5845.42 
2023-07-21 16:28:23 DEBUG Selenium [:selenium_manager] Driver URL: https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/116.0.5845.42/mac-x64/chromedriver-mac-x64.zip 
2023-07-21 16:28:23 DEBUG Selenium [:selenium_manager] Driver path: /Users/titusfortner/.cache/selenium/chromedriver/mac-x64/116.0.5845.42/chromedriver 

@bonigarcia bonigarcia force-pushed the se_mgr_chrome branch 2 times, most recently from a52bd11 to 8879140 Compare July 23, 2023 13:03
@bonigarcia
Copy link
Member Author

I think the non-rust stuff in CI failures is ok, but some rust unit tests are failing, please double check.

Now it's fixed.

When we pass in --browser-path can that value still be output in result?

With the latest changes, yes.

and I think there's been a regression with the escaping of things

I included a couple of new commits in this PR to make more robust the path manipulation (escaped or not).

I wrote a suite of integration tests for the behaviors I'm expecting to see in future releases.

Great, let me know if everything works as expected with the latest changes.

@titusfortner
Copy link
Member

Yes, everything works exactly as expected on MacOS and Windows!
Including the weird windows parsing bug so I can even remove that extra fix in Ruby.
Excellent.

@titusfortner
Copy link
Member

Oh, one thing... I think if "beta" "dev" "canary" "nightly" etc aren't found on the system it should error and not download the latest stable version of the browser.
Minor point, though, and we can fix it later if we agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-rust Rust code is mostly Selenium Manager
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants