Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[grid] Support file downloads on the node #11277

Merged
merged 3 commits into from
Dec 12, 2022

Conversation

krmahadevan
Copy link
Contributor

@krmahadevan krmahadevan commented Nov 17, 2022

Fixes: #9218

On the node, set the directory to where the browser will download files to via:

a. The CLI argument --downloads-path (or)
b. via the Toml syntax of

[node]
downloads-path = "/path/to/dir/goes/here"

The GET end-points that will support file
download would be (The file will ALWAYS BE DOWNLOADED AS A ZIP FILE):

  • /session/:sessionId:/se/file?filename=

Thanks for contributing to Selenium!
A PR well described will help maintainers to quickly review and merge it

Before submitting your PR, please check our contributing guidelines.
Avoid large PRs, help reviewers by making them as simple and short as possible.

Description

Motivation and Context

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • I have read the contributing document.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@krmahadevan krmahadevan force-pushed the support_download_file_9218 branch 2 times, most recently from fb02c4d to b2b6be9 Compare November 17, 2022 11:01
@krmahadevan krmahadevan changed the title Support file downloads on the node [grid] Support file downloads on the node Nov 17, 2022
Copy link
Member

@diemol diemol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @krmahadevan! I left some comments.

java/src/org/openqa/selenium/grid/node/Node.java Outdated Show resolved Hide resolved
@krmahadevan
Copy link
Contributor Author

@diemol - I have fixed all the review comments, rebased off of trunk and added the changes as an additional commit for ease of review. Please help take a look

@krmahadevan krmahadevan force-pushed the support_download_file_9218 branch 2 times, most recently from 80855c2 to c3418ff Compare December 8, 2022 03:27
Copy link
Member

@diemol diemol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked out this PR to run a complete test but it did not work. Probably because the get method for downloadsPath configuration is not being used.

The client code I used to perform a test was:

package org.example;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxOptions;
import org.openqa.selenium.io.Zip;
import org.openqa.selenium.json.Json;
import org.openqa.selenium.remote.RemoteWebDriver;
import org.openqa.selenium.remote.http.HttpClient;
import org.openqa.selenium.remote.http.HttpMethod;
import org.openqa.selenium.remote.http.HttpRequest;
import org.openqa.selenium.remote.http.HttpResponse;

import java.io.File;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Map;

import static org.openqa.selenium.remote.http.Contents.string;

public class Main {

  public static void main(String[] args) throws InterruptedException, IOException {
    FirefoxOptions options = new FirefoxOptions();
    options.addPreference("browser.download.manager.showWhenStarting", false);
    options.addPreference("browser.helperApps.neverAsk.saveToDisk", "images/jpeg, application/pdf, application/octet-stream");
    options.addPreference("pdfjs.disabled", true);

    URL gridUrl = new URL("http://localhost:4444");
    RemoteWebDriver driver =  new RemoteWebDriver(gridUrl, options);
    driver.get("http://the-internet.herokuapp.com/download");
    WebElement element = driver.findElement(By.cssSelector(".example a"));
    element.click();

    Thread.sleep(10 * 1000);

    HttpRequest request = new HttpRequest(
      HttpMethod.GET,
      String.format("/session/%s/se/file", driver.getSessionId()));
    request.addQueryParameter("filename", "my_appointments-1.pdf");
    try (HttpClient client = HttpClient.Factory.createDefault().createClient(gridUrl)) {
      HttpResponse response = client.execute(request);
      Map<String, Object> map = new Json().toType(string(response), Json.MAP_TYPE);
      String encodedContents = map.get("contents").toString();
      Zip.unzip(encodedContents, new File("/Users/diegomolina/Downloads"));
    }

    driver.quit();
  }
}

@krmahadevan krmahadevan force-pushed the support_download_file_9218 branch 2 times, most recently from 1919501 to 3fad03f Compare December 12, 2022 13:52
@krmahadevan
Copy link
Contributor Author

@diemol - Thanks for catching that bug! Fixed the review comments, rebased off of trunk and force pushed changes back to this branch. Please check

Fixes: SeleniumHQ#9218

On the node, set the directory to where the browser 
will download files to via:

a. The CLI argument `--downloads-dir` (or)
b. via the Toml syntax of 

```
[node]
downloads-dir = "/path/to/dir/goes/here"
```

The GET end-points that will support file 
download would be:

* `/session/:sessionId:/file?filename=`
* `/session/:sessionId:/se/file?filename=`
* Renamed the CLI arg to downloads-path
* Download is available only via 
GET /session/{sessionId}/se/file
* Files downloaded would be available ONLY as a zip file.
Copy link
Member

@diemol diemol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @krmahadevan!

@diemol
Copy link
Member

diemol commented Dec 12, 2022

@krmahadevan would be great if you can help us with the docs for this new feature 💪

@diemol diemol merged commit 8e4e20b into SeleniumHQ:trunk Dec 12, 2022
@sonarcloud
Copy link

sonarcloud bot commented Dec 12, 2022

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

@krmahadevan krmahadevan deleted the support_download_file_9218 branch December 13, 2022 02:26
@iammac2
Copy link

iammac2 commented Feb 10, 2023

Downloaded 4.8.0 and verified this feature. Working well👍🏼. Will the API also include an endpoint to clean up the downloaded files?

@titusfortner
Copy link
Member

imo it shouldn't have its own endpoint, it should be part of the cleanup process of ending the session. Once the session is finished, the files can't be accessed again.

@iammac2
Copy link

iammac2 commented Feb 13, 2023

Thank you @titusfortner and agree re session clean up. Have tried verifying this behaviour with a local Grid 4.8.0 config to verify what happens across sessions:

  • Grid with one Node is running locally with the new --downloads-path CLI Node option is set i.e. java -jar selenium-server-4.8.0.jar node --downloads-path C:\tmp
  • Chrome pref download.default_directory also set to same path when RemoteWebDriver initialized.
  • Run a test on Node (noting session ID in Node log) that clicks to download a file from UI at the remote end and then retrieves the file via the new GET request.
  • The test passes first time through with the file downloaded to Node download directory and GET response contains correct filename and contents.
  • Wait for Node debug to state Stopping session
  • Run the test again (i.e. new session ID in Node log) but WITHOUT the step that clicks to download the file and ONLY sends the GET request. (Would expect this scenario to fail when endpoint hit due to session cleardown).
  • In this scenario the GET response still contains the same filename and contents.
  • Physically checking the Node download path after Node log states Stopping session, confirms that the file is still present in the directory.

@titusfortner
Copy link
Member

@iammac2 I just split out the work i'd like to see us do to get downloads working. The way I want to see us support things will look like this in the bindings: #11657

@iammac2
Copy link

iammac2 commented Feb 15, 2023

I see, thank you for the extra info. Will keep a watch in this further development.
So existing implementation of GET /session/:sessionId:/se/files endpoint will need to change as download path will need to be generated and not set by user e.g --downloads-path and browser prefs download.default_directory? Grid Node will then hold internal reference by uuid to the download directories so it can clean them up (ref the DELETE)

@titusfortner
Copy link
Member

That's about right. We'll see what trouble we run into (if any) in the implementation.

@vijaysp77
Copy link

@krmahadevan Can I use the preference settings for this download path inside the code (download.default_directory) instead of passing the download path while setting-up node? Because we are using docker container provisioned by aws ECS? Thanks

@krmahadevan
Copy link
Contributor Author

@vijaysp77 This feature is going through a revamp.

The functionality will change once my PR #11702 gets merged.

So I would suggest that you follow the above PR and wait for this functionality to get streamlined.

@patilsha
Copy link

@krmahadevan I saw your PR 1702 has been merged.
Where can I find the latest Selenium Hub Image to see these changes in Selenium Grid.
Thanks for providing a solution to most awaited feature.

@diemol
Copy link
Member

diemol commented Mar 16, 2023

@patilsha it was merged yesterday, it will be part of the next release.

@xZero707
Copy link

xZero707 commented Mar 22, 2023

I'm assuming this is not yet available in Selenium docker images (4.8.1)?

Chrome node

GET /session/:sessionId:/se/files
curl -X GET "http://172.17.53.2:4444/session/98473aa8661fc256bf7c0de98861005e/se/file?filename=100MB.bin"
= Unknown command

Firefox node

GET /session/:sessionId:/se/files
curl -X GET "http://172.17.53.2:4444/session/98473aa8661fc256bf7c0de98861005e/se/file?filename=100MB.bin"
= Unsupported method - expect POST for file upload

I also tried running this using PHP driver:

$file = $driver->executeCustomCommand('/session/:sessionId/se/file?filename=100MB.bin');

Result:

Facebook\WebDriver\Exception\UnknownCommandException : unknown command: unknown command: session/41d161b598c0fd3f85a4e09b1f5f5a1f/se/file?filename=100MB.bin`

I'm in dark. This feature is documented, yet cannot be used. Am I doing something wrong?

I defined --downloads-path /usr/downloads using env SE_OPTS and nothing. Chrome is still downloading to /home/seluser/Downloads and ignoring the path I defined.

Version: Selenium 4.8.1 (revision 8ebccac)

@krmahadevan
Copy link
Contributor Author

@xZero707 - For this to be available in the docker selenium project, I think this should be released first. The feature has gone through some revamp. So please create an issue on the docker-selenium project that tracks this requirement and you can also cross link this issue there.
The documentation on this also needs to be changed. I haven't found time to get to this yet. I should hopefully be able to wrap the documentation before this weekend.

@xZero707
Copy link

@krmahadevan Thank you for clarification. It makes sense. I will do as you suggested.

@naveenmadd
Copy link

naveenmadd commented May 4, 2023

I checked out this PR to run a complete test but it did not work. Probably because the get method for downloadsPath configuration is not being used.

The client code I used to perform a test was:

package org.example;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxOptions;
import org.openqa.selenium.io.Zip;
import org.openqa.selenium.json.Json;
import org.openqa.selenium.remote.RemoteWebDriver;
import org.openqa.selenium.remote.http.HttpClient;
import org.openqa.selenium.remote.http.HttpMethod;
import org.openqa.selenium.remote.http.HttpRequest;
import org.openqa.selenium.remote.http.HttpResponse;

import java.io.File;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Map;

import static org.openqa.selenium.remote.http.Contents.string;

public class Main {

  public static void main(String[] args) throws InterruptedException, IOException {
    FirefoxOptions options = new FirefoxOptions();
    options.addPreference("browser.download.manager.showWhenStarting", false);
    options.addPreference("browser.helperApps.neverAsk.saveToDisk", "images/jpeg, application/pdf, application/octet-stream");
    options.addPreference("pdfjs.disabled", true);

    URL gridUrl = new URL("http://localhost:4444");
    RemoteWebDriver driver =  new RemoteWebDriver(gridUrl, options);
    driver.get("http://the-internet.herokuapp.com/download");
    WebElement element = driver.findElement(By.cssSelector(".example a"));
    element.click();

    Thread.sleep(10 * 1000);

    HttpRequest request = new HttpRequest(
      HttpMethod.GET,
      String.format("/session/%s/se/file", driver.getSessionId()));
    request.addQueryParameter("filename", "my_appointments-1.pdf");
    try (HttpClient client = HttpClient.Factory.createDefault().createClient(gridUrl)) {
      HttpResponse response = client.execute(request);
      Map<String, Object> map = new Json().toType(string(response), Json.MAP_TYPE);
      String encodedContents = map.get("contents").toString();
      Zip.unzip(encodedContents, new File("/Users/diegomolina/Downloads"));
    }

    driver.quit();
  }
}

@diemol Can you pls help me here, I'm trying to test File Downloads on Jenkins Node but
Getting [ERROR] COMPILATION ERROR : at Map<String, Object> map = new Json().toType(string(response), Json.MAP_TYPE); as
no suitable method found for string(org.openqa.selenium.remote.http.HttpResponse)
[ERROR] method org.openqa.selenium.remote.http.Contents.string(java.io.File) is not applicable
[ERROR] (argument mismatch; org.openqa.selenium.remote.http.HttpResponse cannot be converted to java.io.File)
[ERROR] method org.openqa.selenium.remote.http.Contents.string(org.openqa.selenium.remote.http.HttpMessage) is not applicable [ERROR] (argument mismatch; org.openqa.selenium.remote.http.HttpResponse cannot be converted to org.openqa.selenium.remote.http.HttpMessage)

@krmahadevan
Copy link
Contributor Author

@naveenmadd - If you wanted to build/experiment something on top of this changes, I would suggest that you do it from trunk. The feature has undergone changes after it was merged.

@naveenmadd
Copy link

@naveenmadd - If you wanted to build/experiment something on top of this changes, I would suggest that you do it from trunk. The feature has undergone changes after it was merged.

Even I verified selenium-trunk repository, I see the same implementation [ package org.openqa.selenium.grid.node > class NodeTest > @test @DisplayName("DownloadsTestCase") Map<String, Object> map = new Json().toType(string(response), Json.MAP_TYPE);]
Don't know how it's accepting in that test and not for mine? suggest if you have any insights or try from your end too once..

Thanks @krmahadevan @diemol

@diemol
Copy link
Member

diemol commented May 4, 2023

This is the test case https://github.com/SeleniumHQ/selenium/blob/trunk/java/test/org/openqa/selenium/grid/router/RemoteWebDriverDownloadTest.java#L134

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Selenium 4 dynamic grid - add support for get downloaded files to your host path
8 participants