Question: How do I get puppeteer to download a file? #299

aherriot · 2017-08-16T18:36:58Z

Question: How do I get puppeteer to download a file or make additional http requests and save the response?

Garbee · 2017-08-16T18:41:18Z

I'll look into the specifics after I work on another issue (if no one gets to it before me.) But, I feel like we'd need to look for the request going off and then save the buffer of that response body. Not sure how you'd trigger the download though pragmatically. Although, "clicking" like normal on the right part of the page should do it.

aherriot · 2017-08-16T18:48:47Z

Let me elaborate a little bit. In my use case, I want to visit a website which contains a list of files. Then I want to trigger HTTP requests for each file and save the response to the disk. The list may not already by hyperlinks to the files, just a plain text name of a file, but from that I can derive the actually URL of the file to download.

kensoh · 2017-08-16T18:49:10Z

I saw this Chromium issue some time ago. It addresses downloads and seems to be moving along well. Due to security reasons I have the impression that headless Chrome does not support downloading by clicking on the download button. But above issue opens up that possibility for this important use case.

aherriot · 2017-08-16T18:53:41Z

It seems that a key use case of this project is to use it for web scraping and downloading content from the internet. For example CasperJS has a download method for this purpose.

kensoh · 2017-08-16T19:01:00Z

I'm using CasperJS for couple of years, yep that actually sends raw XMLHttpRequests to the URL specified to grab the contents of a file. However, in practice it is not foolproof. For example, try scripting to fetch the download zip URL of a GitHub repo, it will produce a zero-byte file. I guess a method like that can be revisited to see if it can be improved to cover edge cases.

Although fundamentally, my preference is to have the ability to click some download button and getting the file directly. This seems the easier to implement in scripting, because some types of downloads you won't get to see the actual URL from the DOM layer. Only after clicking and going through some JS code, the download initiates in a normal browser. That type of setup may not be applicable with a download method because there is no full URL to give to the method in the first place.

Garbee · 2017-08-16T19:01:17Z

Yea, Chromium doesn't support it. But if you can trigger the request you should be able to at least get the buffer content and write it to disk using the filesystem API in Node. Or get the URL to then initiate a manual request, if prevented as a download outright, which you'd then do the same with the buffer from.

Chromium may not support it, but it should be possible to work around it.

aherriot · 2017-08-16T19:08:37Z

@kensoh Ideally, it could support both downloading by clicking a link and downloading from URL.
@Garbee There may be a way for me to just use NodeJS to make the requests, but if I use a headless browser, it will send the proper sessions cookies with my requests.

pavelfeldman · 2017-08-17T04:28:23Z

Support for downloads is on the way. It needs changes to Chromium that are under review.

kazaff · 2017-08-17T09:10:19Z

@pavelfeldman im waiting~~~~

aslushnikov · 2017-08-23T07:56:38Z

Upstream has landed as r496577, we now need to roll.

intellix · 2017-08-30T09:33:04Z

I'm having file download issues as well, not sure if it's the same thing. I'm visiting a link that triggers a download like: somewhere.com/download/en-GB.po.

I'm creating a new page per language file I need to download so they're run in parallel and then trying to Promise.all() them before closing the browser. It seems that even after the downloads all finish, the page.goto is never resolved:

const urls = languages.map(lang => `${domain}/download/${lang}/${project}/${lang}.${extension}`);
await Promise.all(urls.map(url => browser.newPage().then(async tab => await tab.goto(url))));
browser.close();

nisrulz · 2017-09-01T02:53:10Z

@aslushnikov is the change you mentioned in your last commit, now shipped?
If yes, then I am looking for some examples to download a file in headless mode. I couldn't find any documentation on it.

aslushnikov · 2017-09-01T03:59:43Z

@nisrulz additional work is required upstream, we're working on it.

pavelfeldman · 2017-09-01T04:15:48Z

~~@aslushnikov 496577 landed a week ago, it should be few lines of code on your end.~~

Looks like aslushnikov@ bakes something more upstream to deliver an event upon download.

aslushnikov · 2017-09-01T05:45:59Z

@pavelfeldman r590913 is not enough for a complete story; I'm working on Page.FileDownloaded event to notify about successful download.

mmacaula · 2017-09-01T16:01:09Z

One workaround I found (not ideal for sure), open chrome with a profile that has a download directory set: Worked for me when i clicked a link that downloaded an audio file. Then in your puppeteer script just wait for that file to appear and copy it over where you need to go.

const browser = await puppeteer.launch({headless: false, args: '--profile-directory="Default"'});

see here for how to find your profile

nisrulz · 2017-09-01T16:52:06Z

@mmacaula I am looking for a way to download a file when Chrome is running in headless mode. If I am not running in headless mode, the file downloads perfectly into the Downloads folder which is the default location I guess.

Its a much sought after feature, already available in projects such as CasperJS

mmacaula · 2017-09-01T17:32:03Z

Oh yeah my mistake, It doesn't work with headless mode. :(

dagumak · 2017-09-09T05:29:28Z

Is there a way to just capture the request and have stored in another remote location instead of local to Chrome/puppeteer?

ebidel · 2017-09-09T18:34:31Z

@dagumak couldn't you catch the responses and write the files to a location of your choice?

const puppeteer = require('puppeteer');
const fs = require('fs');
const mime = require('mime');
const URL = require('url').URL;

(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();

const responses = [];
page.on('response', resp => {
  responses.push(resp);
});

page.on('load', () => {
  responses.map(async (resp, i) => {
    const request = await resp.request();
    const url = new URL(request.url);

    const split = url.pathname.split('/');
    let filename = split[split.length - 1];
    if (!filename.includes('.')) {
      filename += '.html';
    }

    const buffer = await resp.buffer();
    fs.writeFileSync(filename, buffer);
  });
});

await page.goto('https://news.ycombinator.com/', {waitUntil: 'networkidle'});
browser.close();
})();

You may need to adjust the timing for your page. Waiting for the load event and networkidle might not be enough.

mickdekkers · 2017-09-09T19:59:57Z

@ebidel Sorry if I'm missing something, but where are you getting the buffer from in that code?

edit: response.buffer seems to be a function, but when I call it and await the promise it returns I get this error:

Unhandled promise rejection (rejection id: 1): Error: Protocol error (Network.getResponseBody):
No data found for resource with given identifier undefined

This seems to only happen when the file gets downloaded by the browser -- that is to say, when the file appears in the download bar in non-headless mode.

This is the code I used:

// this works
// const downloadUrl = 'https://nodejs.org/dist/v6.11.3/'
// this doesn't work
const downloadUrl = 'https://nodejs.org/dist/v6.11.3/SHASUMS256.txt.sig'

const responseHandler = async (response) => {
  if (response.url !== downloadUrl) {
    return
  }

  const buffer = await response.buffer()
  console.log('response buffer', buffer)
  browser.close()
}
page.on('response', responseHandler)
page.goto(downloadUrl)

Version info:
Windows 10 64-bit
Puppeteer 0.10.2
Chromium 62.0.3198.0 (Developer Build) (64-bit)

dagumak · 2017-09-10T01:14:29Z

@ebidel I'm going to give that a shot. Thank you!!

ebidel · 2017-09-10T02:30:31Z

@mickdekkers updated the snippet to include const buffer = await resp.buffer();. Bad copy and paste.

mickdekkers · 2017-09-10T08:29:07Z

@ebidel alright, thanks! Do you know if the issue I described in my edit is a bug or expected behavior? I couldn't find any info about it and I'd like to report it somewhere if it is, but I'm not sure what's the best place for it. I can make a new issue for it on this tracker if it's a puppeteer bug.

ebidel · 2017-09-10T15:54:30Z

For resource types that the renderer doesn't support, the default browser behavior is to download the file. That's probably what's going on here. Could you use pure node apis to fetch/write the file instead of waiting for the page response? You could also intercept requests page.on('request') and fetch the file.

@aslushnikov would know for sure. He's been working on a download API. There may be a cleaner way to handle cases like this in the future.

NelsonScott · 2022-11-18T18:25:44Z

Not sure if above still works, I had to use the solution provided here https://stackoverflow.com/questions/74424735/puppeteer-not-actually-downloading-zip-despite-clicking-link

cole-jacobs · 2023-06-14T23:13:14Z

@JollyGrin @gregtap I've also had success with Page.setDownloadBehavior, but it seems to have broken with headless=new. Has anyone else experienced this?

jiteshdhamaniya · 2023-10-18T04:58:56Z

@cole-jacobs i having same issue.

await page._client.send("Page.setDownloadBehavior", {
    behavior: "allow",
    downloadPath: downloadDir
  });

throws

There was an uncaught error TypeError: page._client.send is not a function

anybody found any solution to this ? i am using "puppeteer": "^20.9.0"

mattmillen888 · 2023-10-18T05:24:11Z

You could try this.

import puppeteer from 'puppeteer';

async function main() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const client = await page.target().createCDPSession();
  await client.send('Page.setDownloadBehavior', {
    behavior: 'allow',
    downloadPath: './your-download-directory'
  });
  
  // Your further actions here...

  await browser.close();
}

main().catch(console.error);

Notice the use of page.target().createCDPSession() to get the client object directly and then call .send() on it. This is a more reliable way to access Chrome DevTools Protocol (CDP) methods.

also try updating puppeteer to latest version:

jiteshdhamaniya · 2023-10-18T18:28:37Z

@mattmillen888 Worked well. thanks

cipri-tom · 2023-10-24T00:15:10Z

Thanks @mattmillen888 !

Note to future: if docs are opening in new tabs instead of downloading, you should set download property:

await link.evaluate((a, fn) => {
    a.removeAttribute('target');    // was set to '_blank'
    a.setAttribute('download', fn);  // fn is the filename under which you want to download
    console.log('setting download to', fn); // this logs _in the browser_, not in terminal
}, fileName); // pass the filename to the browser context
console.log(fileName); // this logs in the terminal

link is the JSElement of a

sirzento · 2023-12-07T15:24:32Z

You could try this.
import puppeteer from 'puppeteer';

async function main() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const client = await page.target().createCDPSession();
  await client.send('Page.setDownloadBehavior', {
    behavior: 'allow',
    downloadPath: './your-download-directory'
  });
  
  // Your further actions here...

  await browser.close();
}

main().catch(console.error);
Notice the use of page.target().createCDPSession() to get the client object directly and then call .send() on it. This is a more reliable way to access Chrome DevTools Protocol (CDP) methods.

also try updating puppeteer to latest version:

I'm running the latest version. This code still doesn't work in headless mode for me.

mattmillen888 · 2023-12-07T20:13:15Z

Its tough to know what the exact issue is without error messages. You could try this with better error handling and explicitly set the headless mode, handle potential errors, and ensure that all resources are properly closed in case of an error.

import puppeteer from 'puppeteer';

async function main() {
  let browser;
  try {
    // Launch browser in headless mode
    browser = await puppeteer.launch({ headless: true });

    // Open a new page
    const page = await browser.newPage();

    // Set download behavior
    const client = await page.target().createCDPSession();
    await client.send('Page.setDownloadBehavior', {
      behavior: 'allow',
      downloadPath: '/absolute/path/to/your-download-directory' // Use an absolute path
    });

    // Your further actions here...

  } catch (error) {
    // Error handling
    console.error('An error occurred:', error);
  } finally {
    // Ensure browser is closed even if an error occurs
    if (browser) {
      await browser.close();
    }
  }
}

main().catch(error => console.error('Failed to run the main function:', error));

A few things to note in this revision are:

Ensure that the specified download directory exists and is writable.
Replace the placeholder /absolute/path/to/your-download-directory with the actual path where you want the downloads to be saved.
The script assumes that your further actions (commented as // Your further actions here...) are properly implemented and error-free. Any code that goes there should also follow best practices for error handling.

let me know what errors you receive when using this and i may be able to isolate the issue.

sirzento · 2023-12-07T20:20:04Z

I use error handling but there is just no error. When I start it with headless to false to see the browser, the file will be downloaded. After that I delete the file and only change headless to true and run it again. This time no file will be downloaded after the script finishes. I even removed the browser.close() part in case there are some loading issues to give it more time but that didn't help. Just in case, i ran it again with headless: false and the file will be downloaded again.

I also tryed headless: new but that also didn't work.

mattmillen888 · 2023-12-07T23:02:43Z

There are a couple of options here and then a last resort you could try.

Using a Using a Custom User-Agent
Network Monitoring which involves listening to network responses to potentially capture the download URL then you can use axios to download the file and Waiting for Network Idle to ensure that all network activity has settled before proceeding.
[LAST RESORT] Running in Headful Mode in a Hidden Window

I have generated this script but have not had time to test it yet

import puppeteer from 'puppeteer';
import axios from 'axios';
import fs from 'fs';

async function downloadFile(url, path) {
  const writer = fs.createWriteStream(path);

  const response = await axios({
    url,
    method: 'GET',
    responseType: 'stream'
  });

  response.data.pipe(writer);

  return new Promise((resolve, reject) => {
    writer.on('finish', resolve);
    writer.on('error', reject);
  });
}

async function main() {
  let browser;
  try {
    // Launch browser
    browser = await puppeteer.launch({
      headless: true, // Change to false for headful mode
      args: ['--window-size=1920,1080', '--window-position=-2000,-2000'] // Only for headful mode in hidden window
    });

    const page = await browser.newPage();

    // Set custom user agent
    await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36');

    let downloadUrl = null;

    // Listen to network responses
    page.on('response', async response => {
      if (response.url().endsWith('.pdf')) { // Modify this condition based on your download URL pattern
        downloadUrl = response.url();
      }
    });

    // Your navigation and interaction here...

    // Wait for network to be idle
    await page.waitForNetworkIdle();

    // Close the page
    await page.close();

    // Download the file using axios if URL is captured
    if (downloadUrl) {
      await downloadFile(downloadUrl, '/path/to/download/file.pdf'); // Set your download path
    }

  } catch (error) {
    console.error('An error occurred:', error);
  } finally {
    if (browser) {
      await browser.close();
    }
  }
}

main().catch(error => console.error('Failed to run the main function:', error));

Let me know how you get on!

sirzento · 2023-12-08T07:53:52Z

Ok it is working now. Setting the user agent did work. I just have the problem now that I don't know when the download is finished. await page.waitForNetworkIdle(); will timeout after 30s even if the download is just a few KB and instantly finished. Do websockets maybe prevent await page.waitForNetworkIdle(); from finishing?

Even client.on('Browser.downloadProgress') wont show me any state changes. Any idea how to see if all downloads are finished/no active download at the moment?

mattmillen888 · 2023-12-09T03:37:12Z

With page.waitForNetworkIdle(), it’s as you rightly said possible that persistent connections like WebSockets are preventing the network idle state from being reached.

The method waits for a period of idle network activity, so any ongoing connections, including WebSockets, can cause it to time out.

There are few ways you could solve this by using slightly different approaches to determine when the download is complete.

Custom Wait Function
Check for Download File Existence
Listen to Download Event
Adjust waitForNetworkIdle Options

Custom Wait Function
Instead of relying on waitForNetworkIdle, you can create a custom function that waits for the specific network request you are interested in (the file download) to finish.
Check for Download File Existence
If the download initiates promptly after a certain action, you can periodically check if the file has appeared in the download directory.
Listen to the Download Event
If you have control over or insights into the backend, you might get a download completion event through WebSocket or another method.
Adjust waitForNetworkIdle Options
You can adjust the options for waitForNetworkIdle to better suit your needs, although this might not fully resolve the issue if WebSockets are indeed the cause of the timeout.

Here’s an example of how you might implement a custom wait function based on file existence:

async function waitForDownload(downloadPath, timeout = 60000) {
  let startTime = new Date().getTime();
  while (true) {
    if (fs.existsSync(downloadPath)) {
      return true;
    } else if (new Date().getTime() - startTime > timeout) {
      throw new Error('Download timeout');
    }
    await new Promise(resolve => setTimeout(resolve, 1000)); // Wait for 1 second before checking again
  }
}

// Usage in your main function
try {
  // Your code to initiate the download...

  // Wait for the download to complete
  await waitForDownload('/path/to/expected/download/file.pdf');
} catch (error) {
  console.error('Error:', error);
}

This function checks for the existence of the downloaded file every second and times out after a specified duration (default 60 seconds but you can change to whatever you like). You would need to replace '/path/to/expected/download/file.pdf' with the actual path where you expect the file to be downloaded.

Hopefully one of these options will work for you.

Let me know how you get on

ivanalemunioz · 2024-01-27T11:38:34Z

I am going to share my solution after days trying to solve this in a remote chrome connection

const client = await page.target().createCDPSession();

await client.send('Fetch.enable', { patterns: [{ urlPattern: '*', requestStage: 'Response' }] });

// Use the Fetch.requestPaused event as a response middleware
await client.on('Fetch.requestPaused', async (reqEvent) => {
	const { requestId } = reqEvent;

	const responseHeaders = reqEvent.responseHeaders || [];
	let contentType = '';

	for (const elements of responseHeaders) {
		if (elements.name.toLowerCase() === 'content-type') {
			contentType = elements.value;
		}
	}

	// Change this to the type you are waiting for
	if (contentType.includes('application/x-x509-ca-cert')) {
		// I remove content-disposition and content-type headers to get the response inline in the browser
		for (let i = responseHeaders.length - 1; i >= 0; i--) {
			if (responseHeaders[i].name.toLowerCase() === 'content-disposition' || responseHeaders[i].name.toLowerCase() === 'content-type') {
				responseHeaders.splice(i, 1);
			}
		}

		responseHeaders.push({
			name: 'Content-Type',
			value: 'text/plain'
		});
		
		const responseObj = await client.send('Fetch.getResponseBody', {
			requestId
		});

		// You have the response in responseObj.body

		await client.send('Fetch.fulfillRequest', {
			requestId,
			responseCode: 200,
			responseHeaders,
			body: responseObj.body
		});
		
		// Once you get what yout want don't forget to close the client
		await client.send('Fetch.disable');
		await client.detach();
	}
	else {
		await client.send('Fetch.continueRequest', { requestId });
	}
});

const response = await page.waitForResponse('https://url im waiting for');
const file = await response.text();

sirzento · 2024-02-29T13:13:32Z

With page.waitForNetworkIdle(), it’s as you rightly said possible that persistent connections like WebSockets are preventing the network idle state from being reached.

The method waits for a period of idle network activity, so any ongoing connections, including WebSockets, can cause it to time out.

There are few ways you could solve this by using slightly different approaches to determine when the download is complete.

Custom Wait Function Check for Download File Existence Listen to Download Event Adjust waitForNetworkIdle Options

Custom Wait Function
Instead of relying on waitForNetworkIdle, you can create a custom function that waits for the specific network request you are interested in (the file download) to finish.

Check for Download File Existence
If the download initiates promptly after a certain action, you can periodically check if the file has appeared in the download directory.

Listen to the Download Event
If you have control over or insights into the backend, you might get a download completion event through WebSocket or another method.

Adjust waitForNetworkIdle Options
You can adjust the options for waitForNetworkIdle to better suit your needs, although this might not fully resolve the issue if WebSockets are indeed the cause of the timeout.

Here’s an example of how you might implement a custom wait function based on file existence:
async function waitForDownload(downloadPath, timeout = 60000) {
  let startTime = new Date().getTime();
  while (true) {
    if (fs.existsSync(downloadPath)) {
      return true;
    } else if (new Date().getTime() - startTime > timeout) {
      throw new Error('Download timeout');
    }
    await new Promise(resolve => setTimeout(resolve, 1000)); // Wait for 1 second before checking again
  }
}

// Usage in your main function
try {
  // Your code to initiate the download...

  // Wait for the download to complete
  await waitForDownload('/path/to/expected/download/file.pdf');
} catch (error) {
  console.error('Error:', error);
}
This function checks for the existence of the downloaded file every second and times out after a specified duration (default 60 seconds but you can change to whatever you like). You would need to replace '/path/to/expected/download/file.pdf' with the actual path where you expect the file to be downloaded.

Hopefully one of these options will work for you.

Let me know how you get on

I tryed that but I still can't get it to work with those solution because:

I don't exactly know how to do that.
I could do that but I don't know the filename of the file
Sadly not possible
I don't think this will work since there are many websockets on this website.

I also looked into the CDP and tryed to bind the events Page.downloadProgress, Browser.downloadProgress, Page.downloadWillBegin and Browser.downloadWillBegin but no event did fire and thats very strange I think.

I did use this code:

    browser = await puppeteer.launch({ headless: true, ignoreHTTPSErrors: true });
    page = await browser.newPage();

    ...

    const client = await page.target().createCDPSession();
    await client.send('Page.setDownloadBehavior', {
      behavior: 'allow',
      downloadPath: __dirname + '\\'
    });

    client.on('Page.downloadProgress', e => {
      console.log("Page.downloadProgress:", e.state);
    });
    client.on('Browser.downloadProgress', e => {
      console.log("Browser.downloadProgress:", e.state);
    });
    client.on('Browser.downloadWillBegin', e => {
      console.log("Browser.downloadWillBegin:", e.suggestedFilename);
    });
    client.on('Page.downloadWillBegin', e => {
      console.log("Page.downloadWillBegin:", e.suggestedFilename);
    });

    browser.on('downloadProgress', (e: any) => {
      console.log("class downloadProgress:", e.state);
    })

    browser.on('downloadWillBegin', (e: any) => {
      console.log("class downloadWillBegin:", e.suggestedFilename);
    })

    page.on('downloadProgress', (e: any) => {
      console.log("class downloadProgress:", e.state);
    })

    page.on('downloadWillBegin', (e: any) => {
      console.log("class downloadWillBegin:", e.suggestedFilename);
    })
    await dlLink?.click();

The download will start and successfully download the file but still no event will fire..

Any other idea what to try?

Edit: Network.responseReceived also doesn't fire . I think there is something wrong here. Feels like all events are broken somehow. But client.send does still work to set the Page.setDownloadBehavior

ivanalemunioz · 2024-02-29T16:12:54Z

@sirzento check this #299 (comment).

Fetch.requestPaused is fired once the browser gets the http headers

mattmillen888 · 2024-03-01T06:07:07Z

@sirzento check this

Fetch.requestPaused is fired once the browser gets the http headers. This could also work well! here is a version based on @ivanalemunioz version

const puppeteer = require('puppeteer');

// Configuration for easy adjustments
const CONFIG = {
    contentType: 'your-content-type-here', // Adjust this to match the specific Content-Type you're targeting
    responseCode: 200,
    downloadPath: __dirname // or any path where you'd like to save downloads
};

// Function to determine if a response should be modified based on its headers
const shouldModifyResponse = (responseHeaders) => {
    return responseHeaders.some(header => 
        header.name.toLowerCase() === 'content-type' && header.value.includes(CONFIG.contentType)
    );
};

// Function to modify response headers
const modifyHeaders = (responseHeaders) => {
    return responseHeaders.reduce((acc, header) => {
        const nameLower = header.name.toLowerCase();
        if (nameLower !== 'content-disposition' && nameLower !== 'content-type') {
            acc.push(header);
        }
        return acc;
    }, [{ name: 'Content-Type', value: 'text/plain' }]);
};

(async () => {
    const browser = await puppeteer.launch({ headless: true, ignoreHTTPSErrors: true });
    const page = await browser.newPage();
    const client = await page.target().createCDPSession();

    try {
        await client.send('Fetch.enable', { patterns: [{ urlPattern: '*', requestStage: 'Response' }] });

        client.on('Fetch.requestPaused', async (reqEvent) => {
            const { requestId, responseHeaders = [] } = reqEvent;

            try {
                if (shouldModifyResponse(responseHeaders)) {
                    const modifiedHeaders = modifyHeaders(responseHeaders);
                    const responseObj = await client.send('Fetch.getResponseBody', { requestId });
                    await client.send('Fetch.fulfillRequest', {
                        requestId,
                        responseCode: CONFIG.responseCode,
                        responseHeaders: modifiedHeaders,
                        body: responseObj.body
                    });
                } else {
                    await client.send('Fetch.continueRequest', { requestId });
                }
            } catch (error) {
                console.error('Error handling request:', error);
                await client.send('Fetch.continueRequest', { requestId }); // Ensure continuation in case of error
            }
        });

        // Your code to navigate to the page and initiate the download...
    } catch (error) {
        console.error('An error occurred:', error);
    } finally {
        // Clean up
        await client.send('Fetch.disable');
        await client.detach();
        await browser.close();
    }
})();

sirzento · 2024-03-07T12:44:40Z

@mattmillen888 @ivanalemunioz I still don't get how this works. So where can I wait for the download to finish? I guess it does work when using page.waitForResponse() to start the download but I don't have the option for a direct link. I need to start the download with an button click. The direct url does contain parameters that can be differend each time.

ivanalemunioz · 2024-03-07T14:07:02Z

@sirzento in the @mattmillen888 response, where it says // Your code to navigate to the page and initiate the download... click that button and then wait using page.waitForNetworkIdle() and the downloaded content should be in page.content()

sirzento · 2024-03-07T14:51:25Z

@ivanalemunioz Ok but page.waitForNetworkIdle() wont finish for me because websockets on the site :/

ivanalemunioz · 2024-03-07T19:52:09Z

@sirzento in const responseObj = await client.send('Fetch.getResponseBody', { requestId }); you are geting the file content, just save responseObj into a file

haixc · 2024-08-05T03:28:37Z

With page.waitForNetworkIdle(), it’s as you rightly said possible that persistent connections like WebSockets are preventing the network idle state from being reached.
The method waits for a period of idle network activity, so any ongoing connections, including WebSockets, can cause it to time out.
There are few ways you could solve this by using slightly different approaches to determine when the download is complete.
Custom Wait Function Check for Download File Existence Listen to Download Event Adjust waitForNetworkIdle Options

Custom Wait Function
Instead of relying on waitForNetworkIdle, you can create a custom function that waits for the specific network request you are interested in (the file download) to finish.

Check for Download File Existence
If the download initiates promptly after a certain action, you can periodically check if the file has appeared in the download directory.

Listen to the Download Event
If you have control over or insights into the backend, you might get a download completion event through WebSocket or another method.

Adjust waitForNetworkIdle Options
You can adjust the options for waitForNetworkIdle to better suit your needs, although this might not fully resolve the issue if WebSockets are indeed the cause of the timeout.

Here’s an example of how you might implement a custom wait function based on file existence:
async function waitForDownload(downloadPath, timeout = 60000) {
  let startTime = new Date().getTime();
  while (true) {
    if (fs.existsSync(downloadPath)) {
      return true;
    } else if (new Date().getTime() - startTime > timeout) {
      throw new Error('Download timeout');
    }
    await new Promise(resolve => setTimeout(resolve, 1000)); // Wait for 1 second before checking again
  }
}

// Usage in your main function
try {
  // Your code to initiate the download...

  // Wait for the download to complete
  await waitForDownload('/path/to/expected/download/file.pdf');
} catch (error) {
  console.error('Error:', error);
}
This function checks for the existence of the downloaded file every second and times out after a specified duration (default 60 seconds but you can change to whatever you like). You would need to replace '/path/to/expected/download/file.pdf' with the actual path where you expect the file to be downloaded.
Hopefully one of these options will work for you.
Let me know how you get on
I tryed that but I still can't get it to work with those solution because:

I don't exactly know how to do that.

I could do that but I don't know the filename of the file

Sadly not possible

I don't think this will work since there are many websockets on this website.

I also looked into the CDP and tryed to bind the events Page.downloadProgress, Browser.downloadProgress, Page.downloadWillBegin and Browser.downloadWillBegin but no event did fire and thats very strange I think.

I did use this code:
    browser = await puppeteer.launch({ headless: true, ignoreHTTPSErrors: true });
    page = await browser.newPage();

    ...

    const client = await page.target().createCDPSession();
    await client.send('Page.setDownloadBehavior', {
      behavior: 'allow',
      downloadPath: __dirname + '\\'
    });

    client.on('Page.downloadProgress', e => {
      console.log("Page.downloadProgress:", e.state);
    });
    client.on('Browser.downloadProgress', e => {
      console.log("Browser.downloadProgress:", e.state);
    });
    client.on('Browser.downloadWillBegin', e => {
      console.log("Browser.downloadWillBegin:", e.suggestedFilename);
    });
    client.on('Page.downloadWillBegin', e => {
      console.log("Page.downloadWillBegin:", e.suggestedFilename);
    });

    browser.on('downloadProgress', (e: any) => {
      console.log("class downloadProgress:", e.state);
    })

    browser.on('downloadWillBegin', (e: any) => {
      console.log("class downloadWillBegin:", e.suggestedFilename);
    })

    page.on('downloadProgress', (e: any) => {
      console.log("class downloadProgress:", e.state);
    })

    page.on('downloadWillBegin', (e: any) => {
      console.log("class downloadWillBegin:", e.suggestedFilename);
    })
    await dlLink?.click();
The download will start and successfully download the file but still no event will fire..

Any other idea what to try?

Edit: Network.responseReceived also doesn't fire . I think there is something wrong here. Feels like all events are broken somehow. But client.send does still work to set the Page.setDownloadBehavior

@sirzento @ivanalemunioz hi, has this issue been resolved? I having same issue ,where no download event is triggered when downloading a file.

opening a page on a.com
then downloading a file from file.a.com
but no download-related events are being triggered.

ivanalemunioz · 2024-08-05T16:12:27Z

@haixc I solved it with the Fetch.requestPaused event, as you can see in my answer

SystemDisc · 2024-09-01T12:19:09Z

  const browser = await puppeteer.launch({
    headless: true,
    args: ['--no-sandbox', '--disable-setuid-sandbox'],
  });

  // Ensure the download directory exists
  const downloadPath = path.join('some', 'path', 'to', 'downloads');
  fs.mkdirSync(downloadPath, { recursive: true });

  const session = await browser.target().createCDPSession();
  await session.send('Browser.setDownloadBehavior', {
    behavior: 'allow',
    downloadPath,
    eventsEnabled: true, // this must be set
  });
  
  // do your stuff
  
  console.log('Saving file...');
  await new Promise<void>((resolve, reject) => {
    session.on('Browser.downloadProgress', (e) => {
      if (e.state === 'completed') {
        resolve();
      } else if (e.state === 'canceled') {
        reject();
      }
    });
  });
  
  await browser.close();

Solution found here: #7173 (comment)

ebidel added the feature label Aug 16, 2017

pavelfeldman added the P1 label Aug 18, 2017

aslushnikov mentioned this issue Aug 29, 2017

How to download file to the specified path[/name] #584

Closed

aslushnikov self-assigned this Aug 29, 2017

aslushnikov mentioned this issue Sep 3, 2017

Response is Null when Navigating to a PDF #610

Closed

OrKoN mentioned this issue Dec 8, 2022

[Bug]: link click with download attribute does not download the file #9379

Closed

OrKoN mentioned this issue Jan 23, 2023

[Bug]: Code does not download pdf file from link #9557

Closed

2 tasks

amaitland mentioned this issue Mar 28, 2023

Can't get octet-stream response | net::ERR_ABORTED hardkoded/puppeteer-sharp#2114

Closed

OrKoN mentioned this issue Apr 5, 2023

[Bug]: Headless False/True #9980

Closed

2 tasks

OrKoN mentioned this issue May 11, 2023

[Bug]: new-headless mode downloads files #10161

Closed

2 tasks

sadym-chromium unpinned this issue Nov 9, 2023

Lightning00Blade mentioned this issue Mar 14, 2024

How to store resources only #12080

Closed

2 tasks

Question: How do I get puppeteer to download a file? #299

Question: How do I get puppeteer to download a file? #299

Comments

aherriot commented Aug 16, 2017

Garbee commented Aug 16, 2017

aherriot commented Aug 16, 2017

kensoh commented Aug 16, 2017

aherriot commented Aug 16, 2017

kensoh commented Aug 16, 2017

Garbee commented Aug 16, 2017

aherriot commented Aug 16, 2017

pavelfeldman commented Aug 17, 2017

kazaff commented Aug 17, 2017

aslushnikov commented Aug 23, 2017

intellix commented Aug 30, 2017

nisrulz commented Sep 1, 2017

aslushnikov commented Sep 1, 2017

pavelfeldman commented Sep 1, 2017 • edited Loading

aslushnikov commented Sep 1, 2017

mmacaula commented Sep 1, 2017 • edited Loading

nisrulz commented Sep 1, 2017

mmacaula commented Sep 1, 2017

dagumak commented Sep 9, 2017

ebidel commented Sep 9, 2017 • edited Loading

mickdekkers commented Sep 9, 2017 • edited Loading

dagumak commented Sep 10, 2017

ebidel commented Sep 10, 2017

mickdekkers commented Sep 10, 2017

ebidel commented Sep 10, 2017

NelsonScott commented Nov 18, 2022

cole-jacobs commented Jun 14, 2023

jiteshdhamaniya commented Oct 18, 2023

mattmillen888 commented Oct 18, 2023 • edited Loading

jiteshdhamaniya commented Oct 18, 2023

cipri-tom commented Oct 24, 2023

sirzento commented Dec 7, 2023

mattmillen888 commented Dec 7, 2023 • edited Loading

sirzento commented Dec 7, 2023

mattmillen888 commented Dec 7, 2023 • edited Loading

sirzento commented Dec 8, 2023 • edited Loading

mattmillen888 commented Dec 9, 2023 • edited Loading

ivanalemunioz commented Jan 27, 2024 • edited Loading

sirzento commented Feb 29, 2024 • edited Loading

ivanalemunioz commented Feb 29, 2024 • edited Loading

mattmillen888 commented Mar 1, 2024 • edited Loading

sirzento commented Mar 7, 2024

ivanalemunioz commented Mar 7, 2024

sirzento commented Mar 7, 2024

ivanalemunioz commented Mar 7, 2024

haixc commented Aug 5, 2024

ivanalemunioz commented Aug 5, 2024 • edited Loading

SystemDisc commented Sep 1, 2024 • edited Loading

pavelfeldman commented Sep 1, 2017 •

edited

Loading

mmacaula commented Sep 1, 2017 •

edited

Loading

ebidel commented Sep 9, 2017 •

edited

Loading

mickdekkers commented Sep 9, 2017 •

edited

Loading

mattmillen888 commented Oct 18, 2023 •

edited

Loading

mattmillen888 commented Dec 7, 2023 •

edited

Loading

mattmillen888 commented Dec 7, 2023 •

edited

Loading

sirzento commented Dec 8, 2023 •

edited

Loading

mattmillen888 commented Dec 9, 2023 •

edited

Loading

ivanalemunioz commented Jan 27, 2024 •

edited

Loading

sirzento commented Feb 29, 2024 •

edited

Loading

ivanalemunioz commented Feb 29, 2024 •

edited

Loading

mattmillen888 commented Mar 1, 2024 •

edited

Loading

ivanalemunioz commented Aug 5, 2024 •

edited

Loading

SystemDisc commented Sep 1, 2024 •

edited

Loading