Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getHttps not downloading HTML #19

Closed
celtic-coder opened this issue Aug 21, 2023 · 2 comments
Closed

getHttps not downloading HTML #19

celtic-coder opened this issue Aug 21, 2023 · 2 comments

Comments

@celtic-coder
Copy link

Hi Thomas (@ThomasMertes),

The following Seed7 code is not producing any output:

Seed7-getHttps-Code-Listing

However, the following cURL command works correctly:

curl.exe "https://example.com/" --compressed -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:109.0) Gecko/20100101 Firefox/116.0" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8" -H "Accept-Language: en-US,en;q=0.5" -H "Accept-Encoding: gzip, deflate, br" -H "DNT: 1" -H "Connection: keep-alive" -H "Upgrade-Insecure-Requests: 1" -H "Sec-Fetch-Dest: document" -H "Sec-Fetch-Mode: navigate" -H "Sec-Fetch-Site: cross-site" -H "Pragma: no-cache" -H "Cache-Control: no-cache" --output example.com.html

Although, this is the cURL from https://curl.se/. The native cURL in Windows 10 fails with an error that the installed libcurl version doesn't support the "--compressed" option.

When I compile the program with the following options: s7c -tf -p, when run it gives the following output:

-> main
-> 7243_getHttps
-> 4165_getHttpLocation
-> 394_isDigitString
<- 394_isDigitString
-> 394_isDigitString
<- 394_isDigitString
<- 4165_getHttpLocation
-> 7220_openHttps
-> 7171_openTlsSocket
-> 2177_openInetSocket
-> 2170_openSocket
<- 2170_openSocket
<- 2177_openInetSocket
-> 7165_openTlsSocket
<- 7165_openTlsSocket
<- 7171_openTlsSocket
<- 7220_openHttps
<- 7243_getHttps
<- main

This would seem to indicate that the program is going through the correct steps, starting and ending with the getHttps. Also, the profile output gives:

usecs	calls	place	name
31997	1	Download-HTML.sd7(4)	main
26997	1	/c/seed7/lib/gethttps.s7i(123)	getHttps
24010	1	/c/seed7/lib/gethttps.s7i(33)	openHttps
20005	1	/c/seed7/lib/tls.s7i(1864)	openTlsSocket
16002	1	/c/seed7/lib/socket.s7i(182)	openInetSocket
998	1	/c/seed7/lib/gethttp.s7i(61)	getHttpLocation
0	2	/c/seed7/lib/seed7_05.s7i(792)	isDigitString
0	1	/c/seed7/lib/socket.s7i(133)	openSocket
0	1	/c/seed7/lib/tls.s7i(1828)	openTlsSocket

Given that cURL is working correctly, it might further indicate a possible problem with the getHttps function on my Windows 10 laptop. Are there any other steps where I could continue to troubleshoot this problem?

Kind Regards,
Liam

@ThomasMertes
Copy link
Owner

Hi Liam,

Thank you for your report. It helped me to fix the wrong documentation of getHttps.

The function getHttps actually expects a location as parameter instead of an URL. A location is an URL without https:// or http:// at the beginning.

I have corrected the wrong documentation in gethttps.s7i and gethttp.s7i

So, if you use getHttps("example.com") instead of getHttps("https://example.com/"), it will probably work. At least on my computer I succeed with:

$ include "seed7_05.s7i";
  include "gethttps.s7i";

const proc: main is func
local
var string: Page_HTML is "";
  begin
  Page_HTML := getHttps("example.com");
          if Page_HTML <> "" then
	        writeln(Page_HTML);
          end if;
  end func;

This writes the HTML of example.com. Now I could create a DOM from the HTML string with readHtml.

When I implemented getHttp it seemed ridiculous to specify http twice (in the function name and in the URL). So I decided that http:// must be omitted from the URL. I just forgot to document my approach and I also kept url as parameter name. Now the parameter is named location and the documentation contains an explanation and some examples.

@celtic-coder
Copy link
Author

Hi Thomas (@ThomasMertes),

Thanks for making the documentation changes! I can confirm that the HTML for example.com now downloads as expected.

Kind Regards,
Liam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants