-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId. #800
Comments
Hi @ilg-ul, thanks for letting us know. The index file seems to be in sync with the specification in the documentation. There are no such elements like Cheers, |
Hi Liviu, We do not face any problem. Best regards, |
One issue could be the redirect that happens when accessing http://www.keil.com/pack/index.pidx. A file download using |
~~My first suggestion is to remove this attribute. ~~
|
Lucky you! Downloading and parsing locally seems to disable the schema validation. Here are the latest tests. Parsing directly from the URL:
Copying the file locally and parsing:
I first thought that the problem was introduced by updating the JDK to OpenJDK 13, but with the old 1.8 the behaviour was the same. I have no idea how it worked before... |
The xs:noNamespaceSchemaLocation="PackIndex.xsd" was added to the index.pidx in October 2018. |
I understand that the redirection was added recently, but I do not think it causes the issue described at stackoverflow. Here is the verbose curl output:
The file is not UTF-8 but text/plain and the downloaded file has no BOM, it starts directly with ASCII chars:
|
Yes, you're right, the file has no BOM but it should be proper UTF-8 encoding nevertheless. Might it happen that the stream reader you are using fails to detect proper encoding if there is no BOM right at the start? Any chance to force the stream reader to use UTF-8? |
Please note that exactly the same file is parsed by exactly the same code properly when copied locally. It should have nothing to do with encoding. And parsing was ok until recently, when something changed on your side.
|
Evgueni seems right, I uploaded the
So my guess that it has something to do with validation was not confirmed. A curl session looks like: ilg@wks ~ % curl -L -o index2.pidx https://github.com/ilg-ul/test-sax-validation/raw/master/index.pidx -v
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 140.82.118.3...
* TCP_NODELAY set
* Connected to github.com (140.82.118.3) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/cert.pem
CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
} [224 bytes data]
* TLSv1.2 (IN), TLS handshake, Server hello (2):
{ [108 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [3085 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [300 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [37 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
{ [1 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: businessCategory=Private Organization; jurisdictionCountryName=US; jurisdictionStateOrProvinceName=Delaware; serialNumber=5157550; C=US; ST=California; L=San Francisco; O=GitHub, Inc.; CN=github.com
* start date: May 8 00:00:00 2018 GMT
* expire date: Jun 3 12:00:00 2020 GMT
* subjectAltName: host "github.com" matched cert's "github.com"
* issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 Extended Validation Server CA
* SSL certificate verify ok.
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0> GET /ilg-ul/test-sax-validation/raw/master/index.pidx HTTP/1.1
> Host: github.com
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 302 Found
< Date: Fri, 17 Jan 2020 13:02:30 GMT
< Content-Type: text/html; charset=utf-8
< Transfer-Encoding: chunked
< Server: GitHub.com
< Status: 302 Found
< Vary: X-PJAX
< Access-Control-Allow-Origin: https://render.githubusercontent.com
< Location: https://raw.githubusercontent.com/ilg-ul/test-sax-validation/master/index.pidx
< Cache-Control: no-cache
< Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
< X-Frame-Options: deny
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< Expect-CT: max-age=2592000, report-uri="https://api.github.com/_private/browser/errors"
< Content-Security-Policy: default-src 'none'; base-uri 'self'; block-all-mixed-content; connect-src 'self' uploads.github.com www.githubstatus.com collector.githubapp.com api.github.com www.google-analytics.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com wss://live.github.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com; frame-ancestors 'none'; frame-src render.githubusercontent.com; img-src 'self' data: github.githubassets.com identicons.github.com collector.githubapp.com github-cloud.s3.amazonaws.com *.githubusercontent.com; manifest-src 'self'; media-src 'none'; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com
< Age: 0
< Vary: Accept-Encoding
< X-GitHub-Request-Id: DDEB:F596:27E1B4C:3B5193D:5E21B065
<
* Ignoring the response-body
{ [155 bytes data]
100 144 0 144 0 0 331 0 --:--:-- --:--:-- --:--:-- 330
* Connection #0 to host github.com left intact
* Issue another request to this URL: 'https://raw.githubusercontent.com/ilg-ul/test-sax-validation/master/index.pidx'
* Trying 151.101.16.133...
* TCP_NODELAY set
* Connected to raw.githubusercontent.com (151.101.16.133) port 443 (#1)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/cert.pem
CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
} [239 bytes data]
* TLSv1.2 (IN), TLS handshake, Server hello (2):
{ [108 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [3182 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [300 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [37 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
{ [1 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: C=US; ST=California; L=San Francisco; O=GitHub, Inc.; CN=www.github.com
* start date: Mar 23 00:00:00 2017 GMT
* expire date: May 13 12:00:00 2020 GMT
* subjectAltName: host "raw.githubusercontent.com" matched cert's "*.githubusercontent.com"
* issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 High Assurance Server CA
* SSL certificate verify ok.
> GET /ilg-ul/test-sax-validation/master/index.pidx HTTP/1.1
> Host: raw.githubusercontent.com
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline'; sandbox
< Strict-Transport-Security: max-age=31536000
< X-Content-Type-Options: nosniff
< X-Frame-Options: deny
< X-XSS-Protection: 1; mode=block
< ETag: W/"8c5f775585a16c5e8f27556fa1bd47117a66f17ae056af2b72affdaec243caa0"
< Content-Type: text/plain; charset=utf-8
< Cache-Control: max-age=300
< X-Geo-Block-List:
< Via: 1.1 varnish-v4
< X-GitHub-Request-Id: 3CA4:22F3:0333:03E3:5E21AF60
< Content-Length: 75423
< Accept-Ranges: bytes
< Date: Fri, 17 Jan 2020 13:02:30 GMT
< Via: 1.1 varnish
< Connection: keep-alive
< X-Served-By: cache-lcy19264-LCY
< X-Cache: HIT
< X-Cache-Hits: 1
< X-Timer: S1579266150.389912,VS0,VE1
< Vary: Authorization,Accept-Encoding
< Access-Control-Allow-Origin: *
< X-Fastly-Request-ID: 3196e178173b2a09b9bcb0fe77ef1a58b0687a1b
< Expires: Fri, 17 Jan 2020 13:07:30 GMT
< Source-Age: 261
<
{ [1875 bytes data]
100 75423 100 75423 0 0 104k 0 --:--:-- --:--:-- --:--:-- 104k
* Connection #1 to host raw.githubusercontent.com left intact
* Closing connection 0
* Closing connection 1
ilg@wks ~ %
|
Liviu, I asked the web hosting team if we can change the reported Cheers, |
:-( Long live Microsoft! |
Based on further tests, configuring the plug-ins to use the real address (https://sadevicepacksprodus.blob.core.windows.net/idxfile/index.pidx) avoids the problem, so the culprit is the redirection, not the content type. Could you compare the current redirection setup with the previous one, which worked, perhaps you can identify the problem? For completeness, the curl session looks like this: ilg@wks tmp % curl -L -o index3.pidx https://sadevicepacksprodus.blob.core.windows.net/idxfile/index.pidx -v
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 52.190.240.132...
* TCP_NODELAY set
* Connected to sadevicepacksprodus.blob.core.windows.net (52.190.240.132) port 443 (#0)
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/cert.pem
CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
} [255 bytes data]
* TLSv1.2 (IN), TLS handshake, Server hello (2):
{ [81 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [5238 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [333 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [70 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
{ [1 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
* subject: CN=*.blob.core.windows.net
* start date: May 2 00:41:38 2019 GMT
* expire date: May 2 00:41:38 2021 GMT
* subjectAltName: host "sadevicepacksprodus.blob.core.windows.net" matched cert's "*.blob.core.windows.net"
* issuer: C=US; ST=Washington; L=Redmond; O=Microsoft Corporation; OU=Microsoft IT; CN=Microsoft IT TLS CA 4
* SSL certificate verify ok.
> GET /idxfile/index.pidx HTTP/1.1
> Host: sadevicepacksprodus.blob.core.windows.net
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Length: 76375
< Content-Type: text/plain
< Last-Modified: Sat, 18 Jan 2020 04:01:56 GMT
< ETag: 0x8D79BCB28BA6D49
< Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
< x-ms-request-id: 42b49ce6-801e-007f-667d-cf7330000000
< x-ms-version: 2009-09-19
< x-ms-lease-status: unlocked
< x-ms-blob-type: AppendBlob
< x-ms-blob-committed-block-count: 1
< Date: Mon, 20 Jan 2020 10:33:14 GMT
<
{ [15980 bytes data]
100 76375 100 76375 0 0 62911 0 0:00:01 0:00:01 --:--:-- 62911
* Connection #0 to host sadevicepacksprodus.blob.core.windows.net left intact
* Closing connection 0
ilg@wks tmp % |
Any estimate when this issue will be addressed? As a workaround, I currently asked users to reconfigure their Eclipses to use the windows.net URL, but this is not a solution for long term. |
In your analysis above the file still gets delivered as |
First, this is not my issue, I use the XML SAX parser available in the Oracle JDK in the simplest and most obvious configuration. If I pass it the 'keil.com' URL, if fails; if I pass the windows.net URL, it passes; if I copy the file locally and pass the local URL, the parser passes again. The content type seems to have no importance.
Please compare the current redirection setup with the previous one, which worked, and fix the problem. |
Our web team is investigating the issue. But the solution we pointed out in the first place won't be enough. I cannot give you an estimate, but probably not before end of January. |
If you mean fixing the content type, yes, I guess that won't make any difference. Check the redirects. |
This comment has been minimized.
This comment has been minimized.
Whoops, ignore above, I forgot the
|
Hi @cdwilson, using
There is nothing basically wrong with the redirect itself. Its just a matter of coping with these redirects, properly. I am not an expert on that "XML SAX parser available in the Oracle JDK". Can you come up with a small command line reproducer revealing that issue? E.g. a java program I can run from command line in a similar way than above Cheers, |
Yup, I realized that right after I posted it... [facepalm] The original error message that @ilg-ul posted looks similar to the errors that curl is throwing when I forgot the
vs.
I wonder if there is some similar option to curl's |
I did some further tests and the problem is definitely related to the redirection. The problem is not in the SAX parser itself, but in the For reasons that I did not identify yet, in some cases this class does not follow redirections, and returns the error string issued by the server (html content). This string obviously is not a properly formed xml, and the SAX parser fails with that [Edit: The class does not follow redirections from http to https.] Can you confirm that before the move to
I'll try to identify the reason of this inconsistent behaviour, and a possible solution to avoid it. Evgueni @edriouk, any thoughts on this? |
Liviu, |
I checked and this property is already set to true. :-(
I already do this (actually I use an internal buffer), and the problem occurs when reading in the file via HttpURLConnection, instead of the xml I get the html error page. The only way out I can see now is to explicitly process redirects in my code, which is silly. |
you can have a look how our code in CpRepoServiceProvider.readIndexFile() works: |
Well, as far as I can recap we use redirection since quite a while. Need to dig deeper to understand if anything changed recently. To be honest, I don't know what we should do if the implementation you are using is causing the wired behavior. I cannot see that the redirect is somehow special and it works without issues using curl. |
Hi @ilg-ul, I got some feedback from the web team. They moved from redirecting to May I ask you to update the URL from http://www.keil.com/pack/index.pidx to https://www.keil.com/pack/index.pidx, please? Does this change anything on your end? Cheers, |
Indeed.
Yes, now it no longer throws the exception. It looks like the Java classes cannot redirect from http to https. Please note that your url change is not reflected by the documentation, which still points to http. https://arm-software.github.io/CMSIS_5/Pack/html/packIndexFile.html I think that you should explicitly announce this configuration change. |
Thank you Evgueni. Yes, you are explicitly processing redirects, and do not rely on moody implementations. Good to know. |
Resolution: the problem was caused by the recent change at Keil, which added a redirect from http to https, configuration not supported by the Java HttpURLConnect, which require to manually follow the redirections.
The error message is caused by the SAX parser trying to parse the html returned together with the 302 response.
It looks like something changed recently in the
index.pidx
file, crashing the SAX parser:The current file reads like:
I would suspect that the
PackIndex.xsd
requires a full absolute URL.The text was updated successfully, but these errors were encountered: