Skip to content

Option to force HTTPS in origin#1725

Closed
LotteHofstede wants to merge 6 commits intoDSpace:mainfrom
atmire:w2p-93027_option-to-force-HTTPS-in-origin_contribute
Closed

Option to force HTTPS in origin#1725
LotteHofstede wants to merge 6 commits intoDSpace:mainfrom
atmire:w2p-93027_option-to-force-HTTPS-in-origin_contribute

Conversation

@LotteHofstede
Copy link
Copy Markdown
Contributor

@LotteHofstede LotteHofstede commented Jul 15, 2022

References

Description

This PR adds a new configuration parameter forceHTTPSInOrigin to force to return the origin URL with the https:// protocol. This PR also uses the location.replace() function to redirect and make sure the browser can keep track of the browse history.

Instructions for Reviewers

In the environment variables, in the list of ui configurations set forceHTTPSInOrigin to true.
Test that the citation_pdf_url on an item page and make sure it always returns a URL that starts with https://.

To test the redirect change, download a file, and use the back button of your browser. You should end up back on the item page

Checklist

  • My PR is small in size (e.g. less than 1,000 lines of code, not including comments & specs/tests), or I have provided reasons as to why that's not possible.
  • My PR passes TSLint validation using yarn run lint
  • My PR doesn't introduce circular dependencies
  • My PR includes TypeDoc comments for all new (or modified) public methods and classes. It also includes TypeDoc for large or complex private methods.
  • My PR passes all specs/tests and includes new/updated specs or tests based on the Code Testing Guide.

@artlowel artlowel added bug component: SEO Search Engine Optimization high priority e/2 Estimate in hours labels Jul 15, 2022
@artlowel artlowel added this to the 7.4 milestone Jul 15, 2022
@tdonohue tdonohue self-requested a review July 21, 2022 14:49
@tdonohue tdonohue added the 1 APPROVAL pull request only requires a single approval to merge label Jul 21, 2022
@tdonohue tdonohue requested a review from abollini July 21, 2022 14:50
@abollini
Copy link
Copy Markdown
Member

is this something that we should really manage on the dspace side? to force a redirect to https (that is a best practice that we always enforce with an HSTS header) there are specific directive on the web server / balancer / gateway that will be put in front of node in any production deployment. I'm just not sure that we should implement our own solution for area that are under the responsibility of other component. I would like to hear other opinions on that

@artlowel
Copy link
Copy Markdown
Member

artlowel commented Jul 25, 2022

The goal is not to force a redirect, the goal is to solve the problem that if there is something in front of the node server that redirects from a public https url to an internal http url, that the code we use to determine the origin server side has no way of knowing that. As a result, things that use that getCurrentOrigin method, such as the citation_pdf_url meta-tag get the wrong protocol.

We added this forceHTTPSInOrigin boolean rather than a whole separate "public" UI section with the entire URL, because I think two UI sections would be confusing, and we are able to automatically determine the host.

I considered defaulting it to true as using HTTPS is indeed best practice, but that would mean it wouldn't be automatically backwards compatible.

Copy link
Copy Markdown
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LotteHofstede and @artlowel : Overall, this looks great to me, though I have a few minor suggestions.

First though, I wanted to note that I think @abollini misunderstood the purpose of this PR. It looks to me like it's two separate small "features" in one PR:

  1. The feature which ensures getCurrentOrigin() always returns an HTTPS URL whenever forceHTTPSInOrigin: true. This fixes a bug with SEO, as <meta> field URLs like citation_pdf_url return HTTP instead of HTTPS, even when the UI is running via HTTPS.
  2. A separate feature which changes the current redirect behavior to use location.replace() in order to allow the redirect to appear in your browse history of your browser.

I think the misunderstanding here is that these are not directly related features. There is no code in this PR which forces a redirect to HTTPS. In fact, getCurrentOrigin() is not used by the current redirect code...it's only used to populate <meta> tags (as far as I can see). But, it's confusing because getCurrentOrigin() is a method in the hard-redirect services.

That's a long explanation of why I feel this is perfectly OK to implement in the Angular UI.

All that said, I do feel it'd be useful to automate this setting more...as I fear it'll be easy to misunderstand or misconfigure. In my opinion, we might consider setting it to true by default when production mode is disabled....it only should be false by default if in dev mode. See my inline comment for more details.

Beyond that, I'm basically +1 this change.

Comment thread src/config/ui-server-config.interface.ts
@artlowel artlowel requested a review from tdonohue August 26, 2022 09:29
Copy link
Copy Markdown
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks @LotteHofstede ! This looks good to me. I've also tested in both production & dev mode. As documented, prod mode defaults to true (I see HTTPS in citation_pdf_url) and dev mode defaults to false (I see HTTP in citation_pdf_url). I've also verified I can switch the value in either mode.

@abollini : I'll give you a day or so to give this one last check, if you want. As I noted in my previous comment, I think your prior review was a misunderstanding of the code in this PR...but let me know if I've misunderstood your prior concerns.

@tdonohue tdonohue requested a review from atarix83 September 1, 2022 14:49
@tdonohue
Copy link
Copy Markdown
Member

tdonohue commented Sep 7, 2022

@atarix83 or @abollini : Have either of you had a chance to glance at this again? I'd like to get this merged soon if possible, as it's already at +1.

@abollini
Copy link
Copy Markdown
Member

The goal is not to force a redirect, the goal is to solve the problem that if there is something in front of the node server that redirects from a public https url to an internal http url, that the code we use to determine the origin server side has no way of knowing that. As a result, things that use that getCurrentOrigin method, such as the citation_pdf_url meta-tag get the wrong protocol.

We are still unable to reproduce that. We have exactly this kind of setup on many of the installations that we manage, a reverse proxy that deal with the HTTPS talking with an internal Angular/Node server over plain HTTP. Checking the metatag we see that in all the 7.2, 7.3 instances both the citation_abstract_html_url than the citation_pdf_url in HTTPS.

On an 7.1 installation we instead see that the citation_pdf_url is in HTTPS but the citation_abstract_html_url is in HTTP, we need to check deeper if this was a bug in the old version or some misconfiguration in the deployment.

@tdonohue
Copy link
Copy Markdown
Member

tdonohue commented Sep 15, 2022

@abollini : This issue is reproducible on the demo site right now. Here's a page where the citation_pdf_url shows HTTP on the first load: https://demo7.dspace.org/entities/publication/1974d7d3-9648-4dbe-b083-56374f5314e4

The issue is specific to when the page is rendered on the server side. If you look at client side generated content, this bug will never appear. However, if the page is loaded via SSR (which is highly likely when search engines are indexing), then this bug will appear.

@abollini
Copy link
Copy Markdown
Member

@artlowel @tdonohue we check further and indeed the issue comes when the SSR kick-in. I was initially fooled by the reload that happen automatically in the browser, so it is much better test via the command line.
That said, I'm still not convinced by the fix. The X-FORWARDED-FOR headers are used exactly to manage these scenarios and there is also a proto header here.
I have done a quick search and found this npm library that is used to abstract the access to the request to get into consideration these forward headers if available. Can you take a look to that? could this provide a general solution that would not require extra configuration and will work with a proper configuration of the reverse proxy?
https://www.npmjs.com/package/forwarded-for

@tdonohue
Copy link
Copy Markdown
Member

tdonohue commented Sep 15, 2022

@artlowel : @abollini 's latest comment reminded me about the X-FORWARDED-PROTO: https header, which is also required to be set on the backend (REST API) whenever it is run behind a proxy. See this question in our Install docs

We may want to double check whether that header has any impact on the Angular/Node side as well. As it might be that, if we require the proxy to set X-FORWARDED-PROTO: https (just like on the backend), then our Angular code could just access that header to determine if it needs to use HTTPS or HTTP URLs.

@artlowel
Copy link
Copy Markdown
Member

artlowel commented Sep 22, 2022

@abollini @tdonohue These x-forwarded headers are already present. I logged the request headers in the node server with the current httpd config on demo7.dspace.org and got the following:

{
  host: 'demo7.dspace.org',
  accept: 'text/html, application/rss+xml, application/atom+xml, text/xml, text/rss+xml, application/xhtml+xml',
  'accept-encoding': 'gzip,deflate',
  'user-agent': 'REDACTED',
  'x-forwarded-proto': 'https',
  'x-forwarded-for': 'REDACTED',
  'x-forwarded-host': 'demo7.dspace.org',
  'x-forwarded-server': 'dspace7-demo.atmire.com',
  connection: 'Keep-Alive'
}

The fact that they're present means they are not automatically taken into account when simply retrieving the protocol of the request.

@abollini The package you suggested wouldn't be a solution, it will give you the IP of the original requester, but says nothing about the origin or protocol of the public url

Something that could work is to test for the presence of those x-forwarded headers, and use their values instead of the default request properties for host and protocol if they exist. I've created #1850 which takes that approach. We should discuss which one to go ahead with in the meeting today.

Copy link
Copy Markdown
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted in my review of #1850 (#1850 (review)), I found that PR works just as well as this one and I prefer the approach in #1850.

Therefore, I'd recommend we close this PR as "won't fix" as soon as #1850 has a second approval.

@tdonohue tdonohue self-requested a review September 23, 2022 18:50
@tdonohue
Copy link
Copy Markdown
Member

Closing, as this is replaced by #1850 (see discussion in that ticket)

@tdonohue tdonohue closed this Sep 27, 2022
@tdonohue tdonohue removed this from the 7.4 milestone Sep 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1 APPROVAL pull request only requires a single approval to merge bug component: SEO Search Engine Optimization e/2 Estimate in hours high priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The "citation_pdf_url" meta tag download link doesn't use https

4 participants