New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DS-1089 Feedback form breaks when hostname is short #195
Conversation
Fix for a string index out of bounds error
looks good, +1 |
If I understand this, the original code assumes that the FQDN is hostname.domain.domain, while the revised code assumes that it is either hostname.domain.domain or hostname.domain. Instead of clipping domains off the right end, shouldn't we just take the leftmost longest substring not containing a period? That is: int firstDot = host.indexOf('.'); basicHost = (firstDot < 0) ? host : host.substring(0, firstDot); |
Correct me if I'm wrong, but wouldn't your code leave "www" as the hostname if you would feed it www.myrepo.edu? |
If the FQDN is "www.myrepo.edu", then "www" is the hostname. What do you think it should be? I may have completely misunderstood the intent here. |
The reason why this is confusing is because this code doesn't intend to find out the hostname, it wants to make sure that all of example.com, www.example.com, www.sub.example.com etc. are allowed as referrers when only example.com is listed as mail.allowed.referrers. |
The original comment before the functionality states: // cut off all but the hostname, to cover cases where more than one URL So in this context I think it was the original author's intent to keep the top level domain+extension. e.g. www.atmire.com should all end up as atmire.com The original code fails because it assumes there are always at least two "." symbols. Which is not the case for short hostnames like atmire.com (without www) |
Ah, I was reading "cut off" to mean "cut off and throw away". That's what I get for looking only at the diff. So, if the desire is that "any.old.thing.example.com" should match "example.com", rather than making assumptions about how many name elements there are, we should simply compare the rightmost portion of the name with the name from mail.allowed.referrers. That is, in RE terms: if the allowed referrer is "example.com", then the presented name must match "example.com$" Otherwise we not only have problems with short allowed names, but with long ones such as "library.kcl.ac.uk". Looking at the surronding code, I think we have other problems. "String.indexOf(String) != -1" does not mean "occurs at the end"; it means "occurs anywhere". So "example.com" would match "badbot.example.com.roguesite.porn". |
Yes, Mark, while writing my response I also realized that is what we want. With one difference, we don't want to allow xxxexample.com, so the comparison should be separated by a dot (".example.com$") unless it equals to mail.allowed.referrers ("example.com$"). |
in an attempt to revive this: Instead of doing all of this cutting ourselves, maybe we can use URI.getHost() ? http://docs.oracle.com/javase/7/docs/api/java/net/URI.html#getHost() http://stackoverflow.com/questions/9607903/get-domain-name-from-given-url |
Hi Bram, thanks for the tip, but the problem wasn't extracting the domain name from the URL, it was subdomain matching (see my first comment above). |
The more I look at the code, the more confused I get. The patch doesn't seem to be related to mail.allowed.referers code at all (although that code needs work too). The code to be patched seems to be trying to match the referer to the DSpace host -- that is, ${dspace.hostname} is always implicitly a member of mail.allowed.referers -- or to a stem of it, to allow for things like example.org CNAME www.example.org. We have to be careful about this stemming. Not only might the offered name be too short to remove two labels ("example.org"); stemming "example.org" produces "org" which is too short for a sensible match. If stemming makes sense at all, I think it should be: remove one leftmost label from the domain IFF the result would contain at least two labels. ("foo.example.org" -> "example.org" and "foo.bar.example.org" -> "bar.example.org" but "example.org" -> "example.org".) The patched code does seem to do that. However I wonder if the intended behavior is too clever. Shouldn't we just look for an exact match to ${dspace.hostname} or to any element of ${mail.allowed.referer} and document that if you want stemming then you must specify it explicitly in mail.allowed.referer? I think we have assumed too much. Also, I can't find where this stemming behavior is documented. |
In #duraspace on 17-Jul-2013, helix84 commented that it would be good to have unit tests for this behavior. |
Hi Mark. It's true that may be trying to be too clever here. I was only trying to fix a common bug by implementing the intended behaviour correctly and now it turns out to be a little complex. The reason why the intended behaviour exists is that you might have a dozen subdomains, e.g. dspace.university.edu, blog.university.edu, library.university.edu etc. and you want to allow linking from all of them to the feedback page. The subdomains may change over time and this way you wouldn't need to bother the dspace admin to add each one of them. So we may decide to
I now see what you mean - that stemming is implemented only for the dspace hostname and not in the allowedReferrersSplit loop. It's an open question whether we want it there, too. My example above suggests it would have a use. |
Atmire CUA changes from early 2016
Closing as outdated/obsolete (no activity in several years). Please resubmit if you are interested in helping move this forward. Thanks! |
[D4CRIS-1033] use transfer manager to support multipart upload and to avoid reaching max number of connections on S3
Feature relationship versioning
Fix for a string index out of bounds error
https://jira.duraspace.org/browse/DS-1089
Rebased #193 on the correct branch (master)