New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow using special characters in creator name and tags #3793
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really surprised the API handles this so well, but there is a big edge case... ?
!
http://0.0.0.0:50280/v1/images/source/flickr/creator////me////?you-and-them/\////
Leads to the following query:
openverse-web-1 | {'from': 0,
openverse-web-1 | 'query': {'bool': {'filter': [{'term': {'source': 'flickr'}},
openverse-web-1 | {'term': {'creator.keyword': '///me///'}}],
openverse-web-1 | 'must_not': [{'term': {'mature': True}}]}},
openverse-web-1 | 'size': 40,
openverse-web-1 | 'sort': [{'created_on': {'order': 'desc'}}]}
I wonder if we should just percent encode the creator name? Or maybe conditionally percent encode, at least ?
, and any other significant things we could find?
Not requesting changes because it could be considered a separate issue. Let me know!
const removeSlashes = (path: string): string => { | ||
return path.replace(/^\/|\/$/g, "") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this only remove leading and trailing slashes from the path? What about trailing or leading slashes in creator names? e.g., image/source/flickr//edgycreator//
where the creator name is /edgycreator/
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@obulat have you seen this question? Just want to make sure that path
gets passed in with the /
s from the path, and not the raw creator name characters on both ends of the string and always.
If that is the case, then can we just do path.substring(1, path.length - 1)
and avoid the regex altogether? It would communicate things a bit more clearly, especially if the function is renamed to trimLeadingAndTrailingSlashes
(or some less verbose version 😁 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm realising that potentially my question doesn't make sense anymore, because all slashes that are elements of the creator's name should be URL encoded, right? I am still waiting to approve just to make sure, because http://192.168.20.75:8443/audio/source/jamendo/creator/%2FThe.madpix.project%3F/, with a leading slash in the creator name URL encoded does work, but http://192.168.20.75:8443/audio/source/jamendo/creator//The.madpix.project%3F/ does not.
However, the PR description says that we should be able to visit http://localhost:8443/audio/source/wikimedia_audio/creator/wiki/Jeuwre/, but that doesn't work either, it 404s. The URL encoded version does work.
Except, they only work when navigate to client side. Navigating to them SSR (i.e., reloading the page) does not work. Is it an issue with preferences, perhaps? I also don't see the additional search view links on the single result page, so I think it may be an issue outside this PR, but I want to make absolutely sure it is working the right way, because what I am experiencing differs from the PR description.
8080a7a
to
8be847a
Compare
This was exactly my thought for approaching this too! I'm not sure how much it matters, but leaving the |
I added encoding to the link that we create for the creator button in 9a535b9. If you want to access a page for a creator that has special symbols like |
8be847a
to
fa57822
Compare
I'm not sure if it's just my machine, but I'm having a really hard time testing this 😞 lots of the pages aren't loading, and I can't navigate to any of the links you posted (even after trying |
This is weird. I tried checking out the branch again, and it works well for me locally 🙃 Just in case, here are the first steps I used for testing:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More a request for clarification than a change, but I do have an additional fix to the sample audio data.
const removeSlashes = (path: string): string => { | ||
return path.replace(/^\/|\/$/g, "") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@obulat have you seen this question? Just want to make sure that path
gets passed in with the /
s from the path, and not the raw creator name characters on both ends of the string and always.
If that is the case, then can we just do path.substring(1, path.length - 1)
and avoid the regex altogether? It would communicate things a bit more clearly, especially if the function is renamed to trimLeadingAndTrailingSlashes
(or some less verbose version 😁 )
const removeSlashes = (path: string): string => { | ||
return path.replace(/^\/|\/$/g, "") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm realising that potentially my question doesn't make sense anymore, because all slashes that are elements of the creator's name should be URL encoded, right? I am still waiting to approve just to make sure, because http://192.168.20.75:8443/audio/source/jamendo/creator/%2FThe.madpix.project%3F/, with a leading slash in the creator name URL encoded does work, but http://192.168.20.75:8443/audio/source/jamendo/creator//The.madpix.project%3F/ does not.
However, the PR description says that we should be able to visit http://localhost:8443/audio/source/wikimedia_audio/creator/wiki/Jeuwre/, but that doesn't work either, it 404s. The URL encoded version does work.
Except, they only work when navigate to client side. Navigating to them SSR (i.e., reloading the page) does not work. Is it an issue with preferences, perhaps? I also don't see the additional search view links on the single result page, so I think it may be an issue outside this PR, but I want to make absolutely sure it is working the right way, because what I am experiencing differs from the PR description.
@AetherUnbound I am not having the same issue as you. We can pair this week to try to discover why your local is behaving differently, if you would like. |
5bd37c8
to
22722b5
Compare
I was having problems testing the creator names that start with So, I changed the code a little bit to only allow for URL-encoded creator names. Now, if you use a non-encoded slash I also moved the path collectionParams validation to the middleware because throwing an error in the setup function shows the Nuxt generic error page instead of Openverse yellow error page. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code LGTM. I'm unable to test this locally at the moment, but will do so later.
Something I didn't realise about the implementation until now, is the use of _.vue
, rather than typical path parameters. Was that originally to allow for /
in the creator names? If we are now disallowing it, could we switch to using regular Nuxt path params by changing the file names to audio/source/[source].vue
, audio/source/[source]/creator/[creator].vue
, and so forth? Maybe it's even possible to use [mediaType]/source/[source]/creator/[creator].vue
, if that would simplify the logic altogether (I'm not sure if this is the case).
Would this overall simplify the implementation of these pages or is it more complex for some reason?
Also, do we need to handle URL encoding tags? I don't know how much normalisation we do to them on the presentation layer.
22722b5
to
af9b908
Compare
I tried this at first, but had a problem where the nested route ( I went back to the code, and made some more changes to make sure that the tags and creators with special characters work both on SSR and on the client-side navigation. I also added e2e tests (with the special characters set in tapes for easier testing), so I'm confident that parsing works correctly now. Note: previously, when I was doing some testing using the local API, I couldn't understand why the pages were not rendered. Then I found that Jamendo results were filtered as dead links... Parsing of the path was moved to a single middleware. The source and the tags are saved to the media store |
frontend/src/components/VCollectionHeader/VCollectionHeader.vue
Outdated
Show resolved
Hide resolved
e78c0a3
to
789c679
Compare
Unfortunately SSR is still broken for creator names, for me :( I think that this might be actually an issue with trying to use the path at all, potentially with the browser decoding the percent-encoded slash into a plaintext slash upon page load. Then reloading, or copying the URL, results in an usuable URL, because it gives back the URL with the plain-text slash ( ssrbroken.mp4This worries me a bit, because I think it invalidates a core supposition that we had about how to approach this from a technical perspective. Specifically, I think it means we should actually use query parameters instead of trying to treat the creator (and tags) as a RESTful resource. Source is (theoretically) safe from this, because we control the slugs that are used for them, and can ensure they are safe for URL path parameters (do not have Unfortunately, I think that means taking a big step back and rethinking at least the creator and tag endpoints, on both the frontend and API side, to ensure they're able to handle the full spectrum of possible values. |
I'm having the same difficulties as Sara - even hitting refresh on a page that's client-side rendered will then cause the loading to fail from SSR 😞 And the links in the PR description don't work for accessing media locally. Is moving those potential user-generated values to query parameters the only way? That is disappointing that we seem to be fighting the browser on that front 😕 |
I think it's more that we're fighting the HTTP standard, rather than the browser 🤷 Trying to keep a / in the path parameter was always going to be iffy, that's one of the few significant characters in a URI. For what it's worth, I don't think the query parameter is all bad, and the URL can still look nice, especially because we control source:
In fact, it'll be better, because at least in Firefox, you can type whatever cleartext you want in a query parameter, and the browser will URL encode it at request time on your behalf. We were never going to get that with path parameters. Unfortunately, because we haven't been careful about adding new features to the API before verifying the work for all our cases, we'll need to break the API version and remove the creator and tag routes. |
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
6771f31
to
e85b6a8
Compare
Yes, having a slash in the URL is definitely iffy. But I think I found a solution: if we encode the slash in creator and tag twice, we can make it work without having to fight with the browser's behavior with I added the test that handles reloading of the client-rendered page to make sure this case is not missed. You can test the cases of
|
I found a way of allowing for typing creator name as cleartext in the path parameter: in the However, this would not allow for a clear text The easiest solution would be not to add clear text handling to the path, i.e, accept only encoded values. |
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
I think the issue with this is that browser (or at least Firefox) automatically converts URL encoded values in the path to clear text, so if you copy/paste that value, it ends up as cleartext. It does this with query parameters as well, but browsers know to URL encode query parameters before sending them (because that's standard), whereas they won't know to do that for the path (as it is unusual). You can see this in my video above. When I navigate to the creator page, despite the link in the HTML actually being url encoded, Firefox even displays it as cleartext in the link preview in the bottom left corner. After navigating to the page, it is cleartext in the URL bar, and when I copy it, it is copied as cleartext. I think we are doing something fundamentally incompatible with the way browsers expect paths to work. We need to use an approach that will not cause issues when the browser converts the URL to plain text. The only two options I can think of are: using a normalised version of creator names that encodes Because the first option there, using a custom encoding, is really just another version of fighting the browser and has implications for our API's usability and documentation, my strong preference is for the second, query params. Unless there is some other way we haven't mentioned here yet of handling them that will not fight the browser's behaviour and is inline with standard behaviours of HTTP spec. The bottom line is that no one is going to think that they need to URL encode a path parameter, as it just is not common (I don't know if I've ever seen it). Keep in mind this is not just a frontend issue, the API has the same problems with |
Based on the high urgency of this PR, the following reviewers are being gently reminded to review this PR: @AetherUnbound Excluding weekend1 days, this PR was ready for review 7 day(s) ago. PRs labelled with high urgency are expected to be reviewed within 2 weekday(s)2. @obulat, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings. Footnotes
|
I've removed Dhruv as a requested reviewer, I stepped in for him instead. Also drafted this so that we stop getting review pings while we continue to discuss this and find a solution that will work long term and for all cases. |
Closing this PR as we've decided to change the approach and use query parameters for the additional search views. |
Fixes
Fixes #3791 by @AetherUnbound
Description
This PR changes the way the path is parsed to extract the creator name to allow for URL-encoded special symbols like
/
and?
within it. The creator name is also encoded when creating the app creator page link.To enable testing this change locally, I added special symbols to the creator names sample data.
Testing Instructions
Run the local API.
Then, run the Nuxt app using the local API by running the app using
API_URL=http://localhost:50270 just frontend/run dev:only
Enable
additional_search_views
on the/preferences
page.Go tolocalhost:8443/audio/source/wikimedia_audio
and navigate to one of the results that have a slash in the creator name. From that single result page, click on the creator link to open the creator view - there should be no errors here.Search for "Wish You Were Here" in audio. You should see an item with a question mark in the creator name. If you click on it, you should see that the name is encoded in the URL: http://localhost:8443/audio/source/jamendo/creator/The.madpix.project%3F/
If you search for "German term Vulkan" in audio, you should see an item with a
/
in the creator name. If you click on it, you should see the slash as is in the URL, however, this slash is actually encoded in the app code. You can get to this page using this URL: http://localhost:8443/audio/source/wikimedia_audio/creator/wiki%2FJeuwre/Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin