-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: new helper functions to replace regexes #807 #1260
Conversation
src/util/HeaderUtil.ts
Outdated
const sortedLowerCaseSchemes = [ scheme, ...schemes ] | ||
.sort((s1, s2): number => s1.localeCompare(s2)) | ||
.map((item): string => item.toLowerCase()); | ||
if (!urlSchemeRegexCache.has(sortedLowerCaseSchemes)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition will always succeed given that we're creating a new array every time. Map
checks references, not contents. As a result, the map will blow up in size.
I think the simplest way here might be to just chop off the scheme bit from the URL and check if it occurs in the array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I'm used to Kotlin having meaningful equals/hashCode implementations for collections and arrays. I will pay better attention in the future! Your alternative implementation suggestion removes the need for a regex cache, so it reduces complexity (always a plus).
src/util/HeaderUtil.ts
Outdated
export function matchesAuthorizationScheme(authorization: string | undefined, scheme: string): boolean { | ||
const lowerCaseScheme = scheme.toLowerCase(); | ||
if (!authSchemeRegexCache.has(lowerCaseScheme)) { | ||
authSchemeRegexCache.set(lowerCaseScheme, new RegExp(`^${lowerCaseScheme} `, 'ui')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs escaping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in escaping any special regex characters that are being sneaked in via the scheme
argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed; we have escapeStringRegexp
for that already somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments mostly.
src/util/HeaderUtil.ts
Outdated
* @param scheme - Name of the authorization scheme (case insensitive). | ||
* @returns True if the Authorization header uses the specified scheme, false otherwise. | ||
*/ | ||
export function matchesAuthorizationScheme(authorization: string | undefined, scheme: string): boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export function matchesAuthorizationScheme(authorization: string | undefined, scheme: string): boolean { | |
export function matchesAuthorizationScheme(authorization?: string, scheme: string): boolean { |
And then the linter is going to complain that optional arguments should come last so you'll have to switch those two.
src/util/HeaderUtil.ts
Outdated
authSchemeRegexCache.set(lowerCaseScheme, new RegExp(`^${lowerCaseScheme} `, 'ui')); | ||
} | ||
// Support authorization being undefined (for the sake of usability). | ||
return authorization !== undefined && authSchemeRegexCache.get(lowerCaseScheme)!!.test(authorization); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return authorization !== undefined && authSchemeRegexCache.get(lowerCaseScheme)!!.test(authorization); | |
return typeof authorization !== 'undefined' && authSchemeRegexCache.get(lowerCaseScheme)!.test(authorization); |
This is slightly safer since in theory it is possible that someone modifies undefined
(although probably not really relevant). Or you can do Boolean(authorization && authSchemeRegexCache.get(lowerCaseScheme)!.test(authorization))
which is also valid.
Only 1 !
is needed to say that we guarantee the result is not undefined (also see something similar below).
src/util/HeaderUtil.ts
Outdated
export function hasScheme(url: string, scheme: string, ...schemes: string[]): boolean { | ||
// Generate the cache key for the scheme options: sort the items to avoid multiple entries for the same 'check'. | ||
const sortedLowerCaseSchemes = [ scheme, ...schemes ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export function hasScheme(url: string, scheme: string, ...schemes: string[]): boolean { | |
// Generate the cache key for the scheme options: sort the items to avoid multiple entries for the same 'check'. | |
const sortedLowerCaseSchemes = [ scheme, ...schemes ] | |
export function hasScheme(url: string, ...schemes: string[]): boolean { | |
// Generate the cache key for the scheme options: sort the items to avoid multiple entries for the same 'check'. | |
const sortedLowerCaseSchemes = schemes |
This does make it so you can call the function with no schemes but I feel it makes it clearer. Otherwise it seems there's a difference between the first and the rest.
src/util/HeaderUtil.ts
Outdated
.sort((s1, s2): number => s1.localeCompare(s2)) | ||
.map((item): string => item.toLowerCase()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would swap these 2 so casing has no effect on the sort.
Edit: but I see below that this will be removed anyway so not relevant I guess 😄
* @param urlPart - The URL part to sanitize. | ||
* @returns The sanitized output. | ||
*/ | ||
export function sanitizeUrlPart(urlPart: string): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export function sanitizeUrlPart(urlPart: string): string { | |
export function sanitizeName(urlPart: string): string { |
I can see us also using something like this to generate file names for example.
* @param name - The name of the file to validate. | ||
* @returns True if the filename is valid, false otherwise. | ||
*/ | ||
export function isValidFileName(name: string): boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export function isValidFileName(name: string): boolean { | |
export function isSimpleName(name: string): boolean { |
Perhaps a more debatable rename. @RubenVerborgh should come up with better alternatives if he's not happy since he always says I shouldn't name things 😉
@@ -58,6 +59,7 @@ export class BasicConditionsParser extends ConditionsParser { | |||
* Undefined if there is no value for the given header name. | |||
*/ | |||
private parseTagHeader(request: HttpRequest, header: 'if-match' | 'if-none-match'): string[] | undefined { | |||
return request.headers[header]?.trim().split(/\s*,\s*/u); | |||
const headerValue = request.headers[header]; | |||
return headerValue ? splitCommaSeparated(headerValue.trim()) : undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return headerValue ? splitCommaSeparated(headerValue.trim()) : undefined; | |
if (headerValue) { | |
return splitCommaSeparated(headerValue.trim()); | |
} |
It's a short function so I think we can fully write out the ternary operator.
@@ -139,7 +140,7 @@ export class RegistrationManager { | |||
// Parse WebID | |||
if (!validated.createWebId) { | |||
const trimmedWebId = this.trimString(webId); | |||
assert(trimmedWebId && /^https?:\/\/[^/]+/u.test(trimmedWebId), 'Please enter a valid WebID.'); | |||
assert(trimmedWebId && hasScheme(trimmedWebId, 'http', 'https'), 'Please enter a valid WebID.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes me realize this can also be done here:
if (!payload && this.name === 'Client' && /^https?:\/\/.+/u.test(id)) { |
Thanks for the feedback! |
All feedback has been integrated, except for renaming |
Regarding naming, we shouldn't forget that |
Sure, let's go with the original names then. We can always look into new names should it ever be relevant. |
Implemented new StringUtil helper functions: splitCommaSeparated, sanitizeUrlPart, isValidFileName. Added helper functions to HeaderUtil: matchesAuthorizationScheme, hasScheme. Added unit tests for the new helper functions. Refactored codebase to use helper functions instead of regexes if applicable.
I reverted the name change! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks.
📁 Related issues
#807
✍️ Description
The goal of these changes is to reduce the number of (repeated) regex occurrences in the codebase, by wrapping these into helper functions to promote reuse and increase readability.
A new set of helper function is introduced in
StringUtil.ts
:splitCommaSeparated
,sanitizeUrlPart
andisValidFileName
.HeaderUtil.ts
has been extended with helper functionsmatchesAuthorizationScheme
andhasScheme
. As both these functions make use of dynamic regexes, a simple cache mechanism using twoMap
instances was implemented.✅ PR check list
Before this pull request can be merged, a core maintainer will check whether