Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new helper functions to replace regexes #807 #1260

Merged
merged 1 commit into from
Apr 13, 2022

Conversation

wkerckho
Copy link
Contributor

📁 Related issues

#807

✍️ Description

The goal of these changes is to reduce the number of (repeated) regex occurrences in the codebase, by wrapping these into helper functions to promote reuse and increase readability.

A new set of helper function is introduced in StringUtil.ts: splitCommaSeparated, sanitizeUrlPart and isValidFileName.
HeaderUtil.ts has been extended with helper functions matchesAuthorizationScheme and hasScheme. As both these functions make use of dynamic regexes, a simple cache mechanism using two Map instances was implemented.

✅ PR check list

Before this pull request can be merged, a core maintainer will check whether

  • this PR is labeled with the correct semver label
    • semver.patch: Backwards compatible bug fixes.
    • semver.minor: Backwards compatible feature additions.
    • semver.major: Breaking changes. This includes changing interfaces or configuration behaviour.
  • the correct branch is targeted. Patch updates can target main, other changes should target the latest versions/* branch.
  • the RELEASE_NOTES.md document in case of relevant feature or config changes.
  • any relevant documentation was updated to reflect the changes in this PR.

@RubenVerborgh RubenVerborgh requested review from joachimvh and removed request for RubenVerborgh April 11, 2022 14:50
@RubenVerborgh RubenVerborgh added the semver.minor Requires a minor version bump label Apr 11, 2022
const sortedLowerCaseSchemes = [ scheme, ...schemes ]
.sort((s1, s2): number => s1.localeCompare(s2))
.map((item): string => item.toLowerCase());
if (!urlSchemeRegexCache.has(sortedLowerCaseSchemes)) {
Copy link
Member

@RubenVerborgh RubenVerborgh Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition will always succeed given that we're creating a new array every time. Map checks references, not contents. As a result, the map will blow up in size.

I think the simplest way here might be to just chop off the scheme bit from the URL and check if it occurs in the array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I'm used to Kotlin having meaningful equals/hashCode implementations for collections and arrays. I will pay better attention in the future! Your alternative implementation suggestion removes the need for a regex cache, so it reduces complexity (always a plus).

export function matchesAuthorizationScheme(authorization: string | undefined, scheme: string): boolean {
const lowerCaseScheme = scheme.toLowerCase();
if (!authSchemeRegexCache.has(lowerCaseScheme)) {
authSchemeRegexCache.set(lowerCaseScheme, new RegExp(`^${lowerCaseScheme} `, 'ui'));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs escaping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in escaping any special regex characters that are being sneaked in via the scheme argument?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed; we have escapeStringRegexp for that already somewhere.

Copy link
Member

@joachimvh joachimvh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments mostly.

* @param scheme - Name of the authorization scheme (case insensitive).
* @returns True if the Authorization header uses the specified scheme, false otherwise.
*/
export function matchesAuthorizationScheme(authorization: string | undefined, scheme: string): boolean {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export function matchesAuthorizationScheme(authorization: string | undefined, scheme: string): boolean {
export function matchesAuthorizationScheme(authorization?: string, scheme: string): boolean {

And then the linter is going to complain that optional arguments should come last so you'll have to switch those two.

authSchemeRegexCache.set(lowerCaseScheme, new RegExp(`^${lowerCaseScheme} `, 'ui'));
}
// Support authorization being undefined (for the sake of usability).
return authorization !== undefined && authSchemeRegexCache.get(lowerCaseScheme)!!.test(authorization);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return authorization !== undefined && authSchemeRegexCache.get(lowerCaseScheme)!!.test(authorization);
return typeof authorization !== 'undefined' && authSchemeRegexCache.get(lowerCaseScheme)!.test(authorization);

This is slightly safer since in theory it is possible that someone modifies undefined (although probably not really relevant). Or you can do Boolean(authorization && authSchemeRegexCache.get(lowerCaseScheme)!.test(authorization)) which is also valid.

Only 1 ! is needed to say that we guarantee the result is not undefined (also see something similar below).

Comment on lines 580 to 582
export function hasScheme(url: string, scheme: string, ...schemes: string[]): boolean {
// Generate the cache key for the scheme options: sort the items to avoid multiple entries for the same 'check'.
const sortedLowerCaseSchemes = [ scheme, ...schemes ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export function hasScheme(url: string, scheme: string, ...schemes: string[]): boolean {
// Generate the cache key for the scheme options: sort the items to avoid multiple entries for the same 'check'.
const sortedLowerCaseSchemes = [ scheme, ...schemes ]
export function hasScheme(url: string, ...schemes: string[]): boolean {
// Generate the cache key for the scheme options: sort the items to avoid multiple entries for the same 'check'.
const sortedLowerCaseSchemes = schemes

This does make it so you can call the function with no schemes but I feel it makes it clearer. Otherwise it seems there's a difference between the first and the rest.

Comment on lines 583 to 584
.sort((s1, s2): number => s1.localeCompare(s2))
.map((item): string => item.toLowerCase());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would swap these 2 so casing has no effect on the sort.

Edit: but I see below that this will be removed anyway so not relevant I guess 😄

* @param urlPart - The URL part to sanitize.
* @returns The sanitized output.
*/
export function sanitizeUrlPart(urlPart: string): string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export function sanitizeUrlPart(urlPart: string): string {
export function sanitizeName(urlPart: string): string {

I can see us also using something like this to generate file names for example.

* @param name - The name of the file to validate.
* @returns True if the filename is valid, false otherwise.
*/
export function isValidFileName(name: string): boolean {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export function isValidFileName(name: string): boolean {
export function isSimpleName(name: string): boolean {

Perhaps a more debatable rename. @RubenVerborgh should come up with better alternatives if he's not happy since he always says I shouldn't name things 😉

@@ -58,6 +59,7 @@ export class BasicConditionsParser extends ConditionsParser {
* Undefined if there is no value for the given header name.
*/
private parseTagHeader(request: HttpRequest, header: 'if-match' | 'if-none-match'): string[] | undefined {
return request.headers[header]?.trim().split(/\s*,\s*/u);
const headerValue = request.headers[header];
return headerValue ? splitCommaSeparated(headerValue.trim()) : undefined;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return headerValue ? splitCommaSeparated(headerValue.trim()) : undefined;
if (headerValue) {
return splitCommaSeparated(headerValue.trim());
}

It's a short function so I think we can fully write out the ternary operator.

@@ -139,7 +140,7 @@ export class RegistrationManager {
// Parse WebID
if (!validated.createWebId) {
const trimmedWebId = this.trimString(webId);
assert(trimmedWebId && /^https?:\/\/[^/]+/u.test(trimmedWebId), 'Please enter a valid WebID.');
assert(trimmedWebId && hasScheme(trimmedWebId, 'http', 'https'), 'Please enter a valid WebID.');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes me realize this can also be done here:

if (!payload && this.name === 'Client' && /^https?:\/\/.+/u.test(id)) {

@wkerckho
Copy link
Contributor Author

Thanks for the feedback!
I've already updated my changes to reflect Ruben's comments and I will be integrating Joachim's suggestions asap!

@wkerckho
Copy link
Contributor Author

All feedback has been integrated, except for renaming isValidFileName (waiting on @RubenVerborgh's input on an appropriate name).

@RubenVerborgh
Copy link
Member

Regarding naming, we shouldn't forget that sanitizeName and isSimpleName are both ending up in the global list of exports. I.e., import { isSimpleName } from '@solid/community-server'. So I don't think they are sufficiently descriptive and I'd be inclined to keep the names as they are.

@joachimvh
Copy link
Member

So I don't think they are sufficiently descriptive and I'd be inclined to keep the names as they are.

Sure, let's go with the original names then. We can always look into new names should it ever be relevant.

Implemented new StringUtil helper functions: splitCommaSeparated, sanitizeUrlPart, isValidFileName.
Added helper functions to HeaderUtil: matchesAuthorizationScheme, hasScheme.
Added unit tests for the new helper functions.
Refactored codebase to use helper functions instead of regexes if applicable.
@wkerckho
Copy link
Contributor Author

I reverted the name change!

Copy link
Member

@joachimvh joachimvh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks.

@joachimvh joachimvh merged commit 283c301 into versions/4.0.0 Apr 13, 2022
@joachimvh joachimvh deleted the fix/issue-807 branch April 13, 2022 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver.minor Requires a minor version bump
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants