feat: new helper functions to replace regexes #807 #1260

wkerckho · 2022-04-11T14:49:44Z

📁 Related issues

✍️ Description

The goal of these changes is to reduce the number of (repeated) regex occurrences in the codebase, by wrapping these into helper functions to promote reuse and increase readability.

A new set of helper function is introduced in StringUtil.ts: splitCommaSeparated, sanitizeUrlPart and isValidFileName.
HeaderUtil.ts has been extended with helper functions matchesAuthorizationScheme and hasScheme. As both these functions make use of dynamic regexes, a simple cache mechanism using two Map instances was implemented.

✅ PR check list

Before this pull request can be merged, a core maintainer will check whether

this PR is labeled with the correct semver label
- semver.patch: Backwards compatible bug fixes.
- semver.minor: Backwards compatible feature additions.
- semver.major: Breaking changes. This includes changing interfaces or configuration behaviour.
the correct branch is targeted. Patch updates can target main, other changes should target the latest versions/* branch.
the RELEASE_NOTES.md document in case of relevant feature or config changes.
any relevant documentation was updated to reflect the changes in this PR.

RubenVerborgh · 2022-04-11T14:53:55Z

src/util/HeaderUtil.ts

+  const sortedLowerCaseSchemes = [ scheme, ...schemes ]
+    .sort((s1, s2): number => s1.localeCompare(s2))
+    .map((item): string => item.toLowerCase());
+  if (!urlSchemeRegexCache.has(sortedLowerCaseSchemes)) {


This condition will always succeed given that we're creating a new array every time. Map checks references, not contents. As a result, the map will blow up in size.

I think the simplest way here might be to just chop off the scheme bit from the URL and check if it occurs in the array.

Oops, I'm used to Kotlin having meaningful equals/hashCode implementations for collections and arrays. I will pay better attention in the future! Your alternative implementation suggestion removes the need for a regex cache, so it reduces complexity (always a plus).

RubenVerborgh · 2022-04-11T14:55:09Z

src/util/HeaderUtil.ts

+export function matchesAuthorizationScheme(authorization: string | undefined, scheme: string): boolean {
+  const lowerCaseScheme = scheme.toLowerCase();
+  if (!authSchemeRegexCache.has(lowerCaseScheme)) {
+    authSchemeRegexCache.set(lowerCaseScheme, new RegExp(`^${lowerCaseScheme} `, 'ui'));


Needs escaping.

As in escaping any special regex characters that are being sneaked in via the scheme argument?

Indeed; we have escapeStringRegexp for that already somewhere.

joachimvh

Some minor comments mostly.

joachimvh · 2022-04-12T07:28:34Z

src/util/HeaderUtil.ts

+ * @param scheme - Name of the authorization scheme (case insensitive).
+ * @returns True if the Authorization header uses the specified scheme, false otherwise.
+ */
+export function matchesAuthorizationScheme(authorization: string | undefined, scheme: string): boolean {


Suggested change

export function matchesAuthorizationScheme(authorization: string | undefined, scheme: string): boolean {

export function matchesAuthorizationScheme(authorization?: string, scheme: string): boolean {

And then the linter is going to complain that optional arguments should come last so you'll have to switch those two.

joachimvh · 2022-04-12T07:35:16Z

src/util/HeaderUtil.ts

+    authSchemeRegexCache.set(lowerCaseScheme, new RegExp(`^${lowerCaseScheme} `, 'ui'));
+  }
+  // Support authorization being undefined (for the sake of usability).
+  return authorization !== undefined && authSchemeRegexCache.get(lowerCaseScheme)!!.test(authorization);


Suggested change

return authorization !== undefined && authSchemeRegexCache.get(lowerCaseScheme)!!.test(authorization);

return typeof authorization !== 'undefined' && authSchemeRegexCache.get(lowerCaseScheme)!.test(authorization);

This is slightly safer since in theory it is possible that someone modifies undefined (although probably not really relevant). Or you can do Boolean(authorization && authSchemeRegexCache.get(lowerCaseScheme)!.test(authorization)) which is also valid.

Only 1 ! is needed to say that we guarantee the result is not undefined (also see something similar below).

joachimvh · 2022-04-12T07:40:07Z

src/util/HeaderUtil.ts

+export function hasScheme(url: string, scheme: string, ...schemes: string[]): boolean {
+  // Generate the cache key for the scheme options: sort the items to avoid multiple entries for the same 'check'.
+  const sortedLowerCaseSchemes = [ scheme, ...schemes ]


Suggested change

export function hasScheme(url: string, scheme: string, ...schemes: string[]): boolean {

// Generate the cache key for the scheme options: sort the items to avoid multiple entries for the same 'check'.

const sortedLowerCaseSchemes = [ scheme, ...schemes ]

export function hasScheme(url: string, ...schemes: string[]): boolean {

// Generate the cache key for the scheme options: sort the items to avoid multiple entries for the same 'check'.

const sortedLowerCaseSchemes = schemes

This does make it so you can call the function with no schemes but I feel it makes it clearer. Otherwise it seems there's a difference between the first and the rest.

joachimvh · 2022-04-12T07:41:53Z

src/util/HeaderUtil.ts

+    .sort((s1, s2): number => s1.localeCompare(s2))
+    .map((item): string => item.toLowerCase());


Would swap these 2 so casing has no effect on the sort.

Edit: but I see below that this will be removed anyway so not relevant I guess 😄

joachimvh · 2022-04-12T07:47:37Z

src/util/StringUtil.ts

+ * @param urlPart - The URL part to sanitize.
+ * @returns The sanitized output.
+ */
+export function sanitizeUrlPart(urlPart: string): string {


Suggested change

export function sanitizeUrlPart(urlPart: string): string {

export function sanitizeName(urlPart: string): string {

I can see us also using something like this to generate file names for example.

joachimvh · 2022-04-12T07:48:50Z

src/util/StringUtil.ts

+ * @param name - The name of the file to validate.
+ * @returns True if the filename is valid, false otherwise.
+ */
+export function isValidFileName(name: string): boolean {


Suggested change

export function isValidFileName(name: string): boolean {

export function isSimpleName(name: string): boolean {

Perhaps a more debatable rename. @RubenVerborgh should come up with better alternatives if he's not happy since he always says I shouldn't name things 😉

joachimvh · 2022-04-12T07:51:36Z

src/http/input/conditions/BasicConditionsParser.ts

@@ -58,6 +59,7 @@ export class BasicConditionsParser extends ConditionsParser {
   * Undefined if there is no value for the given header name.
   */
  private parseTagHeader(request: HttpRequest, header: 'if-match' | 'if-none-match'): string[] | undefined {
-    return request.headers[header]?.trim().split(/\s*,\s*/u);
+    const headerValue = request.headers[header];
+    return headerValue ? splitCommaSeparated(headerValue.trim()) : undefined;


Suggested change

return headerValue ? splitCommaSeparated(headerValue.trim()) : undefined;

if (headerValue) {

return splitCommaSeparated(headerValue.trim());

}

It's a short function so I think we can fully write out the ternary operator.

joachimvh · 2022-04-12T07:53:34Z

src/identity/interaction/email-password/util/RegistrationManager.ts

@@ -139,7 +140,7 @@ export class RegistrationManager {
    // Parse WebID
    if (!validated.createWebId) {
      const trimmedWebId = this.trimString(webId);
-      assert(trimmedWebId && /^https?:\/\/[^/]+/u.test(trimmedWebId), 'Please enter a valid WebID.');
+      assert(trimmedWebId && hasScheme(trimmedWebId, 'http', 'https'), 'Please enter a valid WebID.');


Makes me realize this can also be done here:

CommunitySolidServer/src/identity/storage/WebIdAdapterFactory.ts

Line 45 in dd56886

if (!payload && this.name === 'Client' && /^https?:\/\/.+/u.test(id)) {

wkerckho · 2022-04-12T09:02:27Z

Thanks for the feedback!
I've already updated my changes to reflect Ruben's comments and I will be integrating Joachim's suggestions asap!

wkerckho · 2022-04-12T09:35:08Z

All feedback has been integrated, except for renaming isValidFileName (waiting on @RubenVerborgh's input on an appropriate name).

RubenVerborgh · 2022-04-12T10:43:45Z

Regarding naming, we shouldn't forget that sanitizeName and isSimpleName are both ending up in the global list of exports. I.e., import { isSimpleName } from '@solid/community-server'. So I don't think they are sufficiently descriptive and I'd be inclined to keep the names as they are.

joachimvh · 2022-04-12T13:01:14Z

So I don't think they are sufficiently descriptive and I'd be inclined to keep the names as they are.

Sure, let's go with the original names then. We can always look into new names should it ever be relevant.

Implemented new StringUtil helper functions: splitCommaSeparated, sanitizeUrlPart, isValidFileName. Added helper functions to HeaderUtil: matchesAuthorizationScheme, hasScheme. Added unit tests for the new helper functions. Refactored codebase to use helper functions instead of regexes if applicable.

wkerckho · 2022-04-13T08:20:23Z

I reverted the name change!

joachimvh

Looks good, thanks.

wkerckho requested a review from RubenVerborgh April 11, 2022 14:49

RubenVerborgh requested review from joachimvh and removed request for RubenVerborgh April 11, 2022 14:50

RubenVerborgh added the semver.minor Requires a minor version bump label Apr 11, 2022

RubenVerborgh reviewed Apr 11, 2022

View reviewed changes

joachimvh requested changes Apr 12, 2022

View reviewed changes

wkerckho force-pushed the fix/issue-807 branch from d0818f3 to 57ed01d Compare April 12, 2022 08:57

wkerckho force-pushed the fix/issue-807 branch from 57ed01d to 26dc698 Compare April 12, 2022 09:29

wkerckho force-pushed the fix/issue-807 branch from 26dc698 to f3e1016 Compare April 13, 2022 08:18

joachimvh approved these changes Apr 13, 2022

View reviewed changes

joachimvh merged commit 283c301 into versions/4.0.0 Apr 13, 2022

joachimvh deleted the fix/issue-807 branch April 13, 2022 09:26

joachimvh mentioned this pull request Apr 13, 2022

Collect (repeated) regular expressions into a utility #807

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: new helper functions to replace regexes #807 #1260

feat: new helper functions to replace regexes #807 #1260

wkerckho commented Apr 11, 2022

RubenVerborgh Apr 11, 2022 •

edited

Loading

wkerckho Apr 11, 2022

RubenVerborgh Apr 11, 2022

wkerckho Apr 11, 2022

RubenVerborgh Apr 11, 2022

joachimvh left a comment

joachimvh Apr 12, 2022

joachimvh Apr 12, 2022

joachimvh Apr 12, 2022

joachimvh Apr 12, 2022

joachimvh Apr 12, 2022

joachimvh Apr 12, 2022

joachimvh Apr 12, 2022

joachimvh Apr 12, 2022

wkerckho commented Apr 12, 2022

wkerckho commented Apr 12, 2022

RubenVerborgh commented Apr 12, 2022

joachimvh commented Apr 12, 2022

wkerckho commented Apr 13, 2022

joachimvh left a comment

	export function matchesAuthorizationScheme(authorization: string \| undefined, scheme: string): boolean {
	export function matchesAuthorizationScheme(authorization?: string, scheme: string): boolean {

	return authorization !== undefined && authSchemeRegexCache.get(lowerCaseScheme)!!.test(authorization);
	return typeof authorization !== 'undefined' && authSchemeRegexCache.get(lowerCaseScheme)!.test(authorization);

		.sort((s1, s2): number => s1.localeCompare(s2))
		.map((item): string => item.toLowerCase());

	export function sanitizeUrlPart(urlPart: string): string {
	export function sanitizeName(urlPart: string): string {

	export function isValidFileName(name: string): boolean {
	export function isSimpleName(name: string): boolean {

-    return headerValue ? splitCommaSeparated(headerValue.trim()) : undefined;
+    if (headerValue) {
+      return splitCommaSeparated(headerValue.trim());
+    }

feat: new helper functions to replace regexes #807 #1260

feat: new helper functions to replace regexes #807 #1260

Conversation

wkerckho commented Apr 11, 2022

📁 Related issues

✍️ Description

✅ PR check list

RubenVerborgh Apr 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joachimvh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wkerckho commented Apr 12, 2022

wkerckho commented Apr 12, 2022

RubenVerborgh commented Apr 12, 2022

joachimvh commented Apr 12, 2022

wkerckho commented Apr 13, 2022

joachimvh left a comment

Choose a reason for hiding this comment

RubenVerborgh Apr 11, 2022 •

edited

Loading