Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for wildcard in heatmap & user paths URLs #4364

Closed
joesaunderson opened this issue May 15, 2021 · 23 comments
Closed

Allow for wildcard in heatmap & user paths URLs #4364

joesaunderson opened this issue May 15, 2021 · 23 comments
Assignees
Labels
enhancement New feature or request feature/paths Feature Tag: Paths feature/toolbar Feature Tag: Toolbar

Comments

@joesaunderson
Copy link
Contributor

Is your feature request related to a problem?

We have lots of URLs that are identical, other than an ID, for example:

/merchant/1234/dashboard
/merchant/4321/dashboard

We would like to view heatmaps for our portal, but currently the heatmap tool has little value without being able to 'group' identical pages like the above.

Describe the solution you'd like

The ability to somehow use wildcard expressions in the heatmap tool to allow grouping of pages.

Ideally, we’d like to see a heat map of the ‘dashboard’ overall, so a numerical wildcard in the middle where the ID sits.

Describe alternatives you've considered

None

Additional context

https://posthogusers.slack.com/archives/C01FKPGG5U6/p1621012156136400

Thank you for your feature request – we love each and every one!

@joesaunderson joesaunderson added the enhancement New feature or request label May 15, 2021
@joesaunderson
Copy link
Contributor Author

Thinking about this, it would be good to have this option wider than the heatmaps.

User paths would also be a good candidate for this, and I am sure there are more!

@mariusandra
Copy link
Collaborator

The wildcard should also be able to take the query string into account.

@paolodamico paolodamico added core-experience feature/paths Feature Tag: Paths feature/toolbar Feature Tag: Toolbar labels Jul 19, 2021
@paolodamico paolodamico changed the title Allow for wildcard in heatmap URLs Allow for wildcard in heatmap & user paths URLs Jul 19, 2021
@paolodamico
Copy link
Contributor

Another focus customer mentioned being able to do this in the user paths view. This particular customer has a bunch of different domains running the same app, so being able to see path drop-offs centralized would be quite useful.

@marcushyett-ph
Copy link
Contributor

@EDsCODE @neilkakkar We spoke about something similar regarding urls and auto-capture events being hard to distinguish - would what we're considering building also solve for this?

@neilkakkar
Copy link
Collaborator

It's definitely a similar concept! We're trying to group path items using some criteria (with autocapture, it's the tag + text inside the tag), and it makes sense to allow such a grouping for URLs as well.

To confirm I understand correctly, what this means is:

On all paths, /merchant/****/dashboard/ would appear everywhere instead of the /merchant/1234/dashboard, /merchant/1235/dashboard , and counts will be aggregated as well, if they happen at the same step.

@EDsCODE
Copy link
Member

EDsCODE commented Aug 27, 2021

I think this would be powerful. Trying to figure out how you would define these wildcards. It would make sense when providing a start or endpoint but not so much the paths in between unless we provide fields that would just mean "make any occurrence of these wildcards and grouped together"

@joesaunderson
Copy link
Contributor Author

Our particular use-case is because we are a multi tenanted platform, all our URLS have the ID of what we call the merchant in.

We'd love to be able to see the flow from page to page (grouped), rather than what it currently does (show the paths and heatmaps on a merchant-by-merchant basis).

I'd love to see some sort of grouping mechanism, that could be used across the site, maybe using page view events? We currently use your matching options to group the URLs here. So to click 'view heatmaps' that match this event could be another logical step... not sure about paths though 🤔

@neilkakkar
Copy link
Collaborator

neilkakkar commented Sep 20, 2021

Thinking about how to represent this grouping. Given that

(1) This feature is useful only when it allows users to group by what's meaningful to them
(2) Users can have all sorts of "weird" URLs & usecases to group on

and my untested hypothesis that: Most users would just need a simple wildcard matching: group by this third URL parameter (like: /insights/merchant/****/payment ) or things like: URL starts with (/insights/**** ).

Then, to implement this feature, we should allow all sorts of regex matching capabilities.

And, to keep things simple for most users who might not be familiar with regexes, tell them about .*, which covers all cases of the untested hypothesis. (You get /insights/merchant/1234/payment by creating the grouping /insights/merchant/.*/payment. And /insights/1234 by creating the grouping /insights/.* )

Thoughts?

I like this over us custom defining operators for grouping because:

(1) Users don't have to whack their brains learning yet another way to operate on strings.
(2) It's simple enough for the basic cases.
(3) Users who want to do very complex grouping are supported without extra effort from our side.
(4) Implementation wise, Clickhouse supports regex matching out of the box.

Edit: Further, since this grouping is not just for URLs, but custom events as well, regexes are good for capturing how users name their events / autogenerate their events. Example, you might want to group: insight [viewed|loaded|mounted] but not insight analyzed. insight **** isn't good enough for this.

@clarkus
Copy link
Contributor

clarkus commented Sep 20, 2021

I think regex is a great starting point for this. Longer term we might want to build in an interface that makes this more approachable to a broader group of users, but this seems like a great starting point for testing improvements. We might consider some of the work from #4680 when exposing this.

and my untested hypothesis that: Most users would just need a simple wildcard matching: group by this third URL parameter (like: /insights/merchant//payment ) or things like: URL starts with (/insights/ ).

@neilkakkar where should wildcarding happen? Based on the context in this issue, it seems like something applicable across multiple insight types. We're working on paths now and we could expose in the query options there, but is that too targeted? Are wildcards going to change often? It seems like something you'd want to define once and use consistently everywhere, but I'm sure there are cases for being targeted. Thoughts?

@joesaunderson
Copy link
Contributor Author

Wildcarding would definitely solve our issue with using paths / heat-maps.

Out of interest, is it possible to group using the wildcard, then drill down into that group to see the different values?

I.e

foo/1/bar
foo/1/bar
foo/2/bar

Would show 3 users/page views, but you could drill into detail and see 2 x ID1, and 1 x ID2... that could be useful too.

@neilkakkar
Copy link
Collaborator

@joesaunderson - isn't this similar to not grouping? I.e. if you get rid of this one specific wildcard, you'll see the values for foo/1/bar and foo/2/bar ? Or put differently: What usecase does drilling down inside a grouping solve for you?


@neilkakkar where should wildcarding happen? Based on the context in this issue, it seems like something applicable across multiple insight types. We're working on paths now and we could expose in the query options there, but is that too targeted?

Good question! I think this is & should be specific to Paths. Our filter properties already have a regex include/exclude match, but that's global and doesn't quite solve the use case here.

Funnels, Trends, Heatmap, Paths all have different semantics for non-global grouping (which this issue aims for): For example, in funnels, consider $Pageview with foo/.*/bar to be one step, and foo/.*/rab to be second step. This might be slightly useful, specially when you can connect it with Paths, but each wildcarding is specific to the insight.

So I'd say we do this for Paths only for now, and then see if we get requests for implementing this wildcarding elsewhere too?

@mariusandra
Copy link
Collaborator

Given:

  • The case with using regex and .* is that it's very catch-all. For example /insights/merchant/.*/payment will also match /insights/merchant/123/somethingelse/payment as 123/somethingelse. Probably not what's expected.
  • We'd have to tell users to write /insights/merchant/([^/]+)/payment instead and teach them about match groups.
  • I've also seen regular expressions referred to in multiple security vulnerabilities, probably around some unsafe escapes. There may be some unknown unknowns here if we'd just blatantly allow all regexps.

I propose:

  • Let's just use * as a wildcard for /[^=&?/]+/ (so anything in the URL), and ** for /.*/

@joesaunderson
Copy link
Contributor Author

@joesaunderson - isn't this similar to not grouping? I.e. if you get rid of this one specific wildcard, you'll see the values for foo/1/bar and foo/2/bar ? Or put differently: What usecase does drilling down inside a grouping solve for you?

We are a multi tenanted platform, and we would be using the wildcard in the path to 'group' by the ID of our 'tenant'.

Without the grouping, paths is not very useful, we can't see the general pages people go to, so the wildcard solves that.

The use case the drilling down shows, is at a group level, shows us what spread of 'tenants' use the page I.e a page with 100 users at 100 tenants viewing it is more valuable than 100 users at 1 tenant viewing it... if that makes sense?

@neilkakkar
Copy link
Collaborator

There's a trade-off here: Would we rather teach everyone about * and **, or some users about .*? or [^=&?/]+to get what they want.

I think in both cases, we'd have a tooltip explaining how to use grouping. Either telling what * and ** means, or giving example regexes.

(3) is a valid point, and strongest point in favour of * and **. I checked some basic ReDoS attacks and Clickhouse seems to be resilient to those.

In light of unknown unknowns though, happy to just use * and **. The remaining problematic bit with (3) is that: our filters already allow any random regexes, so not allowing those here doesn't really protect us from unknown unknowns with regex vulnerabilities.

@neilkakkar
Copy link
Collaborator

Ah I see Joe! We're going to allow drilling down into Persons, so you could see the people related to the 'group'.

For drilling down on 'group statistics', that's a good point, there's no nice way to see this. If you ungroup that specific path item, you could see all the different tenant counts, but I agree, it's not the greatest viewing experience. I'll think over this.

@paolodamico
Copy link
Contributor

From a UX standpoint, I love the approach of * & **, I have a feeling it will solve the vast majority of the use cases and it's easy enough to understand. Given this is a feature that can be very well used by non-technical people (as it's quite exploratory), it's better to keep it friendlier. Once we start gathering user feedback from this we can consider implementing a more robust wildcarding solution.

@neilkakkar
Copy link
Collaborator

Makes sense. Taking this idea a bit further, if we want to keep things simple, do we need two operators at all?

The case you mentioned about /merchant/1234/payment and /merchant/1234/wowzer/woohoo/payment, both matching /merchant/**/payment seems niche enough that I don't think we need to cater to it?

This also helps reduce the copy to something like: "Just use a * for wildcards anywhere". It will hopefully do what you want it to do, and if not, tell us about your usecases.

@paolodamico
Copy link
Contributor

If we don't have strong concerns about regex vulnerabilities, we could simply "advertise" a single simple wildcard operator (*) and then mention somewhere we also support full regex expressions. Otherwise, I'm definitely on board with @neilkakkar's suggestion

@neilkakkar
Copy link
Collaborator

About regex vulnerabilities: The exact same vulnerabilities and unknown unknowns exist already with regex matches on filter properties.

The question I'm having a bit of trouble answering: Do I want to introduce one more place where these kinds of vulnerabilities can occur? On one side, it's hard to exploit & worst case it triggers a container restart on cloud. On the other side, path wildcards seem a little more well exposed. I'm leaning a bit towards not exposing regexes in a new place.

Since it's easy to revert this decision later, I'd just go for * wildcard for now?

@clarkus
Copy link
Contributor

clarkus commented Sep 22, 2021

Since it's easy to revert this decision later, I'd just go for * wildcard for now?

Agreed. Let's keep it simple, not reference regex, and try to make it approachable for a broad audience of users.

@nauxliu
Copy link

nauxliu commented Sep 27, 2021

How the simple * cover this case:

/user/123 users' detail page
/user/login login page

/user/* will match two paths in one group, but with regex, we can use /user/\d+ to match the user details page.

@paolodamico
Copy link
Contributor

Hey @nauxliu! The way we're thinking this for now is to support simple wildcarding so indeed it would match both, but this is helpful context as we keep improving on this.

@neilkakkar
Copy link
Collaborator

Think with the advanced cleaning rules, we're sorted here! (and there's a new issue for path cleaning everywhere) - #7288

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature/paths Feature Tag: Paths feature/toolbar Feature Tag: Toolbar
Projects
None yet
Development

No branches or pull requests

8 participants