Don't see infrequent edges. Is it possible? #30

ffalcolini · 2020-09-01T15:04:11Z

Hi!

I'd like to simplify the process map.

I can reduce the number of activities displayed: to do this I use the edeaR :: filter_activity_frequency function.

I'd like to reduce the number of edges displayed. I tried using the "layout" argument of the process_map function, specifically the "edge_cutoff" parameter; but unfortunately I don't get any results, as if the parameter has no effect.

Is it possible to use "edge_cutoff" parameter to display only the edges that have a frequency higher than a certain limit?
Can you give me an example of use?

Thanks so much!
You did a great job!

ffalcolini · 2020-09-19T16:51:12Z

Can anyone answer my question?
In my opinion, hiding the most infrequent edges is very important. I would like to understand if it is possible to do this with BUPAR

Thanks so much for your attention

gertjanssenswillen · 2020-09-21T07:06:05Z

Thanks for your patience.

The edge_cuttof parameter only impacts the layout of the map, not the edges it shows. And it is still a somewhat experimental feature.
There is now way to filter out edges in a process map in isolation. The philosophy is that you either filter whole cases which are infrequent (filter_trace_frequency), or you filter events (e.g. filter_activity_frequency, filter_activity, etc). In principle we do not want to remove individual edges, because this makes the process map inaccurate and can be misleading.

What we could do is to implement a filter that removes cases by prioritizing on making the process map less complex. I'll have a look into that.

ffalcolini · 2020-09-21T10:29:21Z

Thank you so much! You are really kind.
Congratulations on your work: it is wonderful to performe process mining analysis with R!

gertjanssenswillen · 2020-09-21T15:27:54Z

I have added a new function to edeaR called filter_infrequent_flows. (You can install it from github). This will NOT remove individual flows (i.e. arcs) from the process map. However, it will remove cases that contain a flow which is infrequent.

For example

sepsis %>% filter_infrequent_flows(min_n = 50) %>% process_map()

This will consider all flows in the sepsis data with less than 50 occurences. All cases related to these flows will be removed.

The differences between filter_infrequent_flows and filter_trace_frequency is subtle. The latter willl look at the end-to-end frequency of a sequence, while the first will look at the frequency of each step. The reasoning is that you can have infrequent traces that share a lot of flows, thus not lead to infrequent arcs in the process map. Removing these infrequent traces will not improve the map.

The approach of the infrequent flows filter is to start from flows that are infrequent, and this filter will thus have a direct effect on the number of arcs drawn in the process map. But, importantly, it will always filter cases completely, i.e. all events,, not just individual arcs (which do not have a direct equivalent in the data, but are the result of the events).

The current function expect an absolute frequency, (min_n) which should be 2 or higher (2 meaning, remove all cases of which one or more flows only occurs once). Of course, other posibilities are possible, such as a percentage of cases to keep.

It might be not exactly what you are looking for, but hope this helps. Any feedback is welcome. (Documentation on the new function is still to be elaborated upon, but I hope my description here is clear).

fmannhardt · 2020-09-21T16:12:41Z

Just an idea, if you really want to just hide the edges. You could take a look at the DiagrammeR object returned (so don't render the process map) and then filter edges based on the label. A bit of a hack and the warning by Gert that the result may be very hard to interpret applies, but it could give you exactly what you want.

Treat it more as a visualisation aid rather than a 'model discovery'.

ffalcolini · 2020-09-22T16:14:56Z

Thank you so much for your help!
The new filter_infrequent_flows function created by @gertjanssenswillen works well and helps me quickly get the most "typical" process model.
Now I'm also trying the route suggested by @fmannhardt , but I'm meeting with some difficulties. Are there any examples you can suggest me on how to apply filters on the DiagrammeR object based on the thickness of the arc (ie on the number of cases that cross the path)?

Thanks again!

vpanfilov · 2020-10-02T08:34:25Z

@ffalcolini Here is an example for @fmannhardt approach for removing edges in Diagrammer object:

library(tidyverse)
library(bupaR)
library(processmapR)
library(eventdataR)
library(DiagrammeR)

hospital_billing_process_map <- eventdataR::hospital_billing %>%
  process_map(
    type_nodes = frequency("relative_case"),
    type_edges = frequency("relative_case"),
    sec_edges = performance(median, "secs", flow_time = "inter_start_time"),
    render = FALSE,
    rankdir = "TB"
  )

DiagrammeR::render_graph(hospital_billing_process_map)

edges_to_remove <- hospital_billing_process_map %>%
  processmapR::get_flows() %>%
  dplyr::filter(value < 0.05) %>%
  transmute(from = from_id, to = to_id)

edges_ids_to_remove <- hospital_billing_process_map %>%
  DiagrammeR::get_edge_df() %>%
  as_tibble() %>%
  semi_join(edges_to_remove, by = c("from", "to")) %>%
  pull(id)

filtered_graph <- hospital_billing_process_map %>%
  DiagrammeR::select_edges_by_edge_id(edges = edges_ids_to_remove) %>%
  DiagrammeR::delete_edges_ws() %>%
  DiagrammeR::select_nodes_by_degree("deg == 0") %>%
  DiagrammeR::delete_nodes_ws()

DiagrammeR::render_graph(filtered_graph)

Initial process map:

Filtered process map:

fmannhardt · 2020-10-05T14:41:51Z

Very cool. Thanks for the contribution. Maybe we can add it as vignette.

@gertjanssenswillen, actually I just watched Sander Leemans presentation on the Directly-follows Miner at ICPM 2020. Is the added filtering method actually implementing something similar to his proposal or could we implement it for procesmapR:
http://leemans.ch/publications/papers/icpmdemo2019leemans.pdf
https://ieeexplore.ieee.org/abstract/document/8786057

gertjanssenswillen · 2020-10-05T15:15:02Z

I'll check this out!

ffalcolini · 2020-10-05T16:14:15Z

@vpanfilov thanks a lot for your help!
I immediately try to apply this solution to my use case.

I saw the presentation about the directly-follows miner and tried to use the Prom tool based on it.
Very interesting!

testereng · 2022-06-24T06:37:18Z

Hello!

following up the above discussion, and if we assume that the Directly follow graph approach acceptable, I found that we could explore methods from network filtering, based on directed edges. I have played a little bit with the disparity filter (package skynet), and also the following methods for filtering 'unsignificant' edges:

Tumminello M, Miccichè S, Lillo F, Piilo J, Mantegna RN (2011) Statistically Validated Networks in Bipartite Complex Systems.
PLoS ONE 6(3): e17994. doi:10.1371/journal.pone.0017994
**
and
***
Vasilis Hatzopoulos, Giulia Iori, Rosario N. Mantegna, Salvatore Miccichè & Michele Tumminello (2015) Quantifying preferential trading in the e-MID interbank market,
Quantitative Finance, 15:4, 693-710, DOI: 10.1080/14697688.2014.969889
***
It appears that if the process is 'noisy' (which means any case might have some nuisance step, but not shared with all other cases, which might happen for multiple rare events), removing cases where those edges exist removes a lot of cases, while removing 'noisy edges' keep the backbone(s).
Hope that it helps

fmannhardt added the enhancement New feature or request label Oct 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't see infrequent edges. Is it possible? #30

Don't see infrequent edges. Is it possible? #30

ffalcolini commented Sep 1, 2020

ffalcolini commented Sep 19, 2020 •

edited

gertjanssenswillen commented Sep 21, 2020

ffalcolini commented Sep 21, 2020

gertjanssenswillen commented Sep 21, 2020

fmannhardt commented Sep 21, 2020 •

edited

ffalcolini commented Sep 22, 2020 •

edited

vpanfilov commented Oct 2, 2020

fmannhardt commented Oct 5, 2020

gertjanssenswillen commented Oct 5, 2020

ffalcolini commented Oct 5, 2020

testereng commented Jun 24, 2022

Don't see infrequent edges. Is it possible? #30

Don't see infrequent edges. Is it possible? #30

Comments

ffalcolini commented Sep 1, 2020

ffalcolini commented Sep 19, 2020 • edited

gertjanssenswillen commented Sep 21, 2020

ffalcolini commented Sep 21, 2020

gertjanssenswillen commented Sep 21, 2020

fmannhardt commented Sep 21, 2020 • edited

ffalcolini commented Sep 22, 2020 • edited

vpanfilov commented Oct 2, 2020

fmannhardt commented Oct 5, 2020

gertjanssenswillen commented Oct 5, 2020

ffalcolini commented Oct 5, 2020

testereng commented Jun 24, 2022

ffalcolini commented Sep 19, 2020 •

edited

fmannhardt commented Sep 21, 2020 •

edited

ffalcolini commented Sep 22, 2020 •

edited