Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't see infrequent edges. Is it possible? #30

Open
ffalcolini opened this issue Sep 1, 2020 · 11 comments
Open

Don't see infrequent edges. Is it possible? #30

ffalcolini opened this issue Sep 1, 2020 · 11 comments
Labels
enhancement New feature or request

Comments

@ffalcolini
Copy link

Hi!

I'd like to simplify the process map.

I can reduce the number of activities displayed: to do this I use the edeaR :: filter_activity_frequency function.

I'd like to reduce the number of edges displayed. I tried using the "layout" argument of the process_map function, specifically the "edge_cutoff" parameter; but unfortunately I don't get any results, as if the parameter has no effect.

Is it possible to use "edge_cutoff" parameter to display only the edges that have a frequency higher than a certain limit?
Can you give me an example of use?

Thanks so much!
You did a great job!

@ffalcolini
Copy link
Author

ffalcolini commented Sep 19, 2020

Can anyone answer my question?
In my opinion, hiding the most infrequent edges is very important. I would like to understand if it is possible to do this with BUPAR

Thanks so much for your attention

@gertjanssenswillen
Copy link
Member

Thanks for your patience.

  1. The edge_cuttof parameter only impacts the layout of the map, not the edges it shows. And it is still a somewhat experimental feature.
  2. There is now way to filter out edges in a process map in isolation. The philosophy is that you either filter whole cases which are infrequent (filter_trace_frequency), or you filter events (e.g. filter_activity_frequency, filter_activity, etc). In principle we do not want to remove individual edges, because this makes the process map inaccurate and can be misleading.

What we could do is to implement a filter that removes cases by prioritizing on making the process map less complex. I'll have a look into that.

@ffalcolini
Copy link
Author

Thank you so much! You are really kind.
Congratulations on your work: it is wonderful to performe process mining analysis with R!

@gertjanssenswillen
Copy link
Member

I have added a new function to edeaR called filter_infrequent_flows. (You can install it from github). This will NOT remove individual flows (i.e. arcs) from the process map. However, it will remove cases that contain a flow which is infrequent.

For example

sepsis %>% filter_infrequent_flows(min_n = 50) %>% process_map()

This will consider all flows in the sepsis data with less than 50 occurences. All cases related to these flows will be removed.

The differences between filter_infrequent_flows and filter_trace_frequency is subtle. The latter willl look at the end-to-end frequency of a sequence, while the first will look at the frequency of each step. The reasoning is that you can have infrequent traces that share a lot of flows, thus not lead to infrequent arcs in the process map. Removing these infrequent traces will not improve the map.

The approach of the infrequent flows filter is to start from flows that are infrequent, and this filter will thus have a direct effect on the number of arcs drawn in the process map. But, importantly, it will always filter cases completely, i.e. all events,, not just individual arcs (which do not have a direct equivalent in the data, but are the result of the events).

The current function expect an absolute frequency, (min_n) which should be 2 or higher (2 meaning, remove all cases of which one or more flows only occurs once). Of course, other posibilities are possible, such as a percentage of cases to keep.

It might be not exactly what you are looking for, but hope this helps. Any feedback is welcome. (Documentation on the new function is still to be elaborated upon, but I hope my description here is clear).

@fmannhardt
Copy link
Member

fmannhardt commented Sep 21, 2020

Just an idea, if you really want to just hide the edges. You could take a look at the DiagrammeR object returned (so don't render the process map) and then filter edges based on the label. A bit of a hack and the warning by Gert that the result may be very hard to interpret applies, but it could give you exactly what you want.

Treat it more as a visualisation aid rather than a 'model discovery'.

@ffalcolini
Copy link
Author

ffalcolini commented Sep 22, 2020

Thank you so much for your help!
The new filter_infrequent_flows function created by @gertjanssenswillen works well and helps me quickly get the most "typical" process model.
Now I'm also trying the route suggested by @fmannhardt , but I'm meeting with some difficulties. Are there any examples you can suggest me on how to apply filters on the DiagrammeR object based on the thickness of the arc (ie on the number of cases that cross the path)?

Thanks again!

@vpanfilov
Copy link

@ffalcolini Here is an example for @fmannhardt approach for removing edges in Diagrammer object:

library(tidyverse)
library(bupaR)
library(processmapR)
library(eventdataR)
library(DiagrammeR)

hospital_billing_process_map <- eventdataR::hospital_billing %>%
  process_map(
    type_nodes = frequency("relative_case"),
    type_edges = frequency("relative_case"),
    sec_edges = performance(median, "secs", flow_time = "inter_start_time"),
    render = FALSE,
    rankdir = "TB"
  )

DiagrammeR::render_graph(hospital_billing_process_map)

edges_to_remove <- hospital_billing_process_map %>%
  processmapR::get_flows() %>%
  dplyr::filter(value < 0.05) %>%
  transmute(from = from_id, to = to_id)

edges_ids_to_remove <- hospital_billing_process_map %>%
  DiagrammeR::get_edge_df() %>%
  as_tibble() %>%
  semi_join(edges_to_remove, by = c("from", "to")) %>%
  pull(id)

filtered_graph <- hospital_billing_process_map %>%
  DiagrammeR::select_edges_by_edge_id(edges = edges_ids_to_remove) %>%
  DiagrammeR::delete_edges_ws() %>%
  DiagrammeR::select_nodes_by_degree("deg == 0") %>%
  DiagrammeR::delete_nodes_ws()

DiagrammeR::render_graph(filtered_graph)

Initial process map:
изображение

Filtered process map:
изображение

@fmannhardt
Copy link
Member

Very cool. Thanks for the contribution. Maybe we can add it as vignette.

@gertjanssenswillen, actually I just watched Sander Leemans presentation on the Directly-follows Miner at ICPM 2020. Is the added filtering method actually implementing something similar to his proposal or could we implement it for procesmapR:
http://leemans.ch/publications/papers/icpmdemo2019leemans.pdf
https://ieeexplore.ieee.org/abstract/document/8786057

@fmannhardt fmannhardt added the enhancement New feature or request label Oct 5, 2020
@gertjanssenswillen
Copy link
Member

I'll check this out!

@ffalcolini
Copy link
Author

@vpanfilov thanks a lot for your help!
I immediately try to apply this solution to my use case.

I saw the presentation about the directly-follows miner and tried to use the Prom tool based on it.
Very interesting!

@testereng
Copy link

Hello!

following up the above discussion, and if we assume that the Directly follow graph approach acceptable, I found that we could explore methods from network filtering, based on directed edges. I have played a little bit with the disparity filter (package skynet), and also the following methods for filtering 'unsignificant' edges:


Tumminello M, Miccichè S, Lillo F, Piilo J, Mantegna RN (2011) Statistically Validated Networks in Bipartite Complex Systems.
PLoS ONE 6(3): e17994. doi:10.1371/journal.pone.0017994
**
and
***
Vasilis Hatzopoulos, Giulia Iori, Rosario N. Mantegna, Salvatore Miccichè & Michele Tumminello (2015) Quantifying preferential trading in the e-MID interbank market,
Quantitative Finance, 15:4, 693-710, DOI: 10.1080/14697688.2014.969889
***
It appears that if the process is 'noisy' (which means any case might have some nuisance step, but not shared with all other cases, which might happen for multiple rare events), removing cases where those edges exist removes a lot of cases, while removing 'noisy edges' keep the backbone(s).
Hope that it helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants