Process mining is an emerging discipline and set of techniques for exploring and analyzing event logs. In a nutshell, process mining uses event logs to create models of those events in the context of a specific process or use. Applications have included analyzing event logs from CRM systems or other information systems which contain events that can be used to build a model of a multi-event process. These models can be used for exploratory data analysis or to compare actual event data to a ideal model for a process. Please see Wil M. P. van der Aalst's book, Process Mining (2011) for more details.
Our project seeks to add to the work done in process mining by focusing on process mining as a means to better understand the use of digital products like Software as a Service (SaaS). Specifically, we are interested in using process mining as a tool and methodology to understand the value co-creation processes between SaaS providers and their users. To this end, there are several unique elements and contributions we hope to make to both practice and academic research in this area.
-Our technical approach is based on current Big Data technologies which allow for parallel and distributed processing in the cloud rather than the current process mining tools that are desktop applications. We hope our work will enable process mining techniques to be employed with "web scale" datasets.
-Our research dataset was generated by a SaaS using the JSON Activity Streams specification which is an increasingly common format for SaaS user event logging. We hope to provide insight into how such general formats can easily be used as inputs into a process mining analysis workflow.
-We plan to build our analysis pipeline and tools using the Python programming language. Python and its SciPy ecosystem of mathematics, science, and engineering libraries are increasingly being used in both practice and research when doing data mining and analysis on large and diverse datasets.
-We plan to contribute any libraries, frameworks, or approaches we develop to the open source community under a BSD license.
-Our research dataset includes 10's of millions of events generated by 10's of thousands of users nested in multiple organizations. The dataset includes both user and system generated events. We hope that the challenges of this dataset will enable us to better address user logs form SaaS products that have millions of users and exponentially increasing numbers of events.
-Our dataset spans 2.5 years during which time several major product enhancements were deployed. We hope this long span of time and large number of end users will allow us to explore issues of value co-creation processes between users and a SaaS offering.
Raghu Santanam is a professor at Arizona State University in the Department of Information Systems and the principal investigator for this project.