# Experiment to check discoverability, conformance of different models

We run the following experiment with different random initial Models $M$:

1. Load models $M$ (from [Javert: Fully Automatic Mining of General Temporal Properties from Dynamic Traces](https://dl.acm.org/citation.cfm?id=1453150))
2. Simulate a log of 100k traces $L$ of model $M$
3. For $i$ in $[1,3,10,30,100,300,...,100000]$, use a growing subset $L'_i$ of $L$ with $|L'_i| = i$
4. Discover a model $M_d$ from $L'_{i}$ with an algorithm of your choice
5. Report precision and recall for:
  * $M$ vs. $L'_i$
  * $L'_i$ vs. $M_d$
  * $M$ vs. $M_d$

## Prepare classpath with maven repository, maven local, and some more jars

In [1]:
%maven dk.brics:automaton:1.12-1
%maven commons-logging:commons-logging:1.2
%maven org.apache.commons:commons-collections4:4.1
%maven org.apache.commons:commons-lang3:3.7
%maven org.apache.commons:commons-math3:3.6.1
%maven colt:colt:1.2.0
%maven jgraph:jgraph:5.13.0.0
%maven net.sf.trove4j:trove4j:3.0.3
%maven org.simpleframework:simple-xml:2.7.1
%maven io.github.andreas-solti.matrix-toolkits-java:mtj:1.0.8
%maven net.sourceforge.f2j:arpack_combined_all:0.1
%maven com.github.fommil.netlib:all:1.1.2

In [2]:
%%loadFromPOM
<repository>
    <id>openxes-repo</id>
    <url>file:////home/prom/openxes</url>
</repository>

<!-- Not available on Maven, local copy -->
<dependency>
    <groupId>org.deckfour</groupId>
    <artifactId>openxes</artifactId>
    <version>2.16</version>
</dependency>

<dependency>
    <groupId>io.github.andreas-solti.xeslite</groupId>
    <artifactId>xeslite</artifactId>
    <version>0.0.1</version>
</dependency>

In [3]:
List<String> addedJars = %jars /home/prom/lib/plugins/*.jar
List<String> addedJars2 = %jars /home/prom/lib/*.jar

## Handle imports 

In [4]:
import java.util.stream.IntStream;
import org.deckfour.xes.info.XLogInfo;
import org.deckfour.xes.info.impl.XLogInfoImpl;
import org.deckfour.xes.info.XLogInfoFactory;
import org.deckfour.xes.model.XLog;
import org.deckfour.xes.classification.XEventClassifier;
import org.deckfour.xes.classification.XEventClasses;

import org.processmining.acceptingpetrinet.models.AcceptingPetriNet;
import org.processmining.acceptingpetrinet.models.impl.AcceptingPetriNetImpl;
import org.processmining.eigenvalue.Utils;
import org.processmining.eigenvalue.automata.PrecisionRecallComputer;
import org.processmining.eigenvalue.data.EntropyPrecisionRecall;
import org.processmining.eigenvalue.generator.GenerateLogAndModel;
import org.processmining.eigenvalue.generator.NAryTreeGenerator;
import org.processmining.eigenvalue.tree.TreeUtils;
import org.apache.commons.lang3.tuple.MutablePair;
import org.apache.commons.lang3.tuple.Pair;
import org.processmining.plugins.etm.model.narytree.NAryTree;
import org.processmining.plugins.stochasticpetrinet.StochasticNetUtils;

import org.processmining.projectedrecallandprecision.helperclasses.ProjectPetriNetOntoActivities;
import org.processmining.projectedrecallandprecision.helperclasses.AcceptingPetriNet2automaton;
import org.processmining.projectedrecallandprecision.helperclasses.AutomatonFailedException;
import org.processmining.projectedrecallandprecision.helperclasses.EfficientLog;
import com.google.common.base.Stopwatch;

import org.processmining.eigenvalue.test.TestUtils;

import dk.brics.automaton2.Automaton;
import org.processmining.plugins.etm.model.narytree.conversion.NAryTreeToProcessTree;
import org.processmining.processtree.ProcessTree;
import org.processmining.ptconversions.pn.ProcessTree2Petrinet;
import org.processmining.ptconversions.pn.ProcessTree2Petrinet.NotYetImplementedException;
import org.processmining.ptconversions.pn.ProcessTree2Petrinet.InvalidProcessTreeException;

import org.processmining.plugins.InductiveMiner.efficienttree.EfficientTree;
import org.processmining.plugins.InductiveMiner.efficienttree.EfficientTree2processTree;
import org.processmining.plugins.InductiveMiner.mining.MiningParameters;
import org.processmining.plugins.inductiveminer2.mining.InductiveMiner;
import org.processmining.plugins.inductiveminer2.variants.MiningParametersIMInfrequent;
import org.processmining.plugins.InductiveMiner.mining.logs.LifeCycleClassifier;
import org.processmining.framework.packages.PackageManager;

import org.simpleframework.xml.Serializer;
import org.simpleframework.xml.core.Persister;
import org.processmining.plugins.pnml.simple.PNMLRoot;
import org.processmining.plugins.pnml.importing.StochasticNetDeserializer;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import org.processmining.projectedrecallandprecision.helperclasses.ProjectPetriNetOntoActivities;
import org.processmining.models.graphbased.directed.petrinet.StochasticNet;
import org.processmining.models.semantics.petrinet.Marking;

# Set Experiment PARAMETERS

In [5]:
String INPUT_FOLDER = "data/javert"; // the pnml models are loaded from here
String OUTPUT_FOLDER = "output"; // the results will be put here
int NUM_ACTIVITIES = 15; // how big shall the model be?

int MAX_TRACES = 100000;

int[] sublogSizes = new int[]{1,2,3,4,5,6,7,8,9,10,20,30,40,50,75,100}; // gradual increments in log size
int[] sublogSizes_higher = new int[]{300,1000,3000,10000,30000,100000}; // gradual increments in log size

float INDUCTIVE_MINER_THRESHOLD = 0.2f; // the default parameter for the inductive miner (infrequent)

In [6]:
public static final XEventClassifier CLASSIFIER = XLogInfoImpl.NAME_CLASSIFIER;

# 1. Load models

In [10]:
public AcceptingPetriNet openNetFromFile(String filename) throws Exception{
    File file = new File(filename);
    Serializer serializer = new Persister();
    PNMLRoot pnml = serializer.read(PNMLRoot.class, new FileInputStream(file));

    StochasticNetDeserializer converter = new StochasticNetDeserializer();
    Object[] result = converter.convertToNet(null, pnml, filename, false);
    
    StochasticNet sNet = (StochasticNet) result[0];
    Marking initMarking = (Marking) result[1];
    AcceptingPetriNet acceptingPetriNet = new AcceptingPetriNetImpl(sNet, initMarking, StochasticNetUtils.getFinalMarking(null, sNet));
    
    return acceptingPetriNet;
}

In [12]:
public List<AcceptingPetriNet> loadModels(String input_folder, String extension) throws Exception {
    List<AcceptingPetriNet> listOfModels = new ArrayList<>();
    File folder = new File(input_folder);
    String[] files = folder.list();
    Arrays.sort(files);
    for (String filename : files){
        if (filename.endsWith(extension)){
            System.out.println("Loading model "+filename);
            listOfModels.add(openNetFromFile(folder.toPath()+File.separator+filename));
        }
    }
    System.out.println("Loaded "+listOfModels.size()+" models.");
    return listOfModels;
}

In [13]:
List<AcceptingPetriNet> listOfModels = loadModels(INPUT_FOLDER,".pnml");

Loading model Figure 01.pnml
Assuming race enabling memory for net noID imported from (data/javert/Figure 01.pnml)
Assuming 'minutes' as the time unit in net noID imported from (data/javert/Figure 01.pnml)
Loading model Figure 05.pnml
Assuming race enabling memory for net noID imported from (data/javert/Figure 05.pnml)
Assuming 'minutes' as the time unit in net noID imported from (data/javert/Figure 05.pnml)
Loading model Figure 06.pnml
Assuming race enabling memory for net noID imported from (data/javert/Figure 06.pnml)
Assuming 'minutes' as the time unit in net noID imported from (data/javert/Figure 06.pnml)
Loading model Figure 09.pnml
Assuming race enabling memory for net noID imported from (data/javert/Figure 09.pnml)
Assuming 'minutes' as the time unit in net noID imported from (data/javert/Figure 09.pnml)
Loading model Figure 10.pnml
Assuming race enabling memory for net noID imported from (data/javert/Figure 10.pnml)
Assuming 'minutes' as the time unit in net noID imported from

### Create pictures for the loaded automata 
Images will be stored in **output/automata/&lt;filename&gt;.png**

In [14]:
%maven guru.nidi:graphviz-java:0.11.0

In [15]:
import guru.nidi.graphviz.model.MutableGraph;
import guru.nidi.graphviz.parse.Parser;
import guru.nidi.graphviz.engine.Graphviz;
import guru.nidi.graphviz.engine.Format;

In [29]:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import org.apache.commons.lang3.ArrayUtils;

In [17]:
public String getOriginalFilename(AcceptingPetriNet net){
    Pattern p = Pattern.compile("(Figure .*\\.pnml)");    
    String label = net.getNet().getLabel();
    Matcher m = p.matcher(label);
    String name = "model";
    if (m.find()){
        name = m.group();
    }
    return name;
}

In [21]:
for (AcceptingPetriNet net : listOfModels){
    MutableGraph graph = Parser.read(PrecisionRecallComputer.getAutomaton(net).toDot());
    String origName = getOriginalFilename(net);
    Graphviz.fromGraph(graph).width(900).render(Format.PNG).toFile(new File(OUTPUT_FOLDER+File.separator+ "automata"+ File.separator + origName+".png"));
}

[Ljava.lang.String;@6a699d72
[Ljava.lang.String;@52af5232
[Ljava.lang.String;@86e39cc
[Ljava.lang.String;@3411be9f
[Ljava.lang.String;@571b74d2
[Ljava.lang.String;@24f9fe36
[Ljava.lang.String;@1378fb70
[Ljava.lang.String;@3524472c
[Ljava.lang.String;@4656c86


# 2. simulate log $L$
The random seed of the log generation is set to 1 by default.  
This way, the log will be the same if GenerateLogAndModel is used twice with the same model tree / AcceptingPetrinet.

In [22]:
%maven org.uncommons.maths:uncommons-maths:1.2.2a

In [23]:
import org.processmining.plugins.stochasticpetrinet.simulator.PNSimulator;
import org.processmining.plugins.stochasticpetrinet.simulator.PNSimulatorConfig;
import org.processmining.models.graphbased.directed.petrinet.StochasticNet.ExecutionPolicy;
import org.processmining.models.graphbased.directed.petrinet.StochasticNet.TimeUnit;
import org.processmining.models.semantics.petrinet.impl.EfficientPetrinetSemanticsImpl;

In [24]:
/**
 * Sumulates a log from the given net and returns it.
 * @param net AcceptingPetriNet the net to simulate.
 * @param numberOfTraces int the number of traces to generate from the net.
 * Note: by default, the max number of events per trace is limited to 1000 to avoid running an endless loop in the model.
 */
public XLog simulateLog(AcceptingPetriNet net, int numberOfTraces){
    PNSimulatorConfig config = new PNSimulatorConfig(numberOfTraces,TimeUnit.MINUTES,0,1,1000,ExecutionPolicy.GLOBAL_PRESELECTION);
    PNSimulator simulator = new PNSimulator();
    return simulator.simulate(null,net.getNet(), StochasticNetUtils.getSemantics(net.getNet()),config, net.getInitialMarking());
}

/**
 * Helper method to compute the entropy-based precision/recall measures between two models
 */
public EntropyPrecisionRecall getPrecisionAndRecall(AcceptingPetriNet firstNet, AcceptingPetriNet secondNet){
    String name1 = Utils.getName(firstNet.getNet(),"Md");
    String name2 = Utils.getName(secondNet.getNet(),"M");

    String[] names = PrecisionRecallComputer.getTransitionNames(firstNet, new String[]{});
    names = PrecisionRecallComputer.getTransitionNames(secondNet, names);

    Automaton a1 = getAutomaton(firstNet, names);
    Automaton a2 = getAutomaton(secondNet, names);

    Automaton a12 = a1.intersection(a2, Utils.NOT_CANCELLER);

    return PrecisionRecallComputer.getPrecisionAndRecall(a1, name1, a2, name2, a12, "MdM", a12.getNumberOfStates() / (double)a1.getNumberOfStates(), Utils.NOT_CANCELLER);
}

/**
 * Converts a @{@link AcceptingPetriNet} to an @{@link Automaton}.
 * @param net {@link AcceptingPetriNet} to convert.
 * @param activities {@link String}[] array that captures the names in the other part, if names should be converted.
 * @return Automaton the automaton of the model projected onto the
 */
public Automaton getAutomaton(AcceptingPetriNet net, String[] activities){
    String[] names = PrecisionRecallComputer.getTransitionNames(net, activities);
    System.out.println(""+names);
    AcceptingPetriNet projectedNet = ProjectPetriNetOntoActivities.project(net, Utils.NOT_CANCELLER, names);
    Automaton a = null;
    try {
        a = AcceptingPetriNet2automaton.convert(projectedNet, Integer.MAX_VALUE, Utils.NOT_CANCELLER);
    } catch (AutomatonFailedException e){
        e.printStackTrace();
        System.out.println("Error getting Automaton!");
    }
    return a;
}

public Automaton getAutomaton(AcceptingPetriNet net){
    return getAutomaton(net, new String[]{});
}


public ProcessTree mineTree(XLog xLog, float noiseThreshold){
    XEventClassifier classifier = MiningParameters.getDefaultClassifier();
    org.processmining.plugins.inductiveminer2.logs.IMLog log = new org.processmining.plugins.inductiveminer2.logs.IMLogImpl(xLog, classifier, new LifeCycleClassifier());
    MiningParametersIMInfrequent miningParameters = new MiningParametersIMInfrequent();
    miningParameters.setDebug(false);
    EfficientTree eTree = InductiveMiner.mineEfficientTree(log, miningParameters, new PackageManager.Canceller() {
        @Override
        public boolean isCancelled() {
            return false;
        }
    });

    return EfficientTree2processTree.convert(eTree);
}

public AcceptingPetriNet convertProcessTreeToNet(ProcessTree processTree, int numActivities) {
    try{
        XEventClasses eventClasses = TestUtils.getxEventClasses(CLASSIFIER, numActivities);
        
        ProcessTree2Petrinet.PetrinetWithMarkings petrinetWithMarkings = ProcessTree2Petrinet.convert(processTree, true);
        AcceptingPetriNet acceptingPetriNet = new AcceptingPetriNetImpl(petrinetWithMarkings.petrinet, petrinetWithMarkings.initialMarking, petrinetWithMarkings.finalMarking);
        return acceptingPetriNet;
    } catch (NotYetImplementedException | InvalidProcessTreeException e){
        e.printStackTrace();
        System.err.println("Error!");
        return null;
    }
}

public AcceptingPetriNet convertToNet(NAryTree tree){
    int numActivities = tree.numLeafs();
    XEventClasses eventClasses = TestUtils.getxEventClasses(CLASSIFIER, numActivities);
    ProcessTree processTree = NAryTreeToProcessTree.convert(tree, eventClasses);
    return convertProcessTreeToNet(processTree, numActivities);
}

## Experiment code:

In [25]:
public static void runExperiment(String filename, XLog log, AcceptingPetriNet acceptingPetriNet, String outputFolder,int[] sublogSizes){
    File outFolder = new File(outputFolder + File.separator + filename);
    if (!outFolder.exists()){
        outFolder.mkdirs();
    }
    
    // 3. Select growing number of traces from the log
    for (int i : sublogSizes){
        try (BufferedWriter writer = new BufferedWriter(new FileWriter(new File(outFolder, "results_"+i+".csv")))) {
            writer.write(EntropyPrecisionRecall.getHeader()+"\n");

            XLog subLog = Utils.cloneLog(log, i);
            
            System.out.println("Running with log size: "+subLog.size());

            Stopwatch timer = Stopwatch.createStarted();
            EntropyPrecisionRecall resModelLog = PrecisionRecallComputer.getPrecisionAndRecall(null, Utils.NOT_CANCELLER, subLog,  acceptingPetriNet);
            writer.write(resModelLog.getCSVString()+"\n");
            writer.flush();
            
            System.out.println("Computing recall/precision of sublog/model took: " + timer.stop()); timer.reset(); timer.start();
            
            
            ProcessTree modelDiscovered = mineTree(subLog, INDUCTIVE_MINER_THRESHOLD); 
            System.out.println("Discovery of m_discov from sublog took: " + timer.stop()); timer.reset(); timer.start();
            
            AcceptingPetriNet petriNetDiscovered = convertProcessTreeToNet(modelDiscovered, modelDiscovered.size());

            
            EntropyPrecisionRecall resLogDiscModel = PrecisionRecallComputer.getPrecisionAndRecall(null, Utils.NOT_CANCELLER, subLog,  petriNetDiscovered);
            writer.write(resLogDiscModel.getCSVString()+"\n");
            
            System.out.println("Computing recall/precision of sublog/m_discov: " + timer.stop()); timer.reset(); timer.start();

            EntropyPrecisionRecall resModelDiscModel = getPrecisionAndRecall(acceptingPetriNet, petriNetDiscovered);   
            writer.write(resModelDiscModel.getCSVString()+"\n");
            System.out.println("Computing recall/precision of m_discov/model: " + timer.stop()); 

            writer.flush();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

# Perform the experiment
* Load a model
* Simulate a larger log (1000 traces)
* ... and run the rest of the experiment as explained up above

In [30]:
for (AcceptingPetriNet net : listOfModels){
    String name = getOriginalFilename(net);
    System.out.println("**************\nRunning Experiment with model "+name+"\n**************\n");
    XLog log = simulateLog(net, MAX_TRACES);
    
    runExperiment(name,log, net, OUTPUT_FOLDER+File.separator+"results", ArrayUtils.addAll(sublogSizes, sublogSizes_higher));
}
System.out.println("DONE!")

**************
Running Experiment with model Figure 01.pnml
**************

Running with log size: 1
Computing recall/precision of sublog/model took: 53.72 ms
Discovery of m_discov from sublog took: 74.61 ms
Computing recall/precision of sublog/m_discov: 7.289 ms
[Ljava.lang.String;@735ad34a
[Ljava.lang.String;@29a0934b
Computing recall/precision of m_discov/model: 22.86 ms
Running with log size: 2
Computing recall/precision of sublog/model took: 7.701 ms
Discovery of m_discov from sublog took: 4.580 ms
Computing recall/precision of sublog/m_discov: 8.326 ms
[Ljava.lang.String;@70ab8c4f
[Ljava.lang.String;@66c9310e
Computing recall/precision of m_discov/model: 6.088 ms
Running with log size: 3
Computing recall/precision of sublog/model took: 6.048 ms
Discovery of m_discov from sublog took: 2.186 ms
Computing recall/precision of sublog/m_discov: 6.273 ms
[Ljava.lang.String;@450c885
[Ljava.lang.String;@3e9df2db
Computing recall/precision of m_discov/model: 6.066 ms
Running with log size:

# Done! Now head over to 01_Experiment_Evaluation-selected-models.ipynb
There, we have prepared some python code to visualize the resulting precision/recall graphs.