# IDS Cheatsheet for Exam Prep

## Statistical Preprocessing

|Mean|Variance|Standard Deviation|Standard Score|Covariance|Correlation|
|:---|:---|:---|:---|:---|:---|
|$mean(X) = \frac{sum(X)}{len(X)}$|$var(a)=\frac{\sum_{a_i \in a}(a_i - \bar{a})^2}{len(a)-1}$|$std(V) = \sqrt{var(V)}$|$stsc(x) = \frac{x-mean(X)}{std(X)}$|$\frac{1}{n-1}\sum_{i=1}^n ((a_i-\bar{a})*(b_i-\bar{b}))$|$\frac{cov(a,b)}{std(a)*std(b)}$|

|Entropy||||||
|:---|:---|:---|:---|:---|:---|
|$H(t) = - \sum_{i=1}^I(P(t=i)*log_s(P(t=i)))$||||||

## Activation Functions Neural Networks
|Identity|Logistic/Sigmoid|Step|
|---|---|---|
|$f(x)=x$|$\frac{1}{1+e^{-x}}$|$x \geq 0 \rightarrow 1$|

## Confusion Matrix
<table><tr><td colspan="2" rowspan="2"></td><td colspan="2">predicted</td></tr><tr><td>True</td><td>False</td></tr><tr><td rowspan="2">target</td><td>True</td><td>True Positive</td><td>False Negative</td></tr><tr><td>False</td><td>False Positive</td><td>True Negative</td></tr></table>
 
 -----------------------------

|Recall|Specificity|Precision|
|---|---|---|
|$\frac{TP}{TP+FN}$|$\frac{TN}{TN+FP}$|$\frac{TP}{TP+FP}$|
|sensitivity, recall, hit rate, or true positive rate|specificity, selectivity or true negative rate|precision or positive predictive value|

## Distance Metrics

|Euclidean|Manhattan|Cosine|Dot Product|Magnitude|
|---|---|---|---|---|
|$d(x,y) = \sqrt{\sum_{i=1}^n (x_i-y_i)^2}$|$d(x,y) = \sum_{i=1}^n \|x_i-y_i\|$|$d(x,y) = \frac{x \cdot y}{\|\|x\|\| \cdot \|\|y\|\|}$|$x \cdot y = \sum_{i=1}^n x_i \cdot y_i$|$\|\|x\|\| = \sqrt{\sum_{i=1}^n x_i^2}$|

## Clustering

|-|K-Means|DBSCAN|Hierarchical|
|:---|---|---|---|
|**Initial**|K-Randomly picked centroids|All points with minPts in $\epsilon$-neighborhood set as core points|Each point is a cluster|
|**Iteration**|Assign points to nearest centroid, average points to get new centroid|-|Merge closest clusters|
|**Termination**|Centroids don't (or minimal) change|Core points within $\epsilon$-neighborhood are clustered, non-core with core point in $\epsilon$-neighborhood are assigned to closest cluster, rest are noise|Target number of clusters is reached or no more clusters can be merged without exceeding threshold|

## Itemsets

### Metrics
|Support|Confidence|Lift|Conviction|
|---|---|---|---|
|$supp(X) = \frac{count(X)}{N}$|$conf(X \rightarrow Y) = \frac{supp(X \cup Y)}{supp(X)}$|$lift(X \rightarrow Y) = \frac{conf(X \rightarrow Y)}{supp(Y)} = \frac{supp(X \cup Y)}{supp(X)*supp(Y)}$|$conv(X \rightarrow Y) = \frac{1-supp(Y)}{1-conf(X \rightarrow Y)}$|
|Support shows how relevant this itemset is|Confidence shows how likely Y is bought if X is bought|Lift shows how likely Y is bought if X is bought, while controlling for how popular Y is|Conviction shows how much X is bought without Y|
### Properties
|Frequent|Closed|Maximal Frequent|
|---|---|---|
|Set has more than min support|All supersets have lower support|Set is frequent, closed and no superset is frequent|

## Petri Nets

|Situation|Behavior|
|---|---|
|All output places are empty|Outputs +1 token, produced+=outputs|
|Some output places have a token|Outputs +1 token, produced+=outputs positions can have multiple tokens|
|---|---|
|All input places have a token|Inputs -1 token, consumed+=inputs|
|Some input places have a token|Produce missing_tokens on empty inputs, Inputs -1 token, consumed+=inputs, missing+=missing_tokens, produced not affected|
|---|---|
|Start of trace|start_positions +1 token produced+=start_positions|
|End of trace|end_positions -1 token consumed+=end_positions missing+=amount_missing_on_end|
|---|---|
|Tokens remaining after trace|remaining+=tokens|

### Fitness
$fitness = \frac{1}{2}(1-\frac{missing}{consumed})+\frac{1}{2}(1-\frac{remaining}{produced})$
### Rules
$remaining = produced + missing - consumed$

$produced + missing \geq consumed \geq missing$ 

## Inductive Miner

Maximize # of partitions.
### Exclusive Choice Cut $X$
No connection between partitions

### Sequence Cut $\rightarrow$
Partitions are acyclic

### Parallel Cut $\land$
Each partition has a start and end. Every node in partition A can be reached from every node in partition B and vice versa.

### Loop Cut $\circlearrowleft$
All start and end nodes are in the same partition. Every outgoing edge from that partition comes from an end node and every incoming edge goes to a start node.