# Anomaly Detection with Isolation Forest - Manual Exploration

This notebook demonstrates anomaly detection with Isolation Forest for static code analysis gathered by using jQAssistant and Neo4j. The focus is on detecting anomalies in the data, which can be useful for identifying potential issues or areas for improvement in the codebase.

<br>  

### References
- [jqassistant](https://jqassistant.org)
- [Neo4j Python Driver](https://neo4j.com/docs/api/python-driver/current)

## Features overview

| **Feature**                      | **Type**           | **What it Measures**                        | **Why It’s Useful**                         |
| -------------------------------- | ------------------ | ------------------------------------------- | ------------------------------------------- |
| `PageRank`                       | Centrality         | Popularity / referenced code                | High = many dependents                      |
| `ArticleRank`                    | Centrality         | How much the code depends on others         | High = high dependency                      |
| `PageRank - ArticleRank`         | Relative Rank      | Role inversion / architectural layering     | Highlights mismatches                       |
| `Betweenness Centrality`         | Centrality         | Bridge or control nodes                     | High = structural chokepoints               |
| `Local Clustering Coefficient`   | Structural         | Local cohesion / modularity                 | Low = isolated node in a clique-like region |
| `Degree` (Total and In/Out)      | Structural         | Connectivity                                | Raw values may dominate                     |
| `Node Embedding` (PCA reduced)   | Latent             | Structural and semantic similarity          | Captures latent position in graph           |
| `Normalized Cluster Distance`    | Geometric          | Relative to cluster radius                  | Adds context to position                    |
| `1.0 - HDBSCAN membership probability` | Cluster Confidence | How confidently HDBSCAN clustered this node, 1-x inverted | High score = likely anomaly                   |
| `Average Cluster Radius`          | Cluster Context    | How tight or spread out the cluster is         | Highly spread clusters may be a less meaningful one   |



In [None]:
import typing
import numpy.typing as numpy_typing

import os
from IPython.display import display

import pandas as pd
import numpy as np

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.ensemble import IsolationForest, RandomForestClassifier

import matplotlib.pyplot as plot

In [None]:
#The following cell uses the build-in %html "magic" to override the CSS style for tables to a much smaller size.
#This is especially needed for PDF export of tables with multiple columns.

In [None]:
%%html
<style>
/* CSS style for smaller dataframe tables. */
.dataframe th {
    font-size: 8px;
}
.dataframe td {
    font-size: 8px;
}
</style>

In [None]:
# Main Colormap
# main_color_map = 'nipy_spectral'
main_color_map = 'viridis'

In [None]:
from sys import version as python_version
print('Python version: {}'.format(python_version))

from numpy import __version__ as numpy_version
print('numpy version: {}'.format(numpy_version))

from pandas import __version__ as pandas_version
print('pandas version: {}'.format(pandas_version))

from sklearn import __version__ as sklearn_version
print('sklearn version: {}'.format(sklearn_version))

from matplotlib import __version__ as matplotlib_version
print('matplotlib version: {}'.format(matplotlib_version))

from neo4j import __version__ as neo4j_version
print('neo4j version: {}'.format(neo4j_version))

In [None]:
# Please set the environment variable "NEO4J_INITIAL_PASSWORD" in your shell 
# before starting jupyter notebook to provide the password for the user "neo4j". 
# It is not recommended to hardcode the password into jupyter notebook for security reasons.
from neo4j import GraphDatabase

driver = GraphDatabase.driver(
    uri="bolt://localhost:7687", 
    auth=("neo4j", os.environ.get("NEO4J_INITIAL_PASSWORD"))
)
driver.verify_connectivity()

In [None]:
def query_cypher_to_data_frame(query: typing.LiteralString, parameters: typing.Optional[typing.Dict[str, typing.Any]] = None):
    records, summary, keys = driver.execute_query(query, parameters_=parameters)
    return pd.DataFrame([record.values() for record in records], columns=keys)

In [None]:
plot_annotation_style: dict = {
    'textcoords': 'offset points',
    'arrowprops': dict(arrowstyle='->', color='black', alpha=0.3),
    'fontsize': 6,
    'backgroundcolor': 'white',
    'bbox': dict(boxstyle='round,pad=0.4',
                    edgecolor='silver',
                    facecolor='whitesmoke',
                    alpha=1
                )
}

## 1. Java Packages

### 1.1 Query Features

Query all features that are relevant for anomaly detection. Some of them come from precalculated clustering (HDBSCAN), node embeddings (Fast Random Projection), community detection algorithms (Leiden, Local Clustering Coefficient), centrality algorithms (Page Rank, Article Rank, Betweenness) and classical metrics like the in-/out-degree.

In [None]:
java_package_anomaly_detection_features_query = """
    MATCH (artifact:Java:Artifact)-[:CONTAINS]->(codeUnit:Java:Package)
    WHERE codeUnit.incomingDependencies                              IS NOT NULL
      AND codeUnit.outgoingDependencies                              IS NOT NULL
      and codeUnit.embeddingsFastRandomProjectionTunedForClustering  IS NOT NULL
      AND codeUnit.centralityPageRank                                IS NOT NULL
      AND codeUnit.centralityArticleRank                             IS NOT NULL
      AND codeUnit.centralityBetweenness                             IS NOT NULL
      AND codeUnit.communityLocalClusteringCoefficient               IS NOT NULL
      AND codeUnit.clusteringHDBSCANProbability                      IS NOT NULL
      AND codeUnit.clusteringHDBSCANNoise                            IS NOT NULL
      AND codeUnit.clusteringHDBSCANMedoid                           IS NOT NULL
      AND codeUnit.clusteringHDBSCANRadiusAverage                    IS NOT NULL
      AND codeUnit.clusteringHDBSCANNormalizedDistanceToMedoid       IS NOT NULL
      AND codeUnit.clusteringHDBSCANSize                             IS NOT NULL
      AND codeUnit.clusteringHDBSCANLabel                            IS NOT NULL
      AND codeUnit.clusteringHDBSCANMedoid                           IS NOT NULL
      AND codeUnit.embeddingFastRandomProjectionVisualizationX       IS NOT NULL
      AND codeUnit.embeddingFastRandomProjectionVisualizationY       IS NOT NULL
   RETURN DISTINCT 
         codeUnit.fqn                                                  AS codeUnitName
        ,codeUnit.name                                                 AS shortCodeUnitName
        ,artifact.name                                                 AS projectName
        ,codeUnit.incomingDependencies                                 AS incomingDependencies
        ,codeUnit.outgoingDependencies                                 AS outgoingDependencies
        ,codeUnit.incomingDependencies + codeUnit.outgoingDependencies AS degree
        ,codeUnit.embeddingsFastRandomProjectionTunedForClustering     AS embedding
        ,codeUnit.centralityPageRank                                   AS pageRank
        ,codeUnit.centralityArticleRank                                AS articleRank
        ,codeUnit.centralityPageRank - codeUnit.centralityArticleRank  AS pageToArticleRankDifference
        ,codeUnit.centralityBetweenness                                AS betweenness
        ,codeUnit.communityLocalClusteringCoefficient                  AS locallusteringCoefficient
        ,1.0 - codeUnit.clusteringHDBSCANProbability                   AS clusterApproximateOutlierScore
        ,codeUnit.clusteringHDBSCANNoise                               AS clusterNoise
        ,codeUnit.clusteringHDBSCANRadiusAverage                       AS clusterRadiusAverage
        ,codeUnit.clusteringHDBSCANNormalizedDistanceToMedoid          AS clusterDistanceToMedoid
        ,codeUnit.clusteringHDBSCANSize                                AS clusterSize
        ,codeUnit.clusteringHDBSCANLabel                               AS clusterLabel
        ,codeUnit.clusteringHDBSCANMedoid                              AS clusterMedoid
        ,codeUnit.embeddingFastRandomProjectionVisualizationX          AS embeddingVisualizationX
        ,codeUnit.embeddingFastRandomProjectionVisualizationY          AS embeddingVisualizationY
"""

java_package_anomaly_detection_features = query_cypher_to_data_frame(java_package_anomaly_detection_features_query)
java_package_features_to_standardize = java_package_anomaly_detection_features.columns.drop(['codeUnitName', 'shortCodeUnitName', 'projectName', 'embedding', 'clusterLabel', 'clusterSize', 'clusterMedoid', 'embeddingVisualizationX', 'embeddingVisualizationY']).to_list()

display(java_package_anomaly_detection_features.head(5))

### 1.2 Data preparation

Prepare the data by standardizing numeric fields and reducing the dimensionality of the node embeddings to not dominate the results.

In [None]:
def validate_data(features: pd.DataFrame) -> None:
    if features.empty:
        print("Data Validation Info: No data")

    if features.isnull().values.any():
        raise RuntimeError("Data Validation Error: Some values are null. Fix the wrong values or filter them out.")

In [None]:
validate_data(java_package_anomaly_detection_features)

In [None]:
def standardize_features(features: pd.DataFrame, feature_list: list[str]) -> numpy_typing.NDArray:
    features_to_scale = features[feature_list]
    scaler = StandardScaler()
    return scaler.fit_transform(features_to_scale)

In [None]:
java_package_anomaly_detection_features_standardized = standardize_features(java_package_anomaly_detection_features, java_package_features_to_standardize)

In [None]:
def reduce_dimensionality_of_node_embeddings(
        features: pd.DataFrame, 
        min_dimensions: int = 20, 
        max_dimensions: int = 40, 
        target_variance: float = 0.90,
        embedding_column_name: str = 'embedding'
) -> numpy_typing.NDArray:
    """
    Automatically reduce the dimensionality of node embeddings using Principal Component Analysis (PCA)
    to reach a target explained variance ratio with the lowest possible number of components (output dimensions).

    Parameters:
    - features (pd.DataFrame) with a column 'embedding', where every value contains a float array with original dimensions.
    - min_dimensions: Even if possible with the given variance, don't go below this number of dimensions for the output
    - max_dimensions: Return at most the max number of dimensions, even if that means, that the target variance can't be met.
    - target_variance (float): Cumulative variance threshold (default: 0.90)
    - embedding_column_name (string): Defaults to 'embedding'

    Returns: Reduced embeddings as an numpy array
    """

    # Convert the input and get the original dimension
    embeddings = np.stack(features[embedding_column_name].apply(np.array).tolist())
    original_dimension = embeddings.shape[1]

    # Fit PCA without dimensionality reduction to get explained variance
    full_principal_component_analysis_without_reduction = PCA()
    full_principal_component_analysis_without_reduction.fit(embeddings)

    # Find smallest number of components to reach target variance
    cumulative_variance = np.cumsum(full_principal_component_analysis_without_reduction.explained_variance_ratio_)
    best_n_components = np.searchsorted(cumulative_variance, target_variance) + 1
    best_n_components = max(best_n_components, min_dimensions) # Use at least min_dimensions
    best_n_components = min(best_n_components, max_dimensions) # Use at most max_dimensions

    # Apply PCA with optimal number of components
    principal_component_analysis = PCA(n_components=best_n_components)
    java_type_anomaly_detection_node_embeddings_reduced = principal_component_analysis.fit_transform(embeddings)

    explained_variance_ratio_sum = sum(principal_component_analysis.explained_variance_ratio_)
    print(f"Dimensionality reduction from {original_dimension} to {best_n_components} (min {min_dimensions}) of node embeddings using Principal Component Analysis (PCA): Explained variance is {explained_variance_ratio_sum:.4f}.")

    return java_type_anomaly_detection_node_embeddings_reduced
    

In [None]:
java_package_anomaly_detection_node_embeddings_reduced = reduce_dimensionality_of_node_embeddings(java_package_anomaly_detection_features)

In [None]:
java_package_anomaly_detection_features_prepared = np.hstack([java_package_anomaly_detection_features_standardized, java_package_anomaly_detection_node_embeddings_reduced])
java_package_anomaly_detection_feature_names = list(java_package_features_to_standardize) + [f'pca_{i}' for i in range(java_package_anomaly_detection_node_embeddings_reduced.shape[1])]

### 1.3 List the top 10 anomalies found using Isolation Forest

> The IsolationForest 'isolates' observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature.

In [None]:
def detect_anomalies(
        prepared_features: numpy_typing.NDArray, 
        original_features: pd.DataFrame,
        anomaly_label_column: str = 'anomalyLabel',
        anomaly_score_column: str = 'anomalyScore',
) -> pd.DataFrame:
    isolation_forest = IsolationForest(n_estimators=200, contamination=0.05, random_state=42)
    anomaly_score = isolation_forest.fit_predict(prepared_features)

    original_features[anomaly_label_column] = anomaly_score * -1 # 1 = anomaly, 0 = no anomaly
    original_features[anomaly_score_column] = isolation_forest.decision_function(prepared_features) * -1  # higher = more anomalous
    return original_features

In [None]:
java_package_anomaly_detection_features = detect_anomalies(java_package_anomaly_detection_features_prepared, java_package_anomaly_detection_features)

In [None]:
def get_top_10_anomalies(
        anomaly_detected_features: pd.DataFrame, 
        anomaly_label_column: str = "anomalyLabel",
        anomaly_score_column: str = "anomalyScore"
) -> pd.DataFrame:
    anomalies = anomaly_detected_features[anomaly_detected_features[anomaly_label_column] == -1]
    return anomalies.sort_values(by=anomaly_score_column, ascending=False).reset_index(drop=True).head(10)

In [None]:
display(get_top_10_anomalies(java_package_anomaly_detection_features))

### 1.4 Plot the 20 most influential features

Use Random Forest as a proxy to estimate the importance of each feature contributing to the anomaly score.

In [None]:
def get_feature_importances(
        anomaly_detected_features: pd.DataFrame, 
        prepared_features: numpy_typing.NDArray,
        anomaly_label_column: str = "anomalyLabel",
) -> numpy_typing.NDArray:
    """
    Use Random Forest as a proxy model to find out which are the most important features for the anomaly detection model (Isolation Forest).
    This helps to see if embedding components dominate (top 10 filled with them), and then tune accordingly.
    """
    # Use IsolationForest labels as a "pseudo ground truth"
    y_pseudo = (anomaly_detected_features[anomaly_label_column] == -1).astype(int)

    # Fit classifier to match the IF model
    proxy_random_forest = RandomForestClassifier(n_estimators=100, random_state=42)
    proxy_random_forest.fit(prepared_features, y_pseudo)

    return proxy_random_forest.feature_importances_

In [None]:
java_package_anomaly_detection_importances = get_feature_importances(java_package_anomaly_detection_features, java_package_anomaly_detection_features_prepared)
java_package_anomaly_detection_importances_series = pd.Series(java_package_anomaly_detection_importances, index=java_package_anomaly_detection_feature_names).sort_values(ascending=False)
#display(java_type_anomaly_detection_importances_series.head(10))

In [None]:
def plot_feature_importances(feature_importances_series: pd.Series, title_prefix: str) -> None:
    feature_importances_series.head(20).plot(
        kind='barh',
        figsize=(10, 6),
        color='skyblue',
        title=f"{title_prefix}: Top 20 Feature Importances (Random Forest Proxy)",
        xlabel="Importance"
    )
    plot.gca().invert_yaxis() # Most important feature at the top
    plot.tight_layout()
    plot.show()

In [None]:
plot_feature_importances(java_package_anomaly_detection_importances_series, title_prefix='Java Packages')

### 1.5. Plot anomalies

Plots clustered nodes and highlights anomalies.

In [None]:
def plot_anomalies(
    clustering_visualization_dataframe: pd.DataFrame,
    title_prefix: str,
    code_unit_column: str = "shortCodeUnitName",
    cluster_label_column: str = "clusterLabel",
    cluster_medoid_column: str = "clusterMedoid",
    cluster_size_column: str = "clusterSize",
    anomaly_label_column: str = "anomalyLabel",
    anomaly_score_column: str = "anomalyScore",
    page_rank_column: str = "pageRank",
    x_position_column: str = 'embeddingVisualizationX',
    y_position_column: str = 'embeddingVisualizationY',
) -> None:
    
    if clustering_visualization_dataframe.empty:
        print("No projected data to plot available")
        return
    
    def truncate(text: str, max_length: int):
        if len(text) <= max_length:
            return text
        return text[:max_length - 3] + "..."
    
    cluster_anomalies = clustering_visualization_dataframe[clustering_visualization_dataframe[anomaly_label_column] == 1]
    cluster_without_anomalies = clustering_visualization_dataframe[clustering_visualization_dataframe[anomaly_label_column] != 1]
    cluster_noise = cluster_without_anomalies[cluster_without_anomalies[cluster_label_column] == -1]
    cluster_non_noise = cluster_without_anomalies[cluster_without_anomalies[cluster_label_column] != -1]

    plot.figure(figsize=(10, 10))
    plot.title(title_prefix + ' (size=PageRank, color=ClusterLabel, red=Anomaly)')

    # Plot noise
    plot.scatter(
        x=cluster_noise[x_position_column],
        y=cluster_noise[y_position_column],
        s=cluster_noise[page_rank_column] * 200 + 4,
        color='lightgrey',
        alpha=0.5,
        label='Noise'
    )

    # Plot clusters
    plot.scatter(
        x=cluster_non_noise[x_position_column],
        y=cluster_non_noise[y_position_column],
        s=cluster_non_noise[page_rank_column] * 200 + 4,
        c=cluster_non_noise[cluster_label_column],
        cmap='tab20',
        alpha=0.7,
        label='Clusters'
    )

    # Plot anomalies
    plot.scatter(
        x=cluster_anomalies[x_position_column],
        y=cluster_anomalies[y_position_column],
        s=cluster_anomalies[page_rank_column] * 200 + 4,
        c=cluster_anomalies[anomaly_score_column],
        cmap="Reds",
        alpha=0.9,
        label='Anomaly'
    )

    # Annotate medoids of the cluster
    cluster_medoids = cluster_non_noise[cluster_non_noise[cluster_medoid_column] == 1].sort_values(by=cluster_size_column, ascending=False).head(20)
    for index, row in cluster_medoids.iterrows():
        plot.annotate(
            text=f"{row[cluster_label_column]}:{truncate(row[code_unit_column], 20)} ({row[anomaly_score_column]:.4f})",
            xy=(row[x_position_column], row[y_position_column]),
            xytext=(5, 5),
            alpha=0.4,
            **plot_annotation_style
        )

    anomalies = cluster_anomalies.sort_values(by=anomaly_score_column, ascending=False).reset_index(drop=True).head(6)
    for dataframe_index, row in anomalies.iterrows():
        index = typing.cast(int, dataframe_index)
        plot.annotate(
            text=f"{row[cluster_label_column]}:{truncate(row[code_unit_column], 20)} ({row[anomaly_score_column]:.4f})",
            xy=(row[x_position_column], row[y_position_column]),
            xytext=(5, 5 + (index % 5) * 10),
            color='red',
            **plot_annotation_style
        )

    plot.show()

In [None]:
plot_anomalies(java_package_anomaly_detection_features, title_prefix="Java Package Anomalies")

## 2. Java Types

### 2.1 Query Features

Query all features that are relevant for anomaly detection. Some of them come from precalculated clustering (HDBSCAN), node embeddings (Fast Random Projection), community detection algorithms (Leiden, Local Clustering Coefficient), centrality algorithms (Page Rank, Article Rank, Betweenness) and classical metrics like the in-/out-degree.


In [None]:
java_type_anomaly_detection_features_query = """
    MATCH (artifact:Java:Artifact)-[:CONTAINS]->(codeUnit:Java:Type)
    WHERE codeUnit.incomingDependencies                              IS NOT NULL
      AND codeUnit.outgoingDependencies                              IS NOT NULL
      and codeUnit.embeddingsFastRandomProjectionTunedForClustering  IS NOT NULL
      AND codeUnit.centralityPageRank                                IS NOT NULL
      AND codeUnit.centralityArticleRank                             IS NOT NULL
      AND codeUnit.centralityBetweenness                             IS NOT NULL
      AND codeUnit.communityLocalClusteringCoefficient               IS NOT NULL
      AND codeUnit.clusteringHDBSCANProbability                      IS NOT NULL
      AND codeUnit.clusteringHDBSCANNoise                            IS NOT NULL
      AND codeUnit.clusteringHDBSCANMedoid                           IS NOT NULL
      AND codeUnit.clusteringHDBSCANRadiusAverage                    IS NOT NULL
      AND codeUnit.clusteringHDBSCANNormalizedDistanceToMedoid       IS NOT NULL
      AND codeUnit.clusteringHDBSCANLabel                            IS NOT NULL
      AND codeUnit.clusteringHDBSCANSize                             IS NOT NULL
      AND codeUnit.clusteringHDBSCANMedoid                           IS NOT NULL
      AND codeUnit.embeddingFastRandomProjectionVisualizationX       IS NOT NULL
      AND codeUnit.embeddingFastRandomProjectionVisualizationY       IS NOT NULL
   RETURN DISTINCT 
         codeUnit.fqn                                                  AS codeUnitName
        ,codeUnit.name                                                 AS shortCodeUnitName
        ,artifact.name                                                 AS projectName
        ,codeUnit.incomingDependencies                                 AS incomingDependencies
        ,codeUnit.outgoingDependencies                                 AS outgoingDependencies
        ,codeUnit.incomingDependencies + codeUnit.outgoingDependencies AS degree
        ,codeUnit.embeddingsFastRandomProjectionTunedForClustering     AS embedding
        ,codeUnit.centralityPageRank                                   AS pageRank
        ,codeUnit.centralityArticleRank                                AS articleRank
        ,codeUnit.centralityPageRank - codeUnit.centralityArticleRank  AS pageToArticleRankDifference
        ,codeUnit.centralityBetweenness                                AS betweenness
        ,codeUnit.communityLocalClusteringCoefficient                  AS locallusteringCoefficient
        ,1.0 - codeUnit.clusteringHDBSCANProbability                   AS clusterApproximateOutlierScore
        ,codeUnit.clusteringHDBSCANNoise                               AS clusterNoise
        ,codeUnit.clusteringHDBSCANRadiusAverage                       AS clusterRadiusAverage
        ,codeUnit.clusteringHDBSCANNormalizedDistanceToMedoid          AS clusterDistanceToMedoid
        ,codeUnit.clusteringHDBSCANLabel                               AS clusterLabel
        ,codeUnit.clusteringHDBSCANSize                                AS clusterSize
        ,codeUnit.clusteringHDBSCANMedoid                              AS clusterMedoid
        ,codeUnit.embeddingFastRandomProjectionVisualizationX          AS embeddingVisualizationX
        ,codeUnit.embeddingFastRandomProjectionVisualizationY          AS embeddingVisualizationY
"""

java_type_anomaly_detection_features = query_cypher_to_data_frame(java_type_anomaly_detection_features_query)
java_type_features_to_standardize = java_type_anomaly_detection_features.columns.drop(['codeUnitName', 'shortCodeUnitName', 'projectName', 'embedding', 'clusterLabel', 'clusterSize', 'clusterMedoid', 'embeddingVisualizationX', 'embeddingVisualizationY']).to_list()

display(java_type_anomaly_detection_features.head(5))

### 1.2 Data preparation

Prepare the data by standardizing numeric fields and reducing the dimensionality of the node embeddings to not dominate the results.

In [None]:
validate_data(java_type_anomaly_detection_features)
java_type_anomaly_detection_features_standardized = standardize_features(java_type_anomaly_detection_features, java_type_features_to_standardize)
java_type_anomaly_detection_node_embeddings_reduced = reduce_dimensionality_of_node_embeddings(java_type_anomaly_detection_features)

java_type_anomaly_detection_features_prepared = np.hstack([java_type_anomaly_detection_features_standardized, java_type_anomaly_detection_node_embeddings_reduced])
java_type_anomaly_detection_feature_names = list(java_type_features_to_standardize) + [f'pca_{i}' for i in range(java_type_anomaly_detection_node_embeddings_reduced.shape[1])]

### 2.3 List the top 10 anomalies found using Isolation Forest

> The IsolationForest 'isolates' observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature.

In [None]:
java_type_anomaly_detection_features = detect_anomalies(java_type_anomaly_detection_features_prepared, java_type_anomaly_detection_features)
display(get_top_10_anomalies(java_type_anomaly_detection_features))

### 2.4 Plot the 20 most influential features

Use Random Forest as a proxy to estimate the importance of each feature contributing to the anomaly score.

In [None]:
java_type_anomaly_detection_importances = get_feature_importances(java_type_anomaly_detection_features, java_type_anomaly_detection_features_prepared)
java_type_anomaly_detection_importances_series = pd.Series(java_type_anomaly_detection_importances, index=java_type_anomaly_detection_feature_names).sort_values(ascending=False)
#display(java_type_anomaly_detection_importances_series.head(10))

plot_feature_importances(java_type_anomaly_detection_importances_series, title_prefix='Java Types')

### 2.5. Plot anomalies

Plots clustered nodes and highlights anomalies.

In [None]:
plot_anomalies(java_type_anomaly_detection_features, title_prefix="Java Type Anomalies")