In [None]:
A1. Feature selection is the process of selecting a subset of relevant features from a given data set, in order to reduce the dimensionality, improve the performance, and simplify the interpretation of anomaly detection algorithms. Feature selection can help to remove noisy, redundant, or irrelevant features that might obscure the detection of anomalies, and focus on the most informative features that can better distinguish between normal and abnormal data points. Feature selection can also reduce the computational cost and complexity of anomaly detection algorithms, especially for high-dimensional data sets. Some common methods for feature selection include filter methods, wrapper methods, and embedded methods¹.

A2. Some common evaluation metrics for anomaly detection algorithms are:

- Precision: the proportion of detected anomalies that are true anomalies, calculated as TP / (TP + FP), where TP is the number of true positives and FP is the number of false positives.
- Recall: the proportion of true anomalies that are detected, calculated as TP / (TP + FN), where FN is the number of false negatives.
- F1-score: the harmonic mean of precision and recall, calculated as 2 * (precision * recall) / (precision + recall).
- ROC curve: a plot of the true positive rate (recall) versus the false positive rate (1 - specificity), where specificity is the proportion of normal data points that are correctly classified, calculated as TN / (TN + FP), where TN is the number of true negatives.
- AUC: the area under the ROC curve, which measures the overall performance of an anomaly detection algorithm across different thresholds. A higher AUC indicates a better performance.
- PR curve: a plot of precision versus recall, which shows the trade-off between these two metrics for different thresholds.
- AP: the average precision, which is the area under the PR curve, and measures the precision at different levels of recall. A higher AP indicates a better performance.

These metrics can be computed using the sklearn.metrics module in Python².

A3. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm that groups data points based on their density, i.e., the number of points in their neighborhood. DBSCAN can find clusters of arbitrary shapes and sizes, and can also identify outliers that do not belong to any cluster. DBSCAN works as follows³:

- For each data point, DBSCAN defines a neighborhood of radius epsilon (eps), and counts the number of points within this neighborhood, including itself. This number is called the local density of the point.
- DBSCAN classifies each point as one of the following types:
    - Core point: a point that has at least a minimum number of points (min_samples) in its neighborhood. Core points are at the center of dense regions and form clusters.
    - Border point: a point that has less than min_samples in its neighborhood, but is reachable from a core point. Border points are at the edge of dense regions and belong to the same cluster as the core point that can reach them.
    - Noise point: a point that is neither a core point nor a border point. Noise points are isolated from dense regions and do not belong to any cluster.
- DBSCAN forms clusters by connecting core points that are directly reachable from each other, i.e., there is a path of core points between them. Border points are assigned to the cluster of their nearest core point. Noise points are left as outliers.


A4. The epsilon parameter (eps) affects the performance of DBSCAN in detecting anomalies by determining the size of the neighborhood for each data point. A larger eps value means a larger neighborhood, which can result in more points being classified as core or border points, and fewer points being classified as noise points. This can lead to larger and fewer clusters, and lower sensitivity to outliers. A smaller eps value means a smaller neighborhood, which can result in more points being classified as noise points, and fewer points being classified as core or border points. This can lead to smaller and more clusters, and higher sensitivity to outliers. Therefore, choosing an appropriate eps value is crucial for the performance of DBSCAN, and it depends on the scale and distribution of the data.

A5. The differences between the core, border, and noise points in DBSCAN are:

- Core points are points that have at least a minimum number of points (min_samples) in their neighborhood of radius epsilon (eps). Core points are at the center of dense regions and form clusters. Core points are not considered as anomalies by DBSCAN, since they represent the normal behavior of the data.
- Border points are points that have less than min_samples in their neighborhood, but are reachable from a core point. Border points are at the edge of dense regions and belong to the same cluster as the core point that can reach them. Border points are not considered as anomalies by DBSCAN, since they are still part of the normal behavior of the data, although they are less dense than core points.
- Noise points are points that are neither core points nor border points. Noise points are isolated from dense regions and do not belong to any cluster. Noise points are considered as anomalies by DBSCAN, since they deviate from the normal behavior of the data.

A6. DBSCAN detects anomalies as noise points, which are points that are neither core points nor border points. Noise points are isolated from dense regions and do not belong to any cluster. The key parameters involved in the process are:

- Epsilon (eps): the radius of the neighborhood for each data point. A larger eps value means a larger neighborhood, which can result in fewer noise points and lower sensitivity to outliers. A smaller eps value means a smaller neighborhood, which can result in more noise points and higher sensitivity to outliers.
- Minimum samples (min_samples): the minimum number of points required in the neighborhood of a data point to be classified as a core point. A larger min_samples value means a higher density requirement, which can result in fewer core points and more noise points. A smaller min_samples value means a lower density requirement, which can result in more core points and fewer noise points.


A7. The make_circles package in scikit-learn is used for generating synthetic data sets for clustering and classification tasks. The make_circles function creates a large circle containing a smaller circle in 2D, and returns two arrays: X, which contains the coordinates of the generated samples, and y, which contains the binary labels (0 or 1) of the samples. The function allows the user to specify the number of samples, the noise level, the factor of the inner circle radius, and the random state for reproducibility.

A8. Local outliers and global outliers are two types of outliers that differ in their definition and detection methods. Local outliers are data points that deviate from their local neighborhood, but not from the global distribution of the data. Global outliers are data points that deviate from the global distribution of the data, regardless of their local neighborhood. For example, in a data set of human heights, a person who is 2.5 meters tall would be a global outlier, since they are far away from the average height of the population. A person who is 1.8 meters tall would not be a global outlier, since they are within the normal range of heights, but they could be a local outlier if they live in a region where the average height is much lower.

A9. Local outliers can be detected using the Local Outlier Factor (LOF) algorithm, which is a density-based method that measures the local deviation of a data point from its neighbors. The LOF algorithm computes the anomaly score for each data point as follows:

- For each data point, it calculates the k-distance, which is the distance to its k-th nearest neighbor, and the k-distance neighborhood, which is the set of data points that are within the k-distance from the data point.
- For each data point, it calculates the reachability distance, which is the maximum of the k-distance of the data point and the distance to another data point, and the local reachability density, which is the inverse of the average reachability distance of the data points in the k-distance neighborhood of the data point.
- For each data point, it calculates the local outlier factor, which is the ratio of the average local reachability density of the data points in the k-distance neighborhood of the data point and the local reachability density of the data point itself. A high local outlier factor indicates that the data point is far away from its neighbors compared to how close the neighbors are to each other, which implies that the data point is a local outlier.

A10. Global outliers can be detected using the Isolation Forest algorithm, which is an ensemble method that uses binary trees to isolate data points. The Isolation Forest algorithm assumes that anomalies are more likely to be isolated than normal data points, since they are far away or different from the majority of the data. The algorithm works as follows⁵:

- For each data point, the algorithm randomly selects a feature and a split value between the minimum and maximum values of that feature, and then partitions the data into two subsets based on whether the data point is above or below the split value. This process is repeated recursively until each data point is isolated or a maximum tree depth is reached.
- The algorithm measures the path length of each data point, which is the number of splits required to isolate it. The path length is averaged over a number of trees to obtain the anomaly score of the data point. A shorter path length indicates a higher anomaly score, since it means that the data point is easier to isolate and more likely to be an outlier.

A11. Some real-world applications where local outlier detection is more appropriate than global outlier detection are:

- Fraud detection: Local outlier detection can help to identify fraudulent transactions or activities that deviate from the normal behavior of a specific user or group, rather than the entire population. For example, a sudden increase in the amount or frequency of purchases by a credit card holder may indicate fraud, even if the purchases are within the normal range for the whole dataset.
- Network intrusion detection: Local outlier detection can help to detect malicious attacks or unauthorized access to a network or system that deviate from the normal behavior of a specific host or service, rather than the entire network. For example, a sudden spike in the traffic or requests from a specific IP address may indicate an intrusion, even if the traffic or requests are within the normal range for the whole network.
- Medical diagnosis: Local outlier detection can help to diagnose diseases or conditions that deviate from the normal behavior of a specific patient or group, rather than the entire population. For example, a sudden change in the blood pressure or heart rate of a patient may indicate a health problem, even if the blood pressure or heart rate are within the normal range for the whole dataset.

Some real-world applications where global outlier detection is more appropriate than local outlier detection are:

- Sensor data analysis: Global outlier detection can help to identify faulty or malfunctioning sensors that produce abnormal readings that deviate from the global distribution of the data. For example, a sensor that reports a temperature of 100 degrees Celsius in a room may indicate a defect, even if the temperature is within the normal range for the local neighborhood of the sensor.
- Data cleaning: Global outlier detection can help to remove noisy or erroneous data points that deviate from the global distribution of the data. For example, a data point that has a missing or invalid value for a feature may indicate a data entry error, even if the value is within the normal range for the local neighborhood of the data point.
- Novelty detection: Global outlier detection can help to discover new or emerging patterns or trends that deviate from the global distribution of the data. For example, a data point that represents a new product or service that has not been seen before may indicate a novelty, even if the product or service is within the normal range for the local neighborhood of the data point.

