# Automatic alphas and betas calculator
This script automatically calculates alphas and betas for the SMART alerts.
It was written both for research (prior to writing production code) and for POCs (running versions of the product without automatic alphas and betas calculation in the production code).

### Assumptions:
This script assumes that there's an accessible mongo with the following populated collections:
* `entity_event_normalized_username_daily`
* `entity_event_normalized_username_hourly`

The collections are used for two purposes:
* Get the distribution of Fs and Ps. The distribution is used in the first part of the algorithm - in order to give a penalty for eac F and P for its noisiness.
* Get all of the entity events. This is used in the second part of the algorithm - in order to iteratively decreases alphas and betas until the top entity events have a good ratio of participating Fs and Ps.

### Configuration:
* `mongo_ip` should be configured with the right ip.
* `verbose` can be set to `True` in order to print more stuff.
* `show_graphs` should be set to `True` only when you want to display graphs (typically in research environment).
* `START_TIME` and `END_TIME` can be used in order to limit the entities that will be retrieved. If this is used and there's a file `entities.txt` with previous query results (but different time interval), the results will be combined and saved to the file. If one of them is `None`, it is automatically changed to the time of the first / last entity in mongo.
* `NUM_OF_ALERTS_PER_DAY` can be configured (which affects the `low-values-score-reduction` applied to the SMART score).
* `BASE_ALPHA`, `BASE_BETA` - the $\alpha$ / $\beta$ value set for Fs / Ps which are not noisy at all. These typically shouldn't be changed.
* `REDUCERS` control the configuration for the `low-values-score-reduction`s applied to the Fs. In other words, if some F in mongo has many high scores and you decide to reduce them using the `low-values-score-reduction`, there's a problem - since the scores in mongo are "wrong". So you can set these parameters in order to simulate the behaviour of the production java code associated with the `low-values-score-reduction`. In order to automatically calculate `MIN_VALUE_FOR_NOT_REDUCE`, you can run "`low values reduction.py`".
* `FIXED_W_DAILY` and `FIXED_W_HOURLY` can be used in order to decide in advance what some $\alpha$ / $\beta$ should be.
* `aggregated_feature_event_prevalance_stats_path` is the path to the version of the configuration installed for the customer. The reason this is needed is because some Fs have already been reduced, so before applying the reduction using `MIN_VALUE_FOR_NOT_REDUCE`, the original reduction must be undone.

### Output:
The weights (alphas and betas) are printed first (referred to as `w`). First is the daily and then is the hourly.

Then, the output for the low-values-score-reducer configs is printed (daily and then ourly). These are the parameters that should be used in `aggregated-feature_event-prevalance-stats.properties`).

### Re-runs:
The first run might take a while (because of the mongo queries). Re-runs will run much faster, because the queries output are saved to local files (`fs_and_ps.json` and `entities.json`), and are re-used in re-runs.

In [None]:
import sys
sys.path.append('..')
from common import config
from alphas_and_betas import main
if config.show_graphs:
    import matplotlib.pyplot as plt
    import seaborn as sns
    %matplotlib inline

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
%%javascript
//IPython.load_extensions('usability\\execute_time\\ExecuteTime');

In [None]:
main.main(mongo_ip = '192.168.45.44', path = 'entities.txt')