## SpockFlow Scorecard Component

The SpockFlow Scorecard component facilitates the creation of scoring rules for both numerical and categorical data, allowing users to define criteria that assign scores and descriptive labels based on specified conditions. This component is particularly useful for evaluating data against predefined thresholds or patterns, providing insights through structured outputs.

### Usage

To begin using the Scorecard component, import the necessary packages and instantiate the `ScoreCard` object:

In [1]:
import pandas as pd
from spockflow.components import scorecard

var_1 = "var_1"
var_2 = "var_2"

sc = scorecard.ScoreCard(
    bin_prefix='SCORE_BIN_',
    score_prefix='SCORE_VALUE_',
    description_prefix='SCORE_DESC_'
)


### Adding Criteria

Criteria can be added to the `ScoreCard` object using the `add_criteria` method. Each criterion defines how to evaluate a specific variable and assign scores based on conditions. There are two types of criteria: numerical and categorical.

#### Numerical Criteria

Numerical criteria evaluate numeric variables and can define score ranges, discrete values, and default behaviors:

In [2]:
sc.add_criteria(
    scorecard.ScoreCriteria(var_1, "numerical")
    .add_range_score(0, 1, 10, "First bound var_1")
    .add_range_score(1, 2, 30, "2nd bound var_1")
    .add_discrete_score([None], 73, "missing")
    .set_other_score(73, "default")
)

ScoreCardModel(bin_prefix='SCORE_BIN_', score_prefix='SCORE_VALUE_', description_prefix='SCORE_DESC_', variable_params=[ScoreCriteriaNumerical(variable='var_1', other_score=DefaultScorePattern(group_id=3, score=73.0, description='default'), discrete_scores=[NumericalDiscreteScorePattern(values=[nan], group_id=2, score=73.0, description='missing')], range_scores=[RangeScorePattern(range=MatchRange(start=0.0, end=1.0), group_id=0, score=10.0, description='First bound var_1'), RangeScorePattern(range=MatchRange(start=1.0, end=2.0), group_id=1, score=30.0, description='2nd bound var_1')], type='numerical', included_bounds=(<Bounds.LOWER: 0>,))], version='2.2.0', score_scaling_params=None)

- **Range Scores**: Assign scores based on numeric ranges. For example, `add_range_score(0, 1, 10, "First bound var_1")` assigns a score of 10 to `SCORE_VALUE_var_1` when `0 <= var_1 < 1`.
- **Discrete Scores**: Assign scores for specific values. `add_discrete_score([None], 73, "missing")` assigns a score of 73 to `SCORE_VALUE_var_1` when `var_1` is `None`.
- **Default Score**: Set a default score and description for unmatched values. `set_other_score(73, "default")` assigns a score of 73 to `SCORE_VALUE_var_1` for all other cases.

#### Categorical Criteria

Categorical criteria evaluate text variables and can define exact matches or patterns (regex):


In [3]:
sc.add_criteria(
    scorecard.ScoreCriteria(var_2, "categorical", default_behavior="regex")
    .add_discrete_score(['a', 'b', 'c'], 10, "First pattern var_2")
    .add_discrete_score(['[b-z]'], 20, "Second pattern var_2")
)

ScoreCardModel(bin_prefix='SCORE_BIN_', score_prefix='SCORE_VALUE_', description_prefix='SCORE_DESC_', variable_params=[ScoreCriteriaNumerical(variable='var_1', other_score=DefaultScorePattern(group_id=3, score=73.0, description='default'), discrete_scores=[NumericalDiscreteScorePattern(values=[nan], group_id=2, score=73.0, description='missing')], range_scores=[RangeScorePattern(range=MatchRange(start=0.0, end=1.0), group_id=0, score=10.0, description='First bound var_1'), RangeScorePattern(range=MatchRange(start=1.0, end=2.0), group_id=1, score=30.0, description='2nd bound var_1')], type='numerical', included_bounds=(<Bounds.LOWER: 0>,)), ScoreCriteriaCategorical(variable='var_2', other_score=None, discrete_scores=[CategoricalDiscreteScorePattern(values=['a', 'b', 'c'], group_id=0, score=10.0, description='First pattern var_2'), CategoricalDiscreteScorePattern(values=['[b-z]'], group_id=1, score=20.0, description='Second pattern var_2')], type='categorical', default_behavior='regex')


- **Exact Matches**: Assign scores based on exact text matches. `add_discrete_score(['a', 'b', 'c'], 10, "First pattern var_2")` assigns a score of 10 to `SCORE_VALUE_var_2` when `var_2` is 'a', 'b', or 'c'.
- **Regex Matches**: Evaluate variables using regex patterns. `add_discrete_score(['[b-z]'], 20, "Second pattern var_2")` assigns a score of 20 to `SCORE_VALUE_var_2` when `var_2` matches the regex pattern `[b-z]`.

### Automatic Binning

By default, the Scorecard component automatically determines bin categories (`SCORE_BIN_var_1`, `SCORE_BIN_var_2`) based on the order in which criteria are added. The first criterion added determines `SCORE_BIN_var_1`, the second criterion determines `SCORE_BIN_var_2`, and so forth. These bin categories categorize input values based on the criteria matched.

#### Overriding Bins

Bins can be overridden if needed. This allows users to customize bin categories or reorder them for specific requirements. The `override_idx` parameter in the `add_range_score`, `add_discrete_score`, and `set_default` methods allows specifying the index of the bin to override:

```python
sc.add_criteria(
    scorecard.ScoreCriteria(var_1, "numerical")
    .add_range_score(0, 1, 10, "First bound var_1", override_idx=5)
)
```


### Execution and Results

To execute the Scorecard on a dataset, use the `execute` method:

In [4]:
test_data = pd.DataFrame({
    "var_1": [   0,   1,   2, None,   0,   1,   2, None],
    "var_2": [ 'a', 'b', 'z',  'a', 'a', 'b', 'z',  '9'],
})
test_data

Unnamed: 0,var_1,var_2
0,0.0,a
1,1.0,b
2,2.0,z
3,,a
4,0.0,a
5,1.0,b
6,2.0,z
7,,9


In [5]:
result_df = sc.execute(inputs=test_data)
result_df

Unnamed: 0,SCORE_BIN_var_1,SCORE_VALUE_var_1,SCORE_DESC_var_1,SCORE_BIN_var_2,SCORE_VALUE_var_2,SCORE_DESC_var_2,SCORE_VALUE_SUM
0,0,10.0,First bound var_1,0,10.0,First pattern var_2,20.0
1,1,30.0,2nd bound var_1,1,20.0,Second pattern var_2,50.0
2,3,73.0,default,1,20.0,Second pattern var_2,93.0
3,2,73.0,missing,0,10.0,First pattern var_2,83.0
4,0,10.0,First bound var_1,0,10.0,First pattern var_2,20.0
5,1,30.0,2nd bound var_1,1,20.0,Second pattern var_2,50.0
6,3,73.0,default,1,20.0,Second pattern var_2,93.0
7,2,73.0,missing,-1,-1.0,,72.0


The resulting DataFrame (`result_df`) will contain columns for each score, bin, and description based on the evaluated criteria.

### Saving and Loading Configurations

To save the configuration of the Scorecard for future use or deployment, use a configuration manager such as `YamlConfigManager`:


In [6]:
from spockflow.inference.config.loader.yamlmanager import YamlConfigManager

conf_manager = YamlConfigManager()
conf_manager.save_to_config(
    model_name="demo_spock_model",
    model_version="1.0.0",
    namespace="scorecard_config",
    config=sc.model_dump(mode='json')
)

### Loading and Using Configurations

Load a saved Scorecard configuration from a YAML file and instantiate the `ScoreCard` object using `from_config`:

In [7]:
config = conf_manager.get_config("demo_spock_model", "1.0.0")['scorecard_config']
sc_loaded = scorecard.ScoreCard.from_config("").load(config)

# Retrieve view model and display widget
vm = sc_loaded.get_view_model()
widget = vm.get_widget()

VBox(children=(GridspecLayout(children=(Text(value='SCORE_BIN_', description='Bin Prefix:', layout=Layout(grid…