# Introduction:

This material has been used in the past to teach colleagues in our group how to use persistable.

The `persistable` package provides users an interface to create a parameterized persistable payload that automatically persists and loads based on the parameters provided.  That means, the payload parameters and are used to define unique artifacts that can be reloaded so complex calculations never have to be repeated.

All you need to do is define:
1. How the payload is generated
2. The parameters
3. A working directory (can be local or cloud) for where artifacts should be persisted


For more details, read the [docs](https://github.com/DataReply/persistable).

# Examples
## Gaussian Distributed Dataset
### Define `Persistable` class
In this example, let's say that our persisted object is generated from a random number generator

In [1]:
from dataclasses import dataclass
from persistable import PersistableParams


@dataclass
class GaussianDistributedPointsParams(PersistableParams):
    """ Params for GaussianDistributedPoints.
    
    Parameters:
        n (int): number of gaussian distributed points.
        random_state (int): random_state for generator.
    """
    
    n: int
    random_state: int = 100


from persistable import Persistable
from numpy.typing import NDArray
import numpy as np
from typing import Any


class GaussianDistributedPoints(Persistable[NDArray[np.float64], GaussianDistributedPointsParams]):
    """ Persistable payload of Gaussian distributed points.
    
    """

    def _generate_payload(self, **untracked_payload_params: Any) -> NDArray[np.float64]:
        np.random.seed(self.params.random_state)
        return np.random.random(self.params.n)


### Instantiate and Generate+Persist Payload

In [2]:
from pathlib import Path

data_dir = Path('.').absolute() / "example-data"
params = GaussianDistributedPointsParams(n=100, random_state=10)

p_gaussian_distributed_points = GaussianDistributedPoints(
    data_dir=data_dir,
    params=params,
    verbose=True
)
p_gaussian_distributed_points.generate(persist=True)
p_gaussian_distributed_points.payload[0]

2022-07-13 19:19:50,208 - gaussian_distributed_points - __init__ - INFO - ---- NEW PERSISTABLE SESSION ---- (/Users/aloosley/Alex/Repos/persistable/examples/example-data)
2022-07-13 19:19:50,210 - gaussian_distributed_points - __init__ - INFO - Payload named gaussian_distributed_points; Parameters set to GaussianDistributedPointsParams(n=100, random_state=10)
2022-07-13 19:19:50,212 - gaussian_distributed_points - generate - INFO - Now generating gaussian_distributed_points payload...


0.771320643266746

### Check payload was persisted

In [3]:
list(p_gaussian_distributed_points.persist_filepath.parent.glob("*"))

[PosixPath('/Users/aloosley/Alex/Repos/persistable/examples/example-data/252716002f49672d2d04557fa94c2804.persistable'),
 PosixPath('/Users/aloosley/Alex/Repos/persistable/examples/example-data/7b82e45b8774903fdd7d63c36e8b67c9.params.json'),
 PosixPath('/Users/aloosley/Alex/Repos/persistable/examples/example-data/gauss_dist.log'),
 PosixPath('/Users/aloosley/Alex/Repos/persistable/examples/example-data/7b82e45b8774903fdd7d63c36e8b67c9.persistable'),
 PosixPath('/Users/aloosley/Alex/Repos/persistable/examples/example-data/252716002f49672d2d04557fa94c2804.params.json'),
 PosixPath('/Users/aloosley/Alex/Repos/persistable/examples/example-data/252716002f49672d2d04557fa94c2804.pkl'),
 PosixPath('/Users/aloosley/Alex/Repos/persistable/examples/example-data/gaussian_distributed_points.log')]

### Load payload

In [4]:
p_gaussian_distributed_points_2 = GaussianDistributedPoints(
    data_dir=data_dir,
    params=params,
    verbose=True
)
p_gaussian_distributed_points_2.load()
p_gaussian_distributed_points_2.payload[:3]

2022-07-13 19:19:51,849 - gaussian_distributed_points - __init__ - INFO - ---- NEW PERSISTABLE SESSION ---- (/Users/aloosley/Alex/Repos/persistable/examples/example-data)
2022-07-13 19:19:51,851 - gaussian_distributed_points - __init__ - INFO - Payload named gaussian_distributed_points; Parameters set to GaussianDistributedPointsParams(n=100, random_state=10)
2022-07-13 19:19:51,855 - gaussian_distributed_points - load - INFO - Now loading gaussian_distributed_points payload...


array([0.77132064, 0.02075195, 0.63364823])

## Outlier Detection Model

### Define `Persistable` class

In [19]:
@dataclass
class OutlierEstimatorParams(PersistableParams):
    """ Params for OutlierEstimator.
    
    Parameters:
        z_threshold (float): number of standard deviations from the mean for which to consider a point an outlier.
    """
    
    z_threshold: int

        

from typing import Optional, Any, List


class OutlierEstimator:
    def __init__(self, z_threshold: float) -> None:
        self.z_threshold = z_threshold
        
        self._mean = Optional[float]
        self._stdev = Optional[float]
    
    def fit(self, data: NDArray[np.float64]) -> None:
        self._mean = np.mean(data)
        self._stdev = np.std(data)
        
    def transform(self, data: NDArray[np.float64]) -> NDArray[np.float64]:
        return np.abs((data - self._mean) / self._stdev) > self.z_threshold
        
        
        
class OutlierEstimatorPersistable(Persistable[OutlierEstimator, OutlierEstimatorParams]):
    """ Persistable payload of Gaussian distributed points.
    
    """
    def __init__(
        self,
        data_dir: Path,
        params: OutlierEstimatorParams,
        *,
        fit_data_persistable: GaussianDistributedPoints,
    ) -> None:
        super().__init__(data_dir, params, verbose=True)
        self.fit_data_persistable = fit_data_persistable
        
    @property
    def from_persistble_objs(self) -> List[Persistable[Any, Any]]:
        return [self.fit_data_persistable]

    def _generate_payload(self, **untracked_payload_params: Any) -> OutlierEstimator:
        outlier_estimator = OutlierEstimator(z_threshold = self.params.z_threshold)
        outlier_estimator.fit(self.fit_data_persistable.payload)

In [20]:
outlier_estimator_params = OutlierEstimatorParams(z_threshold=1)
outlier_estimator_p = OutlierEstimatorPersistable(
    data_dir=data_dir, 
    params=outlier_estimator_params, 
    fit_data_persistable=p_gaussian_distributed_points,
)
outlier_estimator_p.payload

2022-07-13 19:22:33,479 - outlier_estimator_persistable - __init__ - INFO - ---- NEW PERSISTABLE SESSION ---- (/Users/aloosley/Alex/Repos/persistable/examples/example-data)
2022-07-13 19:22:33,485 - outlier_estimator_persistable - __init__ - INFO - Payload named outlier_estimator_persistable; Parameters set to OutlierEstimatorParams(z_threshold=1)
2022-07-13 19:22:33,488 - outlier_estimator_persistable - load - INFO - Now loading outlier_estimator_persistable payload...
2022-07-13 19:22:33,489 - outlier_estimator_persistable - load_generate - INFO - Loading payload failed, continuing to generate payload...
2022-07-13 19:22:33,491 - outlier_estimator_persistable - generate - INFO - Now generating outlier_estimator_persistable payload...


ValueError: Payload has not been generated.