Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functional API #91

Open
sarahmish opened this issue Apr 20, 2021 · 0 comments · May be fixed by #92
Open

Functional API #91

sarahmish opened this issue Apr 20, 2021 · 0 comments · May be fixed by #92
Labels
enhancement New feature or request new feature New feature

Comments

@sarahmish
Copy link
Collaborator

Since we have the Cardea Class, it would also be beneficial to add a layer of functional interfaces that allows using Cardea with as few steps as possible. The design of the functional API would be problem centric as in, there will be a function for each given problem.

The functional api hides away all the nitty gritty details of composing a cardea pipeline, it is designed to return to the user a fitted pipeline on a given raw dataset. The user can then use the cardea instance to:

  1. make predictions on a new source data (not necessarily future).
  2. make predictions on future data.
  3. save/load the cardea instance.

Design

def model_pred_prob(data_path: str, 
                    fhir: bool = True,
                    pipeline: Union[str, dict, MLPipeline] = DEFAULT_PIPELINE, 
                    hyperparameters: Union[str, pd.DataFrame] = None, 
                    max_depth: int = 1,
                    max_features: int = -1, 
                    n_jobs: int = 1, 
                    test_size: float = 0.2,
                    shuffle: bool = True, 
                    tune: bool = False, 
                    max_evals: int = 10,
                    scoring: str = None, 
                    evaluate: bool = False,
                    metrics: List[str] = DEFAULT_METRICS, 
                    return_lt: bool = False,
                    return_fm: bool = False, 
                    return_pred: bool = False, 
                    verbose: bool = False,
                    save_path: str = None) -> Cardea:
    """Create and train a cardea instance on a specific prediction problem.

    Return a cardea class object that has been trained on the given
    dataset. The function loads the data, extracts label times, generates
    features, then trains the pipeline all in one command.

    Args:
        data_path (str):
            A directory of all .csv files that should be loaded.
        fhir (bool):
            An indicator whether FHIR or MIMIC schema is used.
        pipeline (str or MLPipeline or dict):
            Pipeline to use. It can be passed as:
                * An ``str`` with a path to a JSON file.
                * An ``str`` with the name of a registered pipeline.
                * An ``str`` with the path to a pickle file.
                * An ``MLPipeline`` instance.
                * A ``dict`` with an ``MLPipeline`` specification.
        hyperparameters (str or dict):
            Hyperparameters to set to the pipeline. It can be passed as
            a hyperparameters ``dict`` in the ``mlblocks`` format or as
            a path to the corresponding JSON file. Defaults to ``None``.
        max_depth (int):
            Maximum allowed depth of features.
        max_features (int):
            Cap to the number of generated features. If -1, no limit.
        n_jobs (int):
            Number of parallel processes to use when calculating the
            feature matrix.
        test_size (float):
            The proportion of the dataset to include in the test dataset.
        shuffle (bool):
            Whether or not to shuffle the data before splitting.
        tune (bool):
            Whether to optimize hyper-parameters of the pipelines.
        max_evals (int):
            Maximum number of hyper-parameter optimization iterations.
        scoring (str):
            The name of the scoring function used in the hyper-parameter
            optimization.
        evaluate (bool):
            Whether to evaluate the performance of the pipeline. If True,
            we evaluate the performance on the test data, if not given,
            evaluate on train data.
        metrics (list):
            A list of scoring function names. The scoring functions should
            be consistent with the problem type.
        return_lt (bool):
            Whether to return ``label_times``.
        return_fm (bool):
            Whether to return the calculated feature matrix.
        return_pred (bool):
            Whether to return the predictions of the pipeline.
        verbose (bool):
            Whether to show information during processing.
        save_path (str):
            Path to the file where the fitted pipeline will be stored
            using ``pickle``.

        Returns:
            Cardea, dict:
                * A fitted Cardea instance.
                * Intermediary outputs when indicated.
        """

      pass
@sarahmish sarahmish added enhancement New feature or request new feature New feature labels Apr 20, 2021
@sarahmish sarahmish linked a pull request Apr 20, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request new feature New feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant