Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python API #13

Closed
BauerLab opened this issue Aug 10, 2017 · 2 comments
Closed

Python API #13

BauerLab opened this issue Aug 10, 2017 · 2 comments
Assignees

Comments

@BauerLab
Copy link
Contributor

A python API needs to be added

@piotrszul
Copy link
Collaborator

piotrszul commented Nov 21, 2017

Python API for generic variant-spark interface.

Should be implemented in variants module in python directory.

The main access point should be VariantsContext which can be used to load features and labels.
The API should be based on the 'non-depricated' scala api (from au.csiro.variantspark.api)

The general use pattern should he like this

from variants import VariantsContext
vc = VariantsContext(spark)
features = vc.import_vcf(input_path = "path-to-a-vcf-file")
label = vs.load_label(input_path = "path-to-a-csv_file", col_name = "column-name)
imp_analysis = features.importance_analysis(label, n_trees = 1000, m_try_fraction = 0.1, ...)
oob_error = imp_analysis.oob_error
imp_vars_df = imp_analysis.important_variables()
# imp_vars_df is a spark.sql.DataFrame

Some guidance on the implementation can be found in the variants.hail module.

@piotrszul piotrszul self-assigned this Dec 23, 2017
@piotrszul
Copy link
Collaborator

API implemented.
Some technical debts reported as new issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants