Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a similar interface to sklearn's make_pipeline #17

Closed
bangxiangyong opened this issue Nov 14, 2019 · 3 comments
Closed

Implement a similar interface to sklearn's make_pipeline #17

bangxiangyong opened this issue Nov 14, 2019 · 3 comments
Assignees

Comments

@bangxiangyong
Copy link
Member

bangxiangyong commented Nov 14, 2019

Been having this in mind of viewing data analytics/ML as a workflow (which current implementation for ZEMA EMC with Haris Lulic is kind-of adopting already), that is a sequence of functions applied one after the other.

sklearn reference

Example:
group_of_agents = make_pipeline(FFT(), PCA(pca_parameters), LDA(lda_parameters))

  • The arguments are any number of objects which implements the function fit and transform similar to how sklearn structures its classes
  • The line above should instantiate 3 agents connected from left to right (FFT -> PCA -> LDA)
  • Further, for this to work, a generic ML agent class which accepts these data analytics methods should be developed

The advantage with the agent network architecture:

  1. Pipelines which use similar components/agents will only need to process once if the agent connections are made dynamic. The code below should create 4 agents (FFT -> PCA -> LDA & ANN) instead of 6 agents (sum of number of agents in both pipelines)
    Example:
group_of_agents_1 = make_pipeline(FFT(), PCA(pca_parameters), LDA()) 
group_of_agents_2 = make_pipeline(FFT(), PCA(pca_parameters), ANN()) 
  1. Entire data processing pipelines can be visualized via dashboard immediately
  2. Compatible immediately with sklearn and more fluid integration with data science
  3. Promotes incremental development for mathematical components. Such as when investigating the effect of added noise/bias, we can have two pipelines to be compared:
group_of_agents_1 = make_pipeline(AddNoise(), FFT(), LDA()) 
group_of_agents_2 = make_pipeline(FFT(),  LDA()) 
@bangxiangyong
Copy link
Member Author

More information here : (https://www.slideshare.net/yongbangxiang/use-cases-agentmet4fof)

@bangxiangyong
Copy link
Member Author

bangxiangyong commented Nov 26, 2019

This is how the interface should look like:

This is to specify the pipelines

#option 1 : group of pipelines with 3 levels
ML_pipelines_A = make_agent_pipelines([PCA(), KNN()],
                                [StandardScaler(),RobustScaler()],
                                [LinearRegression(),ANN()], parameters)

#option 2 : multi pipelines of single level
ML_pipelines_B = make_agent_pipelines([CNN(),BCNN(),ANN()], parameters)

And their candidate parameters to be grid-searched

#example of parameters for pipelines
parameters = ([{"n_components":[1,2,3,4,5]}, {"n":[4]}],
              [],
              [0,{"dimensions":[120,233,345,666]}])

Then connection to data, evaluator and monitor looks like this:

#connect data
data_agent.connect(ML_pipelines)

#connect evaluator such as F1 SCORE, PICP, etc
ML_pipelines.connect(evaluator)

#collect experiment statistics with Monitor Agent
evaluator.connect(monitor_agent)

Lastly, to execute and log them as "Experiments" #25

#loop through the parameters and log on MLFLOW
aggregated_results = run_experiment([data_agent_1, data_agent_2], [ML_pipelines_A,ML_pipelines_B,ML_pipeline_C])

@bangxiangyong bangxiangyong moved this from In progress to Review in progress in agentMET4FOF's progress Dec 9, 2019
@bangxiangyong bangxiangyong moved this from Review in progress to Done in agentMET4FOF's progress Dec 11, 2019
@BjoernLudwigPTB
Copy link
Member

That appears to be resolved by #33 long ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

2 participants