Skip to content

IGNITE-9283#6735

Open
ilantukh wants to merge 6 commits intoapache:masterfrom
ilantukh:IGNITE-9283
Open

IGNITE-9283#6735
ilantukh wants to merge 6 commits intoapache:masterfrom
ilantukh:IGNITE-9283

Conversation

@ilantukh
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown
Contributor

@avplatonov avplatonov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilantukh Thanks for your contribution!
Please review my comments.

private final Preprocessor<K, V> basePreprocessor;

/** DCT type, default is 2 */
private final int type;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you use int as old-c-style-enum? I think we should replace int to enum.


IgniteCache<Integer, Vector> persons = ignite.createCache(cacheConfiguration);

persons.put(1, new DenseVector(new Serializable[]{"Mike", 10, 20, 42}));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such example looks a little weird because you apply DCT to irrelevant data (or data of unknown nature). I think in this example will be great to use vectors with time-series. Maybe it can be data from fake-sensors and we should classify who is broken between them by values from sensors. In this case, a vector will be an array of doubles with fixed size (sensor values per hour) and a label will be 0 or 1 where 0 denotes "good sensor" and 1 denotes "broken sensor".

* @param <K> Type of a key in {@code upstream} data.
* @param <V> Type of a value in {@code upstream} data.
*/
public class DiscreteCosinePreprocessor<K, V> implements Preprocessor<K, V> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of DCT processor should be applied to sequential data and vectors with features data could contain data of different nature I think it will be great if you provide an ability to pass filter of features to this preprocessor. In this case preprocessor will process just these filtered features and replace old values by new preprocessed values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants