-
Notifications
You must be signed in to change notification settings - Fork 283
/
TODO.txt
executable file
·78 lines (56 loc) · 2.89 KB
/
TODO.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
CODE IMPROVEMENTS
=================
- Rewrite the FeatureSelection package:
- Improve the API of Feature Selection and how we handle different data types.
- Refactor AbstractCategoricalFeatureSelector and simplify the method calls.
- Refactor PCA to use Vectors and Matrices instead of arrays.
- In AbstractCategoricalFeatureSelector and TFIDF we should keep track of the kept columns not the removedColumns.
- Consider dropping all the common.dataobjects and use their internalData directly instead.
- Refactor the statistics package and replace all the static methods with proper inheritance.
- Write generic optimizers instead of having optimization methods in the algorithms. Add the optimizers and regularization packages under mathematics.
- Consider moving storages in a separate module that is inherited by common.
- Consider moving all tests in a separate module.
- Run the tests with different configurations (one for each storage engine). Create a profile that runs them all & connect it with CI.
NEW FEATURES
============
- Create the following Numerical Scalers: PercentileScaler.
- Create a storage engine for MapDB 3 once caching & asynchronous writing is supported. Remove the HOTFIX for MapDB bug #664.
- Create a storage engine for BerkeleyDB.
- Add the ability to call Machine Learning algorithms from command line or Python:
- https://pypi.python.org/pypi/javabridge
- https://github.com/LeeKamentsky/python-javabridge/
- https://github.com/fracpete/python-weka-wrapper
DOCUMENTATION
=============
- Improve the code documentation.
- Write How-to blog posts on building Text Classification models.
- Update the website and link directly to the latest and previous documentations.
NEW ALGORITHMS
==============
- Speed up LDA: http://www.cs.ucsb.edu/~mingjia/cs240/doc/273811.pdf
- Factorization Machines: http://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf
- Develop the FunkSVD and PLSI as probabilistic version of SVD.
- Collaborative Filtering for Implicit Feedback Datasets: http://yifanhu.net/PUB/cf.pdf
- Write a Mixture of Gaussians clustering method.
- Include an anomaly detection algorithm.
- Provide a wrapper for DBSCANClusterer and NeuralNet implementations of Maths.
- Add the ability to search through the configuration space and find the best performing algorithmic configuration.
TO CHECK OUT
============
Linear Algebra
--------------
- JBLAS - Linear Algebra for Java:
https://github.com/mikiobraun/jblas
http://jblas.org/
Huge Collection libs, DBs and Storage
-------------------------------------
- Vanilla-java - HugeCollections:
https://code.google.com/p/vanilla-java/wiki/HugeCollections
- Fastutil:
http://fastutil.di.unimi.it/#install
- Joafip:
http://joafip.sourceforge.net/javadoc/net/sf/joafip/java/util/PHashMap.html
- Chronicle Map:
https://github.com/OpenHFT/Chronicle-Map/
- H2 Database:
http://www.h2database.com/html/main.html