GSoC Milestone 1: OLS functionality complete#22
Closed
BBenNguyenn wants to merge 52 commits intoapache:developfrom
Closed
GSoC Milestone 1: OLS functionality complete#22BBenNguyenn wants to merge 52 commits intoapache:developfrom
BBenNguyenn wants to merge 52 commits intoapache:developfrom
Conversation
This is probably the wrong way to do it but I just copy and pasted distribution and used ctrl-F to change distribution to regression (except the README.md) This will probably have to be properly re-generated later by an admin I guess, but just to get started and see if my setup is working as expected....
This reverts commit c30941c.
…ayout StatisticsMatrix modified and to be added significantly from the example: https://ejml.org/wiki/index.php?title=Example_Customizing_SimpleMatrix Changes include: - bug fixes with SimpleBase's ops - conforming to abstract methods from SimpleBase Planned Changes to include: - customizing the mean and stdev to use Java 8 Streams? - benchmarking performance to see if streams is needed - creating new methods for mean and stdev to calculate for specific columns (ie specific estimators)
…BenNguyenn/commons-statistics into STATISTICS-8_Regression_Module
Includes: - support for double arrays as data import in conjunction with EJML DMatrixRMaj - starting class setup for OLS - first real unit test on loading in test data
This reverts commit c92f452.
…into apache-develop
- Completed almost all data loading functionality; ported split and single array into inputData object, including validateSampleData functionality and sample data changing logic, along with complete preliminary unit tests. (preliminary meaning I know perhaps they can be improved/expanded for currently unseen edge cases) - Started directly porting interface and abstract class of OLS as a starting point to finalize architecture designs. - Started first phase of directly porting core OLS functionality and converting from CM linear algebra to EJML (learning how both work in the process). Things such as QR and LU decomposition currently.
…BenNguyenn/commons-statistics into STATISTICS-8_Regression_Module
108 errors left , down from 400+
…n RegressionDataLoader -Removed parent package and moved contents under stored -Moved data input classes in data_input package RegressionDataLoader: -changed name "newXmatrix" to "createXmatrix" OLSRegression: -constructor accepts RegressionData interface instead of RegressionDataLoader obj
- Currently fixing OLS standard error calculation - Discovered issue with AbstractResiduals usage of betasMatrix
… or Bridge pattern
-OLS functionality complete Known code smell: math4.stat depedency Dependency usage: StatUtil, SumOfSquares, Variance, SecondMoment Explanation: This dependency is temporary until Statistics Descriptive completes array as input methods for above class functionalities which is said to be coming soon. Next Objectives in order: - Create RegressionResults interface for OLS and other regressions to output - Fix data loading strategy - Port GLS (expected to be much easier than OLS) - Start LogisticRegression design
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
OLS functionality complete:
-> should be easier now for GLS and logistic since experience was gained while porting OLS....
-> Note: preliminary usage with testSwissFertilityInterfaceFormat() in OLSRegressionTest, more to be added....
Known code smell: math4.stat depedency
Dependency usage: StatUtil, SumOfSquares, Variance, SecondMoment
Explanation:
This dependency is temporary until Statistics Descriptive completes array as input methods for above class functionalities which is said to be coming soon.
I have considered helping Virendra with it to prevent all old dependencies completely for this milestone but I don't think I should interfere whilst not having completed my component since I would have to learn how to use streams properly as well, and it does sound like Virendra will be done soon anyways.
Once Virendra is done, the switch will be swift, since only about 3 methods total use those functionalities.
Known code smell: Data loading is perhaps not ideal
Explanation:
The current RegressionDataLoader stores the input data within a RegressionRawData object and passes an interface with a getter.
This will be improved by using one of the suggested strategies in the ML.
Will get to this this week or maybe after port of GLS....
Next Objectives:
-> as suggested, a proper Factory pattern model
-> Summary statistics printout method?
PS:
Thank you for your review,
-Ben Nguyen