Skip to content

GSoC Milestone 1: OLS functionality complete#22

Closed
BBenNguyenn wants to merge 52 commits intoapache:developfrom
BBenNguyenn:gsoc-milestone-1
Closed

GSoC Milestone 1: OLS functionality complete#22
BBenNguyenn wants to merge 52 commits intoapache:developfrom
BBenNguyenn:gsoc-milestone-1

Conversation

@BBenNguyenn
Copy link
Contributor

OLS functionality complete:

  • converted all math4.linear dependencies to EJML (more complicated and time consuming than expected)
    -> should be easier now for GLS and logistic since experience was gained while porting OLS....
  • ensured full unit test coverage by porting all old ols tests and adding some new ones
  • created preliminary RegressionResults interface which essentially holds calculated results of a regression (to be accessed multiple times but calculated once)
    -> Note: preliminary usage with testSwissFertilityInterfaceFormat() in OLSRegressionTest, more to be added....

Known code smell: math4.stat depedency
Dependency usage: StatUtil, SumOfSquares, Variance, SecondMoment
Explanation:
This dependency is temporary until Statistics Descriptive completes array as input methods for above class functionalities which is said to be coming soon.
I have considered helping Virendra with it to prevent all old dependencies completely for this milestone but I don't think I should interfere whilst not having completed my component since I would have to learn how to use streams properly as well, and it does sound like Virendra will be done soon anyways.
Once Virendra is done, the switch will be swift, since only about 3 methods total use those functionalities.

Known code smell: Data loading is perhaps not ideal
Explanation:
The current RegressionDataLoader stores the input data within a RegressionRawData object and passes an interface with a getter.
This will be improved by using one of the suggested strategies in the ML.
Will get to this this week or maybe after port of GLS....

Next Objectives:

  • Improve data loading strategy
    -> as suggested, a proper Factory pattern model
  • Finalize RegressionResults interface for OLS and other regressions to output
    -> Summary statistics printout method?
  • Port GLS (expected to not take as long as OLS)
  • Start LogisticRegression implementation design

PS:

  • I've created a UML "UML_current.png" in the README directory if anyone thinks a visual would be helpful.

Thank you for your review,
-Ben Nguyen

BBenNguyenn and others added 30 commits May 7, 2019 10:38
This is probably the wrong way to do it but I just copy and pasted distribution and used ctrl-F to change distribution to regression (except the README.md)
This will probably have to be properly re-generated later by an admin I guess, but just to get started and see if my setup is working as expected....
…ayout

StatisticsMatrix modified and to be added significantly from the example:
https://ejml.org/wiki/index.php?title=Example_Customizing_SimpleMatrix

Changes include:
- bug fixes with SimpleBase's ops
- conforming to abstract methods from SimpleBase

Planned Changes to include:
- customizing the mean and stdev to use Java 8 Streams?
- benchmarking performance to see if streams is needed
- creating new methods for mean and stdev to calculate for specific columns (ie specific estimators)
Includes:
- support for double arrays as data import in conjunction with EJML DMatrixRMaj
- starting class setup for OLS
- first real unit test on loading in test data
This reverts commit c92f452.
- Completed almost all data loading functionality; ported split and single array into inputData object, including validateSampleData functionality and sample data changing logic, along with complete preliminary unit tests. (preliminary meaning I know perhaps they can be improved/expanded for currently unseen edge cases)

- Started directly porting interface and abstract class of OLS as a starting point to finalize architecture designs.

- Started first phase of directly porting core OLS functionality and converting from CM linear algebra to EJML (learning how both work in the process). Things such as QR and LU decomposition currently.
108 errors left , down from 400+
…n RegressionDataLoader

-Removed parent package and moved contents under stored
-Moved data input classes in data_input package

RegressionDataLoader:
  -changed name "newXmatrix" to "createXmatrix"

OLSRegression:
  -constructor accepts RegressionData interface instead of RegressionDataLoader obj
- Currently fixing OLS standard error calculation
- Discovered issue with AbstractResiduals usage of betasMatrix
-OLS functionality complete

Known code smell: math4.stat depedency
Dependency usage: StatUtil, SumOfSquares, Variance, SecondMoment
Explanation:
This dependency is temporary until Statistics Descriptive completes array as input methods for above class functionalities which is said to be coming soon.

Next Objectives in order:
- Create RegressionResults interface for OLS and other regressions to output
- Fix data loading strategy
- Port GLS (expected to be much easier than OLS)
- Start LogisticRegression design
@BBenNguyenn BBenNguyenn deleted the gsoc-milestone-1 branch July 22, 2019 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant