Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension #47

Merged
merged 1 commit into from
Nov 4, 2018
Merged

Extension #47

merged 1 commit into from
Nov 4, 2018

Conversation

HongyuJerryWang
Copy link
Contributor

Hi Jed. It's my first time using this...So here's a list of changes:

  1. Output as a vector: The extended API defines the output as a list of values, and can be applied to problems with multiple outputs, including classification problems.

For example,
a regression problem has an output of one value, [regression_result],
a full adder problem has an output of two values, [carry_over_bit, sum_bit].

  1. Ability to deal with categorical data: Columns in the dataset can be specified to be categorical in the configuration file. The API can vectorize categorical data into one-hot-encoding, and the C program generated at the end of LGP training can vectorize the input and devectorize the output in the same way too.

For example, in the Iris Classification problem, the species
Iris-setosa is encoded to be [1.0, 0.0, 0.0],
Iris-versicolor is encoded to be [0.0, 1.0, 0.0],
Iris-virginica is encoded to be [0.0, 0.0, 1.0].

  1. Extended translation: Now all the operations included in the API can be translated into runnable C, and the user has full control over how a custom operation should be translated into C.

  2. Added operations: An Xor bitwise operator and an Identity operator (for moving data) are now included in the API.

  3. Fixes: Logical fixes in IslandMigration evolutionary algorithm, Or bitwise operator, etc.

I also removed the example folders to make compiling the changes easier, if you want the examples back, please let me know.

Thank you.

@JedS6391
Copy link
Owner

JedS6391 commented Nov 2, 2018

Hi Hongyu,

As this looks like some of these will be breaking changes, I will spend some time investigating an approach that will give the smallest API disruption.

My current approach would be something like:

  • Modify the Dataset constructor to accept a List<Output<TData>>, where Output is an interface with different implementations (e.g. single, multiple)
  • Modify the DatasetLoader implementation to reflect changes to the Dataset API when parsing
  • Either:
    a) Extend BaseProgram to handle multiple program outputs by default (maybe based on a configurable flag)
    b) Provide an entirely new Program implementation that can handle multiple program outputs and the ProgramGenerator will determine what type of programs are being used
  • Provide a MultipleOutputFitnessContext which can be used for programs of the type mentioned above
  • Provide a set of fitness functions for multiple program outputs
  • Possibly other things as I come across them

With this approach, we basically have two types of programs in the system -- single output and multiple output, and the implementations of the different components used for each will differ slightly. That way we maintain flexibility and keep things simple (as opposed to having a multiple output program with one output).

I will try to make the necessary changes this weekend and push them to your remote (and updating this PR). It would be great if you could provide an example data set with multiple outputs that I can use to test that everything works as is required :)

@HongyuJerryWang
Copy link
Contributor Author

Hi Jed,

Thank you. You can find some datasets in my tutorials, specifically tutorial 5 has a full adder dataset with 3 inputs and 2 outputs. But if you don't mind me mentioning, I have changed things like the fitness functions to be compatible with multiple outputs (whether they are the most reasonable implementation for multiple outputs is debatable, but they perform in the same way for single outputs as the previous version). When I changed the output to be multiple, I felt that single is just a special case of multiple, and turns out the API can just deal with a single output all the same, and the only slight change would probably be a few elements in the configuration json file. Please let me know what you think.

Cheers
Hongyu

@JedS6391 JedS6391 merged commit b0c0991 into JedS6391:develop Nov 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants