Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GIO] New Implementation of IOGEN #1672

Closed
wants to merge 86 commits into from
Closed

Conversation

fathollahzadeh
Copy link
Contributor

This PR is a new implementation of the GIO (generating readers for custom datasets). In the new implementation, removed all hard-coded implementations for flat datasets and replaced them with code gen. One of the primary goals of GIO is to support single and multi-row representations of tuples in source datasets. This PR is for supporting both of them.
An outline of the PR is:

It supports Matrix and Frame
It supports nested and flat datasets
It supports non-standard datasets, i.e., an incomplete number of cols in a CSV file

Copy link
Contributor

@Baunsgaard Baunsgaard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @fathollahzadeh

I have been through about half of the files (it is a very big PR)
Some comments, (mainly on syntax) i have not gone into technical depth of what the parts do.
If i find the time i will look through the rest.

@@ -535,6 +535,16 @@ public String toString() {
}
}

public enum OpOpGenerateReader {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this enum if it only has one enum value?

if(!_ioGenRead)
l = new Data(_op, null, inputLops, getName(), null, getDataType(), getValueType(), getFileFormat());
else
l = new DataIOGen(_op, null, inputLops, getName(), null, getDataType(), getValueType(), getIOGenFormat());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Transient Reads, should not be affected by DataIOGen.
Transient Reads, only read a matrix from a previous block of code, it should not be connected to IO.
(Correct me if i am wrong)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are identifying the format and generating corresponding readers to that and then reusing it multiple times, I think it can be Transient.

l.getOutputParameters().setDimensions(getDim1(), getDim2(), _inBlocksize, getNnz(), getUpdateType());
}
else
l = new DataIOGen(_op, null, inputLops, getName(), null, getDataType(), getValueType(), getIOGenFormat());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good if you set the output parameters(dimensions) if known. (like in the not IOGen above).

The previous (before your additions) is bad design setting the variables after the call to the method, but it could be you do it inside your constructor? if you do not i suggest to use the method on line 328.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our design for IOGEN is like this:
read(sample=x, sample_raw=$3, format=$4, data_type="matrix")
this read doesn't have any output like the current SystemsDS regular reading. We are saving the JAVA source code in "format=$4" and due the next readers call that format, it is compiled and used.

src/main/java/org/apache/sysds/lops/DataIOGen.java Outdated Show resolved Hide resolved
*
* @return operation type
*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting


private void runIdentification() {

/* Index properties:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is documentation, move it to the method definition, or( even better in my opinion) move it to the class definition.

return matrixReader;
}

@Override public String getReaderString() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

override should be on line above.

return frameReader;
}

@Override public String getReaderString() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format


for(int i = index; i < index + value.length(); i++)
if(reservedPositions.get(i)) {
flag = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only one place you set flag. does this not mean you can put your return logic in here. and then remove the flag variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
2 participants