New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1075] Minimal csv importer #1099

Merged
merged 17 commits into from Jan 9, 2019

Conversation

Projects
None yet
4 participants
@cmoesler
Copy link
Contributor

cmoesler commented Nov 26, 2018

Importer that create a DataSet of ImportVertex for each row of a external csv file that is not already in EPGM format. Read the first line of the file as the header and set all entries of it as properties. Create a unique ID and set the default vertex label for each vertex.

@merando

This comment has been minimized.

Copy link
Contributor

merando commented Nov 26, 2018

Please also fix the travis issues. You even can check these at your computer by using mvn -DskipTests clean verify in the shell.

cmoesler added some commits Nov 27, 2018

@ChrizZz110
Copy link
Contributor

ChrizZz110 left a comment

There are some changes. Please update branch with develop. Sorry but let me guess ... you are using Eclipse? :)

cmoesler added some commits Dec 4, 2018

@ChrizZz110
Copy link
Contributor

ChrizZz110 left a comment

Please add a test:

  1. to check the reoccurring header flag
  2. to check empty lines in the csv
  3. to check if no property is given, no property will be created
Show resolved Hide resolved ...va/org/gradoop/dataintegration/importer/impl/csv/MinimalCSVImporter.java Outdated
* @param checkReoccurringHeader if each row of the file should be checked for reoccurring of
* the column property names.
* @return the imported vertices
* @throws IOException if an error occurred while open the stream

This comment has been minimized.

@ChrizZz110

ChrizZz110 Dec 4, 2018

Contributor

Indentation

Show resolved Hide resolved ...va/org/gradoop/dataintegration/importer/impl/csv/MinimalCSVImporter.java Outdated
Show resolved Hide resolved ...va/org/gradoop/dataintegration/importer/impl/csv/MinimalCSVImporter.java Outdated
@merando
Copy link
Contributor

merando left a comment

There is one thing fundamentally missing, the result of an Import process should always be a LogicalGraph.
Values should also be interpreted as strings. Thanks to @ChrizZz110 for pointing this out.

You can use the LogicalGraph factory provided in the GradoopFlinkConfig to create one. Here a simple example out of the simple JsonImport which by now is not part of gradoop but will be:

DataSet<Vertex> jsonVertices = config.getExecutionEnvironment()
        .readTextFile(pathToJsonFile)
        .map(new SimpleJsonToVertex(config.getVertexFactory()));

    return config.getLogicalGraphFactory().fromDataSets(jsonVertices);

The 'magic' is done in the map function. Please change your code accordingly. Reach out if anything is unclear. :)

@ChrizZz110

This comment has been minimized.

Copy link
Contributor

ChrizZz110 commented Dec 6, 2018

@cmoesler -> sorry for the discussion here. Next time, we clarify the concept in advance, so that no misunderstandings arise.

  1. Creating ImportVertices only makes sence, if the GraphDataSource is used. If this source is not used (like your case) use Vertex- and EdgeFactories from the config inside your map function to create a DataSet<Vertex>. This can be used in the LogicalGraphFactory to create a Logical graph.
    You are doing the following atm: CSV -> ImportVertex -> Vertex -> Graph where the ImportVertex step is not necessary.

Lets talk Tuesday about the main concept.

@ChrizZz110
Copy link
Contributor

ChrizZz110 left a comment

Looks good for me now. @galpha or @cmoesler - please test the implementation with a quiet big csv file if everything works properly

@ChrizZz110

This comment has been minimized.

Copy link
Contributor

ChrizZz110 commented Jan 8, 2019

@merando Please approve or dismiss your review

@merando

merando approved these changes Jan 8, 2019

@merando

This comment has been minimized.

Copy link
Contributor

merando commented Jan 8, 2019

lgtm

@ChrizZz110 ChrizZz110 self-assigned this Jan 8, 2019

@galpha

galpha approved these changes Jan 9, 2019

Copy link
Contributor

galpha left a comment

LGTM

@galpha galpha merged commit c986299 into dbs-leipzig:develop Jan 9, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment