Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java Heap Problem with Tap/Table with over 40 million entries #21

Open
fbrubacher opened this issue Aug 12, 2014 · 9 comments
Open

Java Heap Problem with Tap/Table with over 40 million entries #21

fbrubacher opened this issue Aug 12, 2014 · 9 comments

Comments

@fbrubacher
Copy link

I am using Cascading - Postgres with a table with 40 million entries and I am getting a Java Heap problem.... Any suggestions on how to solve, hack this ?
Best

@fs111
Copy link
Contributor

fs111 commented Aug 12, 2014

Can you post a stacktrace? Is this when reading from the table or when writing to it? Which version of Cascading and Cascading-jdbc are you using?

@locked-fg
Copy link

We had something similar here with MySQL - the trick was to get the statement into Streaming mode (which might be DB-specific). I'm using v2.5.1, so I had to replace DBInputFormat.java and had to uncomment
statement.setFetchSize(Integer.MIN_VALUE);

When I've some time left I'm gonna update to the latest version and maybe patch it.

@fs111
Copy link
Contributor

fs111 commented Aug 28, 2014

@locked-fg those kind of patches are always welcome. If you need database specific changes you can subclass subclass DBInputFormat in cascading-jdbc-mysql, like we do for oracle.

Before we can use your patch, you will have to sign the contributors agreement: http://files.concurrentinc.com/agreements/Concurrent_Contributor_Agreement.doc

@locked-fg
Copy link

I finally forked & checked out the project, made the according changes and did a "gradle build" in the cascading-jdbc-mysql folder (succeeded).
Should the rest (deciding which JDBC-Factory should be used) be done by "magic" or is there a way I have to force/hint scalding/cascading to use the MySQL-Schemes? Because currently it seems that my changes are not applied to the statement ...

@fs111
Copy link
Contributor

fs111 commented Sep 3, 2014

There is currently no magic, you have to instantiate the MySQLScheme yourself, if you want to use it. We only do a bit of magic in the JDBCFactory, that we use for lingual.

@locked-fg
Copy link

I implemented the change and tested it in Scalding using:

abstract class StreamingJDBCSource extends JDBCSource {
  override val maxConcurrentReads = 1

  override protected def getJDBCScheme = new MySqlScheme(
    classOf[MySqlDBInputFormat[DBWritable]],  // inputFormatClass
    columnNames.toArray,
    null,  // orderBy
    filterCondition.getOrElse(null),
    updateBy.toArray,
    false // replace on Insert
  )
}

The contributors agreement is sent to concurrent already.

@fs111
Copy link
Contributor

fs111 commented Sep 8, 2014

@fbrubacher Can you please try again with cascading-jdbc-mysql version 2.5.5-wip-93? This has the fix for mysql streaming from @locked-fg in it.

@fbrubacher
Copy link
Author

Will try to port the fix into postgres and update the ticket !

@fs111
Copy link
Contributor

fs111 commented Mar 9, 2015

any update on this? Can this be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants