Skip to content

Resolve #9; Update Hadoop and Spark versions.#33

Merged
ianmilligan1 merged 1 commit intomasterfrom
issue-9
Aug 30, 2017
Merged

Resolve #9; Update Hadoop and Spark versions.#33
ianmilligan1 merged 1 commit intomasterfrom
issue-9

Conversation

@ruebot
Copy link
Member

@ruebot ruebot commented Aug 29, 2017

  • Set Hadoop and Spark versions to Altiscale versions
  • Set Hadoop to 2.7.3
  • Set Spark to 1.6.2

Let's hold off on merging until I can fully test this on Altiscale. Waiting on sorting out the space issue on /home there.

* Set Hadoop and Spark versions to Altiscale versions
* Set Hadoop to 2.7.3
* Set Spark to 1.6.2
@codecov
Copy link

codecov bot commented Aug 29, 2017

Codecov Report

Merging #33 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #33   +/-   ##
=======================================
  Coverage   43.41%   43.41%           
=======================================
  Files          42       42           
  Lines         850      850           
  Branches      148      148           
=======================================
  Hits          369      369           
  Misses        437      437           
  Partials       44       44

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4bb3ff2...92fe25e. Read the comment docs.

@ruebot
Copy link
Member Author

ruebot commented Aug 29, 2017

Superseded by #34

@ruebot ruebot closed this Aug 29, 2017
@ruebot ruebot reopened this Aug 30, 2017
@ruebot
Copy link
Member Author

ruebot commented Aug 30, 2017

We're good to go on Altiscale.

$ /opt/spark/bin/alti-spark-shell --jars /mnt/ephemeral0/aut_test/aut-0.9.1-SNAPSHOT-fatjar-issue-9.jar --conf spark.local.dir=/mnt/ephemeral0/aut_test/tmp  --master yarn --deploy-mode client --num-executors 50 --executor-cores 5 --executor-memory 20G --driver-memory 10G
/tmp/ruebot-hive-1.2.1-lib.zip: OK
ok - no need to re-generate the same /tmp/ruebot-hive-1.2.1-lib.zip, continuing
mkdir: `/user/ruebot/apps': File exists
put: `/user/ruebot/apps/hive-1.2.1-lib.zip': File exists
/opt/alti-spark-1.6.1 /mnt/ephemeral0/aut_test
2017-08-30 14:25:15,310 INFO  org.apache.spark.SecurityManager (Logging.scala:logInfo(58)) - Changing view acls to: ruebot
2017-08-30 14:25:15,314 INFO  org.apache.spark.SecurityManager (Logging.scala:logInfo(58)) - Changing modify acls to: ruebot
2017-08-30 14:25:15,315 INFO  org.apache.spark.SecurityManager (Logging.scala:logInfo(58)) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ruebot); users with modify permissions: Set(ruebot)
2017-08-30 14:25:15,774 INFO  org.apache.spark.HttpServer (Logging.scala:logInfo(58)) - Starting HTTP Server
2017-08-30 14:25:15,874 INFO  org.apache.spark.util.Utils (Logging.scala:logInfo(58)) - Successfully started service 'HTTP class server' on port 45070.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_102)
Type in expressions to have them evaluated.
Type :help for more information.
2017-08-30 14:25:22,552 INFO  org.apache.spark.SparkContext (Logging.scala:logInfo(58)) - Running Spark version 1.6.1
2017-08-30 14:25:22,571 WARN  org.apache.spark.SparkConf (Logging.scala:logWarning(70)) - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
2017-08-30 14:25:22,583 INFO  org.apache.spark.SparkContext (Logging.scala:logInfo(58)) - Spark configuration:
spark.app.name=Spark shell
spark.blockManager.port=45300
spark.broadcast.port=45200
spark.driver.extraClassPath=/etc/alti-spark-1.6.1/hive-site.xml:/etc/alti-spark-1.6.1/yarnclient-driver-log4j.properties
spark.driver.extraJavaOptions=-Dlog4j.configuration=yarnclient-driver-log4j.properties -Djava.library.path=/opt/hadoop/lib/native/
spark.driver.memory=10G
spark.driver.port=45055
spark.eventLog.dir=hdfs:///logs/spark-history/ruebot
spark.eventLog.enabled=true
spark.executor.cores=5
spark.executor.extraClassPath=spark-hive_2.10-1.6.1.jar:spark-hive-thriftserver_2.10-1.6.1.jar
spark.executor.extraJavaOptions=-Djava.library.path=/opt/hadoop/lib/native/
spark.executor.instances=50
spark.executor.memory=20G
spark.executor.port=45250
spark.fileserver.port=45090
spark.history.fs.logDirectory=hdfs:///logs/spark-history
spark.history.retainedApplications=9999999
spark.history.ui.port=18080
spark.jars=file:/mnt/ephemeral0/aut_test/aut-0.9.1-SNAPSHOT-fatjar-issue-9.jar
spark.local.dir=/mnt/ephemeral0/aut_test/tmp
spark.logConf=true
spark.master=yarn-client
spark.port.maxRetries=999
spark.repl.class.uri=http://10.252.18.87:45070
spark.replClassServer.port=45070
spark.shuffle.consolidateFiles=true
spark.submit.deployMode=client
spark.ui.port=45100
spark.yarn.am.extraJavaOptions=-Djava.library.path=/opt/hadoop/lib/native/
spark.yarn.dist.archives=hdfs:///user/ruebot/apps/hive-1.2.1-lib.zip#hive
spark.yarn.queue=research
2017-08-30 14:25:22,610 INFO  org.apache.spark.SecurityManager (Logging.scala:logInfo(58)) - Changing view acls to: ruebot
2017-08-30 14:25:22,610 INFO  org.apache.spark.SecurityManager (Logging.scala:logInfo(58)) - Changing modify acls to: ruebot
2017-08-30 14:25:22,610 INFO  org.apache.spark.SecurityManager (Logging.scala:logInfo(58)) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ruebot); users with modify permissions: Set(ruebot)
2017-08-30 14:25:23,065 INFO  org.apache.spark.util.Utils (Logging.scala:logInfo(58)) - Successfully started service 'sparkDriver' on port 45055.
2017-08-30 14:25:23,553 INFO  akka.event.slf4j.Slf4jLogger (Slf4jLogger.scala:applyOrElse(80)) - Slf4jLogger started
2017-08-30 14:25:23,653 INFO  Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Starting remoting
2017-08-30 14:25:23,855 INFO  Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.252.18.87:45056]
2017-08-30 14:25:23,859 INFO  org.apache.spark.util.Utils (Logging.scala:logInfo(58)) - Successfully started service 'sparkDriverActorSystem' on port 45056.
2017-08-30 14:25:23,888 INFO  org.apache.spark.SparkEnv (Logging.scala:logInfo(58)) - Registering MapOutputTracker
2017-08-30 14:25:23,922 INFO  org.apache.spark.SparkEnv (Logging.scala:logInfo(58)) - Registering BlockManagerMaster
2017-08-30 14:25:23,942 INFO  org.apache.spark.storage.DiskBlockManager (Logging.scala:logInfo(58)) - Created local directory at /mnt/ephemeral0/aut_test/tmp/blockmgr-958d3a62-f52f-408f-b69c-e88b5cee1162
2017-08-30 14:25:23,953 INFO  org.apache.spark.storage.MemoryStore (Logging.scala:logInfo(58)) - MemoryStore started with capacity 7.0 GB
2017-08-30 14:25:24,351 INFO  org.apache.spark.SparkEnv (Logging.scala:logInfo(58)) - Registering OutputCommitCoordinator
2017-08-30 14:25:24,566 INFO  org.apache.spark.util.Utils (Logging.scala:logInfo(58)) - Successfully started service 'SparkUI' on port 45100.
2017-08-30 14:25:24,572 INFO  org.apache.spark.ui.SparkUI (Logging.scala:logInfo(58)) - Started SparkUI at http://10.252.18.87:45100
2017-08-30 14:25:24,641 INFO  org.apache.spark.HttpFileServer (Logging.scala:logInfo(58)) - HTTP File server directory is /mnt/ephemeral0/aut_test/tmp/spark-e2a0da6c-8134-47e1-ba4e-a20e7810e930/httpd-028aab69-990b-4a9a-943d-17322eedb822
2017-08-30 14:25:24,642 INFO  org.apache.spark.HttpServer (Logging.scala:logInfo(58)) - Starting HTTP Server
2017-08-30 14:25:24,650 INFO  org.apache.spark.util.Utils (Logging.scala:logInfo(58)) - Successfully started service 'HTTP file server' on port 45090.
2017-08-30 14:25:25,578 INFO  org.apache.spark.SparkContext (Logging.scala:logInfo(58)) - Added JAR file:/mnt/ephemeral0/aut_test/aut-0.9.1-SNAPSHOT-fatjar-issue-9.jar at http://10.252.18.87:45090/jars/aut-0.9.1-SNAPSHOT-fatjar-issue-9.jar with timestamp 1504103125578
2017-08-30 14:25:26,563 INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl (TimelineClientImpl.java:serviceInit(297)) - Timeline service address: http://rm-ia.s3s.altiscale.com:8188/ws/v1/timeline/
2017-08-30 14:25:26,752 INFO  org.apache.hadoop.yarn.client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at rm-ia.s3s.altiscale.com/10.251.255.108:8032
2017-08-30 14:25:27,743 INFO  org.apache.hadoop.yarn.client.AHSProxy (AHSProxy.java:createAHSProxy(42)) - Connecting to Application History server at rm-ia.s3s.altiscale.com/10.251.255.108:10200
2017-08-30 14:25:27,839 INFO  org.apache.spark.deploy.yarn.Client (Logging.scala:logInfo(58)) - Requesting a new application from cluster with 4 NodeManagers
2017-08-30 14:25:27,862 INFO  org.apache.spark.deploy.yarn.Client (Logging.scala:logInfo(58)) - Verifying our application has not requested more than the maximum memory capability of the cluster (40960 MB per container)
2017-08-30 14:25:27,863 INFO  org.apache.spark.deploy.yarn.Client (Logging.scala:logInfo(58)) - Will allocate AM container, with 896 MB memory including 384 MB overhead
2017-08-30 14:25:27,864 INFO  org.apache.spark.deploy.yarn.Client (Logging.scala:logInfo(58)) - Setting up container launch context for our AM
2017-08-30 14:25:27,867 INFO  org.apache.spark.deploy.yarn.Client (Logging.scala:logInfo(58)) - Setting up the launch environment for our AM container
2017-08-30 14:25:27,878 INFO  org.apache.spark.deploy.yarn.Client (Logging.scala:logInfo(58)) - Preparing resources for our AM container
2017-08-30 14:25:28,796 INFO  org.apache.spark.deploy.yarn.Client (Logging.scala:logInfo(58)) - Uploading resource file:/opt/alti-spark-1.6.1/assembly/target/scala-2.10/spark-assembly-1.6.1-hadoop2.7.1.jar -> hdfs://nn-ia.s3s.altiscale.com:8020/user/ruebot/.sparkStaging/application_1503012128389_0031/spark-assembly-1.6.1-hadoop2.7.1.jar
2017-08-30 14:25:32,349 INFO  org.apache.spark.deploy.yarn.Client (Logging.scala:logInfo(58)) - Source and destination file systems are the same. Not copying hdfs:/user/ruebot/apps/hive-1.2.1-lib.zip#hive
2017-08-30 14:25:32,445 INFO  org.apache.spark.deploy.yarn.Client (Logging.scala:logInfo(58)) - Uploading resource file:/mnt/ephemeral0/aut_test/tmp/spark-e2a0da6c-8134-47e1-ba4e-a20e7810e930/__spark_conf__2462245065137756904.zip -> hdfs://nn-ia.s3s.altiscale.com:8020/user/ruebot/.sparkStaging/application_1503012128389_0031/__spark_conf__2462245065137756904.zip
2017-08-30 14:25:32,937 INFO  org.apache.spark.SecurityManager (Logging.scala:logInfo(58)) - Changing view acls to: ruebot
2017-08-30 14:25:32,938 INFO  org.apache.spark.SecurityManager (Logging.scala:logInfo(58)) - Changing modify acls to: ruebot
2017-08-30 14:25:32,939 INFO  org.apache.spark.SecurityManager (Logging.scala:logInfo(58)) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ruebot); users with modify permissions: Set(ruebot)
2017-08-30 14:25:32,956 INFO  org.apache.spark.deploy.yarn.Client (Logging.scala:logInfo(58)) - Submitting application 31 to ResourceManager
2017-08-30 14:25:33,004 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl (YarnClientImpl.java:submitApplication(273)) - Submitted application application_1503012128389_0031
2017-08-30 14:25:34,012 INFO  org.apache.spark.deploy.yarn.Client (Logging.scala:logInfo(58)) - Application report for application_1503012128389_0031 (state: ACCEPTED)
2017-08-30 14:25:34,015 INFO  org.apache.spark.deploy.yarn.Client (Logging.scala:logInfo(58)) - 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: research
	 start time: 1504103132979
	 final status: UNDEFINED
	 tracking URL: http://rm-ia.s3s.altiscale.com:8088/proxy/application_1503012128389_0031/
	 user: ruebot
2017-08-30 14:25:35,019 INFO  org.apache.spark.deploy.yarn.Client (Logging.scala:logInfo(58)) - Application report for application_1503012128389_0031 (state: ACCEPTED)
2017-08-30 14:25:36,023 INFO  org.apache.spark.deploy.yarn.Client (Logging.scala:logInfo(58)) - Application report for application_1503012128389_0031 (state: ACCEPTED)
2017-08-30 14:25:37,027 INFO  org.apache.spark.deploy.yarn.Client (Logging.scala:logInfo(58)) - Application report for application_1503012128389_0031 (state: ACCEPTED)
2017-08-30 14:25:38,031 INFO  org.apache.spark.deploy.yarn.Client (Logging.scala:logInfo(58)) - Application report for application_1503012128389_0031 (state: ACCEPTED)
2017-08-30 14:25:39,035 INFO  org.apache.spark.deploy.yarn.Client (Logging.scala:logInfo(58)) - Application report for application_1503012128389_0031 (state: ACCEPTED)
2017-08-30 14:25:40,039 INFO  org.apache.spark.deploy.yarn.Client (Logging.scala:logInfo(58)) - Application report for application_1503012128389_0031 (state: ACCEPTED)
2017-08-30 14:25:40,485 INFO  org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint (Logging.scala:logInfo(58)) - ApplicationMaster registered as NettyRpcEndpointRef(null)
2017-08-30 14:25:40,498 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> rm-ia.s3s.altiscale.com, PROXY_URI_BASES -> http://rm-ia.s3s.altiscale.com:8088/proxy/application_1503012128389_0031), /proxy/application_1503012128389_0031
2017-08-30 14:25:40,500 INFO  org.apache.spark.ui.JettyUtils (Logging.scala:logInfo(58)) - Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
2017-08-30 14:25:41,043 INFO  org.apache.spark.deploy.yarn.Client (Logging.scala:logInfo(58)) - Application report for application_1503012128389_0031 (state: RUNNING)
2017-08-30 14:25:41,044 INFO  org.apache.spark.deploy.yarn.Client (Logging.scala:logInfo(58)) - 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: 10.251.240.185
	 ApplicationMaster RPC port: 0
	 queue: research
	 start time: 1504103132979
	 final status: UNDEFINED
	 tracking URL: http://rm-ia.s3s.altiscale.com:8088/proxy/application_1503012128389_0031/
	 user: ruebot
2017-08-30 14:25:41,045 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - Application application_1503012128389_0031 has started running.
2017-08-30 14:25:41,052 INFO  org.apache.spark.util.Utils (Logging.scala:logInfo(58)) - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45300.
2017-08-30 14:25:41,052 INFO  org.apache.spark.network.netty.NettyBlockTransferService (Logging.scala:logInfo(58)) - Server created on 45300
2017-08-30 14:25:41,054 INFO  org.apache.spark.storage.BlockManagerMaster (Logging.scala:logInfo(58)) - Trying to register BlockManager
2017-08-30 14:25:41,057 INFO  org.apache.spark.storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(58)) - Registering block manager 10.252.18.87:45300 with 7.0 GB RAM, BlockManagerId(driver, 10.252.18.87, 45300)
2017-08-30 14:25:41,060 INFO  org.apache.spark.storage.BlockManagerMaster (Logging.scala:logInfo(58)) - Registered BlockManager
2017-08-30 14:25:41,480 INFO  org.apache.spark.scheduler.EventLoggingListener (Logging.scala:logInfo(58)) - Logging events to hdfs:///logs/spark-history/ruebot/application_1503012128389_0031
2017-08-30 14:25:44,899 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - Registered executor NettyRpcEndpointRef(null) (205-08-c02.sc1.altiscale.com:47278) with ID 3
2017-08-30 14:25:44,944 INFO  org.apache.spark.storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(58)) - Registering block manager 205-08-c02.sc1.altiscale.com:45300 with 14.2 GB RAM, BlockManagerId(3, 205-08-c02.sc1.altiscale.com, 45300)
2017-08-30 14:25:45,619 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - Registered executor NettyRpcEndpointRef(null) (205-08-c02.sc1.altiscale.com:47280) with ID 7
2017-08-30 14:25:45,722 INFO  org.apache.spark.storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(58)) - Registering block manager 205-08-c02.sc1.altiscale.com:45301 with 14.2 GB RAM, BlockManagerId(7, 205-08-c02.sc1.altiscale.com, 45301)
2017-08-30 14:25:47,341 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - Registered executor NettyRpcEndpointRef(null) (205-08-c02.sc1.altiscale.com:47287) with ID 11
2017-08-30 14:25:47,384 INFO  org.apache.spark.storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(58)) - Registering block manager 205-08-c02.sc1.altiscale.com:45302 with 14.2 GB RAM, BlockManagerId(11, 205-08-c02.sc1.altiscale.com, 45302)
2017-08-30 14:25:47,760 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - Registered executor NettyRpcEndpointRef(null) (107-17-c02.sc1.altiscale.com:51613) with ID 13
2017-08-30 14:25:47,766 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - Registered executor NettyRpcEndpointRef(null) (108-05-c02.sc1.altiscale.com:36727) with ID 10
2017-08-30 14:25:47,796 INFO  org.apache.spark.storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(58)) - Registering block manager 107-17-c02.sc1.altiscale.com:45300 with 14.2 GB RAM, BlockManagerId(13, 107-17-c02.sc1.altiscale.com, 45300)
2017-08-30 14:25:47,804 INFO  org.apache.spark.storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(58)) - Registering block manager 108-05-c02.sc1.altiscale.com:45300 with 14.2 GB RAM, BlockManagerId(10, 108-05-c02.sc1.altiscale.com, 45300)
2017-08-30 14:25:47,831 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - Registered executor NettyRpcEndpointRef(null) (108-05-c02.sc1.altiscale.com:36728) with ID 2
2017-08-30 14:25:47,872 INFO  org.apache.spark.storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(58)) - Registering block manager 108-05-c02.sc1.altiscale.com:45301 with 14.2 GB RAM, BlockManagerId(2, 108-05-c02.sc1.altiscale.com, 45301)
2017-08-30 14:25:47,891 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - Registered executor NettyRpcEndpointRef(null) (107-17-c02.sc1.altiscale.com:51614) with ID 5
2017-08-30 14:25:47,899 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - Registered executor NettyRpcEndpointRef(null) (108-47-c02.sc1.altiscale.com:39331) with ID 12
2017-08-30 14:25:47,929 INFO  org.apache.spark.storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(58)) - Registering block manager 107-17-c02.sc1.altiscale.com:45301 with 14.2 GB RAM, BlockManagerId(5, 107-17-c02.sc1.altiscale.com, 45301)
2017-08-30 14:25:47,930 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - Registered executor NettyRpcEndpointRef(null) (108-05-c02.sc1.altiscale.com:36729) with ID 6
2017-08-30 14:25:47,933 INFO  org.apache.spark.storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(58)) - Registering block manager 108-47-c02.sc1.altiscale.com:45300 with 14.2 GB RAM, BlockManagerId(12, 108-47-c02.sc1.altiscale.com, 45300)
2017-08-30 14:25:47,970 INFO  org.apache.spark.storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(58)) - Registering block manager 108-05-c02.sc1.altiscale.com:45302 with 14.2 GB RAM, BlockManagerId(6, 108-05-c02.sc1.altiscale.com, 45302)
2017-08-30 14:25:47,988 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - Registered executor NettyRpcEndpointRef(null) (107-17-c02.sc1.altiscale.com:51615) with ID 9
2017-08-30 14:25:47,991 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - Registered executor NettyRpcEndpointRef(null) (107-17-c02.sc1.altiscale.com:51616) with ID 1
2017-08-30 14:25:48,011 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - Registered executor NettyRpcEndpointRef(null) (108-47-c02.sc1.altiscale.com:39333) with ID 4
2017-08-30 14:25:48,028 INFO  org.apache.spark.storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(58)) - Registering block manager 107-17-c02.sc1.altiscale.com:45302 with 14.2 GB RAM, BlockManagerId(9, 107-17-c02.sc1.altiscale.com, 45302)
2017-08-30 14:25:48,032 INFO  org.apache.spark.storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(58)) - Registering block manager 107-17-c02.sc1.altiscale.com:45303 with 14.2 GB RAM, BlockManagerId(1, 107-17-c02.sc1.altiscale.com, 45303)
2017-08-30 14:25:48,055 INFO  org.apache.spark.storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(58)) - Registering block manager 108-47-c02.sc1.altiscale.com:45301 with 14.2 GB RAM, BlockManagerId(4, 108-47-c02.sc1.altiscale.com, 45301)
2017-08-30 14:25:48,131 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - Registered executor NettyRpcEndpointRef(null) (108-47-c02.sc1.altiscale.com:39335) with ID 8
2017-08-30 14:25:48,170 INFO  org.apache.spark.storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(58)) - Registering block manager 108-47-c02.sc1.altiscale.com:45302 with 14.2 GB RAM, BlockManagerId(8, 108-47-c02.sc1.altiscale.com, 45302)
2017-08-30 14:25:50,292 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - Registered executor NettyRpcEndpointRef(null) (107-17-c02.sc1.altiscale.com:51618) with ID 17
2017-08-30 14:25:50,300 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - Registered executor NettyRpcEndpointRef(null) (108-05-c02.sc1.altiscale.com:36732) with ID 18
2017-08-30 14:25:50,336 INFO  org.apache.spark.storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(58)) - Registering block manager 107-17-c02.sc1.altiscale.com:45304 with 14.2 GB RAM, BlockManagerId(17, 107-17-c02.sc1.altiscale.com, 45304)
2017-08-30 14:25:50,340 INFO  org.apache.spark.storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(58)) - Registering block manager 108-05-c02.sc1.altiscale.com:45303 with 14.2 GB RAM, BlockManagerId(18, 108-05-c02.sc1.altiscale.com, 45303)
2017-08-30 14:25:50,394 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - Registered executor NettyRpcEndpointRef(null) (108-05-c02.sc1.altiscale.com:36733) with ID 14
2017-08-30 14:25:50,419 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - Registered executor NettyRpcEndpointRef(null) (108-47-c02.sc1.altiscale.com:39355) with ID 16
2017-08-30 14:25:50,442 INFO  org.apache.spark.storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(58)) - Registering block manager 108-05-c02.sc1.altiscale.com:45304 with 14.2 GB RAM, BlockManagerId(14, 108-05-c02.sc1.altiscale.com, 45304)
2017-08-30 14:25:50,463 INFO  org.apache.spark.storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(58)) - Registering block manager 108-47-c02.sc1.altiscale.com:45303 with 14.2 GB RAM, BlockManagerId(16, 108-47-c02.sc1.altiscale.com, 45303)
2017-08-30 14:25:50,762 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - Registered executor NettyRpcEndpointRef(null) (205-08-c02.sc1.altiscale.com:47289) with ID 15
2017-08-30 14:25:50,828 INFO  org.apache.spark.storage.BlockManagerMasterEndpoint (Logging.scala:logInfo(58)) - Registering block manager 205-08-c02.sc1.altiscale.com:45303 with 14.2 GB RAM, BlockManagerId(15, 205-08-c02.sc1.altiscale.com, 45303)
2017-08-30 14:25:55,717 INFO  org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend (Logging.scala:logInfo(58)) - SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
2017-08-30 14:25:55,721 INFO  org.apache.spark.repl.SparkILoop (Logging.scala:logInfo(58)) - Created spark context..
Spark context available as sc.
2017-08-30 14:25:57,233 INFO  org.apache.spark.sql.hive.HiveContext (Logging.scala:logInfo(58)) - Initializing execution hive, version 1.2.1
2017-08-30 14:25:57,421 INFO  org.apache.spark.sql.hive.client.ClientWrapper (Logging.scala:logInfo(58)) - Inspected Hadoop version: 2.7.3-altiscale
2017-08-30 14:25:57,427 INFO  org.apache.spark.sql.hive.client.ClientWrapper (Logging.scala:logInfo(58)) - Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3-altiscale
Hive history file=/tmp/ruebot/hive_job_log_e060010e-b170-4ee7-a5d3-44561cf9b033_459411700.txt
2017-08-30 14:25:58,456 INFO  hive.ql.exec.HiveHistoryImpl (SessionState.java:printInfo(951)) - Hive history file=/tmp/ruebot/hive_job_log_e060010e-b170-4ee7-a5d3-44561cf9b033_459411700.txt
2017-08-30 14:25:58,761 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(589)) - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
2017-08-30 14:25:58,838 INFO  org.apache.hadoop.hive.metastore.ObjectStore (ObjectStore.java:initialize(289)) - ObjectStore, initialize called
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] java.io.FileNotFoundException: derby.log (Permission denied)
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.dataDictionary in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.timerJ6 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.lockManagerJ1 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.timerJ1 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.classes.dvfJ2 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.javaCompiler in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.replication.slave in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.jdk.rawStore.transactionJ6 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.ef in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.jdk.rawStore.transactionJ1 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.database in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.NoneAuthentication in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.netServer.autoStart in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.dvfJ2 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.mgmt.null in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.nativeAuthentication in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.jdk.lockManagerJ6 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.replication.master in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.dvfCDC in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.access.btree in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.jdk.lockManagerJ1 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.uuidJ1 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.cryptographyJ2 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.jdk.rawStore.data.genericJ4 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.jdk.rawStore.data.genericJ1 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.access in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.jdbc169 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.jdk.cryptographyJ2 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.optimizer in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.jdk.mgmt.jmx in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.jdk.dvfJ2 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.specificAuthentication in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.JNDIAuthentication in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.basicAuthentication in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.rawStore.data.genericJ4 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.validation in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.rawStore.data.genericJ1 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.classManagerJ6 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.streams in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.classManagerJ2 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.classes.resourceAdapterJ2 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.classes.rawStore.transactionJ6 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.jdbcJ8 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.jdbcJ6 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.jdk.classManagerJ6 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.jdbcJ4 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.rawStore.log in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.rawStore.log.readonly in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.access.heap in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.classes.jdbcJ8 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.daemon in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.jdk.cacheManagerJ6 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.tcf in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.classes.jdbcJ6 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.classes.jdbcJ4 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.access.uniquewithduplicatenullssort in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.jdk.cacheManagerJ1 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.cacheManagerJ6 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.resultSetStatisticsFactory in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.cacheManagerJ1 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.classes.cryptographyJ2 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.classes.rawStore.data.genericJ4 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.jdk.timerJ6 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.database.slave in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.XPLAINFactory in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.resourceAdapterJ2 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.access.sort in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.mgmt.jmx in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.rawStore.transactionJ6 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.classes.JNDIAuthentication in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.rawStore.transactionJ1 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.lcf in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.rawStore in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.jdk.resourceAdapterJ2 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.classes.jdbc169 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.lf in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.jdk.jdbcJ8 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.jdk.jdbcJ6 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.env.jdk.jdbcJ4 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.nodeFactory in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] Ignored duplicate property derby.module.lockManagerJ6 in jar:file:/opt/hive-1.2.1/lib/hive-jdbc-1.2.1-standalone.jar!/org/apache/derby/modules.properties
Wed Aug 30 14:26:01 UTC 2017 Thread[main,5,main] 
----------------------------------------------------------------
Wed Aug 30 14:26:01 UTC 2017:
Booting Derby version The Apache Software Foundation - Apache Derby - 10.10.2.0 - (1582446): instance a816c00e-015e-3388-dd4c-000217ef65a0 
on database directory /tmp/spark-a1e955ad-d5e9-4c3e-bb73-4339167a68ec/metastore with class loader sun.misc.Launcher$AppClassLoader@5c647e05 
Loaded from file:/opt/hive-1.2.1/lib/derby-10.10.2.0.jar
java.vendor=Oracle Corporation
java.runtime.version=1.8.0_102-b14
user.dir=/opt/alti-spark-1.6.1
os.name=Linux
os.arch=amd64
os.version=4.9.34-301.alti6.x86_64
derby.system.home=null
Database Class Loader started - derby.database.classpath=''
2017-08-30 14:26:04,731 INFO  org.apache.hadoop.hive.metastore.ObjectStore (ObjectStore.java:getPMF(370)) - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
2017-08-30 14:26:12,373 INFO  org.apache.hadoop.hive.metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:<init>(139)) - Using direct SQL, underlying DB is DERBY
2017-08-30 14:26:12,387 INFO  org.apache.hadoop.hive.metastore.ObjectStore (ObjectStore.java:setConf(272)) - Initialized ObjectStore
2017-08-30 14:26:12,837 WARN  org.apache.hadoop.hive.metastore.ObjectStore (ObjectStore.java:checkSchema(6666)) - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
2017-08-30 14:26:13,268 WARN  org.apache.hadoop.hive.metastore.ObjectStore (ObjectStore.java:getDatabase(568)) - Failed to get database default, returning NoSuchObjectException
2017-08-30 14:26:13,654 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles_core(663)) - Added admin role in metastore
2017-08-30 14:26:13,664 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles_core(672)) - Added public role in metastore
2017-08-30 14:26:13,826 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore (HiveMetaStore.java:addAdminUsers_core(712)) - No user is added in admin role, since config is empty
2017-08-30 14:26:14,212 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore (HiveMetaStore.java:logInfo(746)) - 0: get_all_databases
2017-08-30 14:26:14,218 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(371)) - ugi=ruebot	ip=unknown-ip-addr	cmd=get_all_databases	
2017-08-30 14:26:14,265 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore (HiveMetaStore.java:logInfo(746)) - 0: get_functions: db=default pat=*
2017-08-30 14:26:14,267 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(371)) - ugi=ruebot	ip=unknown-ip-addr	cmd=get_functions: db=default pat=*	
2017-08-30 14:26:14,734 INFO  org.apache.hadoop.hive.ql.session.SessionState (SessionState.java:createPath(641)) - Created local directory: /tmp/e060010e-b170-4ee7-a5d3-44561cf9b033_resources
2017-08-30 14:26:14,854 INFO  org.apache.hadoop.hive.ql.session.SessionState (SessionState.java:createPath(641)) - Created HDFS directory: /tmp/hive/ruebot/e060010e-b170-4ee7-a5d3-44561cf9b033
2017-08-30 14:26:14,859 INFO  org.apache.hadoop.hive.ql.session.SessionState (SessionState.java:createPath(641)) - Created local directory: /tmp/ruebot/e060010e-b170-4ee7-a5d3-44561cf9b033
2017-08-30 14:26:14,946 INFO  org.apache.hadoop.hive.ql.session.SessionState (SessionState.java:createPath(641)) - Created HDFS directory: /tmp/hive/ruebot/e060010e-b170-4ee7-a5d3-44561cf9b033/_tmp_space.db
2017-08-30 14:26:15,135 INFO  org.apache.spark.sql.hive.HiveContext (Logging.scala:logInfo(58)) - default warehouse location is /hive
2017-08-30 14:26:15,148 INFO  org.apache.spark.sql.hive.HiveContext (Logging.scala:logInfo(58)) - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
2017-08-30 14:26:15,174 INFO  org.apache.spark.sql.hive.client.ClientWrapper (Logging.scala:logInfo(58)) - Inspected Hadoop version: 2.7.3-altiscale
2017-08-30 14:26:15,264 INFO  org.apache.spark.sql.hive.client.ClientWrapper (Logging.scala:logInfo(58)) - Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3-altiscale
Hive history file=/tmp/ruebot/hive_job_log_ff101c4c-4915-4904-9344-cfadb998c6ca_69186235.txt
2017-08-30 14:26:16,321 INFO  hive.ql.exec.HiveHistoryImpl (SessionState.java:printInfo(951)) - Hive history file=/tmp/ruebot/hive_job_log_ff101c4c-4915-4904-9344-cfadb998c6ca_69186235.txt
2017-08-30 14:26:16,370 INFO  hive.metastore (HiveMetaStoreClient.java:open(392)) - Trying to connect to metastore with URI thrift://hive-ia.s3s.altiscale.com:9083
2017-08-30 14:26:16,423 INFO  hive.metastore (HiveMetaStoreClient.java:open(502)) - Connected to metastore.
2017-08-30 14:26:16,650 INFO  org.apache.hadoop.hive.ql.session.SessionState (SessionState.java:createPath(641)) - Created local directory: /tmp/ff101c4c-4915-4904-9344-cfadb998c6ca_resources
2017-08-30 14:26:16,756 INFO  org.apache.hadoop.hive.ql.session.SessionState (SessionState.java:createPath(641)) - Created HDFS directory: /tmp/hive/ruebot/ff101c4c-4915-4904-9344-cfadb998c6ca
2017-08-30 14:26:16,761 INFO  org.apache.hadoop.hive.ql.session.SessionState (SessionState.java:createPath(641)) - Created local directory: /tmp/ruebot/ff101c4c-4915-4904-9344-cfadb998c6ca
2017-08-30 14:26:16,840 INFO  org.apache.hadoop.hive.ql.session.SessionState (SessionState.java:createPath(641)) - Created HDFS directory: /tmp/hive/ruebot/ff101c4c-4915-4904-9344-cfadb998c6ca/_tmp_space.db
2017-08-30 14:26:16,867 INFO  org.apache.spark.repl.SparkILoop (Logging.scala:logInfo(58)) - Created sql context (with Hive support)..
SQL context available as sqlContext.

scala> :paste
// Entering paste mode (ctrl-D to finish)

import io.archivesunleashed.spark.matchbox._
import io.archivesunleashed.spark.rdd.RecordRDD._

val r = RecordLoader.loadArchives("/user/ruebot/aut/ARCHIVEIT-2014-SEMIANNUAL-7285-20150424172507144-00000-wbgrp-crawl061.us.archive.org-6444.warc.gz", sc)
.keepValidPages()
.map(r => ExtractDomain(r.getUrl))
.countItems()
.take(10)

// Exiting paste mode, now interpreting.

2017-08-30 14:29:40,674 INFO  org.apache.spark.storage.MemoryStore (Logging.scala:logInfo(58)) - Block broadcast_0 stored as values in memory (estimated size 317.1 KB, free 317.1 KB)
2017-08-30 14:29:40,796 INFO  org.apache.spark.storage.MemoryStore (Logging.scala:logInfo(58)) - Block broadcast_0_piece0 stored as bytes in memory (estimated size 24.9 KB, free 342.0 KB)
2017-08-30 14:29:40,798 INFO  org.apache.spark.storage.BlockManagerInfo (Logging.scala:logInfo(58)) - Added broadcast_0_piece0 in memory on 10.252.18.87:45300 (size: 24.9 KB, free: 7.0 GB)
2017-08-30 14:29:40,801 INFO  org.apache.spark.SparkContext (Logging.scala:logInfo(58)) - Created broadcast 0 from newAPIHadoopFile at RecordLoader.scala:42
2017-08-30 14:29:40,930 INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat (FileInputFormat.java:listStatus(283)) - Total input paths to process : 1
2017-08-30 14:29:41,032 INFO  org.apache.spark.SparkContext (Logging.scala:logInfo(58)) - Starting job: take at <console>:34
2017-08-30 14:29:41,055 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - Registering RDD 5 (map at RecordRDD.scala:38)
2017-08-30 14:29:41,056 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - Registering RDD 7 (sortBy at RecordRDD.scala:40)
2017-08-30 14:29:41,059 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - Got job 0 (take at <console>:34) with 1 output partitions
2017-08-30 14:29:41,060 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - Final stage: ResultStage 2 (take at <console>:34)
2017-08-30 14:29:41,061 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - Parents of final stage: List(ShuffleMapStage 1)
2017-08-30 14:29:41,062 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - Missing parents: List(ShuffleMapStage 1)
2017-08-30 14:29:41,072 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - Submitting ShuffleMapStage 0 (MapPartitionsRDD[5] at map at RecordRDD.scala:38), which has no missing parents
2017-08-30 14:29:41,110 INFO  org.apache.spark.storage.MemoryStore (Logging.scala:logInfo(58)) - Block broadcast_1 stored as values in memory (estimated size 4.0 KB, free 346.0 KB)
2017-08-30 14:29:41,115 INFO  org.apache.spark.storage.MemoryStore (Logging.scala:logInfo(58)) - Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.2 KB, free 348.2 KB)
2017-08-30 14:29:41,115 INFO  org.apache.spark.storage.BlockManagerInfo (Logging.scala:logInfo(58)) - Added broadcast_1_piece0 in memory on 10.252.18.87:45300 (size: 2.2 KB, free: 7.0 GB)
2017-08-30 14:29:41,116 INFO  org.apache.spark.SparkContext (Logging.scala:logInfo(58)) - Created broadcast 1 from broadcast at DAGScheduler.scala:1006
2017-08-30 14:29:41,127 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[5] at map at RecordRDD.scala:38)
2017-08-30 14:29:41,129 INFO  org.apache.spark.scheduler.cluster.YarnScheduler (Logging.scala:logInfo(58)) - Adding task set 0.0 with 1 tasks
2017-08-30 14:29:41,177 INFO  org.apache.spark.scheduler.TaskSetManager (Logging.scala:logInfo(58)) - Starting task 0.0 in stage 0.0 (TID 0, 108-47-c02.sc1.altiscale.com, partition 0,RACK_LOCAL, 2354 bytes)
2017-08-30 14:29:44,037 INFO  org.apache.spark.storage.BlockManagerInfo (Logging.scala:logInfo(58)) - Added broadcast_1_piece0 in memory on 108-47-c02.sc1.altiscale.com:45303 (size: 2.2 KB, free: 14.2 GB)
2017-08-30 14:29:44,276 INFO  org.apache.spark.storage.BlockManagerInfo (Logging.scala:logInfo(58)) - Added broadcast_0_piece0 in memory on 108-47-c02.sc1.altiscale.com:45303 (size: 24.9 KB, free: 14.2 GB)
2017-08-30 14:30:38,833 INFO  org.apache.spark.scheduler.TaskSetManager (Logging.scala:logInfo(58)) - Finished task 0.0 in stage 0.0 (TID 0) in 57675 ms on 108-47-c02.sc1.altiscale.com (1/1)
2017-08-30 14:30:38,835 INFO  org.apache.spark.scheduler.cluster.YarnScheduler (Logging.scala:logInfo(58)) - Removed TaskSet 0.0, whose tasks have all completed, from pool 
2017-08-30 14:30:38,841 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - ShuffleMapStage 0 (map at RecordRDD.scala:38) finished in 57.687 s
2017-08-30 14:30:38,846 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - looking for newly runnable stages
2017-08-30 14:30:38,847 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - running: Set()
2017-08-30 14:30:38,847 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - waiting: Set(ShuffleMapStage 1, ResultStage 2)
2017-08-30 14:30:38,848 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - failed: Set()
2017-08-30 14:30:38,853 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - Submitting ShuffleMapStage 1 (MapPartitionsRDD[7] at sortBy at RecordRDD.scala:40), which has no missing parents
2017-08-30 14:30:38,860 INFO  org.apache.spark.storage.MemoryStore (Logging.scala:logInfo(58)) - Block broadcast_2 stored as values in memory (estimated size 3.6 KB, free 351.8 KB)
2017-08-30 14:30:38,861 INFO  org.apache.spark.storage.MemoryStore (Logging.scala:logInfo(58)) - Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.0 KB, free 353.8 KB)
2017-08-30 14:30:38,867 INFO  org.apache.spark.storage.BlockManagerInfo (Logging.scala:logInfo(58)) - Added broadcast_2_piece0 in memory on 10.252.18.87:45300 (size: 2.0 KB, free: 7.0 GB)
2017-08-30 14:30:38,869 INFO  org.apache.spark.SparkContext (Logging.scala:logInfo(58)) - Created broadcast 2 from broadcast at DAGScheduler.scala:1006
2017-08-30 14:30:38,873 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - Submitting 1 missing tasks from ShuffleMapStage 1 (MapPartitionsRDD[7] at sortBy at RecordRDD.scala:40)
2017-08-30 14:30:38,874 INFO  org.apache.spark.scheduler.cluster.YarnScheduler (Logging.scala:logInfo(58)) - Adding task set 1.0 with 1 tasks
2017-08-30 14:30:38,876 INFO  org.apache.spark.scheduler.TaskSetManager (Logging.scala:logInfo(58)) - Starting task 0.0 in stage 1.0 (TID 1, 108-47-c02.sc1.altiscale.com, partition 0,NODE_LOCAL, 1961 bytes)
2017-08-30 14:30:41,345 INFO  org.apache.spark.storage.BlockManagerInfo (Logging.scala:logInfo(58)) - Added broadcast_2_piece0 in memory on 108-47-c02.sc1.altiscale.com:45301 (size: 2.0 KB, free: 14.2 GB)
2017-08-30 14:30:41,925 INFO  org.apache.spark.MapOutputTrackerMasterEndpoint (Logging.scala:logInfo(58)) - Asked to send map output locations for shuffle 1 to 108-47-c02.sc1.altiscale.com:39333
2017-08-30 14:30:41,929 INFO  org.apache.spark.MapOutputTrackerMaster (Logging.scala:logInfo(58)) - Size of output statuses for shuffle 1 is 157 bytes
2017-08-30 14:30:42,127 INFO  org.apache.spark.scheduler.TaskSetManager (Logging.scala:logInfo(58)) - Finished task 0.0 in stage 1.0 (TID 1) in 3252 ms on 108-47-c02.sc1.altiscale.com (1/1)
2017-08-30 14:30:42,128 INFO  org.apache.spark.scheduler.cluster.YarnScheduler (Logging.scala:logInfo(58)) - Removed TaskSet 1.0, whose tasks have all completed, from pool 
2017-08-30 14:30:42,128 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - ShuffleMapStage 1 (sortBy at RecordRDD.scala:40) finished in 3.254 s
2017-08-30 14:30:42,130 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - looking for newly runnable stages
2017-08-30 14:30:42,130 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - running: Set()
2017-08-30 14:30:42,131 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - waiting: Set(ResultStage 2)
2017-08-30 14:30:42,131 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - failed: Set()
2017-08-30 14:30:42,131 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - Submitting ResultStage 2 (MapPartitionsRDD[9] at sortBy at RecordRDD.scala:40), which has no missing parents
2017-08-30 14:30:42,134 INFO  org.apache.spark.storage.MemoryStore (Logging.scala:logInfo(58)) - Block broadcast_3 stored as values in memory (estimated size 3.3 KB, free 357.1 KB)
2017-08-30 14:30:42,139 INFO  org.apache.spark.storage.MemoryStore (Logging.scala:logInfo(58)) - Block broadcast_3_piece0 stored as bytes in memory (estimated size 1953.0 B, free 359.0 KB)
2017-08-30 14:30:42,139 INFO  org.apache.spark.storage.BlockManagerInfo (Logging.scala:logInfo(58)) - Added broadcast_3_piece0 in memory on 10.252.18.87:45300 (size: 1953.0 B, free: 7.0 GB)
2017-08-30 14:30:42,140 INFO  org.apache.spark.SparkContext (Logging.scala:logInfo(58)) - Created broadcast 3 from broadcast at DAGScheduler.scala:1006
2017-08-30 14:30:42,141 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[9] at sortBy at RecordRDD.scala:40)
2017-08-30 14:30:42,141 INFO  org.apache.spark.scheduler.cluster.YarnScheduler (Logging.scala:logInfo(58)) - Adding task set 2.0 with 1 tasks
2017-08-30 14:30:42,146 INFO  org.apache.spark.scheduler.TaskSetManager (Logging.scala:logInfo(58)) - Starting task 0.0 in stage 2.0 (TID 2, 108-47-c02.sc1.altiscale.com, partition 0,NODE_LOCAL, 1972 bytes)
2017-08-30 14:30:44,654 INFO  org.apache.spark.storage.BlockManagerInfo (Logging.scala:logInfo(58)) - Added broadcast_3_piece0 in memory on 108-47-c02.sc1.altiscale.com:45302 (size: 1953.0 B, free: 14.2 GB)
2017-08-30 14:30:44,803 INFO  org.apache.spark.MapOutputTrackerMasterEndpoint (Logging.scala:logInfo(58)) - Asked to send map output locations for shuffle 0 to 108-47-c02.sc1.altiscale.com:39335
2017-08-30 14:30:44,804 INFO  org.apache.spark.MapOutputTrackerMaster (Logging.scala:logInfo(58)) - Size of output statuses for shuffle 0 is 156 bytes
2017-08-30 14:30:44,963 INFO  org.apache.spark.scheduler.TaskSetManager (Logging.scala:logInfo(58)) - Finished task 0.0 in stage 2.0 (TID 2) in 2821 ms on 108-47-c02.sc1.altiscale.com (1/1)
2017-08-30 14:30:44,963 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - ResultStage 2 (take at <console>:34) finished in 2.822 s
2017-08-30 14:30:44,964 INFO  org.apache.spark.scheduler.cluster.YarnScheduler (Logging.scala:logInfo(58)) - Removed TaskSet 2.0, whose tasks have all completed, from pool 
2017-08-30 14:30:44,972 INFO  org.apache.spark.scheduler.DAGScheduler (Logging.scala:logInfo(58)) - Job 0 finished: take at <console>:34, took 63.939758 s
import io.archivesunleashed.spark.matchbox._
import io.archivesunleashed.spark.rdd.RecordRDD._
r: Array[(String, Int)] = Array((education.alberta.ca,1002), (www.youtube.com,93), (www.pdfonline.com,56), (www.easypdfcloud.com,24), (studyinalberta.ca,16), (pdftoword4.pdfonline.com,9), (m.pdfonline.com,9), (ideas.education.alberta.ca,8), (pdftoword3.pdfonline.com,7), (pdftoword2.pdfonline.com,7))

@ruebot ruebot changed the title Resolve #9; Update Hadoop and Spark versions. (DO NOT MERGE YET) Resolve #9; Update Hadoop and Spark versions. Aug 30, 2017
@ruebot
Copy link
Member Author

ruebot commented Aug 30, 2017

@ianmilligan1 or @lintool good to merge on this one now.

@ianmilligan1 ianmilligan1 merged commit 67e35b9 into master Aug 30, 2017
@ianmilligan1 ianmilligan1 deleted the issue-9 branch August 30, 2017 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants