Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

running fitRandomForrest with small input data sample results in the exception (M/R terminates) #4

Closed
andrewmilkowski opened this Issue · 7 comments

3 participants

@andrewmilkowski

Second issue is that if input data sample is reduced (example below will only use 20 rows from the overall training set)

transactions <- read.table(file="../downloads/train.csv",
#nrows=1000,
nrows=20,

running fitRandomForrest will terminate will the following exception:

Loading required package: randomForest
randomForest 4.6-7
Type rfNews() to see new features/changes/bug fixes.
Loading required package: rmr2
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: methods
Loading required package: bitops
Loading required package: digest
Loading required package: functional
Loading required package: stringr
Loading required package: plyr
Loading required package: reshape2
Dotted pair list of 12
$ : language (function() { load("./rmr-local-envaaeb61a5a326") ...
$ : language rmr2:::map.loop(map = map, keyval.reader = input.reader(), keyval.writer = if (is.null(reduce)) { output.writer() ...
$ : language as.keyval(map(keys(kv), values(kv)))
$ : language is.keyval(x)
$ : language map(keys(kv), values(kv))
$ : language c.keyval(lapply(1:num.models, generate.sample))
$ : language f.single(args[[1]])
$ : language lapply(kvs, recycle.keyval)
$ : language FUN(X[[1L]], ...)
$ : language keyval(rmr.recycle(k, v), rmr.recycle(v, k))
$ : language rmr.recycle(k, v)
$ : language rmr.str(lx)
lx
int 1
Dotted pair list of 12
$ : language (function() { load("./rmr-local-envaaeb61a5a326") ...
$ : language rmr2:::map.loop(map = map, keyval.reader = input.reader(), keyval.writer = if (is.null(reduce)) { output.writer() ...
$ : language as.keyval(map(keys(kv), values(kv)))
$ : language is.keyval(x)
$ : language map(keys(kv), values(kv))
$ : language c.keyval(lapply(1:num.models, generate.sample))
$ : language f.single(args[[1]])
$ : language lapply(kvs, recycle.keyval)
$ : language FUN(X[[1L]], ...)
$ : language keyval(rmr.recycle(k, v), rmr.recycle(v, k))
$ : language rmr.recycle(k, v)
$ : language rmr.str(ly)
ly
int 0
Error in rmr.recycle(k, v) : Can't recycle 0-length argument
Calls: ... c.keyval -> f.single -> lapply -> FUN -> keyval -> rmr.recycle
Execution halted
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:390)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at org.apache.hadoop.mapred.Child.main(Child.java:260)

@laserson
Owner

It's not clear to me where the read.table call is coming in, as fitRandomForest.R only consumes data from Hadoop. Perhaps some map tasks are somehow calling keyval with no data?

@laserson
Owner

I would also cross-post on the rmr repo as well, as it appears the error is generated in an rmr function.

@andrewmilkowski

will do, believe you are right in this particular test case scenerio

@piccolbo

A rmr.str(v) at the beginning of the map function would clarify the issue. It seems Uri interpretation is correct but it begs the question of why that happens.

@andrewmilkowski

Antonio,

let me transfer this comment and further discussion to rmr2 ticket area (RevolutionAnalytics/rmr2#69) , as to isolate the issue to correct component, for now...

I have added proposed debug statement in the beginning of the mapper function,

MAP function

poisson.subsample <- function(k, input) {
rmr.str(input)
# this function is used to generate a sample from the current block of data

following is the output in the stderr logs

v), values(kv))
$ : language rmr.str(input)
input
'data.frame': 10 obs. of 74 variables:
$ SalePrice : num 26500 9500 19000 11500 65000 24000 38500 13500 21500 36000
$ ModelID.x : Factor w/ 9 levels "21442","2232",..: 8 1 7 3 6 8 4 9 5 2
$ datasource : Factor w/ 1 level "121": 1 1 1 1 1 1 1 1 1 1
$ auctioneerID : Factor w/ 1 level "3": 1 1 1 1 1 1 1 1 1 1
$ YearMade : num 2004 2003 1999 1991 1000 ...
$ MachineHoursCurrentMeter: num 508 0 2450 8005 20700 ...
$ UsageBand : Factor w/ 3 levels "High","Low","Medium": 2 NA 3 3 3 3 1 2 2 NA
$ saledate : Factor w/ 10 levels "2005-10-20","2005-11-17",..: 7 9 3 2 5 6 10 4 8 1
$ fiModelDesc.x : Factor w/ 9 levels "310E","310G",..: 2 5 1 6 7 2 8 3 4 9
$ fiBaseModel.x : Factor w/ 8 levels "310","334","430",..: 1 4 1 5 6 1 7 2 3 8
$ fiSecondaryDesc.x : Factor w/ 6 levels "B","E","G","HAG",..: 3 5 2 6 1 3 NA NA 4 NA
$ fiModelSeries.x : Factor w/ 2 levels "-6E","LC": NA NA NA NA NA NA 1 NA NA 2
$ fiModelDescriptor.x : int NA NA NA NA NA NA NA NA NA 6
$ ProductSize : Factor w/ 4 levels "Large","Large / Medium",..: NA 3 NA NA 1 NA 4 3 3 2
$ fiProductClassDesc.x : Factor w/ 6 levels "Backhoe Loader - 14.0 to 15.0 Ft Standard Digging Depth",..: 1 5 1 1 6 1 2 4 4 3
$ state : Factor w/ 8 levels "Arizona","Arkansas",..: 1 8 2 4 3 6 7 3 7 5
$ ProductGroup.x : Factor w/ 3 levels "BL","TEX","WL": 1 2 1 1 3 1 2 2 2 2
$ ProductGroupDesc.x : Factor w/ 3 levels "Backhoe Loaders",..: 1 2 1 1 3 1 2 2 2 2
$ Drive_System : Factor w/ 2 levels "Four Wheel Drive",..: 1 NA 2 2 NA 1 NA NA NA NA
$ Enclosure : Factor w/ 3 levels "EROPS","EROPS w AC",..: 3 1 3 1 2 3 2 1 1 1
$ Forks : logi NA NA NA NA NA NA ...
$ Pad_Type : Factor w/ 1 level "Street": NA NA NA NA NA 1 NA NA NA NA
$ Ride_Control : Factor w/ 1 level "No": 1 NA 1 1 NA 1 NA NA NA NA
$ Stick : Factor w/ 2 levels "Extended","Standard": 1 NA 2 2 NA 2 NA NA NA NA
$ Transmission : Factor w/ 2 levels "Powershuttle",..: 1 NA 2 2 NA 2 NA NA NA NA
$ Turbocharged : logi NA NA NA NA NA NA ...
$ Blade_Extension : logi NA NA NA NA NA NA ...
$ Blade_Width : logi NA NA NA NA NA NA ...
$ Enclosure_Type : logi NA NA NA NA NA NA ...
$ Engine_Horsepower : logi NA NA NA NA NA NA ...
$ Hydraulics : Factor w/ 2 levels "2 Valve","Auxiliary": NA 2 NA NA 1 NA 1 2 2 2
$ Pushblock : logi NA NA NA NA NA NA ...
$ Ripper : logi NA NA NA NA NA NA ...
$ Scarifier : logi NA NA NA NA NA NA ...
$ Tip_Control : logi NA NA NA NA NA NA ...
$ Tire_Size : logi NA NA NA NA NA NA ...
$ Coupler : Factor w/ 1 level "Manual": NA NA NA NA NA NA NA NA 1 NA
$ Coupler_System : logi NA NA NA NA NA NA ...
$ Grouser_Tracks : logi NA NA NA NA NA NA ...
$ Hydraulics_Flow : logi NA NA NA NA NA NA ...
$ Track_Type : Factor w/ 2 levels "Rubber","Steel": NA 2 NA NA NA NA NA 1 1 2
$ Undercarriage_Pad_Width : int NA 16 NA NA NA NA NA NA NA NA
$ Stick_Length : num NA NA NA NA NA NA NA NA NA 132
$ Thumb : logi NA NA NA NA NA NA ...
$ Pattern_Changer : logi NA NA NA NA NA NA ...
$ Grouser_Type : Factor w/ 1 level "Double": NA 1 NA NA NA NA NA 1 1 1
$ Backhoe_Mounting : logi NA NA NA NA NA NA ...
$ Blade_Type : logi NA NA NA NA NA NA ...
$ Travel_Controls : logi NA NA NA NA NA NA ...
$ Differential_Type : Factor w/ 1 level "Standard": NA NA NA NA 1 NA NA NA NA NA
$ Steering_Controls : Factor w/ 1 level "Conventional": NA NA NA NA 1 NA NA NA NA NA
$ saledatenumeric : num 14231 14637 13468 13104 13734 ...
$ ageAtSale : num 1539 2311 2603 5161 367746 ...
$ saleYear : num 2008 2010 2006 2005 2007 ...
$ saleMonth : Factor w/ 7 levels "August","December",..: 2 3 6 6 1 1 5 4 1 7
$ saleDay : Factor w/ 10 levels "09","14","16",..: 5 10 3 4 1 8 6 2 9 7
$ saleWeekday : Factor w/ 1 level "Thursday": 1 1 1 1 1 1 1 1 1 1
$ MedianModelPrice : int 25250 9500 19000 11500 65000 25250 38500 13500 21500 36000
$ ModelCount : num 2 1 1 1 1 2 1 1 1 1
$ ModelID.y : Factor w/ 9 levels "16705","21442",..: 8 2 7 4 6 8 1 9 5 3
$ fiModelDesc.y : Factor w/ 9 levels "310E","310G",..: 2 5 1 6 7 2 9 3 4 8
$ fiBaseModel.y : Factor w/ 8 levels "310","334","430",..: 1 4 1 5 6 1 8 2 3 7
$ fiSecondaryDesc.y : Factor w/ 6 levels "B","E","G","LC",..: 3 5 2 6 1 3 NA NA NA 4
$ fiModelSeries.y : int NA NA NA NA NA NA -6 NA NA 6
$ fiModelDescriptor.y : Factor w/ 1 level "LK": NA NA NA NA NA NA NA NA NA 1
$ fiProductClassDesc.y : Factor w/ 6 levels "Backhoe Loader - 14.0 to 15.0 Ft Standard Digging Depth",..: 1 3 1 1 6 1 5 2 2 4
$ ProductGroup.y : Factor w/ 3 levels "BL","TEX","WL": 1 2 1 1 3 1 3 2 2 2
$ ProductGroupDesc.y : Factor w/ 3 levels "Backhoe Loaders",..: 1 2 1 1 3 1 3 2 2 2
$ MfgYear : num 2004 2003 1999 1991 1987 ...
$ fiManufacturerID : Factor w/ 6 levels "103","121","25",..: 6 4 6 3 5 6 1 2 2 1
$ fiManufacturerDesc : Factor w/ 6 levels "Bobcat","Case",..: 5 4 5 2 3 5 6 1 1 6
$ PrimarySizeBasis : Factor w/ 3 levels "Horsepower","Standard Digging Depth - Ft",..: 2 3 2 2 1 2 1 3 3 3
$ PrimaryLower : int 14 4 14 14 350 14 225 3 3 40
$ PrimaryUpper : int 15 5 15 15 500 15 250 4 4 50
Dotted pair list of 12
$ : language (function() { load("./rmr-local-env9432cc02004") ...
$ : language rmr2:::map.loop(map = map, keyval.reader = input.reader(), keyval.writer = if (is.null(reduce)) { output.writer() ...
$ : language as.keyval(map(keys(kv), values(kv)))
$ : language is.keyval(x)
$ : language map(keys(kv), values(kv))
$ : language c.keyval(lapply(1:num.models, generate.sample))
$ : language f.single(args[[1]])
$ : language lapply(kvs, recycle.keyval)
$ : language FUN(X[[4L]], ...)
$ : language keyval(rmr.recycle(k, v), rmr.recycle(v, k))
$ : language rmr.recycle(k, v)
$ : language rmr.str(lx)
lx
int 1
Dotted pair list of 12
$ : language (function() { load("./rmr-local-env9432cc02004") ...
$ : language rmr2:::map.loop(map = map, keyval.reader = input.reader(), keyval.writer = if (is.null(reduce)) { output.writer() ...
$ : language as.keyval(map(keys(kv), values(kv)))
$ : language is.keyval(x)
$ : language map(keys(kv), values(kv))
$ : language c.keyval(lapply(1:num.models, generate.sample))
$ : language f.single(args[[1]])
$ : language lapply(kvs, recycle.keyval)
$ : language FUN(X[[4L]], ...)
$ : language keyval(rmr.recycle(k, v), rmr.recycle(v, k))
$ : language rmr.recycle(k, v)
$ : language rmr.str(ly)
ly
int 0
Error in rmr.recycle(k, v) : Can't recycle 0-length argument
Calls: ... c.keyval -> f.single -> lapply -> FUN -> keyval -> rmr.recycle
Execution halted
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:390)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at org.apache.hadoop.mapred.Child.main(Child.java:260)

@andrewmilkowski

@laserson

sorry I confused you a bit, the lines

transactions <- read.table(file="../downloads/train.csv",
#nrows=1000,
nrows=20,

are coming from joinData.R , it is how I reduced number of samples to fitRandomForest.R

internally in rmr2 as is seen above in the trace exception: Error in rmr.recycle(k, v) : Can't recycle 0-length argument

is where the problem is...

@laserson
Owner

Issue moved to rmr repo.

@laserson laserson closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.