You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By default, for reducers Pig uses only 1 task. Thus if the user applies a DISTINCT followed by a GROUP BY, the Pig stream parallelism gets funnelled into one task.
The workaround is to use the PARALLEL work (though that requires user interaction) or potentially create a dynamic file size to trigger this automatically (following Pig's naive InputSizeReducerEstimator).