Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pydoop submit script fails #366

Closed
orwa-te opened this issue Feb 8, 2020 · 5 comments
Closed

Pydoop submit script fails #366

orwa-te opened this issue Feb 8, 2020 · 5 comments

Comments

@orwa-te
Copy link

orwa-te commented Feb 8, 2020

I have tried to run the code for wordCount example linked here https://crs4.github.io/pydoop/tutorial/pydoop_script.html using the pydoop script script.py hdfs_input hdfs_output and it worked fine for me and I could see the results from HDFS. However when I try to run the full-featured version of the program using "Pydoop submit" linked here https://crs4.github.io/pydoop/tutorial/mapred_api.html#api-tutorial using pydoop submit --upload-file-to-cache wc.py wc input output it takes too much time while running without getting any response or result, also the map-reduce job looks like it got stuck and always get something like this in the terminal:

2020-02-08 18:21:05,580 INFO mapreduce.Job: Job job_1581178676163_0001 running in uber mode : false
2020-02-08 18:21:05,583 INFO mapreduce.Job: map 0% reduce 0%
2020-02-08 18:31:34,480 INFO mapreduce.Job: Task Id : attempt_1581178676163_0001_m_000000_0, Status : FAILED
AttemptID:attempt_1581178676163_0001_m_000000_0 Timed out after 600 secs
^C[hdadmin@datanode3 pydoop]$

Map-Reduce job fails when using "Pydoop submit"!!
What could cause the problem and how to solve it?

@orwa-te orwa-te closed this as completed Feb 9, 2020
@orwa-te orwa-te reopened this Feb 9, 2020
@simleo
Copy link
Member

simleo commented Feb 10, 2020

To see what went wrong you have to check the individual task logs. You can access them via the Hadoop web UI.

@orwa-te
Copy link
Author

orwa-te commented Feb 10, 2020

After tyring multiple times, the console gives me these messages:

2020-02-10 23:22:03,628 INFO mapreduce.Job: map 0% reduce 0%
2020-02-10 23:32:34,268 INFO mapreduce.Job: Task Id : attempt_1581369620079_0001_m_000000_0, Status : FAILED
AttemptID:attempt_1581369620079_0001_m_000000_0 Timed out after 600 secs
[2020-02-10 23:32:33.784]Sent signal OUTPUT_THREAD_DUMP (SIGQUIT) to pid 24623 as user hdadmin for container container_1581369620079_0001_01_000002, result=success
[2020-02-10 23:32:33.792]Container killed by the ApplicationMaster.
[2020-02-10 23:32:33.811]Container killed on request. Exit code is 143
[2020-02-10 23:32:33.812]Container exited with a non-zero exit code 143.

I opened "sys logs" from web UI and could not find any error or even warning messages, but "stderr" data is like this:

Feb 10, 2020 11:22:01 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class
Feb 10, 2020 11:22:01 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
Feb 10, 2020 11:22:01 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class
Feb 10, 2020 11:22:01 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.19 02/11/2015 03:25 AM'
Feb 10, 2020 11:22:01 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton"
Feb 10, 2020 11:22:01 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton"
Feb 10, 2020 11:22:02 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest"

I searched for the message "Container exited with a non-zero exit code 143" and found that it may be related to the garbage collector or other memory allocation issues. If this is the case, how the default script Pydoop works with no problems!

@simleo
Copy link
Member

simleo commented Feb 11, 2020

I see. Try tweaking the memory settings and good luck :)

@simleo simleo closed this as completed Feb 11, 2020
@orwa-te
Copy link
Author

orwa-te commented Feb 12, 2020

I am running my Hadoop in my single machine on VM installed with 10 GB RAM and 2 processing cores, Centos 7
What is wrong with the following configuration settings? Here are the properties with their values where memory in MB:

yarn-site.xml

yarn.scheduler.minimum-allocation-mb -> 512
yarn.scheduler.minimum-allocation-vcores -> 1
yarn.scheduler.maximum-allocation-vcores -> 2
yarn.nodemanager.resource.memory-mb -> 8192
yarn.nodemanager.resource.cpu-vcores -> 2

mapred-site.xml

mapreduce.map.memory.mb -> 3072
mapreduce.reduce.memory.mb -> 3072
mapreduce.map.java.opts -> Xmx2048m
mapreduce.reduce.java.opts ->  Xmx2048m
yarn.nodemanager.vmem-pmem-ratio -> 2.1

@simleo
Copy link
Member

simleo commented Feb 13, 2020

That depends on many factors, including the Hadoop version you're running. You can try asking on the Hadoop mailing lists. In the Docker images we use for testing, the configuration is rather minimal. If you want, you can check it out here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants