-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PR - for FfDL Integration With H2O.ai #88
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added some small comments. I will do a full review after I ran some tests with this PR.
Makefile
Outdated
@@ -104,7 +103,7 @@ minikube: ## Configure Minikube (local Kubernetes) | |||
@which minikube > /dev/null || (echo Please install Minikube; exit 1) | |||
@minikube ip > /dev/null 2>&1 || ( \ | |||
echo "Starting up Minikube"; \ | |||
minikube start --insecure-registry 9.0.0.0/8 --insecure-registry 10.0.0.0/8 --cpus $(MINIKUBE_CPUS) --memory $(MINIKUBE_RAM) --vm-driver=$(MINIKUBE_DRIVER) > /dev/null; \ | |||
minikube start --insecure-registry 9.0.0.0/8 --insecure-registry 10.0.0.0/8 --cpus $(MINIKUBE_CPUS) --memory $(MINIKUBE_RAM) > /dev/null; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to remove the --vm-driver
flag?
version: "latest" | ||
command: > | ||
./h2o3-cluster.sh; | ||
if [ "$LEARNER_ID" == "1" ]; then python h2o3_baseline.py --trainDataFile ${DATA_DIR}/higgs_train_10k.csv --target response --memory 1; fi; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In community/FfDL-H2Oai/README.md
, can you add some instructions on how to obtain the higgs_train_10k.csv
dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, see Examples
in readme
The Makefile changes were an accident, didn't realize I had pushed those. I was experimenting. Good catch. Also, edited the Readme to contain the s3 links to higgs dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nkpng2k, overall this PR looks very good. The only thing we need to change is the default resources for this example because I was having some memory inefficient errors when running with only 1Gb of memory per learner. After that we can pull in this PR. 👍
learners: 2 | ||
gpus: 0 | ||
cpus: 0.5 | ||
memory: 1Gb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After running few tests with this example, it's better to run with 1 cpu and 2Gb of memory.
Hi Tommy, |
Thanks @nkpng2k, Looks very good to me. 👍 |
LGTM |
Developer's Certificate of Origin 1.1
Hello Tommy and Animesh,
Dropping the PR for H2O + FfDL integration. Feel free to add any comments/changes necessary and I will make them happen.
Thanks,
Nicholas