-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError(f"None of [{key}] are in the [{axis_name}]" #50
Comments
Thank you. load_data.py works perfectly now. The dataset is downloaded and processed. I am able to run other DLRM like facebook open source DLRM using GPU, so I believe CUDA setup is correct. |
@lmohit95, Hetu main brench has been updated which enables dynamic memory, please pull the new code and try again. |
Thank you. It works now. Sorry for asking lot of questions. I am facing this issue now while training criteo dataset. |
I mean this problem has been solved by #47 which was merged not long before, and will it still happen when you pull these changes? |
I get the following error when I run To avoid this error, I deliberately made |
@lmohit95, You can also update the line 120 by the following code to avoid the first error: Lines 120 to 126 in 1684091
For the OOM error, #47 implements dynamic memory allocation, and the gpu memory peak will be halved when you run Maybe you haven't pull the latest code yet? |
Thanks a lot for everything. The tests are working perfectly now. I was accessing the forked repo mentioned in the HET paper. I pulled latest code and downloaded criteo dataset by running load_data.py file. |
Thank you. Everything works now. I just wanted to clarify something regarding The paper mentions that the training process can take hours (Fig 6), but in my case the training runs for a total of 10 epochs with far less overall runtime. |
It seems like you are running in a local execution mode, rather than the distributed training. That's why it's much faster. Line 190 in acae42a
|
Got it. Thanks for all the help!!! |
Hello, This is my configuration file
|
I am getting
KeyError(f"None of [{key}] are in the [{axis_name}]"
while runningpython models/load_data.py
.The error occurs in this line.
I have set appropriate path for criteo_dataset in load_data.py. The download, extraction and creating local files part is successful. I downloaded dataset from this website: https://www.kaggle.com/competitions/criteo-display-ad-challenge/data.
The text was updated successfully, but these errors were encountered: