Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: could not allocate 0 bytes #41

Closed
hengzhe-zhang opened this issue Feb 21, 2021 · 26 comments · Fixed by #44 or #70
Closed

Error: could not allocate 0 bytes #41

hengzhe-zhang opened this issue Feb 21, 2021 · 26 comments · Fixed by #44 or #70
Labels
bug Something isn't working

Comments

@hengzhe-zhang
Copy link

When I was using this package, I experienced the following problem. According to my observation, there is still a lot of available memory. Thus, what's the problem?

  File "deepforest/tree/_tree.pyx", line 123, in deepforest.tree._tree.DepthFirstTreeBuilder.build
  File "deepforest/tree/_tree.pyx", line 256, in deepforest.tree._tree.DepthFirstTreeBuilder.build
  File "deepforest/tree/_tree.pyx", line 480, in deepforest.tree._tree.Tree._resize_node_c
  File "deepforest/tree/_utils.pyx", line 34, in deepforest.tree._utils.safe_realloc
MemoryError: could not allocate 0 bytes
@xuyxu
Copy link
Member

xuyxu commented Feb 21, 2021

Hi @zhenlingcn, thanks for reporting! Could you print out the data type and shape of your training data, so that we can reproduce the problem.

@hengzhe-zhang
Copy link
Author

Yeah, it's very easy to reproduce this problem. We just need to run the following codes:

import numpy as np
from deepforest import CascadeForestRegressor

c = CascadeForestRegressor(n_jobs=1, verbose=0)
c.fit(np.random.randn(10, 5), np.zeros(10, dtype=np.float32))

@hengzhe-zhang
Copy link
Author

By the way, I want to point out that some other normal data has also raised this error.

@xuyxu xuyxu added the bug Something isn't working label Feb 21, 2021
@xuyxu
Copy link
Member

xuyxu commented Feb 21, 2021

Thanks @zhenlingcn, I can reproduce your problem. I will take a careful look latter.

@xuyxu
Copy link
Member

xuyxu commented Feb 21, 2021

This code snippet runs fine:

import numpy as np
from deepforest import CascadeForestRegressor

c = CascadeForestRegressor(n_jobs=1, verbose=2)
c.fit(np.random.randn(10, 5), np.random.randn(10,))

If you want to use DF21 on the fly, could you check if there is a problem after converting the type of y_train into np.float64.

EDIT: We will check where goes wrong when using the target values of type np.float32. This problem may be also caused by the problematic labels np.zeros(10, dtype=np.float32) for regression.

@xuyxu
Copy link
Member

xuyxu commented Feb 21, 2021

@all-contributors please add @zhenlingcn for bug

@allcontributors
Copy link
Contributor

@xuyxu

I've put up a pull request to add @zhenlingcn! 🎉

@hengzhe-zhang
Copy link
Author

I don't believe I can solve the problem by simply modifying the data type. In fact, in the given case, it will still raise an error even if we change the type of the data.

@xuyxu
Copy link
Member

xuyxu commented Feb 21, 2021

I agree. Besides, what is the result using the following command on your target values:

import numpy as np
from sklearn.utils.multiclass import type_of_target

print(type_of_target(y_train))

When y_train is np.zeros(10, dtype=np.float32), the result is binary, which is not compatible with CascadeForestRegressor.

@xuyxu
Copy link
Member

xuyxu commented Feb 23, 2021

Hi @zhenlingcn, I will appreciate it if your could test whether the latest PR #44 raises the same error for your problem. The wheels are available at here.

@hengzhe-zhang
Copy link
Author

hengzhe-zhang commented Feb 24, 2021

I don't believe the latest PR is an appropriate solution. For example, the following test case is a very reasonable test case. However, the latest version of Deep Forest will raise an error.

import numpy as np
from deepforest import CascadeForestRegressor

c = CascadeForestRegressor(n_jobs=1, verbose=0)
c.fit(np.random.randn(10, 5), np.array([1, 2, 3, 4, 5]))
ValueError: CascadeForestRegressor is used for univariate or multi-variate regression, but the target values seem not to be one of them.

@xuyxu
Copy link
Member

xuyxu commented Feb 24, 2021

Thanks for your feedback, will take a look at the Cython side when I get a moment.

@hengzhe-zhang
Copy link
Author

In fact, this problem has hindered me to conduct several comparative experiments. I hope this problem can be solved as soon as possible.

@xuyxu
Copy link
Member

xuyxu commented Mar 1, 2021

Sorry for your problem. Could you check if using the sklearn backend works?

c = CascadeForestRegressor(backend="sklearn")

EDIT: The sklearn backend is slower, but the performance should be the same as the default custom backend, as guaranteed by our unit tests.

@hengzhe-zhang
Copy link
Author

It's great! The sklearn backend seems to work well.

@xuyxu
Copy link
Member

xuyxu commented Mar 1, 2021

Glad to here that 😄

@609347781
Copy link

ValueError: CascadeForestRegressor is used for univariate or multi-variate regression, but the target values seem not to be one of them.
您好我也遇到了这个问题,设置backend="sklearn"还是会报错,是不是我这边标签列只有0.1的原因(分类的代码可以正常运行)
以下我的数据读取部分
Data = pd.read_excel(r'D:\秭归-巴东段易发性基础数据\第二次实验\预测数据\整体数据.xlsx')
Feature = Data.iloc[1:578158,4:14].values
Label = Data.iloc[1:578158,1].values
print(Label)
print('数据已读取')
#-标准化处理
StandPFeature = preprocessing.StandardScaler().fit_transform(Feature)

#-------2.构造训练集和测试集------#
xTrain = StandPFeature[0:8660,:] #训练集特征
xTest = StandPFeature[len(xTrain):len(StandPFeature),:] #测试集特征
yTrain = Label[:8660:].ravel()
yTest = Label[len(xTrain):len(StandPFeature):].ravel()
print('训练集已分类完毕')
望解答,非常感谢!!!!!

@609347781
Copy link

rf1 = CascadeForestRegressor(backend="sklearn")
rf1.fit(xTrain,yTrain)
pred_value=rf1.predict(xTest)

@xuyxu
Copy link
Member

xuyxu commented May 5, 2022

你好,数据集的标签列既然只有0、1取值,为啥要用CascadeForestRegressor? @609347781

@609347781
Copy link

609347781 commented May 5, 2022 via email

@xuyxu
Copy link
Member

xuyxu commented May 5, 2022

请尝试调用CascadeForestClassifierpredict_proba方法,它会返回每个样本属于正类的概率

@609347781
Copy link

609347781 commented May 5, 2022 via email

@609347781
Copy link

thanks!

@GOD-TEN
Copy link

GOD-TEN commented Mar 11, 2024

作者您好,我也有这个问题。我做的是关联预测的方面,我的数据只有0,1。1代表关联,我想预测关联度,所以是个回归问题,但是也会报CascadeForestRegressor is used for univariate or multi-variate regression, but the target values seem not to be one of them这个错误。我查看了源码,发现会对标签进行判断:type_of_target(y),结果是'binary'就无法用回归问题。请教一下您这可以解决吗。(正常的随机森林可以做回归预测)

@xuyxu
Copy link
Member

xuyxu commented Mar 11, 2024

作者您好,我也有这个问题。我做的是关联预测的方面,我的数据只有0,1。1代表关联,我想预测关联度,所以是个回归问题,但是也会报CascadeForestRegressor is used for univariate or multi-variate regression, but the target values seem not to be one of them这个错误。我查看了源码,发现会对标签进行判断:type_of_target(y),结果是'binary'就无法用回归问题。请教一下您这可以解决吗。(正常的随机森林可以做回归预测)

可以先尝试把label改成float数据类型,看看能不能绕过type_of_target的判断。不行的话,直接根据报错的traceback,找到代码源文件,然后把对type_of_target的判断注释掉 @GOD-TEN

@GOD-TEN
Copy link

GOD-TEN commented Mar 12, 2024

非常感谢您提出的方案。第一种:把label改成float数据类型,不可行。不过第二种是可行的,非常感谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants