Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import csv for regression #257

Closed
axiqia opened this issue Oct 20, 2018 · 6 comments
Closed

import csv for regression #257

axiqia opened this issue Oct 20, 2018 · 6 comments

Comments

@axiqia
Copy link

axiqia commented Oct 20, 2018

I want to use Rand Forest algorithm to solve a regression problem, and there is only one classfication example tutorial. So I try the bellow code to test,

 RegressionDataset data;  
 importCSV(data, "/data/C.csv", LAST_COLUMN, ' '); 

and after I ran, I get some error

terminate called after throwing an instance of 'shark::Exception'
  what():  [importCSVReaderSingleValues] problems parsing file (2)
[1]    19082 abort (core dumped)  ./ExampleProject

I have read the other regression algorithm tutorial, and I fond that all of them use the bellow importCSV to load label and data ,respectively.

void importCSV(
	Data<T>& data,
	std::string fn,
	char separator = ',',
	char comment = '#',
	std::size_t maximumBatchSize = Data<T>::DefaultBatchSize,
	std::size_t titleLines = 0
)

How should I do to solve the problem? And is there someting I missed?

Shark version 3.1.0
Thank you.

@Ulfgard
Copy link
Member

Ulfgard commented Oct 20, 2018 via email

@axiqia
Copy link
Author

axiqia commented Oct 21, 2018

Thank you for your quick rely. And I am sorry for not giving the details. I read Sample data set C.csv for test. And bellow is the first few lines.

14.1 0.7 7.4 5.4 5.4 4 1.3 5.4 8.7 6.7 3.4 1.3 4 1.3 3.4 8.7 6.7 8.1 1.3 2.7 0
12.5 0.7 8.3 2.1 4.2 6.9 1.4 5.6 10.4 9 2.1 6.9 1.4 4.2 2.8 4.2 4.2 9 1.4 2.8 0
19 0.7 4.9 5.6 7 11.3 1.4 1.4 7 6.3 3.5 4.2 2.1 3.5 0.7 8.5 4.2 4.9 2.8 0.7 0
20.4 0.7 4.1 2 2.7 13.6 3.4 5.4 8.2 7.5 3.4 2 2 4.8 2 6.8 0.7 6.8 1.4 2 0
5 11 0 3.9 9.1 3.9 7.1 7.1 5.8 12.3 12.3 1.9 1.3 2.6 3.2 2.6 3.9 3.2 5.2 1.3 1.9 0

You see, the entries are separate with space, and my option param is ' ', so it made me confused.

@axiqia
Copy link
Author

axiqia commented Oct 21, 2018

The error means that the file can not be parsed using the options you supplied (e.g. you specify that entries are separate with space, not ','). Without seeing the actual file i have no way to tell you what is wrong.

________________________________ From: axiqia [notifications@github.com] Sent: Saturday, October 20, 2018 5:52 PM To: Shark-ML/Shark Cc: Subscribed Subject: [Shark-ML/Shark] import csv for regression (#257) I want to use Rand Forest algorithm to solve a regression problem, and there is only one classfication example tutorialhttp://image.diku.dk/shark/sphinx_pages/build/html/rest_sources/tutorials/algorithms/rf.html. So I try the bellow code to test, RegressionDataset data; importCSV(data, "/data/C.csv", LAST_COLUMN, ' '); and after I ran, I get some error terminate called after throwing an instance of 'shark::Exception' what(): [importCSVReaderSingleValues] problems parsing file (2) [1] 19082 abort (core dumped) ./ExampleProject I have read the other regression algorithm tutorial, and I fond that all of them use the bellow importCSV to load label and data ,respectively. void importCSV( Data& data, std::string fn, char separator = ',', char comment = '#', std::size_t maximumBatchSize = Data::DefaultBatchSize, std::size_t titleLines = 0 ) How should I do to solve the problem? And is there someting I missed? Shark version 3.1.0 Thank you. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#257>, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOWTBuuNVRT0UZ1ZLaJy_BrzsxT6XQrZks5um0awgaJpZM4Xx_i_.

Thank you for your hint. I found out where the mistake was. I step through each line of my code and step into the importCSV function, the param separator is always setted as ','. Then I fond that importCSV function has three generic types

/// \brief Import a Dataset from a csv file
void importCSV(
	Data<T>& data,
	std::string fn,
	char separator = ',',
	char comment = '#',
	std::size_t maximumBatchSize = Data<T>::DefaultBatchSize,
	std::size_t titleLines = 0
)

/// \brief Import a labeled Dataset from a csv file
template<class T>
void importCSV(
	LabeledData<blas::vector<T>, unsigned int>& data,
	std::string fn,
	LabelPosition lp,
	char separator = ',',
	char comment = '#',
	std::size_t maximumBatchSize = LabeledData<RealVector, unsigned int>::DefaultBatchSize
)
/// \brief Import a labeled Dataset from a csv file
template<class T>
void importCSV(
	LabeledData<blas::vector<T>, blas::vector<T> >& data,
	std::string fn,
	LabelPosition lp,
	std::size_t numberOfOutputs = 1,
	char separator = ',',
	char comment = '#',
	std::size_t maximumBatchSize = LabeledData<RealVector, RealVector>::DefaultBatchSize
)

I realizeed I had to specify the param numberOfOutputs. The brief description didn't tell the difference between tha last two function at all.
Why not design a unified interface? And if there is a regression example in the document, I think it wil help the new user like me a lot.
Thank you again.

@Ulfgard
Copy link
Member

Ulfgard commented Oct 21, 2018

Unified interface does not make sense.

The first version does not have a label, so it is confusing to have to specify a label position.
Second version is for class læabels. There can only be one column for that, so no need for number of outputs.

third version is for regression, there we can have vectorial labels.

We are still working on making the tutorials better, I will try to include that in a future Data section

@axiqia
Copy link
Author

axiqia commented Oct 21, 2018

Yeah, I have realized difference among the three version :). Maybe the comments should be as clear as you said.
And the error information like bellow

'shark::Exception'
  what():  [importCSVReaderSingleValues] problems parsing file (2)

really helpless for me. Is there a document for the user to look up possible reasons?
Thank you very much.

@axiqia axiqia closed this as completed Oct 22, 2018
@Ulfgard
Copy link
Member

Ulfgard commented Oct 22, 2018

Hi,

there is no document, unfortunately. We base our parser on boost.spirit and it is a bit tough to get the exact reason out. We just check whether the parser could read everything (and that it succeeded with what it read). It is possible to add this, and we would be happy to take a pull request (based on the current 4.1 branch), but have no time to do it ourselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants