Store the training set internally so the user doesn't have to repeatedly pass it #9

rhiever · 2015-11-12T02:13:11Z

Since the training set is required to properly run the pipeline each time, this information should be stored internally so the user doesn't have to repeatedly pass it to TPOT (e.g., with the score() function).

The text was updated successfully, but these errors were encountered:

rhiever · 2015-11-12T02:17:30Z

Actually, this is a bad idea. It can be disastrous to have multiple copies of a very large data set.

rasbt · 2015-11-12T02:30:29Z

Agreed! Plus, the training set may already be copied a bunch of times already (unless you disable the mutliprocessing in certain pipeline objects)

rhiever · 2015-11-12T02:34:41Z

I actually need to work through the pipeline operator code and probably remove all the copy() operations. I think they're unnecessary, but I'm actually not sure if the DataFrames are being passed by copy or reference.

rhiever added the enhancement label Nov 12, 2015

rhiever self-assigned this Nov 12, 2015

rhiever closed this as completed Nov 12, 2015

AIAdventures mentioned this issue Jun 6, 2017

Titanic example -problem with 2nd last cell. #492

Closed

saddy001 mentioned this issue Mar 20, 2018

Segfault on optimization process #676

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store the training set internally so the user doesn't have to repeatedly pass it #9

Store the training set internally so the user doesn't have to repeatedly pass it #9

rhiever commented Nov 12, 2015

rhiever commented Nov 12, 2015

rasbt commented Nov 12, 2015

rhiever commented Nov 12, 2015

Store the training set internally so the user doesn't have to repeatedly pass it #9

Store the training set internally so the user doesn't have to repeatedly pass it #9

Comments

rhiever commented Nov 12, 2015

rhiever commented Nov 12, 2015

rasbt commented Nov 12, 2015

rhiever commented Nov 12, 2015