-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Examples: provide an example based a real dataset #125
Comments
Currently, TPOT only supports supervised classification tasks. Is what you're describing a supervised classification task, i.e., do you have a set of features with labels that you're attempted to model? |
Ok thanks. Let's suppose that I have manually tagged features (i.e. in this case topics), so labelled them in order to belong to a most generic topic (new one) so let's say we have a test set with a initial classification in N categories/topics. |
What features would the algorithm have to classify the topic with? |
So, since I have no description of a |
Gotcha. If you can feed TPOT features such as those distances (or even the vector values from word2vec) and provide it labels for those features, then it can try to optimize a pipeline that maximizes classification accuracy for that data set. |
@rhiever ok, which is the input data format, let's say a typical row I would have from the |
Check these example notebooks:
TPOT requires the "standard" supervised data set format. Check this tutorial to see what that entails: https://github.com/amueller/scipy_2015_sklearn_tutorial/blob/master/notebooks/01.3%20Data%20Representation%20for%20Machine%20Learning.ipynb |
I'm trying to figure out a real-world example with
tpot
to learn how it works, step by step.So, I will take this dataset as example, that looks like
each item in this json is a topic like
socialnetwork
that belongs to a category likecareer-business
. More topics may belong to the same category, likewine-tasting
andnightlife
belongs tosocializing
, son in a 1 to N relationship.Now, the are some topics that may affer to the same topic like here:
Topics like
dragons
andlegendarycreatures
could be considered part of a new topic likefantasy
.So, assumed that
tpot
can analyze dataset automatically, how to describe this dataset in order that ourobjective function
is to collect / aggregate topics in new topics, so new categories that are not defined in this dataset?The text was updated successfully, but these errors were encountered: