Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module for classifiers and interaction analysis #6

Open
adiah80 opened this issue May 12, 2020 · 10 comments
Open

Module for classifiers and interaction analysis #6

adiah80 opened this issue May 12, 2020 · 10 comments

Comments

@adiah80
Copy link
Member

adiah80 commented May 12, 2020

Building upon issues #4 and #5, we need a general module that explores several classification models.

Some libraries that can be explored for classification models are scikit-learn, xgboost, and catboost among others.

Other NLP based methods can also be explored for analyzing the user's preferences. These would employ the tweet text and user interaction metrics.

@shashank-m
Copy link

I can take this issue up. Can someone tell me what the inputs and outputs of the model are?

@ajaysub110
Copy link
Member

Inputs to the model are the tweet text, username of tweeter, number of existing retweets and likes. Output label is binary (1 or 0) determining whether the tweet is interesting or not to our user. These labels have been assigned in the train set based on whether has interacted (liked or retweeted) a tweet or not.

@ajaysub110
Copy link
Member

Of course, we could do generate some of our own features from these too and/or introduce more inputs (if available) to the model. That's up to you

@ajaysub110
Copy link
Member

@adiah80 had suggested collaborative ranking/clustering which is used in recommendation systems. This paper will be probably be useful to check out first
https://homes.cs.washington.edu/~tqchen/data/pdf/sigir12-p661-chen.pdf

@shashank-m
Copy link

Yeah cool. But I was thinking model output will be topic interested in. Like a multiclass classification problem

@ajaysub110
Copy link
Member

What do you think can be good outputs for our model?

@shashank-m
Copy link

Hmm but if the output is topic then labelling will be hard i guess

@ajaysub110
Copy link
Member

Oh you meant "topic interested in" as an output? Yeah, only trends are assigned topics on twitter, I think

@shashank-m
Copy link

But I think topics can be a feature in the train set. Like we can cluster the tweets in the train set and assign cluster number of the tweet as a feature. How does this sound?

@ajaysub110
Copy link
Member

Hmm yeah we could try that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants