Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module to convert raw scraped data into a standardised format #5

Open
adiah80 opened this issue May 12, 2020 · 2 comments
Open

Module to convert raw scraped data into a standardised format #5

adiah80 opened this issue May 12, 2020 · 2 comments

Comments

@adiah80
Copy link
Member

adiah80 commented May 12, 2020

Raw scraped data from Issue #4 would need to be processed before it can be used for training the models. We need a module that aggregates the raw data into a single dataset (.csv file) containing the training features and labels.

Each tweet tweeted by someone the user follows should be considered as a data point. All the tweets that were interacted with (liked, retweeted, or commented on) should be classified as a positive instance.

Features should include the tweet text, the user who tweeted the tweet, the global tweet interaction metrics (count of likes, retweets, comments), and the tweet time.

More complex features can also be thought of and included.

@Akshat2430
Copy link
Contributor

Dibs!

@ajaysub110
Copy link
Member

Can you please take up #9 first so that we have the scraping module ready?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants