Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweet text data parsing/cleaning for nlp #25

Open
wwymak opened this issue Feb 2, 2017 · 1 comment
Open

Tweet text data parsing/cleaning for nlp #25

wwymak opened this issue Feb 2, 2017 · 1 comment

Comments

@wwymak
Copy link
Contributor

wwymak commented Feb 2, 2017

Some of the tasks we might do are:

Depending on what you want to achieve, you might not need all of the above (e.g. for training word2vec, you might not need to do any of that, but you might want to convert emojis)

  • Tag POS (for further sentiment analysis)

Useful libraries:
spaCy
NLTK
sklearn
TextBlob
gensim
Mallet

I'm exploring what is possible/needed at the mo with @divya -- but feel free to chip in with opinions, ideas, especially if you're an nlp expert :)

@jss367
Copy link
Contributor

jss367 commented Feb 25, 2017

I saw the "help wanted" tag on this so I built a notebook that inputs tweets, then tokenizes, removes stop words, and stems the tweets. It's called CleanText.ipynb if you want to take a look at it. I'd be happy to make changes or additions if you have any suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants