MUSA-620-Week-9

Text mining / natural language processing

R data visualizers to follow on Twitter:

CoreNLP

To use Stanford's CoreNLP software, you must have Java installed. If you successfully got Selenium to work, then Java is already installed.
Download the files coreNLP.r and myProps.properties.
The steps for setup and usage are given in the coreNLP.r script.
If you want to experiment with CoreNLP's other functionalities, you can find the full list of "annotators" here. To use them, open your .properties file and add them in manually.

Assignment

Use the Twitter streaming API and sentiment analysis to examine the use of emojis on Twitter.

This assignment is required. Please turn it in by email to myself (galkamaxd at gmail) and Evan (ecernea at sas dot upenn dot edu).

Due: Wednesday, 28-March by 9am

Description

The specifics of this assignment are open ended, subject to the following requirements:

The analysis must involve at a minimum 10,000 tweets, collected via the Twitter streaming API. Search parameters are up to you.
The tweets must be segmented by some characteristic(s) and compared on the basis of sentiment.
Emoji use must factor into the analysis.

Examples:

Are tweets with emojis happier than those without?
Which emojis are the most positive and which are the most negative?
Compare the emoji use and sentiment scores of tweets mentioning @realdonaldtrump with those of a random sample of tweets.
Examine the sentiment and emoji use of Twitter bots vs real users.
Choose a few emojis whose sentiment is obvious (e.g. angry face, grinning face) and test how well their use correlates with sentiment scores.

Emoji encodings

Emojis in streamed tweets can be identified by their byte code.

For example, <ed><a0><bd><ed><b1><8d> is the byte code for the thumbs up emoji. When you stream in a tweet containing this emoji, such as this one, its text will appear like this: <ed><a0><bd><ed><b1><8d>nothing is impossible https://t.co/AckwyTxVjR.

You can find a full list of emojis and their associated encodings here: emoji-encodings.csv in the "streamApiEncodings" column.

Using these codes, you can identify any tweet containing a particular emoji by matching the text of the tweet against that emoji's byte code. See the emoji-template.r script for R code to get you started.

Note: These emoji encodings work only for the streaming API. If you want to incorporate the REST API in your analysis, the encodings will be different. REST API encodings are also included in emoji-encodings.csv in the "restApiEncodings" column.

Deliverable

A series of charts displaying your results
the streaming tweet data you used in your analysis
all R scripts used in scraping, analyzing, and visualizing the data and anything else I would need to replicate your analysis (without having to collect the tweets myself).
a written explanation of: the steps you took to create it, any challenges you encountered along the way, and reasons for your design choices

Notes

Consider whether you want to include retweets and/or replies in you analysis. Unless you have a reason for including retweets, the default should be to filter them out since they will, in effect, create duplicates in your data.
You should examine your data manually to make sure the sample you are using is appropriate - should you be filtering out foreign language tweets, highly active users that are skewing the sample?
You do not have to collect all your data in a single streaming session into one large file. It may be easier to break the task into chunks and combine the data later in R.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
coreNLP.R		coreNLP.R
emoji-encodings.csv		emoji-encodings.csv
emoji-template.r		emoji-template.r
myProps.properties		myProps.properties
realdonaldtrump-relative-word-frequency.png		realdonaldtrump-relative-word-frequency.png
state-of-the-union-2018.txt		state-of-the-union-2018.txt
text-mining-realdonaldtrump.R		text-mining-realdonaldtrump.R
week-9-text-mining.pptx		week-9-text-mining.pptx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MUSA-620-Week-9

CoreNLP

Assignment

Description

Emoji encodings

Deliverable

Notes

About

Releases

Packages

Languages

MUSA-620-Spring-2018/MUSA-620-Week-9

Folders and files

Latest commit

History

Repository files navigation

MUSA-620-Week-9

CoreNLP

Assignment

Description

Emoji encodings

Deliverable

Notes

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages