Twitter Research

2015 Research about happiness and Twitter with Dr. Philip Cannata by Nick Sundin and Brandon Olivier.

Getting the Tweets

We have a list of Twitter users in pieces (for faster gathering) in the training_subset[1234].txt files. Those files need to be passed to timelines_new.py. That file is called like this: python3 timelines-new.py training_subset[1234].txt <account index>. That file does call print for showing progress, so python3 is required.

The result of that function is to create a file with the same name as the input file (training_subset[1234].txt in the above example) with ".results.txt" concatenated to the end.

Flatten Tweets

The tweets even at this point have an unusual format, and need to be flattened. We have a file called the_flattener.py and it's run with python3 the_flattener.py grabbed.json > flattened.txt. Python 3 is required because it calls print as a function.

Associate Zip codes

To associate geolocation coordinates to a zip code, we have to grab shape files of the zip codes from the US government and then convert those into a geoJSON file then upload that into a MongoDB instance. From that, it is trivial to create queries for Mongo to query whether abstracted locations in a shape.

Get Shapefiles

We got a library of shapefiles from https://github.com/mbostock/us-atlas which is cloned with git clone https://github.com/mbostock/us-atlas.git. Then cd into that directory, and run make shp/us/zipcodes-unmerged.shp. That will download the full shape files from the US census website.

That file is too large. To compress the file run ogr2ogr simplify .001 simple.shp shp/us/zipcodes-unmerged.shp.

Since MongoDB is written in Javascript, we need to convert that .shp file to a json file. To convert to geoJSON ogr2ogr -f GeoJSON zipcodes.json simple.shp, where simple.shp is a file that was created in the last step.

Load Shapefiles into Mongo

To import the json file into mongo use: mongoimport zipcodes.json --db research --collection zipcodes. Sometimes that doesn't work exactly right and we need to include the --jsonArray as a flag to get the representation working completely right.

Query Mongo with Tweets

Mongo queries about zip codes take the form of

                    db.collection.findOne({ geometry:
                      { $geoIntersects:
                        { $geometry :
                          { type: "Point",
                            coordinates: tweet.coordinates
                          }
                        }
                      }}, callbackFunction);

UserFiler

To associate each tweet with a zip code, run the file userFiler.js. In line with unix practices, that command simply outputs the text that it returns (simplified json). So that command would be run like node userFiler.js tweets.txt > {{ filename for simplified tweets}}.

Aggregate with Spark

After installing Spark from the website and formatting the data with the required information, we can convert the individual tweets into aggregates. The command to do that is spark-submit twitter-aggregator.py tweeets_all.flattened.zip.txt > tweets_with_counts.aggregate.txt

Convert Aggregates into CSV

The txt file, so that it can be imported into databases, needs to be converted into a csv file. The python code to do that is in output_converter.py. That script doesn't output any text, so it should work with python 2 or 3. We used 3. It's run with python output_converter.py tweets_with_counts.aggregate.txt aggregates.csv which will produce a csv file.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
paper		paper
.gitignore		.gitignore
output_converter.py		output_converter.py
readme.md		readme.md
research.twb		research.twb
the_flattener.py		the_flattener.py
timelines-new.py		timelines-new.py
twitter_aggregator3.py		twitter_aggregator3.py
userFiler.js		userFiler.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Research

Getting the Tweets

Flatten Tweets

Associate Zip codes

Get Shapefiles

Load Shapefiles into Mongo

Query Mongo with Tweets

UserFiler

Aggregate with Spark

Convert Aggregates into CSV

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Twitter Research

Getting the Tweets

Flatten Tweets

Associate Zip codes

Get Shapefiles

Load Shapefiles into Mongo

Query Mongo with Tweets

UserFiler

Aggregate with Spark

Convert Aggregates into CSV

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages