PostgreSQL create + import script for researching the NBC Russian troll tweet dump.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE
README.md
democracyCreate.sql

README.md

democracy_sql

PostgreSQL create + import script for researching the NBC Russian troll tweet dump.

Background

NBC has published a database of the deleted Russian troll tweets. There's even a nifty graph database for exploring them, Neo4j.

But what if you want to explore the data in your own PostgreSQL database? Here is some SQL to create the database and tables, import the CSV files, and convert text data to more convenient formats.

Tested on PostgreSQL 9.3.21, may require adaptation to work with other RDBMSs or versions.

Usage

Prerequisite: install and configure PostgreSQL.

Download tweets.csv and users.csv.

As a user with superuser status, run:

psql -f democracyCreate.sql -v users="'/path/to/users.csv'" -v tweets="'/path/to/tweets.csv'"

Note: this will name the database "democracy"; if you already have a database with that name, edit the first line of the script to call it something else instead.

The schema that results will look like:

                     Table "public.twitter_user"
      Column      |           Type           | Modifiers 
------------------+--------------------------+-----------
 id               | character varying(18)    | 
 location         | character varying(29)    | 
 name             | character varying(20)    | 
 followers_count  | integer                  | 
 statuses_count   | integer                  | 
 time_zone        | character varying(27)    | 
 verified         | boolean                  | 
 lang             | character varying(2)     | 
 screen_name      | character varying(15)    | 
 description      | character varying(160)   | 
 created_at       | timestamp with time zone | 
 favourites_count | integer                  | 
 friends_count    | integer                  | 
 listed_count     | integer                  | 
Indexes:
    "twitter_user_id_key" UNIQUE CONSTRAINT, btree (id)
Referenced by:
    TABLE "tweet" CONSTRAINT "tweet_twitter_user_fkey" FOREIGN KEY (user_id) REFERENCES twitter_user(id)

                     Table "public.tweet"
        Column         |           Type           | Modifiers 
-----------------------+--------------------------+-----------
 user_id               | character varying(18)    | 
 user_key              | character varying(15)    | 
 created_at            | timestamp with time zone | 
 created_str           | character varying(19)    | 
 retweet_count         | integer                  | 
 retweeted             | boolean                  | 
 favorite_count        | integer                  | 
 text                  | character varying(191)   | 
 tweet_id              | character varying(18)    | 
 source                | character varying(93)    | 
 hashtags              | json                     | 
 expanded_urls         | json                     | 
 posted                | character varying(6)     | 
 mentions              | json                     | 
 retweeted_status_id   | character varying(19)    | 
 in_reply_to_status_id | character varying(21)    | 
Foreign-key constraints:
    "tweet_twitter_user_fkey" FOREIGN KEY (user_id) REFERENCES twitter_user(id)

Example Queries

What tweets were most popular?

SELECT twitter_user.name, tweet.text, (tweet.favorite_count + tweet.retweet_count) AS activity_count
FROM twitter_user
JOIN tweet ON twitter_user.id = tweet.user_id
WHERE (tweet.favorite_count IS NOT NULL OR tweet.retweet_count IS NOT NULL)
ORDER BY (tweet.favorite_count + tweet.retweet_count) DESC
LIMIT 10;

What Twitter clients do trolls prefer?

SELECT source, COUNT(*) FROM tweet
GROUP BY source
ORDER BY COUNT(*) DESC;