*Note: this notebook is meant to accompany a forthcoming book chapter by __Deen Freelon__ titled *Partition-specific network analysis of digital trace data: Research questions and tools. *It may or may not stand on its own. After running the code in this notebook, you may want to proceed to the data analysis notebook.*

The purpose of this notebook is to prepare your computer to run TSM, the Twitter Subgraph Manipulator, on a Twitter dataset.

#Task 1: Software installation

1) Download and install Python 3.4 from here: https://www.python.org/downloads/

2) Install NetworkX, python-louvain, and Twarc using the following shell commands:

`pip install networkx`

`pip install python-louvain`

`pip install twarc`

OR use the following links, respectively:

http://networkx.github.io/download.html

https://bitbucket.org/taynaud/python-louvain

https://github.com/edsu/twarc

3) Download the Twitter ID file from here: [link] to your Python directory.

4) If you want to run the code in this notebook directly from the notebook, install IPython using the following shell command:

`pip install ipython`

and run IPython Notebooks using the following shell command:

`ipython notebook`

5) In Python, run the following command:

In [None]:
import tsm

6) If you don't see an error, you did everything correctly.

#Task 2: Hydrating the data

The tweet ID file linked above consists of a list of 564,318 tweet IDs for retweets containing the hashtag "#wiunion" posted between 2/17/2011 and 3/23/2011. This dataset was collected through the Twitter search API and thus is probably incomplete. The code below will convert ("hydrate") the tweet IDs into the data you'll need for the analysis notebook. But before you run it, complete the following steps:

1) Create a new Twitter app here: https://apps.twitter.com/app/new

2) Copy your new app's token, token secret, consumer key, and consumer secret into the respective single-quoted fields on lines 1-4 of the code below. You can find these on the "Keys and Access Tokens" tab of your app page.

3) Run the code below. Remember, the code file or notebook must be in the same folder as the data file for it to work.

4) The hydration process may take up to eight hours of continuous broadband connectivity to complete, or longer on a slower connection. When it has finished, you will find a file called "wiunion_rts_hydrated.csv" in the same folder as your tweet ID file. It should contain close to 564,318 data rows, allowing for a small amount of data rot.

In [None]:
from tsm import save_csv
from datetime import date
from time import strptime
from twarc import Twarc 

token = ''
token_secret = ''
consumer_key = ''
consumer_secret = ''

t = Twarc(consumer_key, consumer_secret, token, token_secret)
wiunion_dataset = []

for tweet in t.hydrate(open('wiunion_rt_ids_20110217-20110323.csv')):
    rawtime = tweet['created_at']
    time_prep = rawtime[-4:] + ' ' + rawtime[4:7] + ' ' + rawtime[8:10]
    st = strptime(time_prep, '%Y %b %d')
    finaldate = date(st.tm_year,st.tm_mon,st.tm_mday).isoformat()
    wiunion_dataset.append([tweet['user']['screen_name'],tweet['text'],finaldate])
    
save_csv('wiunion_rts_hydrated.csv',wiunion_dataset,'USE_QUOTES')