# Project Notebook

The purpose of this notebook is to act as an index to the other notebooks used in this project.

* <a href="wiki_scrape.ipynb" target="_blank">Wiki Scrape</a>: Scrape voting history from http://survivor.wikia.com/wiki/Main_Page. Output saved in wiki_scrape.p
* <a href="process_votes.ipynb" target="_blank">Process Votes</a>: Extract features from voting history. Output saved in process_votes.p
* <a href="make_graphs.ipynb" target="_blank">Make Graphs</a>: Create graphs objects from seasons. Output saved in make_graphs.p
* <a href="network.ipynb" target="_blank">Network</a>: Explore relationships

## Work Flow

Details of the following sections can be found in the hyperlinked notebooks.

In [1]:
# Need to force reloading of modules before execution ...
%load_ext autoreload
%autoreload 2

# ... except:
%aimport pickle

In [2]:
# Force everything to be run ...
run_all_override = True

### Wiki Scrape
<a href="wiki_scrape.ipynb" target="_blank">Wiki Scrape</a>: Scrape voting history from http://survivor.wikia.com/wiki/Main_Page.

In [3]:
use_wiki_scrape_from_disk = True and not run_all_override

if use_wiki_scrape_from_disk:
    try:
        print "Loading seasons from disk."
        seasons = pickle.load( open( "wiki_scrape.p", "rb" ) )
    except IOError:
        print "Error loading from disk."
        use_wiki_scrape_from_disk = False

if not use_wiki_scrape_from_disk:
    import wiki_scrape
    url = "http://survivor.wikia.com/wiki/Main_Page"
    print "Scraping " + url
    seasons = wiki_scrape.scrape(url, save_to_disk=True)

Scraping http://survivor.wikia.com/wiki/Main_Page


### Process Votes
<a href="process_votes.ipynb" target="_blank">Process Votes</a>: Extract features from voting history. Output saved in process_votes.p

In [4]:
use_process_votes_from_disk = True and not run_all_override

if use_process_votes_from_disk:
    try:
        print "Loading voteweights from disk."
        voteweights = pickle.load( open( "process_votes.p", "rb" ) )
    except IOError:
        print "Error loading from disk."
        use_process_votes_from_disk = False
        
if not use_process_votes_from_disk:
    import process_votes
    print "Processing votes ..."
    voteweights = process_votes.get_voteweights(seasons, save_to_disk=True)

Processing votes ...


### Make Graphs
<a href="make_graphs.ipynb" target="_blank">Make Graphs</a>: Create graphs objects from seasons. Output saved in make_graphs.p

In [5]:
use_make_graphs_from_disk = True and not run_all_override

if use_make_graphs_from_disk:
    try:
        print "Loading graphs from disk"
        graphs = pickle.load( open( "make_graphs.p", "rb" ) )
    except IOError:
        print "Error loading from disk"
        use_process_votes_from_disk = False

if not use_make_graphs_from_disk:
    import make_graphs
    print "Making graphs"
    graphs = make_graphs.make_all_graphs(voteweights)

Making graphs


### Network
<a href="network.ipynb" target="_blank">Network</a>: Explore relationships

In [6]:
use_network_from_disk = True and not run_all_override

if use_network_from_disk:
    try:
        print "Loading graph stats from disk"
        central = pickle.load( open( "network.p", "rb") )
    except IOError:
        print "Error loading from disk"
        use_network_from_disk = False
    
if not use_network_from_disk:
    import network
    print "Calculating graph statistics"
    central = network.get_all_centrality_scores(
        voteweights, graphs, save_to_disk=True
    )

Calculating graph statistics


  df = df.sort(['page', 'eig', 'deg'], ascending=[0, 0, 0])


### Episode Scores

In [7]:
import numpy as np

use_episode_scores_from_disk = True and not run_all_override

if use_episode_scores_from_disk:
    try:
        print "Loading episode scores from disk"
        seasons = pickle.load( open( "episode_scores.p", "rb") )
    except IOError:
        print "Error loading from disk"
        use_episode_scores_from_disk = False

if not use_network_from_disk:
    import episode_scores
    print "Calculating episode scores"
    n = 8
    time_line_prct = [i/n for i in np.arange(n) + 1.]
    episode_scores.process_all_seasons(seasons, time_line_prct)


Calculating episode scores
Palau
Tocantins
Borneo
Panama
Cambodia
Blood_vs._Water
Marquesas
Pearl_Islands
Vanuatu
The_Australian_Outback
Heroes_vs_Villains
Guatemala
China
Worlds_Apart
Thailand
The_Amazon
Cagayan
South_Pacific
One_World
Philippines
Caramoan
Gabon
Micronesia
Samoa
All-Stars
Fiji
Africa
San_Juan_del_Sur
Cook_Islands
Redemption_Island
Nicaragua


## Appendix

In [8]:
!ls *.ipynb
print
!ls *.p

episode_scores.ipynb
episode_scores_bc.ipynb
make_graphs.ipynb
network.ipynb
process_votes.ipynb
project_notebook.ipynb
wiki_scrape.ipynb

episode_scores.p
make_graphs.p
network.p
process_votes.p
wiki_scrape.p


In [9]:
# %load https://gist.github.com/ajp619/7dd388315fc824208654/raw/81be07b0e793208641182032e074dbe39bbfa08e/pyprint
def pyprint(myfile):
    from pygments import highlight
    from pygments.lexers import PythonLexer
    from pygments.formatters import HtmlFormatter
    import IPython

    with open(myfile) as f:
        code = f.read()

    formatter = HtmlFormatter()
    return IPython.display.HTML('<style type="text/css">{}</style>{}'.format(
        formatter.get_style_defs('.highlight'),
        highlight(code, PythonLexer(), formatter)))

In [10]:
# %load https://gist.github.com/ajp619/ddaa0f35627b066ef528/raw/cbbd6c6c1cad286ba5a358b93fd94eddede7c4ba/qtutil.py
# silly utility to launch a qtconsole if one doesn't exist

consoleFlag = True
consoleFlag = False  # Turn on/off by commenting/uncommenting this line

import psutil

def returnPyIDs():
    pyids = set()
    for pid in psutil.pids():
        try:
            if "python" in psutil.Process(pid).name():
                pyids.add(pid)
        except:
            pass
    return pyids

def launchConsole():
    before_pyids = returnPyIDs()
    %qtconsole
    after_pyids = returnPyIDs()
    newid = after_pyids.difference(before_pyids)
    assert len(newid) == 1
    return list(newid)[0]

try:
    print qtid
except NameError:
    if consoleFlag:
        qtid = launchConsole()
        print qtid
    
if consoleFlag and (qtid not in returnPyIDs()):
    qtid = launchConsole()
    print qtid

4112
