<img src="http://hilpisch.com/tpq_logo.png" width="36%" align="right" style="vertical-align: top;">

# Dow Jones DNA NLP Case Study

_Based on news articles related to Hurricane Harvey._

**Network Graph Analysis**

Dr Yves J Hilpisch | Michael Schwed

The Python Quants GmbH

## The Imports

In [1]:
import os
import sys
sys.path.append('../../modules')

In [2]:
import pandas as pd
import ng_functions as ng
import nlp_functions as nlp

In [3]:
project = 'harvey_250'

In [4]:
abs_path = os.path.abspath('../../')

In [5]:
data_path = os.path.join(abs_path, 'data_harvey')

In [6]:
results_path = os.path.join(data_path, 'results')

## Reading the Data

In [7]:
fn = os.path.join(results_path, 'relations_{}.h5'.format(project))

In [8]:
!ls ../../data_harvey/results

relations_harvey_250.h5


In [9]:
data = pd.read_hdf(fn, 'data')

In [10]:
data.head()

Unnamed: 0,Node1,Relation,Node2
0,hurricane irma,has strengthened to,category
1,people,is in,leeward islands of caribbean
2,home,is in,affected areas
6,significant event,director of,caribbean disaster emergency management agency
14,forecaster,expect,storm


## Network Graph

### Full Graph

In [11]:
g = ng.create_graph(data.iloc[:1000])

In [12]:
G = ng.plot_graph(g, central_gravity=0.01,
                  with_edge_label=True,
                  height='600px', width='80%',
                  filter_=['physics'])

In [13]:
# G.show('ng_harvey_01.html')

### Focused Graph

In [14]:
entities = ['hurricane', 'houston', 'government','trump']

In [15]:
sel_1 = data[data['Node1'].apply(lambda s: s in entities)].copy()

In [16]:
sel_2 = data[data['Node2'].apply(lambda s: s in entities)].copy()

In [17]:
sel = pd.concat((sel_1, sel_2), ignore_index=True)

In [18]:
sel = sel.applymap(lambda s: ' '.join(nlp.tokenize(s)))

In [19]:
sel = sel[sel.applymap(lambda s: len(s.split()) <= 1)].dropna()

In [20]:
g = ng.create_graph(sel)

In [21]:
G = ng.plot_graph(g, central_gravity=0.01,
                  with_edge_label=True,
                  height='600px', width='80%',
                  filter_=['physics'])

In [22]:
G.show('ng_harvey_02.html')

<img src="http://hilpisch.com/tpq_logo.png" width="36%" align="right" style="vertical-align: top;">