# Show me everyone from \$X country that has visited \$Y extremist forum

_Inspired by The Guardian's release of the [XKeyscore presentation](http://www.theguardian.com/world/interactive/2013/jul/31/nsa-xkeyscore-program-full-presentation)._

## Tutorial Code

Let's re-import scapy

In [2]:
from scapy.all import *



And load the pre-sniffed HTTP traffic that I grabbed from [Wireshark's Samples](http://wiki.wireshark.org/SampleCaptures):

In [3]:
sniffed = "data/http.cap"
pkts = sniff(offline=sniffed)

We can see that we have 41 TCP datagrams (aka packets) and 2 UDP datagrams:

In [4]:
pkts

<Sniffed: TCP:41 UDP:2 ICMP:0 Other:0>

Let's see that nice summary like last time using `nsummary`:

In [5]:
pkts.nsummary()

0000 Ether / IP / TCP 145.254.160.237:tip2 > 65.208.228.223:http S
0001 Ether / IP / TCP 65.208.228.223:http > 145.254.160.237:tip2 SA
0002 Ether / IP / TCP 145.254.160.237:tip2 > 65.208.228.223:http A
0003 Ether / IP / TCP 145.254.160.237:tip2 > 65.208.228.223:http PA / Raw
0004 Ether / IP / TCP 65.208.228.223:http > 145.254.160.237:tip2 A
0005 Ether / IP / TCP 65.208.228.223:http > 145.254.160.237:tip2 A / Raw
0006 Ether / IP / TCP 145.254.160.237:tip2 > 65.208.228.223:http A
0007 Ether / IP / TCP 65.208.228.223:http > 145.254.160.237:tip2 A / Raw
0008 Ether / IP / TCP 145.254.160.237:tip2 > 65.208.228.223:http A
0009 Ether / IP / TCP 65.208.228.223:http > 145.254.160.237:tip2 A / Raw
0010 Ether / IP / TCP 65.208.228.223:http > 145.254.160.237:tip2 PA / Raw
0011 Ether / IP / TCP 145.254.160.237:tip2 > 65.208.228.223:http A
0012 Ether / IP / UDP / DNS Qry "pagead2.googlesyndication.com." 
0013 Ether / IP / TCP 65.208.228.223:http > 145.254.160.237:tip2 A / Raw
0014 Ether / IP / TCP 14

And let's pick off the third packet:

In [4]:
pkts[3].show()

###[ Ethernet ]###
  dst       = fe:ff:20:00:01:00
  src       = 00:00:01:00:00:00
  type      = 0x800
###[ IP ]###
     version   = 4L
     ihl       = 5L
     tos       = 0x0
     len       = 519
     id        = 3909
     flags     = DF
     frag      = 0L
     ttl       = 128
     proto     = tcp
     chksum    = 0x9010
     src       = 145.254.160.237
     dst       = 65.208.228.223
     \options   \
###[ TCP ]###
        sport     = tip2
        dport     = http
        seq       = 951057940
        ack       = 290218380
        dataofs   = 5L
        reserved  = 0L
        flags     = PA
        window    = 9660
        chksum    = 0xa958
        urgptr    = 0
        options   = []
###[ Raw ]###
           load      = 'GET /download.html HTTP/1.1\r\nHost: www.ethereal.com\r\nUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113\r\nAccept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,image/jpeg,image/gif;q

and parse out the actual payload of it:

In [5]:
load = pkts[3].getlayer(Raw).fields.get("load")
print load

GET /download.html HTTP/1.1
Host: www.ethereal.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.ethereal.com/development.html




Woah - Windows NT! this packet must be ancient ;)


It's super simple to parse out HTTP header information, and then from there, all we need to do is a simple check of if the "extremist" forum that we're interested in is inside these packets:

In [6]:
'GET /download' in load

True

In [8]:
'www.ethereal.com' in load

True

Now we can play around and traceroute between the user's IP address and the 

In [9]:
import select as s

In [10]:
def trace_route(pkts):
    for pkt in pkts:
        try:
            IP_layer = pkt.getlayer(IP)
            proto_layer = pkt.getlayer(TCP)
        except Exception:
            continue
        destination = IP_layer.dst
        src = IP_layer.src
        dport = proto_layer.dport
        sport = proto_layer.sport
        
        while True:
            try:
                res, unans = traceroute(target=destination, dport=dport, sport=sport, maxttl=20)
                traces = res.res
                hops = [src]
                for trace in traces:
                    hops.append(trace[1].src)
                return hops, sport
            except s.error:
                continue


In [11]:
tr, sport = trace_route(pkts)


Received 29 packets, got 8 answers, remaining 12 packets
  65.208.228.223:tcp80 
1 192.168.1.1     11   
2 10.1.10.1       11   
3 68.87.57.161    11   
4 68.85.57.241    11   
5 216.218.213.101 11   
6 68.86.82.66     11   
7 184.105.250.34  11   
8 184.105.222.22  11   
Begin emission:
Finished to send 20 packets.


In [12]:
tr

['145.254.160.237',
 '192.168.1.1',
 '10.1.10.1',
 '68.87.57.161',
 '68.85.57.241',
 '216.218.213.101',
 '68.86.82.66',
 '184.105.250.34',
 '184.105.222.22']

In [13]:
!pip install pygeoip

Downloading/unpacking pygeoip
  Downloading pygeoip-0.3.2-py2.py3-none-any.whl
Installing collected packages: pygeoip
Successfully installed pygeoip
Cleaning up...


In [14]:
import pygeoip

In [15]:
def map_ip(hops):
    gip = pygeoip.GeoIP('data/GeoLiteCity.dat')
    coordinates = []
    for hop in hops:
        geo_data = gip.record_by_addr(hop)
        if geo_data:
            lat = geo_data['latitude']
            lon = geo_data['longitude']
            coordinates.append((lon, lat))
    return coordinates

In [17]:
coordinates = map_ip(tr)

In [18]:
coordinates

[(9.0, 51.0),
 (-97.0, 38.0),
 (-81.0998, 32.08349999999999),
 (-121.8962, 37.5155),
 (-97.0, 38.0),
 (-121.8962, 37.5155),
 (-121.8962, 37.5155)]

In [19]:
!pip install geojson

Downloading/unpacking geojson
  Downloading geojson-1.1.0-py2-none-any.whl
Installing collected packages: geojson
Successfully installed geojson
Cleaning up...


In [20]:
import geojson

In [38]:
def create_geojson(coordinates):
    geo_list = []
    j = 1
    for index in xrange(0, len(coordinates)):
        try:
            data = {}
            data["type"] = "Feature"
            data["id"] = j
            data["properties"] = {"title": "hop %i" % j}
            data["geometry"] = {"type": "LineString", "coordinates": [coordinates[index], coordinates[index+1] ]}
            j += 1
            geo_list.append(data)
        except IndexError:
            continue

    d = {"type": "FeatureCollection"}
    for item in geo_list:
        d.setdefault("features", []).append(item)

    return geojson.dumps(d)

In [39]:
data = create_geojson(coordinates)
data

Navigate to [geojson.io](http://geojson.io/) in your browser.  Then copy & paste the `data` output into the JSON input area.

Here's mine (copied from the "share" button on geojson.io):

In [44]:
from IPython.display import HTML
HTML('<iframe frameborder="0" width="100%" height="300" src="http://bl.ocks.org/d/97a969852376545ed452"></iframe>')

In [6]:
help(traceroute)

Help on function traceroute in module scapy.layers.inet:

traceroute(target, dport=80, minttl=1, maxttl=30, sport=<RandShort>, l4=None, filter=None, timeout=2, verbose=None, **kargs)
    Instant TCP traceroute
    traceroute(target, [maxttl=30,] [dport=80,] [sport=80,] [verbose=conf.verb]) -> None

