# Bikeshare data wrangling with bash

Working with [Capital Bikeshare trip history data](http://www.capitalbikeshare.com/trip-history-data) is a lot of fun and fascinating for a DC resident and bikeshare member like me.  So much so that in the past I've worked up a [skeletal app](https://github.com/dchud/bikestat) that makes it easy to load trip data into a database and browse it on the web.  That app could use a little attention, but right now I'm trying to do something different with the same data, and I'm finding that I can get a lot done just by sticking with command line tools. Apparently this is such a good idea that somebody really smart already had it and published [Data Science at the Command Line](http://datascienceatthecommandline.com/), which I'm reading now.  I recommend you take a look at his site and his book to learn a lot more.

For now, though, let's look at some quick and easy ways to clean up the data and process it in chunks.  When we're done, we can pull the chunks together to render a map showing the growth of bikeshare docking stations.

Let's start by getting some data.  This will grab all the data through bikeshare's first twelve quarters, through summer 2013.  It's a good couple dozen MB so it might take a little time, depending on your connection.

NB:  This whole writeup is wordy and rather inefficient.  Several steps could have been faster with combined pipelines or scripts, and it creates hundreds of redundant files.  I'm still getting used to documenting my work in notebooks, though, so please just chalk it up to notebook newbie awkwardness.  I'm not sure what's causing the ```broken pipe``` messages; it might be something in the [bash_kernel](https://github.com/takluyver/bash_kernel) or more likely something I'm not doing correctly in my setup.

In [5]:
# download the files, -q means "quiet", used here only to streamline text
wget -q http://www.capitalbikeshare.com/assets/files/trip-history-data/2010-Q4-Trips-History-Data.zip \
     http://www.capitalbikeshare.com/assets/files/trip-history-data/2011-Q1-Trips-History-Data.zip \
     http://www.capitalbikeshare.com/assets/files/trip-history-data/2011-Q2-Trips-History-Data.zip \
     http://www.capitalbikeshare.com/assets/files/trip-history-data/2011-Q3-Trips-History-Data.zip \
     http://www.capitalbikeshare.com/assets/files/trip-history-data/2011-Q4-Trips-History-Data.zip \
     http://www.capitalbikeshare.com/assets/files/trip-history-data/2012-Q1-Trips-History-Data.zip \
     http://www.capitalbikeshare.com/assets/files/trip-history-data/2012-Q2-Trips-History-Data.zip \
     http://www.capitalbikeshare.com/assets/files/trip-history-data/2012-Q3-Trips-History-Data.zip \
     http://www.capitalbikeshare.com/assets/files/trip-history-data/2012-Q4-Trips-History-Data.zip \
     http://www.capitalbikeshare.com/assets/files/trip-history-data/2013-Q1-Trips-History-Data.zip \
     http://www.capitalbikeshare.com/assets/files/trip-history-data/2013-Q2-Trips-History-Data.zip \
     http://www.capitalbikeshare.com/assets/files/trip-history-data/2013-Q3-Trips-History-Data.zip



It's good to have these compressed; csv files tend to compress well and save space.  I prefer to keep files zipped up using gzip instead of zip, though, and switching them over offers a change to do a little bash script:

In [6]:
# unzip each file, then re-compress with gzip
for f in 201*.zip
do
  unzip $f
  gzip `basename $f .zip`.csv
  \rm $f
done

Archive:  2010-Q4-Trips-History-Data.zip
  inflating: 2010-Q4-Trips-History-Data.csv  
Archive:  2011-Q1-Trips-History-Data.zip
  inflating: 2011-Q1-Trips-History-Data.csv  
Archive:  2011-Q2-Trips-History-Data.zip
  inflating: 2011-Q2-Trips-History-Data.csv  
Archive:  2011-Q3-Trips-History-Data.zip
  inflating: 2011-Q3-Trips-History-Data.csv  
Archive:  2011-Q4-Trips-History-Data.zip
  inflating: 2011-Q4-Trips-History-Data.csv  
Archive:  2012-Q1-Trips-History-Data.zip
  inflating: 2012-Q1-Trips-History-Data.csv  
Archive:  2012-Q2-Trips-History-Data.zip
  inflating: 2012-Q2-Trips-History-Data.csv  
Archive:  2012-Q3-Trips-History-Data.zip
  inflating: 2012-Q3-Trips-History-Data.csv  
Archive:  2012-Q4-Trips-History-Data.zip
  inflating: 2012-Q4-Trips-History-Data.csv  


If you're with me so far, you now should have something like this:

In [8]:
ls -lh 201*.gz

-rw-r--r--  1 dchud  405867214   1.8M Jan 20 17:26 2010-Q4-Trips-History-Data.csv.gz
-rw-r--r--  1 dchud  405867214   2.2M Jan 20 17:52 2011-Q1-Trips-History-Data.csv.gz
-rw-r--r--  1 dchud  405867214   5.5M Jan 20 17:51 2011-Q2-Trips-History-Data.csv.gz
-rw-r--r--  1 dchud  405867214   6.0M Jan 20 17:45 2011-Q3-Trips-History-Data.csv.gz
-rw-r--r--  1 dchud  405867214   4.7M Jan 20 17:44 2011-Q4-Trips-History-Data.csv.gz
-rw-r--r--  1 dchud  405867214   5.2M Jan 20 17:58 2012-Q1-Trips-History-Data.csv.gz
-rw-r--r--  1 dchud  405867214   8.2M Jan 20 18:03 2012-Q2-Trips-History-Data.csv.gz
-rw-r--r--  1 dchud  405867214   9.2M Jan 20 18:04 2012-Q3-Trips-History-Data.csv.gz
-rw-r--r--  1 dchud  405867214   7.0M Jan 20 18:05 2012-Q4-Trips-History-Data.csv.gz


gzipped files are great to work with because they're easy to work with on the commandline.  ```zcat``` (or ```gzcat```, which I'm using here on an OSX machine) let you stream the uncompressed contents as if it weren't compressed, letting you work with the files without taking up the extra space uncompressing would require.  Let's look at some data using ```gzcat```:

In [20]:
gzcat 2010-Q4-Trips-History-Data.csv.gz | head

Duration,Start date,End date,Start station,End station,Bike#,Member Type
14h 26min. 2sec.,12/31/2010 23:49,1/1/2011 14:15,10th & U St NW (31111),10th & U St NW (31111),W00771,Casual
0h 8min. 34sec.,12/31/2010 23:37,12/31/2010 23:46,10th & U St NW (31111),14th & R St NW (31202),W01119,Registered
0h 12min. 17sec.,12/31/2010 23:27,12/31/2010 23:39,Park Rd & Holmead Pl NW (31602),14th St & Spring Rd NW (31401),W00973,Registered
0h 15min. 53sec.,12/31/2010 23:21,12/31/2010 23:37,Calvert St & Woodley Pl NW (31106),14th St & Spring Rd NW (31401),W00914,Registered
0h 36min. 19sec.,12/31/2010 23:20,12/31/2010 23:56,20th St & Florida Ave NW (31110),Columbus Circle / Union Station (31623),W00859,Casual
0h 9min. 40sec.,12/31/2010 23:18,12/31/2010 23:28,Lamont & Mt Pleasant NW (31107),10th & U St NW (31111),W01119,Registered
0h 9min. 56sec.,12/31/2010 23:18,12/31/2010 23:28,Lamont & Mt Pleasant NW (31107),10th & U St NW (31111),W00474,Registered
0h 39min. 28sec.,12/31/2010 23:17,12/

Gotta love csv.  And if you do love csv, you'll really love [csvkit](http://csvkit.readthedocs.org/), which offers handy tools for working with csv data.  Like csvlook, which makes it easier to read:

In [1]:
gzcat 2010-Q4-Trips-History-Data.csv.gz | head | csvlook

gzcat: error writing to output: Broken pipe
gzcat: 2010-Q4-Trips-History-Data.csv.gz: uncompress failed
|-------------------+------------------+------------------+------------------------------------+---------------------------------------------------+--------+--------------|
|  Duration         | Start date       | End date         | Start station                      | End station                                       | Bike#  | Member Type  |
|-------------------+------------------+------------------+------------------------------------+---------------------------------------------------+--------+--------------|
|  14h 26min. 2sec. | 12/31/2010 23:49 | 1/1/2011 14:15   | 10th & U St NW (31111)             | 10th & U St NW (31111)                            | W00771 | Casual       |
|  0h 8min. 34sec.  | 12/31/2010 23:37 | 12/31/2010 23:46 | 10th & U St NW (31111)             | 14th & R St NW (31202)                            | W01119 | Registered   |
|  0h 12min. 17sec. | 12

And csvcut, which lets you list out and slice-and-dice columns, which will come in handy.  Here are some examples, first getting the column names:

In [15]:
gzcat 2010-Q4-Trips-History-Data.csv.gz | csvcut -n

  1: Duration
  2: Start date
  3: End date
  4: Start station
  5: End station
  6: Bike#
  7: Member Type
gzcat: error writing to output: Broken pipe
gzcat: 2010-Q4-Trips-History-Data.csv.gz: uncompress failed


Here we choose just the duration, start, and end stations, referring to the columns by their index as listed above:

In [13]:
gzcat 2010-Q4-Trips-History-Data.csv.gz | head | csvcut -c 1,4,5 | csvlook

gzcat: error writing to output: Broken pipe
gzcat: 2010-Q4-Trips-History-Data.csv.gz: uncompress failed
|-------------------+------------------------------------+----------------------------------------------------|
|  Duration         | Start station                      | End station                                        |
|-------------------+------------------------------------+----------------------------------------------------|
|  14h 26min. 2sec. | 10th & U St NW (31111)             | 10th & U St NW (31111)                             |
|  0h 8min. 34sec.  | 10th & U St NW (31111)             | 14th & R St NW (31202)                             |
|  0h 12min. 17sec. | Park Rd & Holmead Pl NW (31602)    | 14th St & Spring Rd NW (31401)                     |
|  0h 15min. 53sec. | Calvert St & Woodley Pl NW (31106) | 14th St & Spring Rd NW (31401)                     |
|  0h 36min. 19sec. | 20th St & Florida Ave NW (31110)   | Columbus Circle / Union Station (31623)     

This is particularly useful because the files aren't all in the same csv pattern. Compare 2010-Q4 with 2013-Q4 (which I didn't list in the ```wget``` statement above, and am just showing for the demo):

In [21]:
gzcat 2013-Q4-Trips-History-Data2.csv.gz | csvcut -n

  1: Duration
  2: Start date
  3: Start Station
  4: End date
  5: End Station
  6: Bike#
  7: Subscription Type
gzcat: error writing to output: Broken pipe
gzcat: 2013-Q4-Trips-History-Data2.csv.gz: uncompress failed


Notice that columns 2 and 3 are now the start date/station, and 4 and 5 are the end date/station.  Before they were 2/3 start/end dates, 4/5 start/end stations.  You can rearrange the columns easily with csvcut, like so:

In [25]:
gzcat 2013-Q4-Trips-History-Data2.csv.gz | head | csvcut -c 1,3,5 | csvlook

gzcat: error writing to output: Broken pipe
gzcat: 2013-Q4-Trips-History-Data2.csv.gz: uncompress failed
|-------------+------------------------------------------+-------------------------------------------|
|  Duration   | Start Station                            | End Station                               |
|-------------+------------------------------------------+-------------------------------------------|
|  0h 7m 54s  | New York Ave & 15th St NW                | 23rd & E St NW                            |
|  0h 26m 23s | Rosslyn Metro / Wilson Blvd & Ft Myer Dr | Rosslyn Metro / Wilson Blvd & Ft Myer Dr  |
|  0h 28m 7s  | Rosslyn Metro / Wilson Blvd & Ft Myer Dr | Rosslyn Metro / Wilson Blvd & Ft Myer Dr  |
|  0h 26m 4s  | 4th & E St SW                            | Constitution Ave & 2nd St NW/DOL          |
|  0h 26m 11s | 4th & E St SW                            | Constitution Ave & 2nd St NW/DOL          |
|  0h 26m 10s | 4th & E St SW                            | Co

You can use csvcut this way to rearrange the data from 2013-Q4 and later into the same column pattern as the earlier data.  It's a pretty great tool, and csvkit provides a lot more than just that - definitely take some time to [read the csvkit docs](http://csvkit.readthedocs.org/en/0.9.0/) and try the tutorial.

For now, though, let's get on with making a map.  To do this, we will pull all the data together, and then split it up into equal chunks, for three reasons.  First, the per-quarter files have a variable amount of data in them, as bike usage and bike/dock capacity grew:

In [38]:
gzcat 201*.csv.gz | wc -l

 5399600


That's about 5.4 million rides.  And there were a lot more in the later ones than in the first:

In [28]:
gzcat 2010-Q4-Trips-History-Data.csv.gz | wc -l

  117693


In [30]:
gzcat 2013-Q3-Trips-History-Data.csv.gz | wc -l

  848190


With that in mind, if we count all the stations up at the end of each quarter (of which we only have 12) and drew a map where the docks lit up when they were installed, it wouldn't be very dynamic, as they would light up in bunches 12 times and then stop.  If we instead split the files up into smaller but even-sized (by number of rides) chunks we can see the new docks light up with a little more granularity.

A second reason to split the data up into multiple smaller files is that we can grab summary statistics over those smaller chunks, and use those to study the data.

The third reason to split the data up into smaller files is that we can operate on them in parallel.  If it's all just one big file with six million lines, or a handful of mid-sized files, each long file will take a while to process.  If you have a machine with multiple CPUs and cores, though, you can use them all at once to churn through the data a lot faster.  We'll take advantage of that soon.

For now, let's extract only the start station column, combine it all into one file, then split that up into even, small chunks.  First the extract-and-combine:

In [37]:
for f in 201*.gz
do
  gzcat $f | csvcut -c 4 >> combined-rides.txt
done



That slices out the start station column from each of the files and appends it to combined-rides.txt, which should be about the same length as what we saw before:

In [39]:
wc -l combined-rides.txt

 5399600 combined-rides.txt


Perfect - exactly the same.  But it's a big file, so let's compress it:

In [33]:
ls -lh combined-rides.txt

-rw-r--r--  1 dchud  405867214   153M Mar 24 23:52 combined-rides.txt


In [1]:
gzip combined-rides.txt



In [2]:
ls -lh combined-rides.txt.gz

-rw-r--r--  1 dchud  405867214    12M Mar 25 00:01 combined-rides.txt.gz


Now we split it, using ```split```, into a few hundred small files with 20,000 rides each, using a simple naming convention of "rides20k-" plus an ordered pair of letters that ```split``` will generate.

In [3]:
gzcat combined-rides.txt.gz | split -l 20000 - rides20k



Let's see what we've got, looking at the first and last files.  This is a simple check to see if the data comes out consistently.  We'll sort them each, then count the unique lines, and then reverse sort that by their counts.  Key commands here are ```sort``` and ```uniq```... easy to remember.

In [4]:
sort rides20kaa | uniq -c | sort -rn | head

 744 Massachusetts Ave & Dupont Circle NW (31200)
 714 Adams Mill & Columbia Rd NW (31104)
 639 15th & P St NW (31201)
 607 14th & V St NW (31101)
 543 20th St & Florida Ave NW (31110)
 543 17th & Corcoran St NW (31214)
 474 16th & U St NW (31229)
 469 Eastern Market Metro / Pennsylvania Ave & 7th St SE (31613)
 465 Lamont & Mt Pleasant NW (31107)
 450 Park Rd & Holmead Pl NW (31602)


In [5]:
sort rides20kkj | uniq -c | sort -rn | head

 520 Massachusetts Ave & Dupont Circle NW
 418 Columbus Circle / Union Station
 393 Lincoln Memorial
 363 15th & P St NW
 292 Jefferson Dr & 14th St SW
 276 17th & Corcoran St NW
 268 Thomas Circle
 246 Eastern Market Metro / Pennsylvania Ave & 7th St SE
 239 New Hampshire Ave & T St NW
 238 14th & V St NW


Oof, see the problem?  Some of the station names have an identifier in them, others don't.  We could go through all the files, figure out which ones have them, and remove them, or we could just use ```sed``` to find and remove that pattern.

In [7]:
sed -E 's/ \([0-9]+)//' rides20kaa | sort | uniq -c | sort -rn | head

 744 Massachusetts Ave & Dupont Circle NW
 714 Adams Mill & Columbia Rd NW
 639 15th & P St NW
 607 14th & V St NW
 543 20th St & Florida Ave NW
 543 17th & Corcoran St NW
 474 16th & U St NW
 469 Eastern Market Metro / Pennsylvania Ave & 7th St SE
 465 Lamont & Mt Pleasant NW
 450 Park Rd & Holmead Pl NW


Ahh, much cleaner.  Now we don't have to worry about where in the files the convention changed (although it's possible there are other quirks like this).

Let's re-run that split with the ```sed``` piece in the mix so these files will come out cleaner.

In [3]:
gzcat combined-rides.txt.gz | sed -E 's/ \([0-9]+)//' | split -l 20000 - rides20k



Now we can churn through all of them really quickly using 
[parallel](http://www.gnu.org/software/parallel/), which takes a list of files and runs the 
same command over all of them, using as many processors as you can give it.

In [4]:
ls rides20k* | parallel -j+0 "sort {} | uniq -c | sort -rn > {}-counts.txt"



(a little magic happens here... add the ```--eta``` option to parallel after the ```-j+0``` to see some updates on your command line as it cranks through, and for more, see the [GNU Parallel site](http://www.gnu.org/software/parallel/) or the video in [this writeup](http://unethicalblogger.com/2010/11/11/gnu-parallel-changed-my-life.html) for some more info about how it works)

...and when that finishes, we'll have a bunch of files like this:

In [5]:
head rides20kaa-counts.txt

 744 Massachusetts Ave & Dupont Circle NW
 714 Adams Mill & Columbia Rd NW
 639 15th & P St NW
 607 14th & V St NW
 543 20th St & Florida Ave NW
 543 17th & Corcoran St NW
 474 16th & U St NW
 469 Eastern Market Metro / Pennsylvania Ave & 7th St SE
 465 Lamont & Mt Pleasant NW
 450 Park Rd & Holmead Pl NW


Now that the data's all chunked up consistently, let's get it onto a map.  We can get coordinates of the docks from the live feed of station status:

In [11]:
wget http://www.capitalbikeshare.com/data/stations/bikeStations.xml

--2015-03-25 00:27:35--  http://www.capitalbikeshare.com/data/stations/bikeStations.xml
Resolving www.capitalbikeshare.com... 69.20.33.150
Connecting to www.capitalbikeshare.com|69.20.33.150|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 158907 (155K) [text/xml]
Saving to: 'bikeStations.xml'


2015-03-25 00:27:35 (887 KB/s) - 'bikeStations.xml' saved [158907/158907]



In [2]:
xmllint --format bikeStations.xml | head -20

<?xml version="1.0" encoding="UTF-8"?>
<stations lastUpdate="1427257615762" version="2.0">
  <station>
    <id>1</id>
    <name>20th &amp; Bell St</name>
    <terminalName>31000</terminalName>
    <lastCommWithServer>1427250282118</lastCommWithServer>
    <lat>38.8561</lat>
    <long>-77.0512</long>
    <installed>true</installed>
    <locked>false</locked>
    <installDate>0</installDate>
    <removalDate/>
    <temporary>false</temporary>
    <public>true</public>
    <nbBikes>9</nbBikes>
    <nbEmptyDocks>2</nbEmptyDocks>
    <latestUpdateTime>1427250282118</latestUpdateTime>
  </station>
  <station>
I/O error : Broken pipe
I/O error : write error


It stands to reason that we could use this recent list of stations and their coordinates to get most of the coordinates of lists of stations from the past.  We might miss a few because stations have been removed or moved slightly, but overall it should tell the big picture story pretty well.

We could parse this xml, but it's a little easier to parse json, so let's convert it over using [xml2json](https://www.npmjs.com/package/xml2json):

In [15]:
xml2json < bikeStations.xml > bike-stations.json



In [16]:
cat bike-stations.json

{
	"stations": {
		"lastUpdate": "1427257615762",
		"version": "2.0",
		"station": [
			{
				"id": {
					"$t": "1"
				},
				"name": {
					"$t": "20th & Bell St"
				},
				"terminalName": {
					"$t": "31000"
				},
				"lastCommWithServer": {
					"$t": "1427250282118"
				},
				"lat": {
					"$t": "38.8561"
				},
				"long": {
					"$t": "-77.0512"
				},
				"installed": {
					"$t": "true"
				},
				"locked": {
					"$t": "false"
				},
				"installDate": {
					"$t": "0"
				},
				"removalDate": {},
				"temporary": {
					"$t": "false"
				},
				"public": {
					"$t": "true"
				},
				"nbBikes": {
					"$t": "9"
				},
				"nbEmptyDocks": {
					"$t": "2"
				},
				"latestUpdateTime": {
					"$t": "1427250282118"
				}
			},
			{
				"id": {
					"$t": "2"
				},
				"name": {
					"$t": "18th & Eads St."
				},
				"terminalName": {
					"$t": "31001"
				},
				"lastCommWithServer": {
					"$t": "142725016

Turning this into a csv with just the station names, lats, and lons, just takes a little python:

```python
#!/usr/bin/env python

import json

d = json.load(open('bike-stations.json'))

fp = open('station-locations.txt', 'wb')
for station in d['stations']['station']:
    name = station['name']['$t']
    lat = station['lat']['$t']
    lon = station['long']['$t']
    fp.write(','.join([name, lat, lon]) + '\n')

fp.close()
```

In [18]:
python convertstations.py
csvlook station-locations.txt | head

|-------------------------------------------------------------------+------------+-------------|
|  20th & Bell St                                                   | 38.8561    | -77.0512    |
|-------------------------------------------------------------------+------------+-------------|
|  18th & Eads St.                                                  | 38.85725   | -77.05332   |
|  20th & Crystal Dr                                                | 38.8564    | -77.0492    |
|  15th & Crystal Dr                                                | 38.86017   | -77.049593  |
|  Aurora Hills Community Ctr/18th & Hayes St                       | 38.857866  | -77.05949   |
|  Pentagon City Metro / 12th & S Hayes St                          | 38.862303  | -77.059936  |
|  S Joyce & Army Navy Dr                                           | 38.8637    | -77.0633    |
|  Crystal City Metro / 18th & Bell St                              | 38.8573    | -77.0511    |


Now we've got station names, counts, and lats/lons, and we just need to zip it all up.  A little more python will do, here as ```mapify.py```:

```python
#!/usr/bin/env python

import json
import sys


stations = {}
for line in open('station-locations.txt'):
    loc, lat, lon = line.strip().split(',')
    stations[loc] = (lat, lon)


if __name__ == '__main__':
    data = []
    fn = sys.argv[1]
    for line in open(fn):
        count, loc = line.strip().split(' ', 1)
        try:
            lat, lon = stations[loc]
            rec = {
                'loc': loc,
                'lat': float(lat),
                'lon': float(lon),
                'count': int(count),
                'percent': float(count) / 300
                }
            data.append(rec)
        except:
            pass
    json.dump(data, open(fn + '.json', 'wb'))
```

...which we can run through with ```parallel``` again, adding coordinates to each file and then saving them back out again as json for easy loading over the web.

In [6]:
ls rides20k??-counts.txt | time parallel -j+0 './mapify.py {}'

        5.24 real        10.33 user         4.84 sys


In [15]:
cat rides20kaa-counts.txt.json | jq '.' | head -15

[
  {
    "lat": 38.9101,
    "loc": "Massachusetts Ave & Dupont Circle NW",
    "percent": 2.48,
    "lon": -77.0444,
    "count": 744
  },
  {
    "lat": 38.922925,
    "loc": "Adams Mill & Columbia Rd NW",
    "percent": 2.38,
    "lon": -77.042581,
    "count": 714
  },


Which we then just need to glue back into one big file to send all at once, again with a little python:

```python
#!/usr/bin/env python

import json
import os

d = []
for fn in sorted(os.listdir('.')):
    if fn.startswith('rides20k') and fn.endswith('.txt.json'):
        d.append(json.load(open(fn)))

json.dump(d, open('combined.json', 'wb'))
```

In [None]:
python combine.py

In [29]:
ls -lht combined.json

-rw-r--r--  1 dchud  405867214   4.7M Apr 13 15:26 combined.json


In [2]:
cat combined.json | jq '.' | head -16

[
  [
    {
      "lat": 38.9101,
      "loc": "Massachusetts Ave & Dupont Circle NW",
      "percent": 2.48,
      "count": 744,
      "lon": -77.0444
    },
    {
      "lat": 38.922925,
      "loc": "Adams Mill & Columbia Rd NW",
      "percent": 2.38,
      "count": 714,
      "lon": -77.042581
    },


...and there you have it, counts of bikeshare rides in 30,000 chunks, with counts and a latitude and longitude for ride station origins, and suitable for feeding into a simple map viz, if rather unwieldy in size/shape.