# Extracting and exploring a subset of the Hathi Library

This notebook shows how to build a subset of the full SRP Hathi Corpus for a topic of interest.

It then performs some basic clustering to see what's in the set.

First, some basic imports, plus the SRP module.

In [2]:
import SRP
import pandas as pd
import os
import numpy as np
import sys

Next, I define a few variables relating to where vector files are located. This will change from application to application: since the full SRP binary files are several gigabytes, I don't include in them in the repositories.

If you do this for a different set, you'll either want to build up a copy of the full Hathi set from the various parts in the Northeastern repository; **or** just use the half-precision features on Zenodo, that at 11GB are reasonable to download as a single file.

In [3]:
full_vector_file_location = "/home/bschmidt/vector_models/ht-640d-half-precision.bin"

new_vector_file_location = "underwood.bin"

We're building a new smaller corpus based on Ted Underwood's English fiction dataset, used for his [Cultural Analytics article](http://doi.org/10.22148/16.019) (DOI: 10.22148/16.019) with David Bamman and Sabrina Lee. The same steps could be used with any CSV file that has an HTID column. This is pretty big as such sets go--about 100,000 books. But it's still less than 1% of all of Hathi, and so much easier to analyze than the whole thing.


In [4]:
underwood = pd.read_csv("https://raw.githubusercontent.com/tedunderwood/noveltmmeta/master/workmeta.tsv", sep="\t")

  interactivity=interactivity, compiler=compiler, result=result)


Here's what this looks like. We're going to use the 'HTID' column here and match it in the SRP features. Sometimes htids are normalized to not have URL-unsafe characters.


In [5]:
underwood['htid'] = [volid.replace("+",":").replace("=", "/") for volid in underwood['docid']]
underwood

Unnamed: 0,docid,oldauthor,author,authordate,inferreddate,latestcomp,datetype,startdate,enddate,imprint,...,instances,allcopiesofwork,copiesin25yrs,enumcron,volnum,title,parttitle,earlyedition,shorttitle,htid
0,mdp.39015031913893,"Spencer, Louise Reid","Spencer, Louise Reid",,0,2100,|,||||,||||,Thomas Y. Crowell company|1945,...,1,1,1,,,Guerrilla wife | $c: [by] Louise Reid Spencer.,,True,Guerrilla wife,mdp.39015031913893
1,mdp.39015003936864,"Baker, Robert H","Baker, Robert H",,0,2100,n,,,"Port Washington, N.Y.|Ashley Books|197-?].",...,1,1,1,,,The suburbs : | a novel / | $c: by Robert H. B...,,True,The suburbs : a novel,mdp.39015003936864
2,mdp.39015068342305,"Dickens, Charles","Dickens, Charles",1812-1870.,0,1870,n,,,New York|The American news company|n.d.,...,1,1,1,,,Edwin Drood. | $c: By Charles Dickens. With il...,,True,Edwin Drood,mdp.39015068342305
3,mdp.39015055066586,"Stretton, Hesba","Stretton, Hesba",1832-1911.,0,1911,n,,,"New York|Dodd, Mead & co.|n.d.",...,1,1,1,,,"Carola, | $c: by Hesba Stretton.",,True,Carola,mdp.39015055066586
4,mdp.39015055066594,"Stretton,Hesba","Stretton, Hesba",1832-1911.,0,1911,n,,,"New York|Dodd, Mead & co.|n.d.",...,1,1,1,,,In prison & out. | $c: By Hesba Stretton.,,True,In prison & out,mdp.39015055066594
5,mdp.39015063543394,"Lyall, Edna","Lyall, Edna",1857-1903.,0,1903,n,,,New York|A. L. Burt|n.d.,...,1,1,1,,,"Donovan. | A novel, | $c: by Edna Lyall [pseud.]",,True,Donovan. A novel,mdp.39015063543394
6,mdp.39015059414725,"McKenna, Stephen","McKenna, Stephen",1888-1967.,0,1967,n,,,London|Hutchinson & co.|n.d.,...,1,1,1,,,"Lady Lilith, | a novel: Beong the first part o...",,True,"Lady Lilith, a novel: Beong the first part of ...",mdp.39015059414725
7,mdp.39015063920006,"Haggard, H. Rider (Henry Rider)","Haggard, H. Rider (Henry Rider)",1856-1925.,0,1925,n,,,New York|J. S. Ogilvie|n.d.,...,1,1,1,,,Beatrice | [a novel] | $c: by H. Rider Haggard.,,True,Beatrice [a novel],mdp.39015063920006
8,mdp.39015035876971,"Malet, Lucas","Malet, Lucas",1852-1931.,0,1931,n,,,New York|T. Y. Crowell & co.|n.d.,...,1,1,1,,,Little Peter: | a Christmas morality for child...,,True,Little Peter: a Christmas morality for childre...,mdp.39015035876971
9,mdp.39015010208315,"Caffyn, Mannington, Mrs","Caffyn, Mannington, Mrs",,0,2100,n,,,New York|Optimus print. co.|n.d.,...,1,1,1,,,A yellow aster : | a novel / | $c: by Iota [ps...,,True,A yellow aster : a novel,mdp.39015010208315


Now we make a list of what we're looking for: it is 138137 volumes of English language fiction.

In [6]:
looking_for = set(underwood['htid'])
len(looking_for)

138137

Now I use the SRP library two build to files; one the existing file to read, and one a new file to write to. I'm not going to use half-precision vectors to store the final outputs; this means we'll be wasting some space at the expense of simplicity for the final output.

In [7]:
full_hathi_set = SRP.Vector_file(full_vector_file_location, precision = 2)

fiction_set = SRP.Vector_file(new_vector_file_location, dims = full_hathi_set.dims, mode = "w")

It's pretty easy to iterate through the original file and add new ones, but may take a while: most of the code below just prints updates.

Once this is done, we've created an extract of the full Hathi set.

In [8]:
written = 0

for (id, row) in full_hathi_set:
    if id in looking_for:
        fiction_set.add_row(id, row)
        written += 1
        if written % 10000 == 0:
            print ("{} written out of {}".format(written, len(looking_for)))
# you MUST close files after writing or they will be corrupted.
fiction_set.close()
print ("{} written out of {}".format(written, len(looking_for)))

10000 written out of 138137
20000 written out of 138137
30000 written out of 138137
40000 written out of 138137
50000 written out of 138137
60000 written out of 138137
70000 written out of 138137
80000 written out of 138137
90000 written out of 138137
100000 written out of 138137
110000 written out of 138137
120000 written out of 138137
130000 written out of 138137
137150 written out of 138137


Not every file is matched--about .75% (1000 out of 138,000) have gone missing. But this gives a good enough way to explore the set without even downloading the original EF files.

# Finding typical fiction

Let's do a funny little experiment: finding  typical works of fiction. The first step is to load the fiction set into a matrix we can hold in memory. The `to_matrix` argument of a vector file gives us a dict that has two properties: 'names' (which gives all the identifier codes) and 'matrix' (which represents the full set as a matrix in SRP space.

In [9]:
fiction_set = SRP.Vector_file(new_vector_file_location).to_matrix()

Now I'll use numpy to normalize the full matrix to unit length. I've gotten in the habit of using Einstein notation for this kind of matrix operation after reading [this great blog post](https://rockt.github.io/2018/04/30/einsum), but all it's really doing is normalizing each row against the L2-norm--that is, making each vector unit length. This means that book length won't affect our clustering.

In [10]:
import numpy as np
mat = fiction_set['matrix']
rownorms = 1 / np.linalg.norm(mat, axis=1)
normalized = np.einsum('ij,i->ij', mat, rownorms)

In [11]:
mean = np.mean(normalized, axis=0)
# Dot product on unit vectors is cosine similarity.
dist_from_mean = np.dot(normalized, mean)
top_matches = np.argpartition(-dist_from_mean, 10)[:10]

Now, we can look at the typical works of fiction in the Hathi Trust:

In [12]:
for m in top_matches:
    print("{} {}".format(dist_from_mean[m], fiction_set['names'][m]))
    


0.733444333076477 mdp.39015028485996
0.7314298152923584 nyp.33433075742001
0.7332363724708557 nyp.33433076065550
0.7302396297454834 nyp.33433076065568
0.7281640768051147 mdp.39015002130204
0.7286748290061951 nyp.33433074888904
0.728473424911499 wu.89098004476
0.7302240133285522 nyp.33433075873269
0.7286484241485596 uc1.b3607824
0.7281256318092346 mdp.39015068363236


Hmm... Ids aren't very helpful. Here's some code to pretty print Hathi documents.

In [231]:
from urllib.request import urlopen
import ujson as json
from IPython.display import display, HTML

#hathi_cache = {}

def jsonify(id, force = False):
    global hathi_cache
    if id in hathi_cache and not force:
        return hathi_cache[id]
    sons = urlopen("http://catalog.hathitrust.org/api/volumes/brief/htid/%s.json" % id.replace("+",":").replace("=","/")).read()
    hathi_cache[id] = json.loads(sons.decode())
    return hathi_cache[id]

def descend(record):
    # Parse a hathi API call response.
    a = record['records']
    return a[list(a.keys())[0]]

class Printable_Hathi():
    def __init__(self, htid, text):
        self.htid = htid
        self.desc = descend(jsonify(htid))
        self.text = text
        
    def _repr_html_(self):
        self.desc['url'] = u"https://babel.hathitrust.org/cgi/pt?id=" + self.htid
        output_string = "<li><a href={}>{} ({})</a><br>{}</li>".format(
                self.desc['url'],self.desc['titles'][0].encode("ascii","ignore"), self.desc['publishDates'][0], self.text)
        return output_string
    
    def title(self):
        return self.desc['title']
    
for m in top_matches:
    display(HTML(Printable_Hathi(fiction_set['names'][m], str(dist_from_mean[m]))._repr_html_()))
    


# K-means clustering of fiction

The overall most typical fiction isn't interesting. But it's easy to use any standard matrix operations
on this space to start to delve into things like genre.

For example: we can use kmeans clustering to create 30 groups of books, and then look at the books closest the the centers of each of them. Since we're normalizing to the unit sphere, I use spherical k-means; using regular k-means can produce strange effects.

In [None]:
from spherecluster import SphericalKMeans

skm = SphericalKMeans(n_clusters=35, random_state = 1, verbose = 1)
skm.fit(normalized)


# Demonstration clusters

Some of these clusters are nonsense. But some are pretty good! Rather than look at the middle of the cluster, I'll randomly pull five-ten books from each cluster and see if they make sense.

In [235]:
import random
def kmeans_cluster(x, n = 5):
    matches = [fiction_set['names'][i] for i in range(len(fiction_set['names'])) if skm.labels_[i]==x]
    random.seed(1)
    sample = random.sample(matches, n)
    for htid in sample:
        try:
            display(Printable_Hathi(htid, ""))
        except IndexError:
            pass
kmeans_cluster(3)

## Cluster 0: The Old British Novel.

In [237]:
kmeans_cluster(0, 10)


## Cluster 2: Modern women's stories

In [239]:
kmeans_cluster(2, 10)


## Cluster 4: War stories?

In [241]:
kmeans_cluster(4, 10)


## Cluster 9: Science Fiction

In [247]:
kmeans_cluster(9, 10)


## Cluster 14: This isn't fiction!

In [249]:
kmeans_cluster(14, 10)


## Cluster 15: The 18th century tale

In [250]:
kmeans_cluster(15)


## Cluster 18: This is pretty weird.

Based on proximity to the center I though this was straight modern fantasy--instead it seems to be something that spans the romance and some 

In [252]:
kmeans_cluster(18, 10)


## Cluster 21: Stories from the post-colonial Commonwealth? 

Mostly, Nigerian, Indian, Trinidadian fiction from 1970-200

In [254]:
kmeans_cluster(21, 10)


## Cluster 25: Folk tales

In [259]:
kmeans_cluster(25)


## Cluster 26: The hip-hop noir novel

Isn't it helpful when the metadata defines the cluster! Honestly, I don't know exactly what this is; it seems to be a certain form of brisk American modern novel that bridges cops (*Under the Color of Law)*, technology (*JPod*) and gay erotica (*Brother Stud*). 

In [261]:
kmeans_cluster(26, 10)


## Cluster 28: The Imperial Romance.

In [264]:
kmeans_cluster(28, 10)


## Cluster 30: More Scifi/fantasy.

In [267]:
kmeans_cluster(30, 10)


# All Clusters

For reference, 5 random texts from all 30 clusters, with no attempt to label.
All clusters: top 5.

In [271]:
for i in range(30):
    print("~"*100)
    print ("CLUSTER {}".format(i))
    print("~"*100)
    kmeans_cluster(i)


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 3
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 4
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 5
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 6
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 7
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 8
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 9
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 10
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 11
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 12
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 13
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 14
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 15
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 16
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 17
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 18
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 19
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 20
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 21
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 22
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 23
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 24
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 25
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 26
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 27
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 28
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CLUSTER 29
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
