# Explore the use of cosine similarity measures on the money branch to improve its structure

# Context

This notebook is for identifying areas of the taxonomy where content is not well sorted and may require further curation. We're focusing on the money branch.

We're looking to explore and flag:

1. Content in the wrong place (its semantically different to other items in that taxon)
1. Odd taxon structure (content diversity, taxon size and depth)
1. Taxons that need splitting (clusters of closely-related content exist within a taxon)
1. Taxons that need merging (there's a large overlap in content tagging between branches)

# Prepare workspace

Assuming for now that your working directory is at `/content-similarity-models/google-universal-encoder`

In [30]:
!pwd

/Users/matthewdray/Documents/content-similarity-models/google-universal-encoder


In [4]:
import numpy as np
import pandas as pd

from sklearn.metrics import pairwise_distances, pairwise_distances_chunked

import altair as alt
from altair import datum

# Read and prepare data

In [50]:
## Embedded sentences

In [5]:
embedded_sentences = np.load('../data/embedded_sentences2019-02-11.npy')

## Labelled data
We may also need to read the `labelled.csv` data to create some objects that will be used later. The labelled data is one of the inputs to the `get_homogeneity_scores_taxon.py script` that produces `taxon_homogeneity_df.csv`.

In [6]:
labelled = pd.read_csv(
    '../data/2019-02-11/labelled.csv.gz',
    compression='gzip',
    low_memory=False
)

In [13]:
labelled

Unnamed: 0,base_path,content_id,description,document_type,first_published_at,locale,primary_publishing_organisation,publishing_app,title,body,combined_text,taxon_id,taxon_base_path,taxon_name,level1taxon,level2taxon,level3taxon,level4taxon,level5taxon
0,/government/publications/list-of-psychologists...,04a0cc0d-0b9f-45ad-bf57-7c54cbab9df9,list of english speaking psychologists and psy...,guidance,2017-07-21T16:42:00.000+00:00,en,Foreign & Commonwealth Office,whitehall,chile - list of psychologists and psychiatrist...,prepared by british embassy/consulate santiago...,chile - list of psychologists and psychiatrist...,668cd623-c7a8-4159-9575-90caac36d4b4,/society-and-culture/community-and-society,Community and society,Society and culture,Community and society,,,
1,/government/news/charity-commission-names-furt...,5fa49c52-7631-11e4-a3cb-005056011aef,regulator increases transparency of its work.,press_release,2014-06-04T23:00:00.000+00:00,en,The Charity Commission,whitehall,charity commission names further charities und...,the charity commission has today named further...,charity commission names further charities und...,668cd623-c7a8-4159-9575-90caac36d4b4,/society-and-culture/community-and-society,Community and society,Society and culture,Community and society,,,
2,/government/publications/trust-and-confidence-...,d0341424-12a1-4b4c-9045-2e74ba17f2d5,independent research into trust and confidence...,research,2015-06-25T07:00:00.000+00:00,en,The Charity Commission,whitehall,trust and confidence in the charity commission...,the charity commission commissioned populus to...,trust and confidence in the charity commission...,668cd623-c7a8-4159-9575-90caac36d4b4,/society-and-culture/community-and-society,Community and society,Society and culture,Community and society,,,
3,/government/speeches/william-shawcross-speech-...,9245dfca-4210-41d9-9ffd-7fcc35dc1642,william shawcross asks charities to pull toget...,speech,2016-02-29T12:39:07.000+00:00,en,The Charity Commission,whitehall,william shawcross speech at commission’s publi...,good morning and thank you for joining us here...,william shawcross speech at commission’s publi...,668cd623-c7a8-4159-9575-90caac36d4b4,/society-and-culture/community-and-society,Community and society,Society and culture,Community and society,,,
4,/government/statistics/crime-statistics-focus-...,5fec046a-7631-11e4-a3cb-005056011aef,crime statistics from the crime survey for eng...,national_statistics,2015-03-26T09:30:00.000+00:00,en,Office for National Statistics,whitehall,public perceptions of crime and the police and...,official statistics are produced impartially a...,public perceptions of crime and the police and...,668cd623-c7a8-4159-9575-90caac36d4b4,/society-and-culture/community-and-society,Community and society,Society and culture,Community and society,,,
5,/government/news/britain-honours-its-holocaust...,5b12e7a3-3db7-4710-862f-0d54ec6117b6,this holocaust memorial day the government wil...,press_release,2018-01-23T14:02:00.000+00:00,en,Foreign & Commonwealth Office,whitehall,britain honours its holocaust heroes,at an event at the foreign & commonwealth offi...,britain honours its holocaust heroes this holo...,668cd623-c7a8-4159-9575-90caac36d4b4,/society-and-culture/community-and-society,Community and society,Society and culture,Community and society,,,
6,/government/publications/esf-funding-allocated...,5f5167fc-7631-11e4-a3cb-005056011aef,these documents show european social fund (esf...,transparency,2014-04-15T23:00:00.000+00:00,en,Department for Work and Pensions,whitehall,esf funding for the north east,the funding is broken down by co financing org...,esf funding for the north east these documents...,668cd623-c7a8-4159-9575-90caac36d4b4,/society-and-culture/community-and-society,Community and society,Society and culture,Community and society,,,
7,/government/publications/charities-holding-mov...,5fe33d80-7631-11e4-a3cb-005056011aef,how charities can hold move and receive funds ...,guidance,2011-01-05T08:33:00.000+00:00,en,The Charity Commission,whitehall,charities: holding moving and receiving funds ...,chapter 4 of the commission’s compliance toolk...,charities: holding moving and receiving funds ...,668cd623-c7a8-4159-9575-90caac36d4b4,/society-and-culture/community-and-society,Community and society,Society and culture,Community and society,,,
8,/government/statistics/english-indices-of-depr...,e38fc3a7-1b0f-46d8-b19e-69b6a3c38809,statistics on relative deprivation in small ar...,national_statistics,2015-09-30T08:30:00.000+00:00,en,"Ministry of Housing, Communities & Local Gover...",whitehall,english indices of deprivation 2015,these statistics update the english indices of...,english indices of deprivation 2015 statistics...,668cd623-c7a8-4159-9575-90caac36d4b4,/society-and-culture/community-and-society,Community and society,Society and culture,Community and society,,,
9,/government/news/dcms-improves-efficiency-and-...,5d33a69f-7631-11e4-a3cb-005056011aef,a number of the department for culture media a...,news_story,2010-07-27T00:00:00.000+00:00,en,"Department for Digital, Culture, Media & Sport",whitehall,dcms improves efficiency and cuts costs with r...,a number of the department for culture media a...,dcms improves efficiency and cuts costs with r...,668cd623-c7a8-4159-9575-90caac36d4b4,/society-and-culture/community-and-society,Community and society,Society and culture,Community and society,,,


Prepare objects for later visualisation

In [12]:
taxon_id_to_base_path = dict(zip(labelled['taxon_id'], labelled['taxon_base_path']))

#taxon_id_to_level = dict(zip(labelled['taxon_id'], labelled['level']))

taxon_id_to_level1 = dict(zip(labelled['taxon_id'], labelled['level1taxon']))

In [8]:
taxons = labelled['taxon_id'].unique()

## Branch homogeneity
Read the homogeneity data, which is a Pandas data frame output from the `get_homogeneity_scores_taxon.py` script.

In [14]:
taxon_homogeneity_df = pd.read_csv("../data/taxon_homogeneity_df.csv")

In [15]:
taxon_homogeneity_df.shape

(1265, 9)

In [16]:
taxon_homogeneity_df.head()

Unnamed: 0.1,Unnamed: 0,taxon_id,taxon_size,mean_cosine_score,taxon_base_path,taxon_level,level1taxon,fewer_than_or_equal_5items,more_than_0_5_diversity
0,0,668cd623-c7a8-4159-9575-90caac36d4b4,5166,0.59549,/society-and-culture/community-and-society,2,Society and culture,0,1
1,246,f9e476ef-654d-41ec-97d9-2b6842d4361d,786,0.589025,/society-and-culture/sports-and-leisure,2,Society and culture,0,1
2,48,495afdb6-47be-4df1-8b38-91c8adb1eefc,8136,0.57151,/business-and-industry,1,Business and industry,0,1
3,833,fc5f468f-a3ba-4fde-9c1d-ed2dd17cfd82,31,0.571205,/housing-local-and-community/housing-local-ser...,3,"Housing, local and community",0,1
4,18,b29cf14b-54c6-402c-a9f0-77218602d1af,2333,0.569644,/society-and-culture/arts-and-culture,2,Society and culture,0,1


# Explore flagging

## 1. Odd taxon structure

In [109]:
numcols = 6  # specify the number of columns you want
level1taxons = taxon_homogeneity_df['level1taxon'].unique() 


money = taxon_homogeneity_df[taxon_homogeneity_df.level1taxon == 'Money'].copy()

total_size = money['taxon_size'].sum().astype(str)

money_plot = alt.Chart(money).mark_circle(size=60).encode(
    alt.X(
        'taxon_size:Q',
        scale=alt.Scale(type='log', domain=(1, 10000)),
        axis=alt.Axis(grid=False, title='log(topic_size)')
    ),
    alt.Y(
        'mean_cosine_score:Q',
        scale=alt.Scale(domain=(0, 0.6)),
        axis=alt.Axis(grid=False, title='content diversity score')
    ), 
    #color='taxon_level:N',
    color=alt.Color('taxon_level:N', scale=alt.Scale(scheme='magma')),
    opacity=alt.value(0.8), 
    tooltip=['taxon_base_path']
).properties(
        title='Money' + ", " + total_size).interactive()

In [113]:
money_plot.save('money.html', scale_factor=2.0)

## 2. Content in the wrong place
Content may have been tagged in the wrong place. How can we identify this? One idea is to look at the cosine similarity between each content item and all the others within a taxon and then inspect the ones with scores that are above a certain threshold (i.e. they're semantically different to everything else).

### Example: 'business tax' taxon

Store the taxon ID as a variable.

In [17]:
btax_id = '28262ae3-599c-4259-ae30-3c83a5ec02a1'

Filter the embedded sentences (a numpy array) where it matches the business tax taxon ID. Indices for `embedded sentences` and `labelled` are the same, so `labelled` can be used to help filter.

In [18]:
btax_embedded = embedded_sentences[labelled['taxon_id'] == btax_id]

Get the cosine similarity for all content item pairs in the taxon, convert to a Pandas data frame and then get the mean distances for each content item.

In [19]:
btax_dist = pairwise_distances(
    btax_embedded, 
    metric = 'cosine', 
    n_jobs = -1
)

In [20]:
btax_dist_df = pd.DataFrame(btax_dist)

In [21]:
btax_dist_df['mean'] = btax_dist.mean(axis = 1)

In [22]:
btax_dist_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,513,514,515,516,517,518,519,520,521,mean
0,1.192093e-07,5.811087e-01,0.363355,0.395613,0.621481,6.434086e-01,0.289677,0.530680,0.207134,0.350927,...,5.854432e-01,0.505209,0.217435,0.395614,0.237765,0.321701,4.654763e-01,0.299991,1.957843e-01,0.450615
1,5.811087e-01,1.788139e-07,0.654736,0.576303,0.530271,6.266518e-01,0.597192,0.427653,0.501061,0.444977,...,6.465485e-01,0.560747,0.697612,0.502061,0.655768,0.623915,5.118585e-01,0.428566,5.778310e-01,0.580138
2,3.633550e-01,6.547358e-01,0.000000,0.426611,0.544326,5.551578e-01,0.253399,0.467847,0.262523,0.498422,...,6.806681e-01,0.598263,0.327363,0.497435,0.225597,0.265516,5.799253e-01,0.312676,2.386062e-01,0.438918
3,3.956128e-01,5.763029e-01,0.426611,0.000000,0.561196,5.895838e-01,0.290253,0.435382,0.441065,0.483954,...,7.239314e-01,0.557843,0.340588,0.306367,0.411198,0.428849,4.966263e-01,0.362502,4.144944e-01,0.443872
4,6.214815e-01,5.302707e-01,0.544326,0.561196,0.000000,4.022706e-01,0.556397,0.424755,0.601762,0.533905,...,6.275401e-01,0.514643,0.752991,0.367417,0.568614,0.582047,4.397339e-01,0.363483,6.077694e-01,0.555981
5,6.434086e-01,6.266518e-01,0.555158,0.589584,0.402271,1.192093e-07,0.619666,0.345636,0.608109,0.474413,...,6.891668e-01,0.309997,0.744982,0.420848,0.608365,0.475288,3.990849e-01,0.449534,6.333849e-01,0.505678
6,2.896773e-01,5.971916e-01,0.253399,0.290253,0.556397,6.196660e-01,0.000000,0.531329,0.284649,0.440067,...,6.574677e-01,0.515673,0.285938,0.377179,0.309599,0.314286,4.741045e-01,0.263140,2.587251e-01,0.427959
7,5.306804e-01,4.276533e-01,0.467847,0.435382,0.424755,3.456364e-01,0.531329,0.000000,0.487663,0.374371,...,6.128612e-01,0.468699,0.611151,0.319653,0.540137,0.372984,4.767542e-01,0.407424,5.223266e-01,0.441965
8,2.071338e-01,5.010610e-01,0.262523,0.441065,0.601762,6.081095e-01,0.284649,0.487663,0.000000,0.389362,...,6.607015e-01,0.563229,0.262255,0.445377,0.215301,0.299106,5.283373e-01,0.238127,1.458029e-01,0.429075
9,3.509272e-01,4.449770e-01,0.498422,0.483954,0.533905,4.744127e-01,0.440067,0.374371,0.389362,0.000000,...,5.294540e-01,0.323660,0.516627,0.433995,0.463708,0.385228,3.320318e-01,0.388419,3.998640e-01,0.452977


How many content items (rows) have a larger mean distance than the overall mean?

In [23]:
btax_dist_df[btax_dist_df['mean'] > btax_dist.mean()].shape

(212, 523)

Now we can use this information to filter the data frame of labelled content items (`labelled`), leaving us with a data frame of the problem content.

We can start by filtering the `labelled` data so we have only the content items that are in the business tax taxon.

In [24]:
btax_content = labelled[labelled['taxon_id'] == btax_id].reset_index()

In [25]:
btax_content

Unnamed: 0,index,base_path,content_id,description,document_type,first_published_at,locale,primary_publishing_organisation,publishing_app,title,body,combined_text,taxon_id,taxon_base_path,taxon_name,level1taxon,level2taxon,level3taxon,level4taxon,level5taxon
0,115137,/government/publications/hidden-economy-unders...,5fe7f08a-7631-11e4-a3cb-005056011aef,research to help hmrc understand and reduce th...,research,2012-12-07T00:00:00.000+00:00,en,HM Revenue & Customs,whitehall,hidden economy: understanding problems for sma...,research report on ways hm revenue and customs...,hidden economy: understanding problems for sma...,28262ae3-599c-4259-ae30-3c83a5ec02a1,/money/business-tax,Business tax,Money,Business tax,,,
1,115138,/government/publications/duty-on-high-strength...,5ab5791d-1643-4da8-8647-bff155cefe89,details of the government’s reforms to the tax...,policy_paper,2017-11-22T13:37:10.000+00:00,en,HM Treasury,whitehall,duty on high strength ciders: autumn budget 20...,following consultation earlier this year autum...,duty on high strength ciders: autumn budget 20...,28262ae3-599c-4259-ae30-3c83a5ec02a1,/money/business-tax,Business tax,Money,Business tax,,,
2,115139,/government/publications/corporation-tax-refun...,5d644c95-7631-11e4-a3cb-005056011aef,response to a freedom of information request o...,foi_release,2011-11-22T00:00:00.000+00:00,en,HM Revenue & Customs,whitehall,corporation tax refunds between 2006 and 2010,response to a freedom of information request f...,corporation tax refunds between 2006 and 2010 ...,28262ae3-599c-4259-ae30-3c83a5ec02a1,/money/business-tax,Business tax,Money,Business tax,,,
3,115140,/government/news/one-million-schemes-use-new-p...,5e2ac439-7631-11e4-a3cb-005056011aef,over one million employer paye schemes have st...,news_story,2013-05-02T12:32:46.000+00:00,en,HM Revenue & Customs,whitehall,one million schemes use new paye system,the new paye reporting system known as real ti...,one million schemes use new paye system over o...,28262ae3-599c-4259-ae30-3c83a5ec02a1,/money/business-tax,Business tax,Money,Business tax,,,
4,115141,/government/publications/devolution-of-landfil...,b1e9af5d-4613-40a0-9532-e8c55f0a23be,legislation will be made to amend the landfill...,policy_paper,2017-12-07T08:45:10.000+00:00,en,HM Revenue & Customs,whitehall,devolution of landfill tax to wales and the 2 ...,landfill tax will be devolved to wales from 1 ...,devolution of landfill tax to wales and the 2 ...,28262ae3-599c-4259-ae30-3c83a5ec02a1,/money/business-tax,Business tax,Money,Business tax,,,
5,115142,/guidance/stamp-duty-land-tax-cross-border-tra...,33a7c6f3-8c5a-4604-9bce-1b89a4dd40f7,find how to make sure you pay the right tax on...,detailed_guide,2018-03-21T16:27:02.000+00:00,en,HM Revenue & Customs,whitehall,stamp duty land tax: cross-border transactions,there’s no stamp duty land tax ( sdlt ) to pay...,stamp duty land tax: cross-border transactions...,28262ae3-599c-4259-ae30-3c83a5ec02a1,/money/business-tax,Business tax,Money,Business tax,,,
6,115143,/government/consultations/technical-consultati...,de4a00f7-f7a1-4305-b477-8017cd1f2e03,this technical consultation seeks comment on d...,consultation_outcome,2015-11-26T09:30:00.000+00:00,en,HM Revenue & Customs,whitehall,technical consultation on companies excluded f...,the chancellor announced at summer budget 2015...,technical consultation on companies excluded f...,28262ae3-599c-4259-ae30-3c83a5ec02a1,/money/business-tax,Business tax,Money,Business tax,,,
7,115144,/government/publications/diverted-profits-tax-...,627cb593-6304-453a-ad57-765b4212a583,this report sets out how hm revenue and custom...,research,2017-09-13T08:30:00.000+00:00,en,HM Revenue & Customs,whitehall,diverted profits tax yield: methodological note,diverted profits tax ( dpt ) was introduced in...,diverted profits tax yield: methodological not...,28262ae3-599c-4259-ae30-3c83a5ec02a1,/money/business-tax,Business tax,Money,Business tax,,,
8,115145,/government/publications/budget-2016-overview-...,c5d6b5eb-a502-49e6-b4c8-fcb201d96da5,tax policy measures announced at budget 2016.,policy_paper,2016-03-16T18:11:00.000+00:00,en,HM Revenue & Customs,whitehall,budget 2016: overview of tax legislation and r...,this document lists the tax policy measures an...,budget 2016: overview of tax legislation and r...,28262ae3-599c-4259-ae30-3c83a5ec02a1,/money/business-tax,Business tax,Money,Business tax,,,
9,115146,/guidance/changes-to-commodity-codes-in-chapte...,1f461191-e932-47b2-9c9e-134b91760eca,find out about the changes to volume 2 of the ...,detailed_guide,2018-02-16T12:00:00.000+00:00,en,HM Revenue & Customs,whitehall,changes to commodity codes in chapter 40 (tari...,chapter 40 delete commodity code 40121200 00 a...,changes to commodity codes in chapter 40 (tari...,28262ae3-599c-4259-ae30-3c83a5ec02a1,/money/business-tax,Business tax,Money,Business tax,,,


Now return content items from the data frame where the mean cosine similarity score is above a threshold value. These are the problem content items. Simplify the output to three columns of interest.

In [26]:
btax_content[['base_path', 'title', 'description']][btax_dist_df['mean'] > 0.65]

Unnamed: 0,base_path,title,description
57,/hmrc-internal-manuals/vat-womens-sanitary-pro...,vat women’s sanitary products,guidance on the reduced rate for women's sanit...
66,/guidance/changes-to-chief-commodity-codes-tar...,changes to chief commodity codes (tariff stop ...,find out the changes to commodity codes in the...
196,/guidance/rates-and-allowances-for-air-passeng...,historic rates for air passenger duty,check which air passenger duty rates apply for...
219,/guidance/air-passenger-duty-and-connected-fli...,air passenger duty and connected flights,check which flights to treat as connected for ...
256,/guidance/rates-and-allowances-for-air-passeng...,rates for air passenger duty,check which rates of air passenger duty you ne...
278,/guidance/poultry-from-iceland-tariff-quota-no...,poultry from iceland (tariff quota notice 73),check the new tariff quota for poultry from ic...
343,/government/publications/iso-country-codes,iso country codes,find out the iso country codes.
397,/government/news/government-to-waive-vat-on-mi...,government to waive vat on military wives’ cha...,chancellor of the exchequer has today announce...
430,/guidance/laser-skin-treatment-and-hair-remova...,laser skin treatment and hair removal (tariff ...,check the tariff classification of electrical ...
478,/government/collections/gwe-rwydo-a-sgamiau,gwe-rwydo a sgamiau,cyngor ar ddiogelwch gan gyllid a thollau em i...


## Function to get odd content

In [38]:
def get_misplaced_content (
    taxon_id = '28262ae3-599c-4259-ae30-3c83a5ec02a1',
    similarity_threshold = 0.65,
    embedded_sentences_data = embedded_sentences,
    labelled_data = labelled
):
    
    """Identify content items that seem out of place in a given taxon.
    The cosine-similarity score (CSS) for each content item is calculated.
    Content items are extracted if their mean score is above a particular threshold (default 0.65).
    """
    
    print('Taxon ID: ', taxon_id)
    print('Similarity threshold:', similarity_threshold)
    
    # Get embeddedings for the specified taxon ID
    taxon_embedded = embedded_sentences[labelled['taxon_id'] == taxon_id]
    
    # Get distances between all content item pairs
    taxon_dist = pairwise_distances(
        taxon_embedded, 
        metric = 'cosine', 
        n_jobs = -1
    )
    
    # As dataframe
    taxon_dist_df = pd.DataFrame(taxon_dist)
    
    # Calculate a mean
    taxon_dist_df['mean'] = taxon_dist.mean(axis = 1)
    
    # Get the rows of the labelled data (content items) that match the taxon ID
    taxon_content = labelled[labelled['taxon_id'] == taxon_id].reset_index()
    
    # Content items that are above the similarity threshold
    misplaced = taxon_content[['content_id', 'base_path', 'title', 'description']][taxon_dist_df['mean'] > similarity_threshold]
    
    return misplaced;
    

In [39]:
get_misplaced_content()

Taxon ID:  28262ae3-599c-4259-ae30-3c83a5ec02a1
Similarity threshold: 0.65


Unnamed: 0,content_id,base_path,title,description
57,f9e12f0e-bd0d-5361-8d26-bc83bfb34729,/hmrc-internal-manuals/vat-womens-sanitary-pro...,vat women’s sanitary products,guidance on the reduced rate for women's sanit...
66,a211f181-1cc0-45c0-8bb6-0491eb67fc92,/guidance/changes-to-chief-commodity-codes-tar...,changes to chief commodity codes (tariff stop ...,find out the changes to commodity codes in the...
196,6f019571-54be-4344-aede-cebd901c1fe5,/guidance/rates-and-allowances-for-air-passeng...,historic rates for air passenger duty,check which air passenger duty rates apply for...
219,e110d285-20e0-431e-a394-39edabb2b331,/guidance/air-passenger-duty-and-connected-fli...,air passenger duty and connected flights,check which flights to treat as connected for ...
256,eb031ebb-7078-4879-a124-33753c4ca0bd,/guidance/rates-and-allowances-for-air-passeng...,rates for air passenger duty,check which rates of air passenger duty you ne...
278,5f60a446-f47c-403a-aab3-bd83db20cf4f,/guidance/poultry-from-iceland-tariff-quota-no...,poultry from iceland (tariff quota notice 73),check the new tariff quota for poultry from ic...
343,6eb3a99b-9a0b-464a-bb42-c08882c7d857,/government/publications/iso-country-codes,iso country codes,find out the iso country codes.
397,5d5afda3-7631-11e4-a3cb-005056011aef,/government/news/government-to-waive-vat-on-mi...,government to waive vat on military wives’ cha...,chancellor of the exchequer has today announce...
430,0fea02ed-c1c8-4502-a7a2-f0ebebe1ee1c,/guidance/laser-skin-treatment-and-hair-remova...,laser skin treatment and hair removal (tariff ...,check the tariff classification of electrical ...
478,e36ebdbf-b8df-4dc8-beb5-beece2f7b7de,/government/collections/gwe-rwydo-a-sgamiau,gwe-rwydo a sgamiau,cyngor ar ddiogelwch gan gyllid a thollau em i...


In [34]:
test

## 3. Taxon could be split 

## 4. Taxon could be merged