In [1]:
import pandas as pd
import requests
import json
import numpy as np

from sklearn.metrics.pairwise import cosine_similarity

pd.set_option('display.max_colwidth', 0)

# Article Similarity

This notebook will demonstrate the method of cosine simliarity ranking. For this demonstration, our seed articles of interest will be from the "Pilots" project.  We'll then rank each of articles from the "Target 1" project by how semantically similiar their titles and abstracts are to the titles and abstracts in a representative "Pilots" project embedding.

In [2]:
### DATA ###
df_all = pd.read_csv("../data/target1_cleaned.csv")
df_pilot = df_all[df_all["Project"] == "Pilots"]
df_t1 = df_all[df_all["Project"] == "t1"]

### EMBEDDINGS ###
with open("../embeddings/specter_embeddings_target1.json", "r") as fp:
    specter_json = json.load(fp)
specter_pilot_embed = [i["embedding"] for i in specter_json if i["id"] in df_pilot.index]
specter_t1_embed = [i["embedding"] for i in specter_json if i["id"] in df_t1.index]

In [3]:
# Create mean embedding for the Pilots papers
mean_embed = np.array(specter_pilot_embed).mean(axis=0).reshape(1, -1)

# Calculate cosine simliarity of each title and abstract text in Target 1 to the mean embedding for the Pilots papers
cosine_sim = cosine_similarity(mean_embed, specter_t1_embed)

# Append cosine simliarity to dataframe
df_t1.loc[:,"cosine_similarity"] = cosine_sim[0]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_t1.loc[:,"cosine_similarity"] = cosine_sim[0]


### Articles with Highest Cosine Simliarity

In [4]:
df_t1[["Refid","Title","Abstract","cosine_similarity"]].sort_values(["cosine_similarity"], ascending=False).head()

Unnamed: 0,Refid,Title,Abstract,cosine_similarity
439,8632166,Thyroid Hormones and Derivatives: Endogenous Thyroid Hormones and Their Targets,"More than a century after the discovery of L-Thyroxine, the main thyroid hormone secreted solely by the thyroid gland, several metabolites of this iodinated, tyrosine-derived ancestral hormone have been identified. These are utilized as hormones during development, differentiation, metamorphosis, and regulation of most biochemical reactions in vertebrates and their precursor species. Among those metabolites are the thyromimetically active 3,3', 5-Triiodo-L-thyronine (T3) and 3,5-Diiodo-L-thronine, reverse-T3 (3,3', 5'-Triiodo-L-thyronine) with still unclear function, the recently re-discovered thyronamines (e.g., 3-Iodo-thyronamine), which exert in part T3-antagonistic functions, the thyroacetic acids (e.g., Tetrac and Triac), as well as various sulfated or glucuronidated metabolites of this panel of iodinated signaling compounds. In the blood most of these hydrophobic metabolites are tightly bound to the serum distributor proteins thyroxine binding globulin (TBG), transthyretin (TTR), albumin or apolipoprotein B100. Cellular import and export of these charged, highly hydrophobic amino acid derivatives requires a number of cell-membrane transporters or facilitators such as MCT8 or MCT10 and members of the OATP and LAT families of transporters. Depending on their structure, the thyroid hormone metabolites exert their cellular action by binding and thus modulating the function of various receptors systems (e.g., alpha nu beta 3 integrin receptor and transient receptor potential channels (TRPM8) of the cell membrane), in part linked to intracellular downstream kinase signaling cascades, and several isoforms of membrane-associated, mitochondrial or nuclear thyroid hormone receptors (TR), which are members of the c-erbA family of ligand-modulated transcription factors. Intracellular deiodinase selenoenzymes, which obligatory are membrane integrated enzymes, ornithine decarboxylase and monoamine oxidases control local availability of biologically active thyroid hormone metabolites. Inactivation of thyroid hormone metabolites occurs mainly by deiodination, sulfation or glucuronidation, reactions which favor their renal or fecal elimination.",0.931197
723,7973177,Chemistry and Biology in the Biosynthesis and Action of Thyroid Hormones,"Thyroid hormones (THs) are secreted by the thyroid gland. They control lipid, carbohydrate, and protein metabolism, heart rate, neural development, as well as cardiovascular, renal, and brain functions. The thyroid gland mainly produces l-thyroxine (T4) as a prohormone, and 5'-deiodination of T4 by iodothyronine deiodinases generates the nuclear receptor binding hormone T3. In this Review, we discuss the basic aspects of the chemistry and biology as well as recent advances in the biosynthesis of THs in the thyroid gland, plasma transport, and internalization of THs in their target organs, in addition to the deiodination and various other enzyme-mediated metabolic pathways of THs. We also discuss thyroid hormone receptors and their mechanism of action to regulate gene expression, as well as various thyroid-related disorders and the available treatments.",0.928544
922,1300760,3-Iodothyronamine is an endogenous and rapid-acting derivative of thyroid hormone,"Thyroxine (T(4)) is the predominant form of thyroid hormone (TH). Hyperthyroidism, a condition associated with excess TH, is characterized by increases in metabolic rate, core body temperature and cardiac performance. In target tissues, T(4) is enzymatically deiodinated to 3,5,3'-triiodothyronine (T(3)), a high-affinity ligand for the nuclear TH receptors TR alpha and TR beta, whose activation controls normal vertebrate development and physiology. T(3)-modulated transcription of target genes via activation of TR alpha and TR beta is a slow process, the effects of which manifest over hours and days. Although rapidly occurring effects of TH have been documented, the molecules that mediate these non-genomic effects remain obscure. Here we report the discovery of 3-iodothyronamine (T(1)AM), a naturally occurring derivative of TH that in vitro is a potent agonist of the G protein-coupled trace amine receptor TAR1. Administering T(1)AM in vivo induces profound hypothermia and bradycardia within minutes. T(1)AM treatment also rapidly reduces cardiac output in an ex vivo working heart preparation. These results suggest the existence of a new signaling pathway, stimulation of which leads to rapid physiological and behavioral consequences that are opposite those associated with excess TH.",0.921258
674,7651060,Thyroid hormone transporters in health and disease,"Cellular entry is required for conversion of thyroid hormone by the intracellular deiodinases and for binding of 3,3',5-triiodothyronine (T(3)) to its nuclear receptors. Recently, several transporters capable of thyroid hormone transport have been identified. Functional expression studies using Xenopus laevis oocytes have demonstrated that organic anion transporters (e.g., OATPs), and L-type amino acid transporters (LATs) facilitate thyroid hormone uptake. Among these, OATP1C1 has a high affinity and specificity for thyroxine (T(4)). OATP1C1 is expressed in capillaries throughout the brain, suggesting it is critical for transport of T(4) over the blood-brain barrier. We have also characterized a member of the monocarboxylate transporter family, MCT8, as a very active and specific thyroid hormone transporter. Human MCT8 shows preference for T(3) as the ligand. MCT8 is highly expressed in liver and brain but is also widely distributed in other tissues. The MCT8 gene is located on the X chromosome. Recently, mutations in MCT8 have been found to be associated with severe X-linked psychomotor retardation and strongly elevated serum T(3) levels.",0.918188
611,8627688,Novel thyroid hormones,"The field of thyroid hormone signaling has grown more complex in recent years. In particular, it has been suggested that some thyroid hormone derivatives, tentatively named ""novel thyroid hormones"" or ""active thyroid hormone metabolites"", may act as independent chemical messengers. They include 3,5-diiodothyronine (T2), 3-iodothyronamine (T1AM), and several iodothyroacetic acids, i.e., 3,5,3',5'-thyroacetic acid (TA4), 3,5,3'-thyroacetic acid (TA3), and 3-thyroacetic acid (TA1). We summarize the present knowledge on these compounds, namely their biosynthetic pathways, endogenous levels, molecular targets, and the functional effects elicited in experimental preparations or intact animals after exogenous administration. Their physiological and pathophysiological role is discussed, and potential therapeutic applications are outlined. The requirements needed to qualify these substances as chemical messengers must still be validated, although promising evidence has been collected. At present, the best candidate to the role of independent chemical messenger appears to be T1AM, and its most interesting effects concern metabolism and brain function. The responses elicited in experimental animals have suggested potential therapeutic applications. TA3 has an established role in thyroid hormone resistance syndromes, and is under investigation in Allen-Herndon-Dudley syndrome. Other potential targets are represented by obesity and dyslipidemia (for T2 and T1AM); dementia and degenerative brain disease (for T1AM and TA1); cancer (for T1AM and TA4). Another intriguing and unexplored question is the potential relevance of these metabolites in the clinical picture of hypothyroidism and in the response to replacement therapy.",0.914047


### Articles with Lowest Cosine Simliarity

In [5]:
df_t1[["Refid","Title","Abstract","cosine_similarity"]].sort_values(["cosine_similarity"]).head()

Unnamed: 0,Refid,Title,Abstract,cosine_similarity
419,8593286,Nucleation of amyloidogenesis in infectious and noninfectious amyloidoses of brain,,0.491662
126,8584020,Implantable cardioverter-defibrillator placement in patients with cardiac amyloidosis,,0.506563
152,8583621,Malnutrition as measured by albumin and prealbumin on admission is associated with poor outcomes after severe traumatic brain injury,,0.507448
535,8617642,The fractionation of cerebrospinal fluid proteins by cellulose acetate electrophoresis in children with infectious diseases of the central nervous system (author's transl),"The increased permeability of the blood-brain barrier during acute inflammation of the central nervous system leads to changes of the cerebrospinal fluid (C.S.F.) protein pattern. Initially, in the cases of bacterial meningitis, cellulos acetate electrophoresis revealed decreased prealbumin, albumin and tau-globulin fraktion whereas alpha- and gamma-globulin fractions were found increased. In later stages of purulent inflammation a hydrocephalus occurred in five children, associated with an increased amount of albumin in the C.S.F. Cases of viral meningoencephalitis had a characteristic decrease of prealbumin and increase of gamma-globulin, the lowered prealbumin values were found more often. In three cases of congenital encephalitis pathological patterns of C.S.F. proteins were still found 1--1 1/2 years postpartum. Children with acute peripheral facial palsy and febrile convulsions had a normal C.S.F. protein profile.",0.553917
427,8600993,A case of cerebral amyloid angiopathy-type hereditary ATTR amyloidosis with Y69H (p.Y89H) variant displaying transient focal neurological episodes as the main symptom,,0.55525
