# Infer Neuro Behavioral Ontology terms from text


Two big steps:
* Use zero-shot learning as an NER to extract behavioral terms from specific sections of research papers.
* Use vector databases to ground the terms to specific concepts in the [Neuro Behavioral Ontology](https://www.ebi.ac.uk/ols4/ontologies/nbo?tab=classes).


![Alt text](<2023-070-27(Neuro Behavioral Ontology).svg>)



# Pipeline

In [1]:
import pprint
from pathlib import Path
import os 


from repo_secrets import OPENAI_API_KEY
from repo_secrets import QDRANT_API_KEY

os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
os.environ["QDRANT_API_KEY"] = QDRANT_API_KEY

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Methods text from the paper

In [2]:
# https://www.nature.com/articles/s41586-021-03814-7#Sec9

co_housing_text = (
"""Co-housing
Pup-naive C57Bl/6 virgin female mice were bred and raised at NYU School of Medicine and kept isolated from dams and pups until used for these studies when approximately eight weeks old. For experiments where viral injections were performed, we first allowed two weeks for viral expression before animals were used in experiments. Dams were initially pre-screened to ensure they behaved maternally, meaning that they retrieved pups and built nests; about 1% of dams did not retrieve pups and these animals were not used for co-housing. Naive virgins were initially pre-screened for retrieval or pup mauling before co-housing; around 5% of the naive virgins retrieved at least one pup or mauled pups during pre-screening and these mice were excluded from subsequent behavioural studies.

Co-housing of a virgin female with a mother and litter was conducted for 4–6 consecutive days in 80 × 40 × 50 cm plastic home cages. The floor was covered with abundant bedding material, food pellets and a pack of hydrogel for hydration placed in a corner of the bin and refreshed daily. Nesting material was also placed in the cage. We first placed the dam and her postnatal day 1 (P1) litter in the cage. After the dam was acclimatized for ~30 min, we introduced the virgin female with a tail mark for identification. Well-being of the adult mice and pups was monitored at least twice a day. A surveillance infrared camera system (Blackrock Microsystems) was positioned ~100 cm above the home cage to capture the entire surface. An ultrasonic microphone (Avisoft) was placed in the corner of the cage, ~10 cm above the nest. Two initial cages had a second camera placed on the side but these videos were not analysed for these experiments. For studies of spontaneous pup retrieval by dams and the influence of co-housing, some dams were singly housed with their litter but not with other adults.

In cases where co-housing was done only between a virgin female and pups (Fig. 1c), the pups were returned to the donor mother every 12 h (for at least 48 h) and immediately replaced with new pups. This was done to ensure that they stay alive and healthy despite not being fed during co-housing with the virgin. The procedure was repeated throughout the duration of the co-housing27.
"""
)

pup_retrieval_text = (
"""
Pup retrieval testing
This test was used for the initial screening of dams and virgin female mice. In addition, outside of the spontaneous home cage behaviours, we specifically monitored pup retrieval every 24 h by the virgin females. We placed the female mouse to be tested in a behavioural arena (38 × 30 × 15 cm) containing bedding and nesting material; the female was alone, without contact with other animals. Each animal was given 20 min to acclimatize before each testing session began. The entire litter (ranging from 3 to 7 P1–4 pups) were grouped in a corner of the arena and covered with nesting material, and the adult female given an additional 2 min of acclimatization (pup group size did not affect retrieval behaviour; Extended Data Fig. 2c). One pup was removed from the nest and placed in an opposite corner of the arena. The experimental female was given 2 min per trial to retrieve the displaced pup and return it back to the nest; if the displaced pup was not retrieved within 2 min, the pup was returned to the nest and the trial was scored as a failure. If the pup was successfully retrieved, the time to retrieval was recorded and the trial was scored as a success. Another pup was then taken out of the nest, placed away from the nest (varying the position of the isolated pup relative to the nest from trial to trial), and the next trial was begun. After ten trials, pups were placed back into their home cage with their dam. We used an ultrasonic microphone (Avisoft) to verify that isolated pups vocalized during testing.

We reported probability of retrieving out of ten trials. Reliable retrieval was defined as having at least two out of ten successful trials. We used two-way ANOVA and Sidak’s multiple-comparison test corrections to compare probability of retrieving in each group over days, and Student’s t-test to compare the day of retrieval onset for each group.
"""
)

video_and_audio_analysis = (
"""
Video and audio analysis
Video and audio recordings were synchronized with the neuronal recordings, and then analysed with Adobe Audition and Avisoft. For video recordings we used the BORIS suite for scoring of behavioural observations. Three separate teams of independent scorers (two scorers from the Sullivan laboratory, three scorers from the Carcea laboratory and four scorers from the Froemke laboratory) were trained in a similar way on how to identify relevant individual and social behaviours during co-housing, and then scored the videos blind to the conditions. The results from each raster were compared and compiled, and results from each lab were cross-validated. Nest entry was considered the moment when the head of the animal entered the nest. Nest exit was considered the time when the rear of the animal left the nest. We used two-way ANOVA and Sidak’s multiple-comparison test to compare pup retrieval rates and time in nest across days for each group.

Any event in which the dam chased the virgin towards the nest was identified as a shepherding event (that is, where distance from start to nest was greater than distance from end to nest). To determine the distance from nest during shepherding, we measured the distance from the bottom left corner of the cage to the position of the snout of the mouse, and to the position of the nest center. We then calculated distance from the virgin to nest. In cases of physical contact, start of shepherding was considered to be the moment when the dam made contact with the virgin, and the end of shepherding was the moment when the virgin stopped running. In some cases (especially later into co-housing), we noticed that virgins started running as soon as they noticed the dam approaching; in those cases, the start of shepherding was considered to be the moment when the virgins started running after the dam’s approach. For Fig. 1i, we used paired t-tests to compare distance from start of shepherding to nest with the distance from end of shepherding to nest. For Fig. 1j, we used one-sample Student’s t-tests to determine if the daily frequency of shepherding was higher than 0.2 events per h (which was the average rate of dam–virgin chases in absence of pups). Audio recordings were processed in Adobe Audition, and isolation or distress calls were distinguished from adult calls and wriggling calls on the basis of the characteristic statistics (bout rate of 4–8 Hz and frequencies of 40–90 kHz).
"""
)


observation_of_experienced_retrievers = (
"""
Observation of experienced retrievers
We first confirmed that virgins did not retrieve and dams retrieved at 100% at baseline. The exposures were done in standard behavioural arena (38 × 30 × 15 cm). The virgin and dam were acclimatized for 20 min, then the nest with pups was transferred to this arena. After another 5–10 min, we manually isolated one pup at a time so that the dam would retrieve the pup back into the nest. We repeated this for ten times per session. In the experiments where either a transparent or an opaque divided the cage, the two adult animals were acclimatized on opposite sides of the barriers. After exposure, the adult animals were separated and the virgins were tested for pup retrieval 30 min later, as described above. As the preparation for testing and the acclimatization to the testing cage also took 30 min, this amounted to a total 60-min interval between virgin observation and testing of responses to isolated pups. The exposure was repeated for four sessions (one per day). A virgin that retrieved at least once during the four days of observation was considered as having acquired pup retrieval behaviour. We used chi-square exact tests to compare retrieval between conditions: wild-type mice with no barrier, wild-type mice with transparent barrier, wild-type mice with opaque barrier, and OXTR-KO virgins with transparent barrier.  
"""
)

methods_text = (
    f"{co_housing_text} {pup_retrieval_text} {video_and_audio_analysis} {observation_of_experienced_retrievers}" 
)

# The following can also be used to test other methods:
# import json
# methods_dict = json.load(open('./data/methods_dictionary.json'))
# section = methods_dict['brezovec']

pprint.pprint(methods_text)

('Co-housing\n'
 'Pup-naive C57Bl/6 virgin female mice were bred and raised at NYU School of '
 'Medicine and kept isolated from dams and pups until used for these studies '
 'when approximately eight weeks old. For experiments where viral injections '
 'were performed, we first allowed two weeks for viral expression before '
 'animals were used in experiments. Dams were initially pre-screened to ensure '
 'they behaved maternally, meaning that they retrieved pups and built nests; '
 'about 1% of dams did not retrieve pups and these animals were not used for '
 'co-housing. Naive virgins were initially pre-screened for retrieval or pup '
 'mauling before co-housing; around 5% of the naive virgins retrieved at least '
 'one pup or mauled pups during pre-screening and these mice were excluded '
 'from subsequent behavioural studies.\n'
 '\n'
 'Co-housing of a virgin female with a mother and litter was conducted for 4–6 '
 'consecutive days in 80 × 40 × 50 cm plastic home cages. The floor

## Zero shot Named Entity Recognition with LLMs

In [3]:
from utils.behavior_metadata_extraction import extract_behavior_metadata_from

section = methods_text
behaviors = extract_behavior_metadata_from(section)
behaviors

['Pup retrieval: The act of the adult female mouse retrieving a displaced pup and returning it back to the nest.',
 'Nest entry: The moment when the head of the animal enters the nest.',
 'Nest exit: The time when the rear of the animal leaves the nest.',
 'Shepherding event: Any event in which the dam chases the virgin towards the nest, with the distance from start to nest being greater than the distance from end to nest.',
 'Isolation or distress calls: Calls made by the pups that indicate isolation or distress, distinguished from adult calls and wriggling calls based on characteristic statistics (bout rate of 4-8 Hz and frequencies of 40-90 kHz).',
 'Acquired pup retrieval behavior: When a virgin female mouse retrieves at least once during the four days of observation, indicating the acquisition of pup retrieval behavior.']

## Ground plain metadata in ontologies with vector databases (Entity Linking)

In [4]:
from utils.behavior_metadata_extraction import ground_metadata_in_ontologies

terms_list = behaviors
queries_response_list = ground_metadata_in_ontologies(term_list=behaviors)

In [5]:
best_results = [(q['names'][0], q["ids"][0], q["urls"][0], q["context"]) for q in queries_response_list]
best_results

[('flee',
  'NBO:0020268',
  'http://purl.obolibrary.org/obo/NBO_0020268',
  'Nest exit: The time when the rear of the animal leaves the nest.'),
 ('mouth brooding',
  'NBO:0020102',
  'http://purl.obolibrary.org/obo/NBO_0020102',
  'Nest entry: The moment when the head of the animal enters the nest.'),
 ('agonistic chase',
  'NBO:0020141',
  'http://purl.obolibrary.org/obo/NBO_0020141',
  'Shepherding event: Any event in which the dam chases the virgin towards the nest, with the distance from start to nest being greater than the distance from end to nest.'),
 ('aggressive behavior towards female mice',
  'NBO:0000111',
  'http://purl.obolibrary.org/obo/NBO_0000111',
  'Acquired pup retrieval behavior: When a virgin female mouse retrieves at least once during the four days of observation, indicating the acquisition of pup retrieval behavior.'),
 ('distress signaling',
  'NBO:0020149',
  'http://purl.obolibrary.org/obo/NBO_0020149',
  'Isolation or distress calls: Calls made by the pups

# Post Processing (optional showcase)
## High recall method followed by reranking with high precision

Another popular methodology with information retrieval is to use a high recall method to extract a large number of terms and then re-rank them with a more specific method. 

 This is a proof of concept of how this could be accomplished building upon the previous machinery.

The idea is simple, instead of only returning the best result from semantic similarity, we return the top 10 results and then re-rank them with either an LLM or a more traditional NLP method such as bm25 to prune down the results


In [6]:
neuro_ontology_terms_found = dict()
urls_per_term = dict()

for query_response in queries_response_list:
    context_term = query_response["context"]
    neuro_ontology_terms_found[context_term] = query_response["names"]
    urls_per_term[context_term] = query_response["urls"]
    
    
neuro_ontology_terms_found


{'Nest exit: The time when the rear of the animal leaves the nest.': ['flee',
  'flush prey',
  'agonistic chase',
  'chase prey',
  'exclusion',
  'nest building behavior',
  'capturing prey',
  'stalk prey',
  'distress signaling',
  'offspring retrieval'],
 'Nest entry: The moment when the head of the animal enters the nest.': ['mouth brooding',
  'nest building behavior',
  'facilitating oviposition',
  'capturing prey',
  'courtship begging',
  'flush prey',
  'courtship feeding',
  'agonistic chase',
  'clutching reflex',
  'trapping behavior'],
 'Shepherding event: Any event in which the dam chases the virgin towards the nest, with the distance from start to nest being greater than the distance from end to nest.': ['agonistic chase',
  'chase prey',
  'hunting behavior',
  'mate guarding',
  'mating amplexus',
  'courtship feeding',
  'female courtship behavior',
  'sexual harassment',
  'courtship begging',
  'flush prey'],
 'Acquired pup retrieval behavior: When a virgin femal

In [7]:
from utils.behavior_metadata_extraction import rerank_with_open_ai_use_name_matching


ontology_terms = rerank_with_open_ai_use_name_matching(queries_response_list, top=3, verbose=False)
ontology_terms

[{'names': ['flee'],
  'id': ['NBO:0020268'],
  'context': 'Nest exit: The time when the rear of the animal leaves the nest.',
  'url': ['http://purl.obolibrary.org/obo/NBO_0020268']},
 {'names': ['nest building behavior'],
  'id': ['NBO:0000157'],
  'context': 'Nest entry: The moment when the head of the animal enters the nest.',
  'url': ['http://purl.obolibrary.org/obo/NBO_0000157']},
 {'names': ['agonistic chase'],
  'id': ['NBO:0020141'],
  'context': 'Shepherding event: Any event in which the dam chases the virgin towards the nest, with the distance from start to nest being greater than the distance from end to nest.',
  'url': ['http://purl.obolibrary.org/obo/NBO_0020141']},
 {'names': ['offspring retrieval'],
  'id': ['NBO:0000155'],
  'context': 'Acquired pup retrieval behavior: When a virgin female mouse retrieves at least once during the four days of observation, indicating the acquisition of pup retrieval behavior.',
  'url': ['http://purl.obolibrary.org/obo/NBO_0000155']},

In [8]:
from utils.behavior_metadata_extraction import rerank_with_bm25

ontology_terms = rerank_with_bm25(queries_response_list, top=3)
ontology_terms

[[('offspring retrieval',
   'NBO:0000155',
   3.120781714439237,
   'http://purl.obolibrary.org/obo/NBO_0000155'),
  ('agonistic chase',
   'NBO:0020141',
   0.6602724432751526,
   'http://purl.obolibrary.org/obo/NBO_0020141'),
  ('exclusion',
   'NBO:0020220',
   0.5159854785515056,
   'http://purl.obolibrary.org/obo/NBO_0020220')],
 [('agonistic chase',
   'NBO:0020141',
   1.827029865134796,
   'http://purl.obolibrary.org/obo/NBO_0020141'),
  ('mouth brooding',
   'NBO:0020102',
   0.9986649842882978,
   'http://purl.obolibrary.org/obo/NBO_0020102'),
  ('facilitating oviposition',
   'NBO:0020048',
   0.9344650924411928,
   'http://purl.obolibrary.org/obo/NBO_0020048')],
 [('mate guarding',
   'NBO:0020054',
   7.073883412703765,
   'http://purl.obolibrary.org/obo/NBO_0020054'),
  ('flush prey',
   'NBO:0020045',
   4.1721630034123365,
   'http://purl.obolibrary.org/obo/NBO_0020045'),
  ('female courtship behavior',
   'NBO:0000638',
   3.957098296745021,
   'http://purl.obolibrary