# Search Terms

This project starts with curated collections of terms, including risk factor terms and potential associations, such as psychological, social, and criminological factors. Automated literature collection then gathers information from papers using those terms, utilizing [LISC](https://lisc-tools.github.io/). 

Current analysis takes two forms:
- `Words` analyses: analyzes text data from articles that discuss risk factors for violence and recidivism
    - This approach collects text and metadata from papers, and builds data-driven profiles for different risk factors
- `Count` analyses: searches for co-occurrences of terms, between risk factors and associated terms
    - This approach identifies patterns based on how commonly terms appear together

This notebook introduces the terms that are used in the project.


In [19]:
import lisc
print(lisc.__version__)
!pip install --upgrade --force-reinstall lisc




0.3.0
Collecting lisc
  Using cached lisc-0.3.0-py3-none-any.whl.metadata (8.3 kB)
Collecting numpy>=1.17.1 (from lisc)
  Using cached numpy-2.0.2-cp39-cp39-macosx_14_0_arm64.whl.metadata (60 kB)
Collecting requests (from lisc)
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting lxml (from lisc)
  Using cached lxml-5.3.1-cp39-cp39-macosx_10_9_universal2.whl.metadata (3.7 kB)
Collecting beautifulsoup4 (from lisc)
  Using cached beautifulsoup4-4.13.3-py3-none-any.whl.metadata (3.8 kB)
Collecting soupsieve>1.2 (from beautifulsoup4->lisc)
  Using cached soupsieve-2.6-py3-none-any.whl.metadata (4.6 kB)
Collecting typing-extensions>=4.0.0 (from beautifulsoup4->lisc)
  Using cached typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting charset-normalizer<4,>=2 (from requests->lisc)
  Using cached charset_normalizer-3.4.1-cp39-cp39-macosx_10_9_universal2.whl.metadata (35 kB)
Collecting idna<4,>=2.5 (from requests->lisc)
  Using cached idna-3.10-py3-none

In [20]:
import lisc.io
print(dir(lisc.io))


['SCDB', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'create_file_structure', 'db', 'io', 'load_api_key', 'load_meta_data', 'load_object', 'load_time_results', 'load_txt_file', 'save_meta_data', 'save_object', 'save_time_results', 'utils']


In [21]:
from collections import Counter

# Import Base LISC object to load and check search terms
from lisc.objects.base import Base
from lisc.io import load_txt_file


In [22]:
import seaborn as sns
sns.set_context('talk')

In [23]:
# Import custom project code
import sys
sys.path.append('../code')
from plts import plot_latencies

In [24]:
# Set the location of the terms
term_dir = '../terms/'

In [25]:
# Load a test object to check the terms
risk_factors = Base()

## Risk Factor Terms

First, we can check the list of search terms used to find articles about risk factors for violence and recidivism.


In [26]:
# Load risk factors and labels terms from file
risk_factors.add_terms('riskfactors.txt', directory=term_dir)
risk_factors.add_labels('riskfactor_labels.txt', directory=term_dir)

In [27]:
# Check the number of risk factor terms
print('Number of risk factor terms: {}'.format(risk_factors.n_terms))

Number of risk factor terms: 62


# Risk Factor Term Formatting

## Explanation of search term formatting
In the list below, the left-most term is the label of the search term (not necessarily used as a search term), with any terms to the right of the colon listing search terms that were used. Any synonyms are separated by commas, and were used together in searches, with an OR operator.


In [28]:
# Check list of search terms for the risk factor categories
risk_factors.check_terms()

List of terms used: 

Single-parent family             : Single-parent family, monoparental family, lone-parent family, single-parent household
Two-parent family                : Two-parent family, nuclear family, dual-parent family, intact family
Extended family                  : Extended family, multigenerational family, joint family, kinship network
Divorced parents                 : Divorced parents, separated parents, dissolved marriage, marital dissolution
Stable family                    : Stable family, intact family, cohesive family, supportive family
Parental conflict                : Parental conflict, interparental conflict, marital conflict, parental discord
Lack of supervision              : Lack of supervision, inadequate supervision, parental neglect, insufficient monitoring
Emotional support                : Emotional support, psychological support, emotional assistance, affective support
Family abuse                     : Family abuse, domestic abuse, familial abuse,

### Risk Factor Exclusion Terms

To exclude articles that might include unrelated meanings of our search terms (for example, the term 'debt' referring to economic policy rather than criminology), 
we use exclusion terms to remove irrelevant papers. 

These terms are integrated into the overall search query using a NOT operator to filter out articles that contain them.


In [29]:
# Add exclusion words
risk_factors.add_terms('erps_exclude.txt', term_type='exclusions', directory=term_dir)

In [30]:
# Check the risk factor exclusion terms used
risk_factors.check_terms('exclusions')

List of exclusions used: 

Single-parent family             : demography, socioeconomic, income, poverty, welfare, child support, obesity, language disorders, surgical complications, asthma, diabetes, cardiovascular disease, hypertension, chronic obstructive pulmonary disease, arthritis, cancer, migraine, epilepsy, hypothyroidism, osteoporosis, gastrointestinal disorders, renal disease
Two-parent family                : demography, economic, household income, population studies, welfare, obesity, language disorders, surgical complications, asthma, diabetes, cardiovascular disease, hypertension, chronic obstructive pulmonary disease, arthritis, cancer, migraine, epilepsy, hypothyroidism, osteoporosis, gastrointestinal disorders, renal disease
Extended family                  : demography, genealogy, cultural studies, economic, population, obesity, language disorders, surgical complications, asthma, diabetes, cardiovascular disease, hypertension, chronic obstructive pulmonary disease, ar

## Association Terms

As well as search terms for risk factors, we collected lists of potential association terms. 

Groups of association terms include:
- violence-related terms
- recidivism-related terms


### Violence-Related Terms

First, we curated a list of violence-related association terms, to investigate research on violence and its risk factors.


In [31]:
# Load violence-related terms from file
violence_terms = Base()
violence_terms.add_terms('violence.txt', directory=term_dir)

In [32]:
# Check the number of violence-related terms
print('Number of violence-related terms: {}'.format(violence_terms.n_terms))

Number of violence-related terms: 4


In [33]:
# Check the violence-related terms used
violence_terms.check_terms()

List of terms used: 

physical violence       : physical violence, violence, assault, battery, physical aggression
sexual violence         : sexual violence, sexual assault, sexual abuse, rape, molestation
psychological violence  : psychological violence, emotional abuse, psychological abuse, mental abuse, verbal abuse
neglect                 : neglect, abandonment, deprivation, disregard


### Recidivism-Related Terms

Finally, we curated a list of recidivism-related terms to search for research related to repeated offending and criminal relapse.


In [34]:
# Load recidivism-related terms from file
recidivism_terms = Base()
recidivism_terms.add_terms('recidivism.txt', directory=term_dir)

In [35]:
# Check the number of recidivism-related terms
print('Number of recidivism-related terms: {}'.format(recidivism_terms.n_terms))


Number of recidivism-related terms: 3


In [36]:
# Check the recidivism-related terms used
recidivism_terms.check_terms()

List of terms used: 

violent recidivism  : violent recidivism, reoffending, relapse into violence, repeat violence, violent repeat offending
general recidivism  : general recidivism, recidivism, reoffending, repeat offending, criminal relapse
sexual recidivism   : sexual recidivism, sexual recidivism, sexual reoffending, sexual repeat offending, sexual relapse
