In [1]:
# NegativeResultDetector

## SciBERT text classification model for positive and negative results prediction in scientific abstracts
## of clinical psychology and psychotherapy.

# 1. Load Data
## You can load your own data or utilize our example datasets. 
## Your data should be a single column 'csv' file with a column containing scientific abstracts.
## Make sure your text column is named 'text', otherwise replace 'text' within the preprocess function 
## with the name of your text column.  

## We present two options for loading your data:
from datasets import load_dataset

## Option 1: Using Github CSV File
## Insert Github 'raw' url for inference dataset 
example_url =    'https://github.com/PsyCapsLock/NegativeResultDetector/blob/main/Data/Example_Data/example_df.csv?raw=true'
dataset = load_dataset('csv', data_files={'inference': example_url})

## Option 2: Using Local CSV File
dataset = load_dataset('csv', data_files={'inference': "example_folder/example_df.csv"})


# 2. Preprocessing
## Load tokenizer in uncased settings with scivocab
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('allenai/scibert_scivocab_uncased')

## preprocess function
## Make sure your text column is named 'text'. Otherwise replace 'text' with the name of your text column.
def preprocess_function(examples):
    return tokenizer(examples["text"],
                     truncation=True,
                     max_length=512,
                     padding='max_length'
                     )

## map preprocess_function to tokenized_data
tokenized_data = dataset.map(preprocess_function, batched=True)

# 3. Load Model
from transformers import Trainer, AutoModelForSequenceClassification
NegativeResultDetector = AutoModelForSequenceClassification.from_pretrained("ClinicalMetaScience/NegativeResultDetector")

## Initialize the trainer with the model and tokenizer
trainer = Trainer(
    model=NegativeResultDetector,
    tokenizer=tokenizer,
  )

# 4. Prediction
## Apply PubBiasDetect for inference
predict_test=trainer.predict(tokenized_data["inference"])

## Get the predicted class
import numpy as np
predict_test_classes=np.argmax(predict_test.predictions, axis=1)

# 5. Interpretation
## Print the predicted class
print(predict_test_classes)

## 1: Positive Results Only --> All results in the abstract are positive
## 0: Mixed and Negative Results --> At least one negative result in the abstract

Downloading and preparing dataset csv/default to C:/Users/louis/.cache/huggingface/datasets/csv/default-d78b0cce31f182b7/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/1.27k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating inference split: 0 examples [00:00, ? examples/s]

Dataset csv downloaded and prepared to C:/Users/louis/.cache/huggingface/datasets/csv/default-d78b0cce31f182b7/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

Downloading and preparing dataset csv/default to C:/Users/louis/.cache/huggingface/datasets/csv/default-722878ee200f3888/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating inference split: 0 examples [00:00, ? examples/s]

Dataset csv downloaded and prepared to C:/Users/louis/.cache/huggingface/datasets/csv/default-722878ee200f3888/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

Map:   0%|          | 0/3 [00:00<?, ? examples/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/827 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


Downloading pytorch_model.bin:   0%|          | 0.00/440M [00:00<?, ?B/s]

You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


  0%|          | 0/1 [00:00<?, ?it/s]

[1 0 1]
