Scientific Claim Detection and Classification Using Large Language Models

This repository contains programs for detecting and classifying COVID-19 related tweets into scienitifc claim categories. This work follows the annotation framework described by Hafid et. al.

The repository is divided into two directories - one for Llama 2 models and another one for GPT models. The Llama 2 models contains the following prompting techniques:

Few Shot Prompting
Few Shot Prompting with Guidelines
Few Shot Prompting with Guidelines and Emotional Stimuli
Chain of Thought
Clue and Reasoning Prompting

Setup

Run the following command to install necessary packages for running the standalone python program for `GPT`` models:

# first, activate your virtual environment

pip install openai pandas scikit-learn

Similarly, for Llama 2 models:

pip install requests together langchain pandas scikit-learn

Running the program

To run the python program for GPT models, first set the environment variable, named OPENAI_API_KEY in my case:

setx OPENAI_API_KEY “<yourkey>” # for windows
echo "export OPENAI_API_KEY='yourkey'" >> ~/.bashrc # for Linux
echo "export OPENAI_API_KEY='yourkey'" >> ~/.zshrc # for mac

The Jupyter Notebook for Llama 2 calls TogetherAI's API, a third-party service which hosts several large language models including Llama 2 and provides free credits. You can set the API key for TogetherAI likewise, named TOGETHERAI_API_KEY in this case:

setx TOGETHERAI_API_KEY “<yourkey>” # for windows
echo "export TOGETHERAI_API_KEY='yourkey'" >> ~/.bashrc # for Linux
echo "export TOGETHERAI_API_KEY='yourkey'" >> ~/.zshrc # for mac

To run the Llama 2 notebook in a GPU cluster, like Coloarado State University's Falcon HPC Cluster, you need to craft a shell script with required configuration parameters. Falcon Cluster uses Slurm scheduler to schedule jobs. Once the job is submitted to the cluster, to interact with the notebook, you need to enable port forwarding:

ssh -N -f -R $port:localhost:$port falcon # This port forwards the specified port of HPC cluster to the same port of the machine that was used to submit the job
ssh -N -f -L localhost:$port:localhost:$port <username>@<machine_name>.<domain> # This further forwards the port from the machine that was used to submit the job to your local machine

Dataset

A truncated version of the dataset is available in .csv format.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ChatGPT		ChatGPT
Llama 2		Llama 2
Results		Results
README.md		README.md
tweets.csv		tweets.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ChatGPT

ChatGPT

Llama 2

Llama 2

Results

Results

README.md

README.md

tweets.csv

tweets.csv

Repository files navigation

Scientific Claim Detection and Classification Using Large Language Models

Setup

Running the program

Dataset

Results

About

Releases

Packages

Languages

Tanjim131/Scientific-Claim-Detection-LLM

Folders and files

Latest commit

History

Repository files navigation

Scientific Claim Detection and Classification Using Large Language Models

Setup

Running the program

Dataset

Results

About

Resources

Stars

Watchers

Forks

Languages