https://pytorch.org/get-started/locally/
pip install tqdm
pip install transformers
pip install pandas
pip install numpy
pip install requests
pip install beautifulsoup4
- Amazon Product Dataset : https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/
- Cosmetic Product : https://www.kaggle.com/datasets/kingabzpro/cosmetics-datasets
- Sephora Dataset : https://www.kaggle.com/datasets/raghadalharbi/all-products-available-on-sephora-website
- BERT_Domain_Training.ipynb: Pretraining of BERT using MLM
- BERT_Classification_Task_Training.ipynb: Fine tuning of BERT Model after Weight Transfer from MLM BERT
- BERT_Task_Training.ipynb: Downstream Task Finetuning on MLM BERT (Not Recommended)
- CheMapBERT.ipynb: Implementation of BERT fine tuning w/ External Knowledge
- External_Knowledge_Retrieval.ipynb: Contains code for external knowledge retrieval
- Label_Merging_591.xlsx: Future Work Label Merging (Under Work)
- Download all the ipynb files from the Repository
- Download all the datasets using the links provided
- Run BERT_Domain_Training.ipynb for the pretraining and then store the weights (in the same folder)
- Run CheMapBERT/BERT_Classification_Task_Training files to fine tune the pretrained model on the downstream task with and without External Knowledge Incorporation
- Try to use other Faster models (less parameters) - such as DistilBERT and Larger Models - such as RoBERTa
- Query to filter information into various types: Filter down the information provided through external knowledge by using FAISS or ElasticSearch
- Merging of Labels in order to have more samples for each class/ Introduce more datasets in fine-tuning