<a href="https://colab.research.google.com/github/SeitaroShinagawa/bibtex-abbreviator/blob/main/bibtex_abbreviator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

bibtexによる参考文献リストを短縮形にすることができます。

- 短縮形をサポートしていない国際会議・論文誌名は適宜追加する必要があります。
- 短縮がうまくいかない例外も生じることがあります。必ず短縮後のファイルをご確認ください（作成者は一切の責任を取りません）


In [1]:
# install library
!pip install bibtexparser

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
# Example of bibtex file

bibtex = """@ARTICLE{Cesar2013,
  author = {Jean César},
  title = {An amazing title},
  year = {2013},
  volume = {12},
  pages = {12--23},
  journal = {Nice Journal},
  abstract = {This is an abstract. This line should be long enough to test
     multilines...},
  comments = {A comment},
  keywords = {keyword1, keyword2}
}

@inproceedings{DBLP:journals/corr/KingmaW13,
  author    = {Diederik P. Kingma and
               Max Welling},
  editor    = {Yoshua Bengio and
               Yann LeCun},
  title     = {Auto-Encoding Variational Bayes},
  booktitle = {2nd International Conference on Learning Representations, {ICLR} 2014,
               Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings},
  year      = {2014},
  url       = {http://arxiv.org/abs/1312.6114},
  timestamp = {Thu, 04 Apr 2019 13:20:07 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/KingmaW13.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

@InProceedings{Plummer_2017_ICCV,
author = {Plummer, Bryan A. and Mallya, Arun and Cervantes, Christopher M. and Hockenmaier, Julia and Lazebnik, Svetlana},
title = {Phrase Localization and Visual Relationship Detection With Comprehensive Image-Language Cues},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2017}
}

@article{chen2020scanrefer,
    title={ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language},
    author={Chen, Dave Zhenyu and Chang, Angel X and Nie{\ss}ner, Matthias},
    journal={16th European Conference on Computer Vision (ECCV)},
    year={2020}
}
@ARTICLE{Dancette2021-pq,
  title         = "Beyond {Question-Based} Biases: Assessing Multimodal
                   Shortcut Learning in Visual Question Answering",
  author        = "Dancette, Corentin and Cadene, Remi and Teney, Damien and
                   Cord, Matthieu",
  abstract      = "We introduce an evaluation methodology for visual question
                   answering (VQA) to better diagnose cases of shortcut
                   learning. These cases happen when a model exploits spurious
                   statistical regularities to produce correct answers but does
                   not actually deploy the desired behavior. There is a need to
                   identify possible shortcuts in a dataset and assess their
                   use before deploying a model in the real world. The research
                   community in VQA has focused exclusively on question-based
                   shortcuts, where a model might, for example, answer ``What
                   is the color of the sky'' with ``blue'' by relying mostly on
                   the question-conditional training prior and give little
                   weight to visual evidence. We go a step further and consider
                   multimodal shortcuts that involve both questions and images.
                   We first identify potential shortcuts in the popular VQA v2
                   training set by mining trivial predictive rules such as
                   co-occurrences of words and visual elements. We then create
                   VQA-CE, a new evaluation set made of CounterExamples i.e.
                   questions where the mined rules lead to incorrect answers.
                   We use this new evaluation in a large-scale study of
                   existing models. We demonstrate that even state-of-the-art
                   models perform poorly and that existing techniques to reduce
                   biases are largely ineffective in this context. Our findings
                   suggest that past work on question-based biases in VQA has
                   only addressed one facet of a complex issue. The code for
                   our method is available at
                   https://github.com/cdancette/detect-shortcuts",
  month         =  "apr",
  year          =  2021,
  archivePrefix = "arXiv",
  primaryClass  = "cs.CV",
  eprint        = "2104.03149"
}
"""

with open('bibtex.bib', 'w') as bibfile:
    bibfile.write(bibtex)

In [7]:
#@title bibtex.bibを自身のファイルで置き換えて実行（出力：new_bibtex.bib） { display-mode: "form" }
bibtex_path = "/content/bibtex.bib" #@param {type:"string"} 

import os
from pprint import pprint

# 短縮形をサポートしている会議名のキーワード
conf_keywords = { "ECCV": ["European", "Computer Vision",],
              "ICCV": ["International Conference on Computer Vision",],
              "AAAI": ["Association for the Advancement of Artificial Intelligence",],
              "CVPR": ["Computer Vision and Pattern Recognition",],
              "ICLR": ["Learning Representations",],
              "NeurIPS": ["Neural", "Systems",],
              "WACV": ["Winter Conference on Applications of Computer Vision",],
              "ACL": ["Association for Computational Linguistics",],
              "EMNLP": ["Empirical Methods in Natural Language Processing",],
              "NAACL": ["North American", "Association for Computational Linguistics",],
              "IJCAI": ["International Joint Conferences on Artificial Intelligence Organization"],
              "ICML": ["International Conference on Machine Learning"],
              "ITCA": ["Information Technology", "Computer Application"],              
              }

for key, keywords in conf_keywords.items():
    lis = []
    for words in keywords:
        lis.append(words.lower())
    conf_keywords[key] = lis

print("サポートしている短縮形")
pprint(conf_keywords)

# 国際会議名の短縮用関数
def abbriviate_conference(conference_name, conferences_dict):
    """
    return renamed, is_renamed
    """
    for abb, fullname in conferences_dict.items():
        if abb in conference_name:
            is_renamed = True
            return abb, is_renamed
    
    # abbriation not found -> exhaust search
    lower_conference_name = conference_name.lower()
    for abb, keywords in conferences_dict.items():
        count = 0
        for keyword in keywords:
            if keyword in lower_conference_name:
                count += 1
        if count == len(keywords):
            is_renamed = True
            return abb, is_renamed
    
    is_renamed = False
    return conference_name, is_renamed

#TODO: "NIPS" -> "NeurIPS"

# 著者名の短縮関数(3名以上の著者は「第一著者 et al.」に)
def rename_to_etal(authors_name):
    authors = authors_name.split(" and")
    if len(authors) > 2:
        return authors[0]+" and others"
    else:
        return authors_name

# 前処理：月が小文字だった場合の大文字化処理
months = ["jan", "feb", "mar", "apr", "may", "june", "jul", "aug", "sep", "oct", "nov", "dec"]
if os.path.isfile("cleaned_bib.bib"):
    os.remove("cleaned_bib.bib")
with open(bibtex_path) as bibtex_file:
    lines = []
    for line in bibtex_file:
        line = line.strip()
        if "month" in line:
            assert "=" in line
            words = line.split("=")
            assert len(words)==2
            month_src = words[1].lower()
            for month in months:
                if month in month_src:
                    break
            month = month.capitalize()+"."
            line = "= ".join([words[0], '"%s"' % month])+","
        lines.append(line)
    with open("cleaned_bib.bib", "w") as f:
        for line in lines:
            print(line, file=f)

# メインの処理
import bibtexparser
from bibtexparser.bwriter import BibTexWriter
from bibtexparser.bibdatabase import BibDatabase

with open('cleaned_bib.bib') as bibtex_file:
    bib_db = bibtexparser.load(bibtex_file)
    os.remove("cleaned_bib.bib")

keys = {}
for dickey, dic in bib_db.get_entry_dict().items():
    for key in dic.keys():
        if key not in keys:
            keys[key] = 1
        else:
            keys[key] += 1
lis = [(key,value) for key,value in keys.items()]
lis.sort(key=lambda x:x[1])


for dickey, dic in bib_db.get_entry_dict().items():
    conf_keys = ["journal", "booktitle"]
    if "journal" in dic and "booktitle" in dic:
        print("Both journal and booktitle appeared in %s" % dickey)
        raise AssertionError
    elif "journal" in dic:
        conf_key = "journal"
    elif "booktitle" in dic:
        conf_key = "booktitle"
    else:
        conf_key = None

    if conf_key is not None:
        abb, is_renamed = abbriviate_conference(dic[conf_key], conf_keywords)
        if is_renamed:
            dic[conf_key] = "Proc. " + abb #"Proceedings of "

            if dic['ENTRYTYPE'] != 'inproceedings':
                dic['ENTRYTYPE'] = 'inproceedings'
    
    if True:
        if 'author' not in dic:
            print('The folloiwng element has no "author" key')
            pprint(dic)
        else:
            dic['author'] = rename_to_etal(dic['author'])

        if dic['ENTRYTYPE'] != 'journal':
            dic['month'] = ''
        
        if 'archivePrefix' in dic or 'archiveprefix' in dic:
            dic["journal"] = "arXiv preprint arXiv {}".format(dic['eprint'])

        if dic['ENTRYTYPE'] == 'inproceedings':
            dic['publisher'] = ""

        dic["abstract"] = ""
        dic["keywords"] = ""
        dic["editor"] = ""
        dic["howpublished"] = ""
        dic["note"] = ""
        dic["abstract"] = ""
        dic["url"] = ""
        dic["pdf"] = ""
        dic["address"] = ""

writer = BibTexWriter()
with open('new_bibtex.bib', 'w') as bibfile:
    bibfile.write(writer.write(bib_db))


サポートしている短縮形
{'AAAI': ['association for the advancement of artificial intelligence'],
 'ACL': ['association for computational linguistics'],
 'CVPR': ['computer vision and pattern recognition'],
 'ECCV': ['european', 'computer vision'],
 'EMNLP': ['empirical methods in natural language processing'],
 'ICCV': ['international conference on computer vision'],
 'ICLR': ['learning representations'],
 'ICML': ['international conference on machine learning'],
 'IJCAI': ['international joint conferences on artificial intelligence '
           'organization'],
 'ITCA': ['information technology', 'computer application'],
 'NAACL': ['north american', 'association for computational linguistics'],
 'NeurIPS': ['neural', 'systems'],
 'WACV': ['winter conference on applications of computer vision']}
