# Part 0: Bulk Loading BibTeX data

Loading data from a .bib file into an sqlite database.

Based on the [documentation of the BibTexParser Python package](https://bibtexparser.readthedocs.io/en/master/tutorial.html#step-2-parse-it).

In [1]:
__author__ = "Christine Mendoza"

Before going further, make sure to run the following command in the terminal for package installation and restart if necessary:

```pip install bibtexparser```

Also make sure to edit your file paths as necessary below.

In [2]:
RAW_BIB_DATA: str = "./data/paper-selection-analysis/part-0.bib"
OUTPUT_DB: str = "./data/sqlite/part-0.db"

## Step 0: Parse BibTeX data

Depending on the size of your BibTeX file, this can take a while. (Ex. Approximately 15 seconds for a 7700 line-long BibTeX file from 300 articles.)

In [4]:
import bibtexparser

with open(RAW_BIB_DATA) as bibtex_file:
    bib_to_dict = bibtexparser.load(bibtex_file)

# Uncomment the following to see entries
# print(bib_to_dict.entries)

## Step 1: Determine columns for part 0 sqlite database

Not all BibTeX entries have the same columns. If you do not know the columns you would like in the database in advance, you will have to determine this from the columns present in the BibTeX data.

In [7]:
raw_columns: set = set([])

for entry in bib_to_dict.entries:
    current_keys: list[str] = entry.keys()

    for key in current_keys:
        raw_columns.add(key)

# print result
print(raw_columns)

{'correspondence_address1', 'number', 'doi', 'isbn', 'year', 'abbrev_source_title', 'editor', 'art_number', 'ID', 'pubmed_id', 'volume', 'coden', 'publisher', 'author', 'page_count', 'issn', 'source', 'language', 'author_keywords', 'funding_details', 'keywords', 'url', 'note', 'affiliation', 'pages', 'ENTRYTYPE', 'abstract', 'sponsors', 'references', 'document_type', 'title', 'journal'}


Decide on your columns, then edit `final_columns` below (currently set to the value of `raw_columns`) as necessary.

In [9]:
final_columns: set = raw_columns

# print result

## Step 2: Create sqlite database