# 02-01 : Data Selection

## Data Collection

- **Training Data:** Romans and Galatians (ESV/NLT), chosen for undisputed authenticity and stylistic consistency.
- **Test Data:** 1-2 Corinthians, 1-2 Timothy (ESV/NLT).

In [1]:
from typing import List, Dict, Any
import json
from IPython.display import display, Markdown

## 1. Configuration

In [2]:
data_path = '../../data'
input_path = f'{data_path}/input/jadenzaleski/BibleTranslations'

esv_file = f'{input_path}/ESV/ESV_bible.json'
nlt_file = f'{input_path}/NLT/NLT_bible.json'

output_path = f'{data_path}/interim'
output_path_books = f'{output_path}/books'
output_path_train = f'{output_path}/train'
output_path_test = f'{output_path}/test'

In [3]:
# the paths to the translations to use
bible_paths = {
    'esv': esv_file,
    'nlt': nlt_file
}

In [4]:
# the selected books for training and test data
selected_books = {
    'train': ['Romans', 'Galatians'],
    'test': ['1 Corinthians', '2 Corinthians', '1 Timothy', '2 Timothy']
}

## 2. Load the Bible Data

In [5]:
def load_bible(bible_name:str, bible_paths:List[str]=bible_paths) -> Dict[str, Any]:
    '''
    Load a bible from a json file.
    
    Parameters:
    -----------
    bible_name: str
        The name of the bible to load.
        
    bible_paths: List[str]
        A list of paths to the bibles.
        
    Returns:
    --------
    Dict[str, Any]
        The bible as a dictionary.
    '''
    with open(bible_paths[bible_name]) as f:
        return json.load(f)

def load_bibles(bible_paths:List[str]=bible_paths) -> Dict[str, Dict[str, Any]]:
    '''
    Load all bibles from the given paths.
    
    Parameters
    ----------
    bible_paths : List[str]
        List of paths to the bibles.
        
    Returns
    -------
    Dict[str, Dict[str, Any]]
        Dictionary of bibles, where the keys are the names of the bibles.
    '''
    return {bible_name: load_bible(bible_name) for bible_name in bible_paths.keys()}


In [6]:
# load each translation of the bible
bibles = load_bibles()

## 3. Verses to Text

In [7]:
def verses_to_text(bible: Dict[str, Any],
                   book: str,
                   chapter: int,
                   from_verse: int = 1,
                   to_verse: int = None) -> str:
    """
    Get the text of the verses from the bible.
    """
    result = ''

    # get the to verse
    if to_verse is None:
        to_verse = len(bible[book][str(chapter)])

    # concatenate the verses
    for verse in range(from_verse, to_verse + 1):
        result += f'{bible[book][str(chapter)][str(verse)]} '.replace('\n', '  ')

    return result


# test the function
Markdown(verses_to_text(
    bible=bibles['esv'],
    book='Romans',
    chapter=1,
    from_verse=1,
    to_verse=4))

Paul, a servant of Christ Jesus, called to be an apostle, set apart for the gospel of God, which he promised beforehand through his prophets in the holy Scriptures, concerning his Son, who was descended from David according to the flesh and was declared to be the Son of God in power according to the Spirit of holiness by his resurrection from the dead, Jesus Christ our Lord, 