# Notebook: Set Up Local-LLM for BERT Training and Inference

### Objective:

1. Locate the following files downloaded from Google Research's GitHub Repository:
    - A foloder containing three BERT TensorFlow checkpoint files (e.g., `bert_model.ckpt.data-00000-of-00001`, `bert_model.ckpt.index`, `bert_model.ckpt.meta`).
    - Metadata about model architecture (e.g., `bert_config.json`).
    - A .txt file containing a BERT vocabularly (e.g., `vocab.txt`).
2. Convert TensorFlow checkpoints into format for PyTorch.
3. Store a .bin file which contains the BERT model parameters in PyTorch compatable format (e.g., `pytorch_model.bin`).

In [2]:
import torch
import local_llm as lllm

In [None]:
# OPTION 1 SETUP
assets_dir = lllm.setup_bert(
    checkpoints=r"C:/Users/Cameron.Webster/Python/local-llm/assets/uncased_L-12_H-768_A-12",
    vocab=r"C:/Users/Cameron.Webster/Python/local-llm/assets/uncased_L-12_H-768_A-12/vocab.txt",
    config=r"C:/Users/Cameron.Webster/Python/local-llm/assets/uncased_L-12_H-768_A-12/bert_config.json",
    # optional; by default this would become ..\assets\bert-base-local
    output_dir=r"C:/Users/Cameron.Webster/Python/local-llm/assets/bert-base-local",
    overwrite=True
)

# Now assets_dir should contain:
#   pytorch_model.bin
#   config.json
#   vocab.txt


# # OPTION 2 SETUP
# assets_dir = lllm.setup_bert(
#     model_params=r"C:/Users/Cameron.Webster/Python/local-llm/assets/bert-base-local/pytorch_model.bin",
#     vocab=r"C:/Users/Cameron.Webster/Python/local-llm/assets/bert-base-local/vocab.txt",
#     config=r"C:/Users/Cameron.Webster/Python/local-llm/assets/bert-base-local/config.json",
#     # output_dir optional; if omitted, uses the folder containing model_params
# )


[setup] Using TF checkpoint prefix: C:\Users\Cameron.Webster\Python\local-llm\assets\uncased_L-12_H-768_A-12\bert_model.ckpt
[convert] Loaded tensors: 199 | Skipped: 7
[convert] Wrote: C:\Users\Cameron.Webster\Python\local-llm\assets\bert-base-local\pytorch_model.bin
[setup] Copied config → C:\Users\Cameron.Webster\Python\local-llm\assets\bert-base-local\config.json
[setup] Copied vocab → C:\Users\Cameron.Webster\Python\local-llm\assets\bert-base-local\vocab.txt


In [None]:
assets_dir = lllm.setup_bert(
    checkpoints=r"C:/Users/Cameron.Webster/Python/local-llm/assets/wwm_uncased_L-24_H-1024_A-16",
    vocab=r"C:/Users/Cameron.Webster/Python/local-llm/assets/wwm_uncased_L-24_H-1024_A-16/vocab.txt",
    config=r"C:/Users/Cameron.Webster/Python/local-llm/assets/wwm_uncased_L-24_H-1024_A-16/bert_config.json",
    # optional; by default this would become ..\assets\bert-base-local
    output_dir=r"C:/Users/Cameron.Webster/Python/local-llm/assets/bert-large-local",
    overwrite=True
)

In [None]:
import requests

def check_internet_requests(url="http://www.google.com", timeout=5):
    try:
        requests.get(url, timeout=timeout)
        return True
    except (requests.ConnectionError, requests.Timeout):
        return False

if check_internet_requests():
    print("Internet connection available using requests.")
else:
    print("No internet connection using requests.")