# Setup Verification Notebook

This notebook verifies that Jupyter is working correctly with our environment.

In [None]:
import sys
import pandas as pd
import numpy as np
import yaml

print(f"Python version: {sys.version}")
print(f"Pandas version: {pd.__version__}")
print(f"Numpy version: {np.__version__}")

# Test notebook display
pd.DataFrame({'test': [1, 2, 3], 'values': [

In [None]:
# IsisCB JSON-LD Conversion Project Setup Documentation

## Overview
This document describes the development environment setup and project structure for the IsisCB JSON-LD conversion project. The project aims to convert IsisCB's bibliographic and authority data into JSON-LD format while maintaining the specialized focus on the history of science, technology, and medicine.

## Prerequisites
- MacOS operating system
- Anaconda installed
- Visual Studio Code
- Git
- GitHub account

## Development Environment Setup

### 1. Python Environment
The project uses a Conda virtual environment with Python 3.10:
```bash
# Create conda environment
conda create -n isiscb-env python=3.10

# Activate environment
conda activate isiscb-env

# Install required packages
conda install pandas numpy pyyaml
conda install -c conda-forge rdflib rdflib-jsonld
conda install pytest jupyter
pip install json-schema-validator
```

### 2. Project Structure
```
isiscb-jsonld-conversion/
├── config.yml                  # Project configuration
├── requirements.txt           # Python dependencies
├── src/                      # Source code
│   ├── converters/          # Conversion logic
│   ├── schemas/             # JSON-LD schemas
│   ├── utils/               # Utility functions
│   └── validators/          # Data validation
├── tests/                   # Test files
│   ├── test_converters/
│   └── test_validators/
├── docs/                    # Documentation
│   └── schemas/            # Schema documentation
├── data/                    # Data files
│   ├── raw/                # Original CSV files
│   └── processed/          # Converted JSON-LD files
└── notebooks/              # Jupyter notebooks
    └── exploration/       # Data exploration notebooks
```

### 3. Configuration Files

#### config.yml
The configuration file manages different environments and paths:
```yaml
environment:
  current: development

paths:
  development:
    raw: data/raw/samples/
    processed: data/processed/
    schemas: src/schemas/
  
  production:
    raw: ${ISISCB_RAW_DATA_PATH}
    processed: ${ISISCB_PROCESSED_DATA_PATH}
    archive: ${ISISCB_ARCHIVE_PATH}
```

#### VS Code Workspace
The workspace configuration (`isiscb-jsonld.code-workspace`) includes Python environment settings and file exclusions:
```json
{
    "folders": [
        {
            "path": "."
        }
    ],
    "settings": {
        "python.defaultInterpreterPath": "~/anaconda3/envs/isiscb-env/bin/python",
        "python.analysis.typeCheckingMode": "basic",
        "editor.formatOnSave": true
    }
}
```

## Data Management

### Sample Data
Development uses a subset of data stored in the repository:
- `data/raw/samples/` contains representative CSV files
- Sample data includes different record types and edge cases

### Production Data
Production data is stored externally and configured through environment variables:
```bash
export ISISCB_RAW_DATA_PATH="/path/to/your/csv/files"
export ISISCB_PROCESSED_DATA_PATH="/path/to/store/jsonld/output"
export ISISCB_ARCHIVE_PATH="/path/to/archive/original/files"
```



## Best Practices

1. **Version Control**
   - Commit changes regularly
   - Don't commit large data files
   - Use meaningful commit messages

2. **Data Handling**
   - Keep sample data in repository
   - Store large datasets externally
   - Use batch processing for large files

3. **Documentation**
   - Document code with docstrings
   - Maintain up-to-date setup instructions
   - Document data schemas and formats


## Next Steps

1. **Data Exploration**
   - Use Jupyter notebooks for data analysis
   - Document data patterns and edge cases
   - Plan conversion strategy

2. **Schema Development**
   - Define JSON-LD contexts
   - Document schema decisions
   - Create validation rules

3. **Conversion Implementation**
   - Develop conversion logic
   - Implement batch processing
   - Add error handling and logging