# Metadata Validator
This jupyter notebook is to faciliate development of a gen3 metadata validation script

***

### General Idea
1. Load metadata into a python object 
    - class for loading and storing metadata
    - define input folder, reads in .json, _*.json and dataImportOrder.txt into an accessible object
1. Load schema into a python object 
    - class Loads bundled json and also splits yamls from bundled json into accessible splits
1. 

***

# Perplexity help
- [link to chat](https://www.perplexity.ai/search/lets-say-I-erdZUVAOQ_SgDnHh_3meWA)


To handle a scenario where your `bundled.json` file contains a `_definitions.yaml` file, and some of the YAML schemas in the `bundled.json` link to `_definitions.yaml` for common definitions, you need to ensure that your JSON Schema validator can resolve these references correctly. Here’s how you can achieve this using Python:

### Steps to Validate `metadata.json` Against a Schema with External Definitions

1. **Load and Parse YAML Files**: Load and parse the `_definitions.yaml` and other YAML schemas from the `bundled.json` file.
2. **Resolve References**: Ensure that references to definitions in `_definitions.yaml` are correctly resolved.
3. **Validate the JSON Data**: Use a JSON Schema validator to validate the `metadata.json` file against the resolved schema.

### Example Using Python

Here’s a step-by-step guide using Python, `jsonschema`, and `pyyaml` libraries:

1. **Install Required Libraries**:
   ```bash
   pip install jsonschema pyyaml
   ```

2. **Load and Parse YAML Files**:
   ```python
   import yaml
   import json
   from jsonschema import validate, RefResolver, ValidationError

   # Load the bundled JSON file containing multiple YAML schemas
   with open('bundled.json', 'r') as bundled_file:
       bundled_schemas = json.load(bundled_file)

   # Extract and parse the _definitions.yaml file
   definitions_yaml = bundled_schemas['_definitions.yaml']
   definitions_schema = yaml.safe_load(definitions_yaml)

   # Extract and parse the specific schema that references _definitions.yaml
   specific_schema_yaml = bundled_schemas['specific_schema_key']  # Replace with the actual key
   specific_schema = yaml.safe_load(specific_schema_yaml)
   ```

3. **Resolve References**:
   ```python
   # Create a resolver that includes the definitions
   class CustomRefResolver(RefResolver):
       def resolve_remote(self, uri):
           if uri == 'definitions.yaml':
               return definitions_schema
           return super().resolve_remote(uri)

   resolver = CustomRefResolver.from_schema(specific_schema)
   ```

4. **Validate the JSON Data**:
   ```python
   # Load the metadata JSON file
   with open('metadata.json', 'r') as json_file:
       metadata = json.load(json_file)

   # Validate the metadata against the specific schema with resolved references
   try:
       validate(instance=metadata, schema=specific_schema, resolver=resolver)
       print("Validation successful!")
   except ValidationError as e:
       print(f"Validation error: {e.message}")
   ```

### Detailed Explanation

1. **Load and Parse YAML Files**:
   - Load the `bundled.json` file, which contains multiple YAML schemas.
   - Extract and parse the `_definitions.yaml` file and the specific schema that references it using `yaml.safe_load`.

2. **Resolve References**:
   - Create a custom `RefResolver` class that overrides the `resolve_remote` method to return the parsed `_definitions.yaml` schema when the reference URI matches.
   - Instantiate the custom resolver with the specific schema.

3. **Validate the JSON Data**:
   - Load the `metadata.json` file.
   - Use the `validate` function from the `jsonschema` library to validate the `metadata.json` data against the specific schema, using the custom resolver to handle references.

### Additional Resources

- **GitHub - Schema Validator**: Provides a utility for validating YAML/JSON files against predefined schemas, including handling nested properties and command-line usage [1].
- **JSON Schema - Getting Started**: Offers a comprehensive guide on creating and using JSON Schema, including defining properties, nesting data structures, and validating JSON data [2].
- **MuleSoft Documentation**: Describes how to use the JSON Schema validator to evaluate JSON payloads at runtime, supporting both local and external schemas [4].
- **Python JSON Schema Documentation**: Explains how to use the `jsonschema` library in Python to validate JSON documents, including handling references and custom resolvers [10][11][14].

By following these steps, you can ensure that your `metadata.json` file is validated against the correct schema, even if it relies on external definitions in `_definitions.yaml`. This approach leverages Python's `jsonschema` library and custom reference resolution to handle complex schema validation scenarios.
