Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 111 additions & 0 deletions api-reference/how-to/generate-schema.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
---
title: Generate a JSON schema for a file
---

## Task

You want to generate a schema for a JSON file that Unstructured produces, so that you can validate, test,
and document related JSON files across your systems.

## Approach

Use a Python package such as [genson](https://pypi.org/project/genson/) to generate schemas for your
JSON files.

<Info>The `genson` package is not owned or supported by Unstructured. For questions and
requests, see the [Issues](https://github.com/wolverdude/genson/issues) tab of the
`genson` repository in GitHub.</Info>

## Generate a schema from the terminal

<Steps>
<Step title="Install genson">
Use [pip](https://pip.pypa.io/en/stable/installation/) to install the [genson](https://pypi.org/project/genson/) package.

```bash
pip install genson
```
</Step>
<Step title="Install jq">
By default, `genson` generates the JSON schema as a single string without any line breaks or indented whitespace.

To pretty-print the schema that `genson` produces, install the [jq](https://jqlang.github.io/jq/) utility.

<Info>The `jq` utility is not owned or supported by Unstructured. For questions and
requests, see the [Issues](https://github.com/jqlang/jq/issues) tab of the
`jq` repository in GitHub.</Info>
</Step>
<Step title="Generate the schema">
1. Run the `genson` command, specifying the path to the input (source) JSON file, and the path to
the output (target) JSON schema file to be generated. Use `jq` to pretty-print the schema's content
into the file to be generated.

```bash
genson "/path/to/input/file.json" | jq '.' > "/path/to/output/schema.json"
```

2. You can find the generated JSON schema file in the output path that you specified.
</Step>
</Steps>

## Generate a schema from Python code

<Steps>
<Step title="Install dependencies">
In your Python project, install the [genson](https://pypi.org/project/genson/) package.

```bash
pip install genson
```
</Step>
<Step title="Add and run the schema generation code">
1. Set the following local environment variables:

- Set `LOCAL_FILE_INPUT_PATH` to the local path to the input (source) JSON file.
- Set `LOCAL_FILE_OUTPUT_PATH` to the local path to the output (target) JSON schema file to be generated.

2. Add the following Python code file to your project:

```python
import os, json
from genson import SchemaBuilder

def json_schema_from_file(
input_file_path: str,
output_schema_path: str
) -> None:
try:
with open(input_file_path, "r") as file:
json_data = json.load(file)

builder = SchemaBuilder()
builder.add_object(json_data)

schema = builder.to_schema()

try:
with open(output_schema_path, "w") as schema_file:
json.dump(schema, schema_file, indent=2)
except IOError as e:
raise IOError(f"Error writing to output file: {e}")

print(f"JSON schema successfully generated and saved to '{output_schema_path}'.")
except FileNotFoundError:
print(f"Error: Input file '{input_file_path}' not found.")
except IOError as e:
print(f"I/O error occurred: {e}")
except Exception as e:
print(f"An unexpected error occurred: {e}")

if __name__ == "__main__":
json_schema_from_file(
input_file_path=os.getenv("LOCAL_FILE_INPUT_PATH"),
output_schema_path=os.getenv("LOCAL_FILE_OUTPUT_PATH")
)
```

3. Run the Python code file.
4. Check the path specified by `LOCAL_FILE_OUTPUT_PATH` for the generated JSON schema file.
</Step>
</Steps>

3 changes: 2 additions & 1 deletion mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -379,7 +379,8 @@
"api-reference/how-to/powerpoint",
"api-reference/how-to/use-langchain-ollama",
"api-reference/how-to/use-langchain-llama-3",
"api-reference/how-to/transform-schemas"
"api-reference/how-to/transform-schemas",
"api-reference/how-to/generate-schema"
]
},
{
Expand Down