## What is RecursiveJsonSplitter?
RecursiveJsonSplitter is a smart text splitter designed specifically to split large JSON documents into smaller, meaningful chunks.

It tries to preserve the structure of the JSON (like keys, arrays, objects) while breaking it into smaller parts that fit within your model's limits.



In [2]:
import json 

import requests 

json_data=requests.get('https://api.smith.langchain.com/openapi.json').json()

#json_data

In [3]:
from langchain_text_splitters import RecursiveJsonSplitter

json_splitter=RecursiveJsonSplitter(max_chunk_size=300)

json_chunks=json_splitter.split_json(json_data)

print(f"Number of chunks: {len(json_chunks)}")


Number of chunks: 2716


In [5]:
json_chunks[0].items()

dict_items([('openapi', '3.1.0'), ('info', {'title': 'LangSmith', 'version': '0.1.0'}), ('paths', {'/api/v1/sessions/{session_id}': {'get': {'tags': ['tracer-sessions'], 'summary': 'Read Tracer Session', 'description': 'Get a specific session.'}}})])

In [6]:
type(json_chunks[0])

dict

In [7]:
json_chunks[0].keys()

dict_keys(['openapi', 'info', 'paths'])

In [8]:
json_chunks[0].values()

dict_values(['3.1.0', {'title': 'LangSmith', 'version': '0.1.0'}, {'/api/v1/sessions/{session_id}': {'get': {'tags': ['tracer-sessions'], 'summary': 'Read Tracer Session', 'description': 'Get a specific session.'}}}])

## converting to documnets

In [11]:
docs=json_splitter.create_documents([json_data])

docs[0].page_content

'{"openapi": "3.1.0", "info": {"title": "LangSmith", "version": "0.1.0"}, "paths": {"/api/v1/sessions/{session_id}": {"get": {"tags": ["tracer-sessions"], "summary": "Read Tracer Session", "description": "Get a specific session."}}}}'

In [13]:
type(docs[0])

langchain_core.documents.base.Document

In [14]:
len(docs)

2716

In [17]:
for doc in docs[2715:]:
    print(doc)

page_content='{"definitions": {"examples.ExamplesUpdatedResponse": {"type": "object", "properties": {"count": {"type": "integer", "example": 1}, "example_ids": {"type": "array", "items": {"type": "string"}, "example": ["[\"123e4567-e89b-12d3-a456-426614174000\"]"]}}}}}'


## Converting to a string

In [18]:
text=json_splitter.split_text(json_data)

print(f"Number of text chunks: {len(text)}")

Number of text chunks: 2716


In [19]:
type(text)

list

In [20]:
type(text[0])

str