**How to split JSON data**

This json splitter splits json data while allowing control over chunk sizes. It traverses json data depth first and builds smaller json chunks. 

It attempts to keep nested json objects whole but will split them if needed to keep chunks between the min_chunk_size and max_chunk_size.

If the valus is not a nested json, but rather a very large string, the string will not be splitted.
If you need a hard cap on the chunk size consider composing this with a Recursive Text Splitter on those chunks.

There is an optional pre-processing step to split lists, by first converting them to json(dict) and then splitting them as such.

* **How the text is split:** JSON value
* **How the chunk size is measured:** by number of characters

In [1]:
import json
import requests

json_data=requests.get('https://dummyjson.com/products').json()
json_data

{'products': [{'id': 1,
   'title': 'Essence Mascara Lash Princess',
   'description': 'The Essence Mascara Lash Princess is a popular mascara known for its volumizing and lengthening effects. Achieve dramatic lashes with this long-lasting and cruelty-free formula.',
   'category': 'beauty',
   'price': 9.99,
   'discountPercentage': 7.17,
   'rating': 4.94,
   'stock': 5,
   'tags': ['beauty', 'mascara'],
   'brand': 'Essence',
   'sku': 'RCH45Q1A',
   'weight': 2,
   'dimensions': {'width': 23.17, 'height': 14.43, 'depth': 28.01},
   'warrantyInformation': '1 month warranty',
   'shippingInformation': 'Ships in 1 month',
   'availabilityStatus': 'Low Stock',
   'reviews': [{'rating': 2,
     'comment': 'Very unhappy with my purchase!',
     'date': '2024-05-23T08:56:21.618Z',
     'reviewerName': 'John Doe',
     'reviewerEmail': 'john.doe@x.dummyjson.com'},
    {'rating': 2,
     'comment': 'Not as described!',
     'date': '2024-05-23T08:56:21.618Z',
     'reviewerName': 'Nolan Gon

In [2]:
from langchain_text_splitters import RecursiveJsonSplitter
json_splitter=RecursiveJsonSplitter(max_chunk_size=300)

json_chunks=json_splitter.split_json(json_data)


In [3]:
json_chunks

[{'products': [{'id': 1,
    'title': 'Essence Mascara Lash Princess',
    'description': 'The Essence Mascara Lash Princess is a popular mascara known for its volumizing and lengthening effects. Achieve dramatic lashes with this long-lasting and cruelty-free formula.',
    'category': 'beauty',
    'price': 9.99,
    'discountPercentage': 7.17,
    'rating': 4.94,
    'stock': 5,
    'tags': ['beauty', 'mascara'],
    'brand': 'Essence',
    'sku': 'RCH45Q1A',
    'weight': 2,
    'dimensions': {'width': 23.17, 'height': 14.43, 'depth': 28.01},
    'warrantyInformation': '1 month warranty',
    'shippingInformation': 'Ships in 1 month',
    'availabilityStatus': 'Low Stock',
    'reviews': [{'rating': 2,
      'comment': 'Very unhappy with my purchase!',
      'date': '2024-05-23T08:56:21.618Z',
      'reviewerName': 'John Doe',
      'reviewerEmail': 'john.doe@x.dummyjson.com'},
     {'rating': 2,
      'comment': 'Not as described!',
      'date': '2024-05-23T08:56:21.618Z',
      '

In [4]:
for chunk in json_chunks[:3]:
  print(chunk)

{'products': [{'id': 1, 'title': 'Essence Mascara Lash Princess', 'description': 'The Essence Mascara Lash Princess is a popular mascara known for its volumizing and lengthening effects. Achieve dramatic lashes with this long-lasting and cruelty-free formula.', 'category': 'beauty', 'price': 9.99, 'discountPercentage': 7.17, 'rating': 4.94, 'stock': 5, 'tags': ['beauty', 'mascara'], 'brand': 'Essence', 'sku': 'RCH45Q1A', 'weight': 2, 'dimensions': {'width': 23.17, 'height': 14.43, 'depth': 28.01}, 'warrantyInformation': '1 month warranty', 'shippingInformation': 'Ships in 1 month', 'availabilityStatus': 'Low Stock', 'reviews': [{'rating': 2, 'comment': 'Very unhappy with my purchase!', 'date': '2024-05-23T08:56:21.618Z', 'reviewerName': 'John Doe', 'reviewerEmail': 'john.doe@x.dummyjson.com'}, {'rating': 2, 'comment': 'Not as described!', 'date': '2024-05-23T08:56:21.618Z', 'reviewerName': 'Nolan Gonzalez', 'reviewerEmail': 'nolan.gonzalez@x.dummyjson.com'}, {'rating': 5, 'comment': 'V

The Splitter can also create documents

In [5]:
docs=json_splitter.create_documents(texts=[json_data])
for doc in docs[:3]:
  print(doc)

page_content='{"products": [{"id": 1, "title": "Essence Mascara Lash Princess", "description": "The Essence Mascara Lash Princess is a popular mascara known for its volumizing and lengthening effects. Achieve dramatic lashes with this long-lasting and cruelty-free formula.", "category": "beauty", "price": 9.99, "discountPercentage": 7.17, "rating": 4.94, "stock": 5, "tags": ["beauty", "mascara"], "brand": "Essence", "sku": "RCH45Q1A", "weight": 2, "dimensions": {"width": 23.17, "height": 14.43, "depth": 28.01}, "warrantyInformation": "1 month warranty", "shippingInformation": "Ships in 1 month", "availabilityStatus": "Low Stock", "reviews": [{"rating": 2, "comment": "Very unhappy with my purchase!", "date": "2024-05-23T08:56:21.618Z", "reviewerName": "John Doe", "reviewerEmail": "john.doe@x.dummyjson.com"}, {"rating": 2, "comment": "Not as described!", "date": "2024-05-23T08:56:21.618Z", "reviewerName": "Nolan Gonzalez", "reviewerEmail": "nolan.gonzalez@x.dummyjson.com"}, {"rating": 5,

In [6]:
texts=json_splitter.split_text(json_data)
print(texts[0])
print(texts[1])

{"products": [{"id": 1, "title": "Essence Mascara Lash Princess", "description": "The Essence Mascara Lash Princess is a popular mascara known for its volumizing and lengthening effects. Achieve dramatic lashes with this long-lasting and cruelty-free formula.", "category": "beauty", "price": 9.99, "discountPercentage": 7.17, "rating": 4.94, "stock": 5, "tags": ["beauty", "mascara"], "brand": "Essence", "sku": "RCH45Q1A", "weight": 2, "dimensions": {"width": 23.17, "height": 14.43, "depth": 28.01}, "warrantyInformation": "1 month warranty", "shippingInformation": "Ships in 1 month", "availabilityStatus": "Low Stock", "reviews": [{"rating": 2, "comment": "Very unhappy with my purchase!", "date": "2024-05-23T08:56:21.618Z", "reviewerName": "John Doe", "reviewerEmail": "john.doe@x.dummyjson.com"}, {"rating": 2, "comment": "Not as described!", "date": "2024-05-23T08:56:21.618Z", "reviewerName": "Nolan Gonzalez", "reviewerEmail": "nolan.gonzalez@x.dummyjson.com"}, {"rating": 5, "comment": "V