# JSON Example

This example demonstrates creating a very simple and small dataset in JSON format. Refer to the [Simple Dataset](./simple_dataset.ipynb) example for more details on what is being demonstrated here as the contents are broadly the same.

First thing to do is import `randomdataset` from the parent of this directory:

In [1]:
import os
import sys

sys.path.append(os.path.abspath(".."))

import randomdataset

The YAML schema is written out which will be used to generate the random data:

In [2]:
%%writefile paymentschema_json.yaml

- typename: randomdataset.generators.JSONGenerator
  num_lines: 10
  dataset:
    name: customers
    typename: randomdataset.Dataset
    fields:
    - name: id
      typename: randomdataset.UIDFieldGen
    - name: FirstName
      typename: randomdataset.StrFieldGen
      lmin: 6
      lmax: 14
    - name: LastName
      typename: randomdataset.StrFieldGen
      lmin: 6
      lmax: 14
        
- typename: randomdataset.generators.JSONGenerator
  num_lines: 20
  dataset:
    name: payments
    typename: randomdataset.Dataset
    fields:
    - name: date
      typename: randomdataset.DateTimeFieldGen
    - name: customer_id
      typename: randomdataset.IntFieldGen
      vmin: 0
      vmax: 10
    - name: amount
      typename: randomdataset.FloatFieldGen
      vmin: 0
      vmax: 100

Overwriting paymentschema_json.yaml


The generation is done by passing this schema to the `generate_dataset` command line utility in the library:

```bash
$ generate_dataset paymentschema_json.yaml .
```

Instead of invoking this utility the command can be called directly through the imported library:

In [3]:
randomdataset.application.generate_dataset.callback("paymentschema_json.yaml", ".")

Schema: 'paymentschema_json.yaml'
Output: '.'
Generating dataset 'customers'
Generating dataset 'payments'


The output is two JSON files, we can look at `customers_json.csv` to see the list of randomly generated customer:

In [9]:
print(open("customers.json").read())

{
    "header": ["id", "FirstName", "LastName"],
    "data": [
        [0, "cFF3vm5XGeh", "NpweyS"],
        [1, "X4xOvr", "Dx5yAslv"],
        [2, "OHjeDIqm0J", "xDNQMubp3fbpD"],
        [3, "q6JJwF", "jwKSf6kGvY3S"],
        [4, "wrwGkTtIdjg8V", "JtlEx5YqoSMs"],
        [5, "xMyhP335D", "4RuKdiD5"],
        [6, "pgusVtOScPu", "lEtjY1lg"],
        [7, "GwtQnDFBSwG", "02vaMHEe5D2"],
        [8, "vTVBAJKbi", "47HlCy"],
        [9, "XA56V36msh", "coDvP6hmNQSX"]
    ]
}



This data can be loaded with the `json` library to get a dictionary containing the `header` and `data` components:

In [7]:
import json

with open("customers.json") as o:
    data = json.load(o)

print(data)

{'header': ['id', 'FirstName', 'LastName'], 'data': [[0, 'cFF3vm5XGeh', 'NpweyS'], [1, 'X4xOvr', 'Dx5yAslv'], [2, 'OHjeDIqm0J', 'xDNQMubp3fbpD'], [3, 'q6JJwF', 'jwKSf6kGvY3S'], [4, 'wrwGkTtIdjg8V', 'JtlEx5YqoSMs'], [5, 'xMyhP335D', '4RuKdiD5'], [6, 'pgusVtOScPu', 'lEtjY1lg'], [7, 'GwtQnDFBSwG', '02vaMHEe5D2'], [8, 'vTVBAJKbi', '47HlCy'], [9, 'XA56V36msh', 'coDvP6hmNQSX']]}


Using a Jupyter widget a tree view can be created:

In [8]:
from IPython.display import JSON
JSON(data)

<IPython.core.display.JSON object>

Constructing a Pandas dataframe is straight forward from here:

In [7]:
import pandas as pd

pd.DataFrame(data["data"], columns=data["header"])

Unnamed: 0,id,FirstName,LastName
0,0,cFF3vm5XGeh,NpweyS
1,1,X4xOvr,Dx5yAslv
2,2,OHjeDIqm0J,xDNQMubp3fbpD
3,3,q6JJwF,jwKSf6kGvY3S
4,4,wrwGkTtIdjg8V,JtlEx5YqoSMs
5,5,xMyhP335D,4RuKdiD5
6,6,pgusVtOScPu,lEtjY1lg
7,7,GwtQnDFBSwG,02vaMHEe5D2
8,8,vTVBAJKbi,47HlCy
9,9,XA56V36msh,coDvP6hmNQSX
