Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Learning nested JSON #1

Open
FullPint opened this issue Apr 13, 2019 · 3 comments
Open

Learning nested JSON #1

FullPint opened this issue Apr 13, 2019 · 3 comments

Comments

@FullPint
Copy link

Currently when running, all that is returned is the schema from "root", even though I have over 100,000 documents that have many nested attributes.

Currently in vectorizers there are the following:

        basevectorizer.py
	boolvectorizer.py
	numbervectorizer.py	
	stringvectorizer.py
	timestampvectorizer.py

Is there something I'm not quite understanding when it comes to "learning" deeper JSON than beyond 'root'?

@arsarabi
Copy link
Owner

arsarabi commented Apr 17, 2019

The code should automatically learn the schema of nested documents. There was a bug in the sample code that I just fixed, that might have caused the issue. Use vectorizer.extend(docs) for learning the schema, where docs is a list of JSON documents, or use vectorizer.extend([doc]) when learning the schema incrementally.

@jvmk
Copy link

jvmk commented Apr 26, 2022

Hello arsarabi,

Thank you for making your code available.

I've also had no luck learning nested attributes. Do I need to define a vectorizer of type "object" to be able to learn nested JSON objects?

Suppose I have a set of documents that match the following schema:

{
  "nestedobject": {
    "stringattr1": "some string",
    "numberattr1": 42,
    "stringattr2": "another string"
  },
  "stringattr3": "a third string",
  "booleanattr1": true
}

...do I need to define additional vectorizers beyond those you provide in the sample code?

If I (only) use the vectorizers provided in the sample code, the only learned features are:

0: root has "booleanattr1"
1: root has "stringattr3"
2: root has "nestedobject"

Thank you in advance for answering this (very basic) usage question :).

@arsarabi
Copy link
Owner

Hello,

It has been a while since I worked on this but I believe it should work with nested JSON out of the box following the usage steps. Could you provide sample code that recreates the issue? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants