You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To enhance our understanding of the dataset's structure, I propose making the platform_sdk.dataset_reader accessible. This will enable us to unpack the entire dataset and view it comprehensively, including the nested fields. Currently, the AEPP supports data loading through the queryservice module by specifying a SQL query, which loads the data into a pandas dataframe. However, each column in the dataframe only represents the first hierarchy of the nested object in the schema, unless we manually unpack a certain object in the query. For example: "select web.* from table_abc" will give us the fields nested in the second layer under "web" object.
By utilizing the platform_sdk.dataset_reader, we can effortlessly load the data with its nested fields unpacked, resulting in a more extensive perspective of the dataset. This approach enables us to grasp a clearer understanding of the data's structure by having access to all the fields it contains. Furthermore, it enhances the efficiency of querying and data processing, data manipulation since we no longer need to manually unpack individual object and the value won't be nested for each field.
Example of using SDK dataset reader, automatically unpack all the nested fields under "web" object.
The text was updated successfully, but these errors were encountered:
Thanks for bringing the idea @yoyo6022.
We will consider it for the future development.
FYI: The SDK dataset reader and this library are 2 different projects working in different environment and connecting to different sources.
I do not mean it is not doable, but it is not as easy as it may sound.
Hello @yoyo6022
I am coming back to that.
Have you checked the latest version of aepp, and especially the SchemaManager part ?
It is not as efficient that the SDK reader because it will not provide the values in the fields, but there is a way to flatten schema structure and work with the field path to use query service more efficiently.
We will need to work on more documentation in the future but if you are familiar with python and notebooks, you may be able to learn by playing with it as all of the docstring are provided.
To enhance our understanding of the dataset's structure, I propose making the platform_sdk.dataset_reader accessible. This will enable us to unpack the entire dataset and view it comprehensively, including the nested fields. Currently, the AEPP supports data loading through the queryservice module by specifying a SQL query, which loads the data into a pandas dataframe. However, each column in the dataframe only represents the first hierarchy of the nested object in the schema, unless we manually unpack a certain object in the query. For example: "select web.* from table_abc" will give us the fields nested in the second layer under "web" object.
By utilizing the platform_sdk.dataset_reader, we can effortlessly load the data with its nested fields unpacked, resulting in a more extensive perspective of the dataset. This approach enables us to grasp a clearer understanding of the data's structure by having access to all the fields it contains. Furthermore, it enhances the efficiency of querying and data processing, data manipulation since we no longer need to manually unpack individual object and the value won't be nested for each field.
Example of using SDK dataset reader, automatically unpack all the nested fields under "web" object.
The text was updated successfully, but these errors were encountered: