# Explore JSON files

이 노트북은 주어진 JSON 파일의 내용을 확인하여, 데이터에 대한 이해도를 높이기 위함 입니다.

---

## [중요] 사전 실행 노트북
이 노트북은 아래 두개의 셋업 노트북이 먼저 실행이 되어야 합니다.
- (1) Setup 노트북
    - 경로는 aws-ai-ml-workshop-kr/genai/aws-gen-ai-kr/00_setup/setup.ipynb 와 같습니다.
    -  [Setup Notebook](https://github.com/aws-samples/aws-ai-ml-workshop-kr/blob/master/genai/aws-gen-ai-kr/00_setup/setup.ipynb)


The data is presented in the following way where “sections” is a list of the different web page data and “content” is the body of text for the public facing web page, in the future we will be adding a keyword list for the body of text. The rest of the key value pairs should be easy to follow:
########################################

![json_structure.png](img/json_structure.png)


###  참고:
- [LangChain JASON Loaser](https://python.langchain.com/docs/modules/data_connection/document_loaders/json)

In [2]:
%load_ext autoreload
%autoreload 2

import sys, os

def add_python_path(module_path):
    if os.path.abspath(module_path) not in sys.path:
        sys.path.append(os.path.abspath(module_path))
        print(f"python path: {os.path.abspath(module_path)} is added")
    else:
        print(f"python path: {os.path.abspath(module_path)} already exists")
    print("sys.path: ", sys.path)

module_path = ".."
add_python_path(module_path)


python path: /home/sagemaker-user/aws-ai-ml-workshop-kr/genai/aws-gen-ai-kr/20_applications/02_qa_chatbot is added
sys.path:  ['/home/sagemaker-user/aws-ai-ml-workshop-kr/genai/aws-gen-ai-kr/20_applications/02_qa_chatbot/01_preprocess_docs', '/opt/conda/lib/python310.zip', '/opt/conda/lib/python3.10', '/opt/conda/lib/python3.10/lib-dynload', '', '/opt/conda/lib/python3.10/site-packages', '/home/sagemaker-user/aws-ai-ml-workshop-kr/genai/aws-gen-ai-kr/20_applications/02_qa_chatbot']


# 1. Raw 파일 확인

In [3]:
import json
from pathlib import Path
from pprint import pprint

In [4]:
file_path='data/poc/customer_EFOTA.json'
# file_path='./example_data/facebook_chat.json'
raw_data = json.loads(Path(file_path).read_text())

In [5]:
pprint(raw_data)

{'sections': [{'content': 'How-to videos. Contains videos on how to use Knox '
                          'E-FOTA. This section contains videos on how to use '
                          'Knox E-FOTA. Getting started with Knox E-FOTA This '
                          'video walks you through the Knox E-FOTA console and '
                          'demonstrates how you can register a reseller, '
                          'approve a device, create a campaign, assign a '
                          'campaign, and monitor device status. Creating a '
                          'campaign on Knox E-FOTA The following video '
                          'provides in-depth information on how to create and '
                          'apply a Knox E-FOTA campaign to your Samsung '
                          'devices. Connecting Knox E-FOTA to VMware Workspace '
                          'ONE The following video describes the simple steps '
                          'of connecting Knox E-FOTA with VMware 

# 2. JSON Loader 사용

In [7]:
import glob
from local_utils.proc_docs import get_load_json, show_doc_json

In [8]:
# Specify the directory and file pattern for .txt files
folder_path = 'data/poc/*.json'

# List all .txt files in the specified folder
txt_files = glob.glob(folder_path)

# Print the list of .txt files
for file_path in txt_files:
    doc_json = get_load_json(
        file_path,
        jq_schema=".sections[]"
    )
    show_doc_json(doc_json, file_path)
    print("\n")


.sections[]
### File name:  customer_EFOTA.json
### of document:  260
### The first doc
page_content='How-to videos. Contains videos on how to use Knox E-FOTA. This section contains videos on how to use Knox E-FOTA. Getting started with Knox E-FOTA This video walks you through the Knox E-FOTA console and demonstrates how you can register a reseller, approve a device, create a campaign, assign a campaign, and monitor device status. Creating a campaign on Knox E-FOTA The following video provides in-depth information on how to create and apply a Knox E-FOTA campaign to your Samsung devices. Connecting Knox E-FOTA to VMware Workspace ONE The following video describes the simple steps of connecting Knox E-FOTA with VMware Workspace ONE, while adding device groups from Workspace ONE.' metadata={'source': 'customer_EFOTA.json', 'seq_num': 1, 'title': 'How-to videos', 'url': 'https://docs.samsungknox.com/admin/efota-one/how-to-videos', 'project': 'EFOTA', 'last_updated': '2023-09-27'}


.secti