# **3.21_read_json**

### 一、读取JSON

用Python的Pandas库，可以丝滑读取JSON

1. 导入Pandas

   `import pandas as pd`

2. read_json函数

   顾名思义是用来读JSON文件的

   把文件的路径作为参数

   函数会完成文件读取、JSON解析、转成DataFrame的全流程，会直接把JSON文件转成DataFrame。不用Pandas，每一个步骤都可能需要一段单独的代码

### 二、JSON与DataFrame的对应

#### (一)、cell_phones_survey.json

```JSON
[
  {
    "questionType": "yes/no",
    "asin": "1466736038",
    "answerTime": "Mar 8, 2014",
    "unixTime": 1394265600,
    "question": "Is there a SIM card in it?",
    "answerType": "Y",
    "answer": "Yes. The Galaxy SIII accommodates a micro SIM card."
  },
  {
    "questionType": "yes/no",
    "asin": "1466736038",
    "answerTime": "Jan 29, 2015",
    "unixTime": 1422518400,
    "question": "Is this phone new, with 1 year manufacture warranty?",
    "answerType": "?",
    "answer": "It is new but I was not able to get it activated with AT&T."
  },
  {
    "questionType": "yes/no",
    "asin": "1466736038",
    "answerTime": "Nov 30, 2014",
    "unixTime": 1417334400,
    "question": "can in it be used abroad with a different carrier?",
    "answerType": "Y",
    "answer": "Yes"
  },
  {
    "questionType": "open-ended",
    "asin": "1466736038",
    "answerTime": "Nov 3, 2014",
    "unixTime": 1415001600,
    "question": "What is the warranty on this?",
    "answer": "No warranty"
  },
  {
    "questionType": "yes/no",
    "asin": "1466736038",
    "answerTime": "Oct 2, 2014",
    "unixTime": 1412233200,
    "question": "Does this phone use the regular Sim card (the bigger Sim card)?",
    "answerType": "?",
    "answer": "it takes mini sim"
  },
  {
    "questionType": "open-ended",
    "asin": "1466736038",
    "answerTime": "Sep 11, 2014",
    "unixTime": 1410418800,
    "question": "how much time you need to send me this product to miami?",
    "answer": "If you choose expedited shipping you will have the phone in 2-3 days"
  },
  {
    "questionType": "yes/no",
    "asin": "1621911888",
    "answerTime": "Dec 13, 2013",
    "unixTime": 1386921600,
    "question": "Is it unlocked?",
    "answerType": "Y",
    "answer": "yes"
  }
]
```

In [2]:
import pandas as pd
survey_df = pd.read_json("./3.21_data_cell_phones_survey.json")
survey_df

Unnamed: 0,questionType,asin,answerTime,unixTime,question,answerType,answer
0,yes/no,1466736038,"Mar 8, 2014",1394265600,Is there a SIM card in it?,Y,Yes. The Galaxy SIII accommodates a micro SIM ...
1,yes/no,1466736038,"Jan 29, 2015",1422518400,"Is this phone new, with 1 year manufacture war...",?,It is new but I was not able to get it activat...
2,yes/no,1466736038,"Nov 30, 2014",1417334400,can in it be used abroad with a different carr...,Y,Yes
3,open-ended,1466736038,"Nov 3, 2014",1415001600,What is the warranty on this?,,No warranty
4,yes/no,1466736038,"Oct 2, 2014",1412233200,Does this phone use the regular Sim card (the ...,?,it takes mini sim
5,open-ended,1466736038,"Sep 11, 2014",1410418800,how much time you need to send me this product...,,If you choose expedited shipping you will have...
6,yes/no,1621911888,"Dec 13, 2013",1386921600,Is it unlocked?,Y,yes


1. 原始文件是一个JSON数组，数组里的元素都是JSON对象，每一个JSON对象与DataFrame中的一行相对应。
   
   JSON里数据实例一般会用对象表示，而DataFrame里一般每一行代表一个实例，正好相互对应。
   
2. JSON对象的键值对，与DataFrame的列名和下面的数据相对应
   
   DataFrame的列名表示数据实例的各个属性，而JSON对象里的键值对也是这个作用，正好相互对应
   
3. JSON中有些对象没有answerType这个键，与DataFrame里answerType列下部分值是NaN相对应
   
   由于JSON并不要求所有对象的键都相同，Pandas也很好地在DataFrame里对应了这种情况，用NaN来表示数据空缺

#### (二)、github.json

```JSON
  {
  "owner": "pelmers",
  "name": "text-rewriter",
  "stars": 11,
  "forks": 4,
  "watchers": 3,
  "isFork": false,
  "languages": [
    {
      "name": "JavaScript",
      "size": 21769
    },
    {
      "name": "HTML",
      "size": 2096
    },
    {
      "name": "CSS",
      "size": 2081
    }
  ],
  "description": "Webextension to rewrite phrases in pages",
  "createdAt": "2015-03-14T22:35:11Z",
  "pushedAt": "2022-02-11T14:26:00Z",
  "license": null
}
```

In [3]:
github_df = pd.read_json("./3.21_data_github.json")
github_df

Unnamed: 0,owner,name,stars,forks,watchers,isFork,languages,description,createdAt,pushedAt,license
0,pelmers,text-rewriter,11,4,3,False,"{'name': 'JavaScript', 'size': 21769}",Webextension to rewrite phrases in pages,2015-03-14T22:35:11Z,2022-02-11T14:26:00Z,
1,pelmers,text-rewriter,11,4,3,False,"{'name': 'HTML', 'size': 2096}",Webextension to rewrite phrases in pages,2015-03-14T22:35:11Z,2022-02-11T14:26:00Z,
2,pelmers,text-rewriter,11,4,3,False,"{'name': 'CSS', 'size': 2081}",Webextension to rewrite phrases in pages,2015-03-14T22:35:11Z,2022-02-11T14:26:00Z,


1. JSON文件里JSON对象中，languages的值是一个长度为3的数组，与DataFrame的三行相对应

   当键值对里的值是数组时，Pandas会把数组元素视为属于不同数据实例，拆分成单独的一行，因此languages属性下面就应该有3行。但因为表格里每列下面的行数都得是一样的，所以DataFrame其它属性值就被复制成了三个，从而得到一个结构规整的表格。