In [1]:
# 表格左对齐
%%html 
<style> 
table {float:left} 
</style>

## Json
结尾：``.json`` 

支持的数据类型：``number``， ``boolean``， ``string``， ``null``， ``array``， ``object``

规定：字符集必须是``UTF-8``，字符串必须是双引号``“”``

### 2. 常用的json数据转换网站
1、json.cn：https://www.json.cn/

2、json菜鸟工具：https://c.runoob.com/front-end/53

3、sojson：https://www.sojson.com/，非常全的json处理网站

4、kjson：https://www.kjson.com/

5、编程狮-json检验工具：https://www.w3cschool.cn/tools/index?name=jsoncheck

6、JSONViewer：http://jsonviewer.stack.hu/，用于检测Json格式是否正确的一个在线应用工具

### 3. Json和Dict类型之间的转换

|方法|作用|
|---|---|
|json.dumps()|将python对象编码成Json字符串：字典到json|
|json.loads()|将Json字符串解码成python对象：json到字典|
|json.dump()|将python中的对象转化成json储存到文件中|
|json.load()|将文件中的json的格式转化成python对象提取出来|

#### 3.1 json.jumps()
和dump相关的两个函数是将Python数据类型转成json类型

|Python | Json|
|---|---|
|dict|object|
|list, tuple|array|
|str, unicode|string|
|int, long, float|number|
|True|true|
|False|false|
|None|null|

参数
``` python
json.dumps(obj, 
           skipkeys=False, 
           ensure_ascii=True, 
           check_circular=True, 
           allow_nan=True, 
           cls=None, 
           indent=None,  
           separators=None,   
           encoding="utf-8",  
           default=None, 
           sort_keys=False,
           **kw)
```
``obj``： 待转化的对象

``skipkeys`` ： 默认值是False，若dict的keys内的数据不是python的基本类型(str,unicode,int,long,float,bool,None)，设置为False时，就会报TypeError的错误。此时设置成True，则会跳过这类key 

``ensure_ascii`` ： 默认是ASCII码，若设置成False，则可以输出中文

``check_circular``： 若为False，跳过对容器类型的循环引用检查

``allow_nan``： 若allow_nan为假，则ValueError将序列化超出范围的浮点值(nan、inf、-inf)，严格遵守JSON规范，而不是使用JavaScript等价值(nan、Infinity、-Infinity)

``indent``：参数根据格式缩进显示，表示缩进几个空格

``separators``：指定分隔符；包含不同dict项之间的分隔符和key与value之间的分隔符；同时去掉: 

``encoding``：编码

``default``：默认是一个函数，应该返回可序列化的obj版本或者引发类型错误；默认值是只引发类型错误

``sort_keys``：若为False，则字典的键不排序；设置成True，按照字典排序（a到z） 


举例说明

In [16]:
import json
information1 = {
    'name' : '小明',
    'age' : 18,
    'address' : 'shenzhen',
}
information2 = json.dumps(information1)
print(type(information1))
print(type(information2))
print(information2)

<class 'dict'>
<class 'str'>
{"name": "\u5c0f\u660e", "age": 18, "address": "shenzhen"}


注意点：
1. 此时显示的不是中文，为了显示中文，需要加入``ensure_ascii=False``参数
2. json数据中全部变成了双引号，原来的字典类型数据中使用的是单引号

In [9]:
information3 = json.dumps(information1, ensure_ascii=False)
print(type(information1))
print(type(information3))
print(information3)

<class 'dict'>
<class 'str'>
{"name": "小明", "age": 18, "address": "shenzhen"}


In [10]:
# 让json美观输出
information4 = json.dumps(information1, ensure_ascii=False, indent = 2) # 缩进两个空格
information5 = json.dumps(information1, ensure_ascii=False, indent = 5) # 缩进五个空格
print(information3)
print(information4)
print(information5)

{"name": "小明", "age": 18, "address": "shenzhen"}
{
  "name": "小明",
  "age": 18,
  "address": "shenzhen"
}
{
     "name": "小明",
     "age": 18,
     "address": "shenzhen"
}
{"name": "小明", "age": 18, "address": "shenzhen"}


In [11]:
# 对键进行排序
information6 = json.dumps(information1, ensure_ascii=False, indent = 2, sort_keys = True) 
information7 = json.dumps(information1, ensure_ascii=False, indent = 2, sort_keys = False) 
print(information6)
print(information7)

{
  "address": "shenzhen",
  "age": 18,
  "name": "小明"
}
{
  "name": "小明",
  "age": 18,
  "address": "shenzhen"
}


In [13]:
# 控制输出分隔符
information8 = json.dumps(information1, ensure_ascii=False, indent = 2, separators = ('+','@')) 
print(information4)
print(information8)

{
  "name": "小明",
  "age": 18,
  "address": "shenzhen"
}
{
  "name"@"小明"+
  "age"@18+
  "address"@"shenzhen"
}


#### json.dump()

In [14]:
with open('information_to_json.json', 'w', encoding = 'utf-8') as f:
    json.dump(information1, # 待写入数据
             f, # File对象
             indent = 2, # 空格缩进符，多行显示；若无该参数，显示数据在一行中显示
             sort_keys = True, # 键的排序
             ensure_ascii = False) # 显示中文

#### 3.3 json.loads()
与load相关的两个函数是将json转换为python

|JSON|Python|
|---|---|
|object|dict|
|array|list|
|string|unicode|
|number (int)|int, long|
|number (real)|float|
|true|True|
|false|False|
|null|None|

In [18]:
information9 = json.dumps(information1, ensure_ascii = False)
information10 = json.loads(information9)
print(information10)

{'name': '小明', 'age': 18, 'address': 'shenzhen'}


#### 3.4 json.load()

In [20]:
with open('information_to_json.json', encoding = 'utf-8') as f:
    json_to_dict = json.load(f) # json转成字典
print(json_to_dict)

{'address': 'shenzhen', 'age': 18, 'name': '小明'}


### 4 Json和非Dict之间的转换

#### 4.1 元组转换

In [21]:
data = (1,2)
data1 = json.dumps(data)
print(type(data1))
data1

<class 'str'>


'[1, 2]'

#### 4.2 列表转换

In [22]:
data = [1,2,3,4]
data1 = json.dumps(data)
print(type(data1))
data1

<class 'str'>


'[1, 2, 3, 4]'

#### 4.3 布尔值转换

In [23]:
data = True
data1 = json.dumps(data)
print(type(data1))
data1

<class 'str'>


'true'

#### 4.4 数值型转换

In [24]:
data = 1
data1 = json.dumps(data)
print(type(data1))
data1

<class 'str'>


'1'

### 5 Pandas处理Json

- ``read_json``：从json中读取数据
- ``to_json``：从pandas中的数据写入到json文件中
- ``json_noemalize``：对json数据进行规范化处理

#### 5.1 read_json
``read_json参数``
```python
pandas.read_json(
  path_or_buf=None,  
  orient=None,  
  typ='frame',   
  dtype=None, 
  convert_axes=None, 
  convert_dates=True, 
  keep_default_dates=True, 
  numpy=False, 
  precise_float=False, 
  date_unit=None, 
  encoding=None, 
  lines=False,  
  chunksize=None,
  compression='infer', 
  nrows=None, 
  storage_options=None)
```
``path_or_buf``：json文件路径

``orient``：重点参数，取值为："split"、"records"、"index"、"columns"、"values"

``typ``：要恢复的对象类型（系列或框架），默认’框架’.

``dtype``：boolean或dict，默认为True

``lines``：布尔值，默认为False，每行读取该文件作为json对象

In [26]:
import pandas as pd

In [28]:
pd.read_json('final_results.json')

Unnamed: 0,contextOSE,earthgeckoSkyline,expose,htmjava,knncad,null,numenta,numentaTM,random,randomCutForest,relativeEntropy,skyline,twitterADVec,windowedGaussian
reward_low_FN_rate,73.181673,63.800148,26.920947,70.424078,64.812321,0,74.320244,69.185068,25.876351,59.748765,58.842292,44.481017,53.501075,47.410027
reward_low_FP_rate,66.977139,46.195301,3.190931,53.25695,43.408517,0,63.116842,56.665308,5.763682,38.360772,47.598352,27.08121,33.610052,20.867986
standard,69.899569,58.200223,16.436669,65.54991,57.994343,0,70.101056,64.553464,16.831768,51.717055,54.642749,35.687043,47.061957,39.649523


orient参数 ``orient = 'split'``

json文件中的key只能为``index``,``cloumns``,``data``这三个，多一个不行，少一个也不行

``split`` : dict like {index -> [index], columns -> [columns], data -> [values]}


In [34]:
data = '{"index":[2,3,4], "columns" : ["a","b"], "data":[[1,3],[2,4],[4,5]]}'
pd.read_json(data, orient = 'split')

Unnamed: 0,a,b
2,1,3
3,2,4
4,4,5


orient参数 ``orient = 'records'``

json格式：``records`` : list like [{column -> value}, … , {column -> value}]

In [36]:
data = '[{"name": "小米","age" : 22},{"name": "小李", "age":19}]'
pd.read_json(data, orient = 'records')

Unnamed: 0,name,age
0,小米,22
1,小李,19


orient参数 ``orient = 'index'``

json格式：``index`` : dict like {index -> {column -> value}}

In [37]:
data = '{"city":{"zhuhai":"100","shenzhen":200},"home":{"price":"8k","time":"2020-08-21 02:00:00"}}'
pd.read_json(data, orient = 'index')

Unnamed: 0,zhuhai,shenzhen,price,time
city,100.0,200.0,,
home,,,8k,2020-08-21 02:00:00


orient参数 ``orient = 'columns'``

json格式：``columns`` : dict like {column -> {index -> value}}

orient = 'columns'与 orient = 'index'之间的转换在于转置即可

In [39]:
data = '{"city":{"zhuhai":"100","shenzhen":200},"home":{"price":"8k","time":"2020-08-21 02:00:00"}}'
pd.read_json(data, orient = 'columns')

Unnamed: 0,city,home
zhuhai,100.0,
shenzhen,200.0,
price,,8k
time,,2020-08-21 02:00:00


In [40]:
pd.read_json(data, orient = 'columns').T

Unnamed: 0,zhuhai,shenzhen,price,time
city,100.0,200.0,,
home,,,8k,2020-08-21 02:00:00


orient参数 ``orient = 'values'``

json格式：``values`` : just the values array

In [43]:
data = '[["小米",20],["小李",18]]'
pd.read_json(data, orient = 'values')

Unnamed: 0,0,1
0,小米,20
1,小李,18


In [46]:
df = pd.read_json(data, orient = 'values').rename(columns = {0:'name',1:'age'})
df

Unnamed: 0,name,age
0,小米,20
1,小李,18


#### 5.2 to_json()

In [47]:
df.to_json("个人信息1.json",force_ascii=False) # 不加force_ascii无法显示中文

#### 5.3 json_normalize()

上面介绍的json数据的保存和读取中json数据都是列表形式的；但是json文件中的数据通常不一定全部是列表形式，那么我们需要将字典结构的文件转成列表形式，这个过程就叫做规范化。

官网：https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html

https://www.jianshu.com/p/a84772b994a0

In [49]:
from pandas.io.json import json_normalize

In [50]:
# 层级字典通过属性的形式显示数据
data = [{'id': 1, 'name': {'first': 'Coleen', 'last': 'Volk'}},
        {'name': {'given': 'Mose', 'family': 'Regner'}},
        {'id': 2, 'name': 'Faye Raker'}]
pd.json_normalize(data)

Unnamed: 0,id,name.first,name.last,name.given,name.family,name
0,1.0,Coleen,Volk,,,
1,,,,Mose,Regner,
2,2.0,,,,,Faye Raker


max_level参数

若max_level=0，则嵌套的字典会当做整体，显示在数据框中

若max_level=1，则嵌套的字典会被拆解，里面的键会被单独出来

In [51]:
data = [{'id': 1,
         'name': "Cole Volk",
         'fitness': {'height': 130, 'weight': 60}},
        {'name': "Mose Reg",
         'fitness': {'height': 130, 'weight': 60}},
        {'id': 2, 'name': 'Faye Raker',
         'fitness': {'height': 130, 'weight': 60}}]
pd.json_normalize(data, max_level=0)

Unnamed: 0,id,name,fitness
0,1.0,Cole Volk,"{'height': 130, 'weight': 60}"
1,,Mose Reg,"{'height': 130, 'weight': 60}"
2,2.0,Faye Raker,"{'height': 130, 'weight': 60}"


In [52]:
data = [{'id': 1,
         'name': "Cole Volk",
         'fitness': {'height': 130, 'weight': 60}},
        {'name': "Mose Reg",
         'fitness': {'height': 130, 'weight': 60}},
        {'id': 2, 'name': 'Faye Raker',
         'fitness': {'height': 130, 'weight': 60}}]
pd.json_normalize(data, max_level=1)

Unnamed: 0,id,name,fitness.height,fitness.weight
0,1.0,Cole Volk,130,60
1,,Mose Reg,130,60
2,2.0,Faye Raker,130,60


读取层级嵌套中的部分内容

In [54]:
data = [{'state': 'Florida',
         'shortname': 'FL',
         'info': {'governor': 'Rick Scott'},
         'counties': [{'name': 'Dade', 'population': 12345},
                      {'name': 'Broward', 'population': 40000},
                      {'name': 'Palm Beach', 'population': 60000}]},
        {'state': 'Ohio',
         'shortname': 'OH',
         'info': {'governor': 'John Kasich'},
         'counties': [{'name': 'Summit', 'population': 1234},
                      {'name': 'Cuyahoga', 'population': 1337}]}]
pd.json_normalize(data, 'counties')

Unnamed: 0,name,population
0,Dade,12345
1,Broward,40000
2,Palm Beach,60000
3,Summit,1234
4,Cuyahoga,1337


读取全部内容

In [56]:
data = [{'state': 'Florida',
         'shortname': 'FL',
         'info': {'governor': 'Rick Scott'},
         'counties': [{'name': 'Dade', 'population': 12345},
                      {'name': 'Broward', 'population': 40000},
                      {'name': 'Palm Beach', 'population': 60000}]},
        {'state': 'Ohio',
         'shortname': 'OH',
         'info': {'governor': 'John Kasich'},
         'counties': [{'name': 'Summit', 'population': 1234},
                      {'name': 'Cuyahoga', 'population': 1337}]}]
pd.json_normalize(data, 'counties', ['state', 'shortname',['info', 'governor']])

Unnamed: 0,name,population,state,shortname,info.governor
0,Dade,12345,Florida,FL,Rick Scott
1,Broward,40000,Florida,FL,Rick Scott
2,Palm Beach,60000,Florida,FL,Rick Scott
3,Summit,1234,Ohio,OH,John Kasich
4,Cuyahoga,1337,Ohio,OH,John Kasich
