**Visualize Word Dependencies & Named Entities in Chinese Text**

In [29]:
# author: Dr. Sandra Isabelle Schmid

# requirements:
# 1) spacy: $ pip install -U spacy
# 2) spacy chinese package: $ python -m spacy download zh_core_web_lg

# for spacy documentation concerning this code see:
# spacy in general: https://spacy.io/usage/spacy-101
# visualization of dependencies and named entities: https://spacy.io/usage/visualizers
# chinese packages for spacy: https://spacy.io/models/zh


# import chinese package
import zh_core_web_lg
# import library for visualizing word relations
from spacy import displacy

# load my nlp (chinese)
nlp = zh_core_web_lg.load()

In [48]:
# define or load document
# text from xinhuanet.com: http://www.xinhuanet.com/local/2020-08/04/c_1126320957.htm
doc1 = nlp('《北京市生活垃圾管理条例》正式实施已满3个月。昨天下午，市城市管理委召开新闻发布会，市城管执法局、市民政局、市妇联等部门联合通报新版条例实施3个月来本市垃圾分类开展总体情况、执法情况及社会动员情况。记者了解到，3个月来，全市家庭厨余垃圾分出量增长4倍有余。接下来，本市将在社区广泛开展“盯桶”“守桶”行动，市妇联还将组建7000支、10万人的巾帼志愿者队参与桶前值守。')

# text about pleco fish from baidu: https://baike.baidu.com/item/%E9%B2%B6%E9%B1%BC/1579868
doc2 = nlp('鲶鱼又名胡子鱼、塘鲺，显著特征是周身无鳞、体表多黏液、头扁口阔、上下颌各有4根条胡须，分布广泛，主要产于长江和珠江流域，仲春至仲夏（4~7月）为最佳食用季节。鲶鱼是肉食性鱼类，其肉质细嫩少刺、美味浓郁，富含蛋白质和脂肪，营养丰富，尤其适宜体质虚弱、营养不良之人食用')

doc3=nlp('我叫苏灵芸，我在海德堡学习物理。经过在北京的研究工作，我现在是一名数据科学家。')

In [49]:
def visualize_word_relations(document):
    sentence_spans=list(document.sents)
    for x in sentence_spans:
        # displacy.render(x, style="dep", jupyter=True, options={'distance': 130})
        displacy.render(x, style="dep", jupyter=True, options={'distance': 130, "compact": True})
        
def visualize_named_entities(document):
    sentence_spans=list(document.sents)
    for x in sentence_spans:
        displacy.render(x, style="ent", jupyter=True, options={'distance': 60, "compact": True})

In [50]:
visualize_word_relations(doc3)
visualize_named_entities(doc3)

In [33]:
# extract information of a document:

# print all named entities together with its kind (location, date, person...) contained in the text "doc"
for ent in doc.ents:
      #print(ent.text, ent.label_)
      # nicer formatted
      print(f"{ent.text:<15}{ent.label_:<10}")

# for every sentence in the document, print the respective sentence
for s in doc.sents:
  print(s)

# for every token in the document, print the token, its word species and its grammatical role in the sentence
for token in doc:
  print(token.text, token.pos_, token.dep_)

周身无鳞           PERSON    
4              CARDINAL  
长江             LOC       
珠江             LOC       
仲春至仲夏          DATE      
鲶鱼又名胡子鱼、塘鲺，显著特征是周身无鳞、体表多黏液、头扁口阔、上下颌各有4根条胡须，分布广泛，主要产于长江和珠江流域，仲春至仲夏（4~7月）
为最佳食用季节。
鲶鱼是肉食性鱼类，其肉质细嫩少刺、美味浓郁，富含蛋白质和脂肪，营养丰富，尤其适宜体质虚弱、营养不良之人食用
鲶鱼 NOUN nsubj
又 ADV advmod
名 NUM ROOT
胡子 NOUN compound:nn
鱼 NOUN conj
、 PUNCT punct
塘鲺 NOUN dobj
， PUNCT punct
显著 ADJ amod
特征 NOUN conj
是 VERB cop
周身无鳞 PROPN dep
、 PUNCT punct
体表 NOUN nsubj
多 VERB dep
黏液 NOUN conj
、 PUNCT punct
头扁 VERB compound:nn
口阔 NOUN conj
、 PUNCT punct
上下颌 NOUN nsubj
各 ADV advmod
有 VERB conj
4 NUM nummod
根 NUM mark:clf
条 NUM amod
胡须 NOUN dobj
， PUNCT punct
分布 NOUN nsubj
广泛 VERB conj
， PUNCT punct
主要 ADV advmod
产于 VERB conj
长江 PROPN conj
和 CCONJ cc
珠江 PROPN nmod:assmod
流域 NOUN dobj
， PUNCT punct
仲春 NOUN conj
至 CCONJ cc
仲夏 NOUN conj
（ PUNCT punct
4~7月 NOUN parataxis:prnmod
） PUNCT punct
为 ADP cop
最佳 ADJ amod
食用 NOUN compound:nn
季节 NOUN ROOT
。 PUNCT punct
鲶鱼 NOUN nsubj
是 VERB cop
肉食性 NOUN compound:nn
鱼类 NOU