Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python词云wordcloud #126

Open
Qingquan-Li opened this issue Sep 15, 2019 · 0 comments
Open

Python词云wordcloud #126

Qingquan-Li opened this issue Sep 15, 2019 · 0 comments
Labels

Comments

@Qingquan-Li
Copy link
Owner

Qingquan-Li commented Sep 15, 2019


一、最简单的生成词云图片的方法

'''
Minimal Example
===============
参考:生成默认简单的词云图:https://github.com/amueller/word_cloud/blob/master/examples/simple.py
'''

from wordcloud import WordCloud

# Read the whole text.
text = open('/Users/fatli/Desktop/content.txt').read()

# Generate a word cloud image
# 通过font_path参数来设置字体集,否则中文无法显示(显示为长方形格格)
wordcloud = WordCloud(font_path="/Users/fatli/Library/Fonts/NotoSansHans-Regular.otf").generate(text)

# The pil way (if you don't have matplotlib)
image = wordcloud.to_image()
image.show()
# 保存到文件
wordcloud.to_file('01.png')

01


二、生成默认的简单的词云图

'''
Minimal Example
===============
参考:https://github.com/amueller/word_cloud#examples
生成默认的简单的词云图:https://github.com/amueller/word_cloud/blob/master/examples/simple.py
'''

from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
# matplotlib.pyplot 是命令样式函数的集合,使matplotlib像MATLAB一样工作。 
# 每个pyplot函数对图形进行一些更改:例如,创建图形,在图形中创建绘图区域,在绘图区域中绘制一些线条,用标签装饰图形等。

# Read the whole text.
text = open('/Users/fatli/Desktop/content.txt').read()

# 设置停用词
stopwords = set(STOPWORDS)
stopwords.add("测试") # 一个括号内只接受一个参数
stopwords.add("test")
stopwords.add("他妈的")

# 通过font_path参数来设置字体集,否则中文无法显示(显示为长方形格格)
# background_color参数为设置背景颜色,默认颜色为黑色
# width,height可以设置图片属性
# generate可以对全部文本进行自动分词(若对中文分词有较高要求,可使用Jieba中文分词库github.com/fxsjy/jieba)
wordcloud = WordCloud(font_path="/Users/fatli/Library/Fonts/NotoSansHans-Regular.otf",
                      stopwords=stopwords,
                      background_color="white",
                      max_font_size=200,
                    # max_words=2000,
                      width=1920,
                      height=1080
).generate(text)

# 显示词云图片
# plt.figure()
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

# 保存到文件
wordcloud.to_file('wordcloud.png')

wordcloud


三、根据给定的图片形状生成词云

'''
Masked wordcloud
================
参考:https://github.com/amueller/word_cloud#examples
根据给定的图片形状生成词云图:https://github.com/amueller/word_cloud/blob/master/examples/masked.py
'''

import numpy as np
import matplotlib.pyplot as plt

from PIL import Image
from wordcloud import WordCloud, STOPWORDS

text = open('/Users/fatli/Desktop/content.txt').read()

# read the mask image 图片分辨率越高,生成的词云图也越清晰
rectangle_mask = np.array(Image.open("/Users/fatli/Desktop/Rectangle.png"))

# 设置停用词
stopwords = set(STOPWORDS)
stopwords.add("测试") # 一个括号内只接受一个参数
stopwords.add("test")
stopwords.add("他妈的")

'''
通过font_path参数来设置字体集,否则中文无法显示(显示为长方形格格)
background_color参数为设置背景颜色,默认颜色为黑色
width,height可以设置图片属性
'''
wc = WordCloud(font_path="/Users/fatli/Library/Fonts/NotoSansHans-Regular.otf",
               background_color="white",
               max_words=2000,
               mask=rectangle_mask,
               stopwords=stopwords
)

# generate可以对全部文本进行自动分词。
# 若对中文分词有较高要求,可使用Jieba中文分词库https://github.com/fxsjy/jieba
wc.generate(text)

# store to file 保存词云图
wc.to_file("output_wordcloud.png")

# show
plt.imshow(wc, interpolation='bilinear')
plt.axis("off")
plt.figure()
plt.imshow(rectangle_mask, cmap=plt.cm.gray, interpolation='bilinear')
plt.axis("off")
plt.show()

output_wordcloud


附:踩坑经历:

conda install wordcloud 会安装包括 numpy 在内的依赖包,这时需要先安装 numpy 后,再安装 wordcloud ,否则将会报错:

ImportError: 

Importing the multiarray numpy extension module failed.  Most

likely you are trying to import a failed build of numpy.

If you're working with a numpy git repo, try `git clean -xdf` (removes all

files not under version control).  Otherwise reinstall numpy.

... ...

  Reason: image not found
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant