# Word Clouds
<img src="img/wordcloud-logo.png" style="float: right; width: 240px; padding-left: 1em;"></img>

Word clouds are images composed of words used in a particular text or context, where the size of each word indicates its frequency or importance.

## Installing libraries & tools

You'll need these Python packages installed: `numpy`, `pandas`, `matplotlib`, `pillow`, and `wordcloud`.

In case you get a `The _imagingft C module is not installed` error when trying to build your wordcloud, try to install Pillow from source:

    sudo apt install libfreetype6-dev
    python3 -m pip install -I --no-binary :all: pillow

## Setting the stage

> *The notebook, data, and images can be found [here](https://github.com/jhermann/jupyter-by-example/tree/master/charts).*

First, import all the needed libraries:

In [0]:
import os
import time
import random
import collections
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import HTML, clear_output
from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

%matplotlib notebook

## Loading a word list or text

The words are in a text file, one word (or short phrase) per line.

In [0]:
with open('../data/wordcloud.txt') as handle:
    words = [x.strip() for x in handle]
collections.Counter(words).most_common()

### Rendering the word cloud

This code then creates the word cloud image from the word list.

In [0]:
random.shuffle(words)
wordcloud = WordCloud(
    width=720, height=320, margin=5,
    prefer_horizontal=0.6,
    mask=np.array(Image.open('img/panda-mask.png')),
    contour_width=3, contour_color='black',
    background_color='white',
).generate('\n'.join(words))

chart_img = 'img/wordcloud.png'
plt.axis("off")
wordcloud.to_file(chart_img)
clear_output()
HTML('<img src="{}?{}"></img>'.format(chart_img, time.time()))

The above code uses this image mask to give the word ‘cloud’ a specific shape.

<img src="img/panda-mask.png"></img>

As you can see, black marks the region for inserting words, while white is background. You can also use colored images, see the [Generating WordClouds in Python](https://www.datacamp.com/community/tutorials/wordcloud-python) article for details.

In [0]:
# execute this cell for docs
?WordCloud

In [0]:
#plt.imshow(wordcloud, interpolation='bilinear')
#plt.show()