# Creating a word cloud from sticky note feedback

A Google search of "python word cloud" has this for the first result: https://github.com/amueller/word_cloud. Perfect. As expected, someone has already thought about this and went through all the trouble to make a package and put it on GitHub and the Python Package Index (PyPI) for us all to use. 

Note that in `amueller`'s readme, the `wordcloud` package is installable via `pip` (since it has been uploaded to PyPI). This means installation is as simple as (from the shell):

```bash
pip install wordcloud
```


## Windows troubles

If you run `pip install wordcloud` on Windows (like I did), you'll probably get an error like this:


```bash
Failed to build wordcloud
Installing collected packages: wordcloud
  Running setup.py install for wordcloud
    Complete output from command C:\Users\Pete\Anaconda3\envs\py27\python.exe -c "import setuptools, tokenize;__file__='c:\\users\\pete\\appdata\\local\\temp\\pip-build-fakcaf\\wordcloud\\setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record c:\users\pete\appdata\local\temp\pip-bp1q1a-record\install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    running build_ext
    building 'wordcloud.query_integral_image' extension
    error: Microsoft Visual C++ 9.0 is required (Unable to find vcvarsall.bat). Get it from http://aka.ms/vcpython27

    ----------------------------------------
Command "C:\Users\Pete\Anaconda3\envs\py27\python.exe -c "import setuptools, tokenize;__file__='c:\\users\\pete\\appdata\\local\\temp\\pip-build-fakcaf\\wordcloud\\setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record c:\users\pete\appdata\local\temp\pip-bp1q1a-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in c:\users\pete\appdata\local\temp\pip-build-fakcaf\wordcloud
```


There's a link in that error message (http://aka.ms/vcpython27), which is telling us to download and install a C++ compiler for Python. So let's do that and run `pip install wordcloud` again and we should get something like


```bash
Building wheels for collected packages: wordcloud
  Running setup.py bdist_wheel for wordcloud
  Stored in directory: C:\Users\Pete\AppData\Local\pip\Cache\wheels\32\a9\74\58e379e5dc614bfd9dd9832d67608faac9b2bc6c194d6f6df5
Successfully built wordcloud
Installing collected packages: wordcloud
Successfully installed wordcloud-1.1.3
```

Okay, we should be good to go!


## Installing the rest of the dependencies

Looking in https://github.com/amueller/word_cloud/blob/master/requirements.txt, we see that the `Image` package is also a requirement, so we can install with (from the shell):

    conda install PIL

and we should be good to go! Why not `conda install Image`? This might be outdated documentation. Looking inside the code, we see that `Image` is imported from `PIL`: https://github.com/amueller/word_cloud/blob/master/wordcloud/wordcloud.py#L16

In [None]:
# We want this to be masked like the example at https://github.com/amueller/word_cloud/blob/master/examples/masked.py
# The code will be adapted from that
from __future__ import division, print_function
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from wordcloud import WordCloud

# Read the whole text from the positive feedback, omitting blank lines and ones that begin with `#`
text = ""
with open("data/feedback_positive.txt") as f:
    for line in f:
        if not line.strip().startswith("#") and len(line.strip()) > 0:
            text += line

print(text)

In [None]:
# There are some unclear comments in there that shouldn't make it into the word cloud
# We can get rid of these using regular expressions, which are built into Python
# See http://stackoverflow.com/questions/640001/how-can-i-remove-text-within-parentheses-with-a-regex

import re

text = re.sub(r"\[.*?\]", "", text)

print(text)

In [None]:
from PIL import Image

# Read the mask image
logos_mask = np.array(Image.open("images/logos.png"))

wc = WordCloud(background_color="white", max_words=20000, mask=logos_mask)
# Renerate word cloud
wc.generate(text)

# Store to file
wc.to_file("images/wordcloud_raw.png")

# Get the size of the image
size_pixels = np.array(logos_mask.shape[:-1])
dpi = 60
size_inches = size_pixels/dpi

# Show the wordcloud with matplotlib
plt.figure(figsize=size_inches)
plt.imshow(wc)
plt.axis("off")
plt.show()

In [None]:
# That wordcloud is hard to distinguish, so let's add a semi-transparent version of the mask image

# From http://stackoverflow.com/questions/10640114/overlay-two-same-sized-images-in-python
background = Image.open("images/logos.png").convert("RGBA")
wordcloud = Image.open("images/wordcloud_raw.png").convert("RGBA")
blended = Image.blend(wordcloud, background, 0.1)
blended.save("images/unh_swc_wordcloud.png", "PNG")

# Show the new blended image with matlotlib
plt.figure(figsize=size_inches)
plt.imshow(blended)
plt.axis("off")
plt.show()