![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fcurriculum-notebooks&branch=master&subPath=EnglishLanguageArts/WordClouds/word-clouds.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Word Clouds

We can use Python code to visualize word frequencies in text by generating [word clouds](https://en.wikipedia.org/wiki/Tag_cloud). We can even create these in custom shapes using [image masks](https://en.wikipedia.org/wiki/Mask_(computing)#Image_masks).

## Creating Word Clouds

Find some online text from a site such as [Project Gutenberg](http://www.gutenberg.org) and a silhouette image from a site such as [Public Domain Vectors](https://publicdomainvectors.org/en/search/silhouette). Paste the links into the code below, then click the `►Run` button above.

In [None]:
%pip install -q pyodide_http plotly nbformat requests shutil wordcloud
import pyodide_http
pyodide_http.patch_all()
text_url = 'http://www.gutenberg.org/files/1661/1661-0.txt'
image_url = 'https://publicdomainvectors.org/photos/sherlock-holmes.png'


#import required libraries
import requests
import shutil
try:
    from wordcloud import WordCloud, STOPWORDS
except:
    !pip install wordcloud
    from wordcloud import WordCloud, STOPWORDS
from PIL import Image as pil_image
import numpy as np
from IPython.display import Image
# download the text
text = requests.get(text_url).text
# download the image file
r = requests.get(image_url, stream=True)
with open('image.png', 'wb') as out_file:
    shutil.copyfileobj(r.raw, out_file)
# if there is a transparent background, replace it with white
image = pil_image.open('image.png').convert('RGBA')
white_background = pil_image.new('RGBA', image.size, (255,255,255))
composite = pil_image.alpha_composite(white_background, image)
image.close()
composite.save('image.png', 'PNG')
image = pil_image.open('image.png')
print('Ready to generate word cloud image.')

Now that everything is ready for us to generate a word cloud image, `►Run` the following code cell.

Each time you run the cell it will generate a new image, try changing some of the [options](http://amueller.github.io/word_cloud/generated/wordcloud.WordCloud.html) in the code then re-running it to see what happens.

In [None]:
# filter out common words
stopwords = set(STOPWORDS)
stopwords.add('said')
# set up some options
max_font_size = 50
background_color = 'white'
border_size = 0
border_color = 'yellow'
colormap = 'rainbow' #'spring','summer','autumn','winter','cool','hot','viridis','inferno','magma','cividis','Spectral','Greys'
scale = 1.5
# create the wordcloud
custom_mask = np.array(image)
wc = WordCloud(stopwords=stopwords, mask=custom_mask, max_font_size=max_font_size, background_color=background_color, contour_width=border_size, contour_color=border_color, colormap=colormap, scale=scale)
wc.generate(text)
# save the wordcloud as an image and display it
wc.to_file('wordcloud.png')
Image(filename='wordcloud.png')

To download the resulting image, you can right-click and copy or save the image.

`►Run` the following code cell to list all of the possible [colormaps](https://matplotlib.org/tutorials/colors/colormaps.html) you can choose. To try one of these, modify the variable `colormap = 'rainbow'` to be `colormap = 'cool'` or `colormap = 'viridis'` and re-run the above code cell.

In [None]:
import matplotlib
matplotlib.pyplot.colormaps()

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)