<a href="https://colab.research.google.com/github/StickBrush/Excel2WordCloud/blob/main/WordCloud.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Creating Word Clouds from frequency Excel files

## Step 1: Install and load the needed libraries

- `matplotlib`: "Mathematic Plotting Library". It's a Python library to make plots and graphics. After calculating the word cloud, we use `matplotlib` to show and save the image.
- `pandas`: Data manipulation and tables library. It can read data from a bunch of formats (CSV, Excel, databases, or even your clipboard) and do all kinds of transformations with it (like asking for rows that meet a certain condition, change rows-columns, operate with rows and columns to generate new ones, and helps with plotting too making it easier to use `matplotlib`).
- `wordcloud`: A library that makes wordclouds from the data we give it. Here's [the documentation](http://amueller.github.io/word_cloud/generated/wordcloud.WordCloud.html#wordcloud.WordCloud) of the library, as well as [some cool examples](http://amueller.github.io/word_cloud/auto_examples/index.html).

In [None]:
!pip install matplotlib pandas wordcloud

In [None]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
import pandas as pd
import glob
import os

## Step 2: Load the dataset

In summary, we will just load up each Excel file with Pandas, giving the columns their proper names.


In [None]:
all_files = glob.glob(os.path.join("data", "*.xlsx"))
all_dataframes = {os.path.splitext(os.path.basename(file_r))[0]: pd.read_excel(file_r,  names=["Term", "Frequency"]) for file_r in all_files}

## Step 3: Create and save a word cloud for each file

I will be using the Calibri font for this. Each word cloud will have a 720p resolution, and will be saved as a PNG image.




In [None]:
for og_filename in all_dataframes:
  wordcloud_df = all_dataframes[og_filename]
  wordcloud_data = wordcloud_df.set_index('Term').to_dict()['Frequency']
  wordcloud = WordCloud(width=1280, height=720, background_color ='white', font_path='calibri.ttf').generate_from_frequencies(wordcloud_data)
  plt.figure(figsize=(16, 9), facecolor=None)
  plt.imshow(wordcloud)
  plt.axis("off")
  plt.tight_layout(pad = 0)

  out_file = os.path.join("plots", og_filename+'.png')
  plt.savefig(out_file, format='png', bbox_inches='tight')

  print(og_filename)
  plt.show()

## Step 4: Compress the plots into a ZIP file

In [None]:
!zip -r plots.zip plots