# EUSIPCO 2021 Papers Wordcloud

reference for wordclouds: 
- https://towardsdatascience.com/simple-wordcloud-in-python-2ae54a9f58e5
- https://pypi.org/project/wordcloud/

## Install needed packages

In [None]:
# !pip install pandas
# !pip install wordcloud
# !pip install numpy
# !pip install matplotlib

### Read file with paper names

To load the papers information we will use the file conference_content.txt. To create this file, I went to the page https://eusipco2021.org/conference-programme/ and manually copied all the programme to a txt.

In conference_content.txt we can distinguish the paper names from other information using the fact that the paper names always appear alone in a line of the file with all of its characters in upper case letter. 

The filter of paper names is handled by the get_paper_names function below

In [None]:
import pandas as pd
import numpy as np

In [None]:
def get_paper_names(file):
    with open(file, 'r') as f:
        content = f.readlines()
    
    names = []
    
    for line in content:
        temp = line.strip()
        if temp == 'Q&A':
            continue 
        if temp == '3MT':
            continue    
        temp = ''.join([c for c in temp if c == ' ' or c.isalpha()])
        if temp.isupper():
            if line.startswith('#'):
                line = ' '.join(line.split(' ')[1:])
            names.append(line.strip())
            
    return names

In [None]:
papers = get_paper_names('./conference_content.txt')

In [None]:
len(papers)

In [None]:
print(*np.random.choice(papers, 10), sep='\n\n')

## Generate wordcloud

In [None]:
import matplotlib.pyplot as plt

def plot_cloud(wordcloud, save=True):
    f = plt.figure(figsize=(40, 30))
    plt.imshow(wordcloud) 
    plt.axis("off");
    if save:
        f.savefig('./wordcloud.jpg')  

In [None]:
from wordcloud import WordCloud, STOPWORDS

# based and used appear a lot
# so we add it to stopwords 
STOPWORDS.add('based')
STOPWORDS.add('using')

text = '\n'.join(papers)
text = text.lower()

wordcloud = WordCloud(width= 3000, height = 2000, random_state=1, 
                      background_color='salmon', colormap='Pastel1', 
                      collocations=False, stopwords = STOPWORDS).generate(text)
# Plot
plot_cloud(wordcloud)