# Midterm

Authors: Cassie Corey, Jay Zou

Tasks:
1. Read in (parse, tokenize, ...) the text (30 points)
2. Visualize the text using different interactive Bokeh visualizations (10 points of each different type of interactive visualizations - max 30).
3. Cluster the text and visualize using interactive Bokeh visualizations (10 points for each different type of interactive visualizations - max 30).
4. Explain what you've seen (10 points)

Sections:

- [Gathering Data](#Gathering-Data)
- [Preliminary Analysis](#Preliminary-Analysis)
- [Wordcount and Complexity vs. Swear Ratio](#WordCount-and-Complexity/Swear-Ratio-Analysis)
- [Highest Scoring Songs](#Highest-Scoring-Songs)
- [Band Names and Complexity](#Band-Names-and-Complexity)
- [TFIDF](#TFIDF)
- [Cosine Distance](#Cosine-Distance)
- [Lyric Generation](#Lyric-Generation)
- [Summary](#Summary)

### Introduction

Our text mining and visualizations are based on the [Heavy Metal Text Mining](https://paulvanderlaken.com/2017/09/27/text-mining-pythonic-heavy-metal/) example. This example looks at multiple characteristics of the lyrics, some of which include: TFIDF, cosine distances between word distributions, emotional arcs, swearwords, and lyric generation. The characteristics were visualized using various scatter plots, graphs, and trees. A brief explanation of the algorithms used and the output observed is given separately for each visualization/technique.

### Required Libraries

These can all be installed with `pip install <package>`
- BeautifulSoup
- Sklearn
- Textstat
- Markovify

# Gathering Data

[Back to top](#Midterm)

Since this example does not provide a dataset of lyrics, we collected lyrics ourselves by scraping [MetroLyrics](https://www.metrolyrics.com).

First we chose a music genre: Tech Death Metal. We got our list of bands from [Wikipedia's list of Technical Death Metal Bands](https://en.wikipedia.org/wiki/List_of_technical_death_metal_bands).

In [None]:
import requests
from bs4 import BeautifulSoup

WIKI_URL = "https://en.wikipedia.org/wiki/List_of_technical_death_metal_bands"

req = requests.get(WIKI_URL)
soup = BeautifulSoup(req.content, 'lxml')
table_cells = soup.findAll("td")

artists = []
for cell in table_cells:
    link = cell.find('a',href=True)
    if link is not None:
        if '[' not in link.text:
            artists.append(link.text.replace('(band)','').strip())

# It'll be convenient to have a lowercase version for URLs and indexing.
artists_L = [a.lower() for a in artists]

The next cell contains some useful methods that we'll need for getting urls and lyrics from urls.

In [None]:
from bs4 import BeautifulSoup
from time import sleep, time
import random, requests

BASE_URL = "http://www.metrolyrics.com/"

def get_song_urls(artists):
    art_song_dict = {}
    for artist in artists:
        url = BASE_URL + artist.replace(' ','-') + "-lyrics.html"
        sleep(random.randint(0,10))
        response = requests.get(url)
        if response.status_code != 404: # Not all artists might be on MetroLyrics
            soup = BeautifulSoup(response.content, 'lxml')
            links = [a['href'] for a in soup.find_all('a',href=True)]
            song_list = []
            for link in links:
                if "lyrics-" + artist.replace(' ','-') in link:
                    song_list.append(link)
            art_song_dict[artist] = song_list
    return art_song_dict

def get_lyrics(song_url):
    sleep(random.randint(0,10))
    response = requests.get(song_url)
    soup = BeautifulSoup(response.content, 'lxml')
    verses = soup.find_all("p",{"class":"verse"})
    lyrics = ''
    for verse in verses:
        lyrics += verse.text + ' '
    return lyrics

def song_from_url(song_url):
    return song_url[27:].split('lyrics')[0].replace('-',' ').strip()

_WARNING:_ THE FOLLOW CELL MAY TAKE UP TO __5 MINUTES__ TO RUN

This cell fetches urls for songs from each artist. We then use these urls to fetch the lyrics for each song.

In [None]:
t0 = time()
print('Fetching song urls...',end='')
art_song_dict = get_song_urls(artists_L)
print('Done in {:02f}s'.format(time()-t0))

In [None]:
import pandas as pd

# Initialize a dataframe to hold the lyrics
lyrics_df = pd.DataFrame(columns=['artist','song','lyrics'])

_WARNING:_ THE FOLLOWING CELL WILL RUN FOR A __REALLY LONG TIME__, like until the network connection times out.

This cell fetches lyrics from the song URLs. It is OK to interrupt this cell at any time if you have other business to do. As long as you save your work in the cell that follows, you can come back to this cell and it won't waste time on lyrics it has already gathered.

That said, you do still run the risk of interrupting it while it's in the middle of writing lyrics for a song. So you may get some partially complete lyrics. But you can manually check that if you're really concerned.

In [None]:
t0 = time()
for artist in art_song_dict:
    print("Fetching lyrics for: ",artist)
    for song_url in art_song_dict[artist]:
        song = song_from_url(song_url)
        if song not in lyrics_df.song.values:
            lyrics = get_lyrics(song_url)
            lyrics_df = lyrics_df.append({'artist':artist,
                                          'song':song,
                                          'lyrics':lyrics},ignore_index=True)
print('Done in {:02f}s'.format(time()-t0))

In [None]:
# Save the lyrics data.
lyrics_df.to_csv('lyrics_line.csv',index=False)

There were some lyrics that weren't available on MetroLyrics. The following is an attempt to get the missing lyrics from another site: Genius.com. It mostly doesn't work.

In [None]:
def get_genius_lyrics(song,artist):
    url = "http://genius.com/{}-{}-lyrics".format(artist.replace(' ','-'),song.replace(' ','-'))
    print(url)
    response = requests.get(url)
    if response.status_code != 404:
        soup = BeautifulSoup(response.content,'lxml')
        lyrics = soup.find("div",{"class":"lyrics"})
        text = lyrics.find("p").text
        print(text)
        return text
    return ''

In [None]:
missing_songs = lyrics_df[lyrics_df.lyrics==''].song
print("{} songs missing!".format(len(missing_songs)))

t0 = time()
for idx,song in enumerate(missing_songs):
    artist = lyrics_df.iloc[idx].artist
    lyrics = get_genius_lyrics(song,artist)
    if lyrics != '':
        lyrics_df.iloc[idx].lyrics = lyrics
print("Done in {:02f}".format(time()-t0))

# Save our work.
lyrics_df.to_csv('lyrics.csv',index=False)

## Gathering Swear Words and Calculating Complexity

The original author did not provide a full dataset of lyrics (or even the code to scrape it). He did, however, provide a list of naughty words. So, we used that to explore naughty words in the lyrics we collected.

He compared lyric complexity to the number of swearwords used and found a positive correlation. We do the same experiment below.

In [None]:
# Read in the swear words
with open('swear_words.txt','r') as f:
    swear_words = f.read().splitlines()

The original author used the SMOG measure of complexity. It estimates the reading grade level of text. However, this calculation relies on counting the number of sentences in a piece of text. Since lyrics are structured a bit differently from normal text, we might need to try a few different ways of dealing with the lack of punctuation.

In [None]:
from textstat.textstat import textstat

# From pythonic-metal github
def count_swear_word_ratio(text):
    counter = 0
    for swear_word in swear_words:
        counter += text.count(swear_word)
    number_of_words = textstat.lexicon_count(text)
    return counter/number_of_words

In [None]:
lyrics_df['swear_words_ratio'] = 0
lyrics_df['complexity'] = 0

for i,lyrics in enumerate(lyrics_df.dropna().lyrics):
    if len(lyrics)>0:
        # Calculate complexity
        complexity = textstat.smog_index(lyrics)
        lyrics_df.iloc[i,lyrics_df.columns.get_loc('complexity')] = complexity
        # Calculate swear words ratio
        swr = count_swear_word_ratio(lyrics)
        lyrics_df.iloc[i,lyrics_df.columns.get_loc('swear_words_ratio')] = swr

lyrics_df.sample(5)

In [None]:
# Save it if you want
lyrics_df.to_csv("lyrics_complexity_swear_words.csv",index=False)

# Preliminary Analysis

[Back to top](#Midterm)

Out of the scraped lyrics data, we ended up with approximately 420 good points since a significant number of entries had a zero complexity score. This was mostly due to instrumental songs being scraped or missing lyrics. After cleaning up the data, we began to explore its features.

In [1]:
import pandas as pd
dfd = pd.read_csv("cleaned_lyrics_data.csv")

# WordCount and Complexity/Swear Ratio Analysis

[Back to top](#Midterm)

The first subject of interest is frequency of occurence for words across all collected lyrics. In order to get a meaningful wordcount of words specific to technical death metal, we had to cut out commonly occuring stop words from our counting. This was done by using the stopwords list from https://algs4.cs.princeton.edu/35applications/stopwords.txt and adding a few words of our own that were appearing and were clearly not informative. We then plotted both the relationship between our complexity metric and swear words ratio as well as the top 10 most commonly occurring words to get initial introspection into our dataset. 

In [2]:
import operator
stopwords_file = open("stopwords.txt")
stopwords = []
for word in stopwords_file:
    stopwords.append(word.strip())
stopwords_file.close()
    
stopwords.append('i')
stopwords.append('')
stopwords.append('-')

def word_count():
    wordcount = {}
    for lyrics in dfd["lyrics"]:
        parsed = str(lyrics).split(" ")
        for word in parsed:
            if word.lower() not in stopwords:
                count = wordcount.get(word.lower(), 0)
                wordcount[word.lower()] = count + 1
    return wordcount

wordcount = sorted(word_count().items(), key=operator.itemgetter(1), reverse = True)
wcx, wcy = zip(*wordcount[:10])

Here we make an interactive graph of our two preliminary explorations.

In [3]:
from bokeh.plotting import figure, output_notebook, show
from bokeh.io import push_notebook
from bokeh.palettes import Spectral10, RdYlBu10, PiYG10
from ipywidgets import interact
output_notebook()

s = figure(plot_height = 800, plot_width = 800, title = "Complexity vs. Swear Words Ratio")
s.circle("complexity", "swear_words_ratio", source = dfd)
s.xaxis.axis_label = "Lyrical Complexity"
s.yaxis.axis_label = "Swear Words Ratio"

v = figure(plot_height = 800, plot_width = 800, x_range = list(wcx), title = "Top 10 Words by Appearence")
v.vbar(wcx, 0.5, wcy, color = RdYlBu10)

def update1(Graph):
    if Graph == "Complexity vs. Swear Words":
        show(s, notebook_handle = True)
    if Graph == "Most Common Words":
        show(v, notebook_handle = True)        
    push_notebook()

interact(update1, Graph=['Complexity vs. Swear Words', 'Most Common Words'])

A Jupyter Widget

<function __main__.update1>

## Visualization 1 Analysis

Based on our plots, we see that the distribution of data is relatively random and it's difficult to make a statement directly relating lyrical complexity and the ratio of swear words. It seems as though there is an ever so slight slight positive correlation between complexity and swear words at the far end of the data set, but the data also becomes much more sparse and variable. Within the center cluster of data set, there is no clear pattern between swear words ratio and lyrical complexity. 

Out of the top 10 most commonly occurring words, there are no swear words. Most words follow themes of death and mortality, which we have colorfully illustrated with a bright color palette.

# Highest Scoring Songs

[Back to top](#Midterm)

Of particular interest was to determine if there was any overlap between high scoring songs and high swear ratio songs. We thus plotted the top 10 most lyrically complex and highest swearing songs against together in an interactive plot. 

In [4]:
top10comp = dfd.nlargest(10, "complexity")
compx = top10comp["song"]
compy = top10comp["complexity"]

top10swears = dfd.nlargest(10, "swear_words_ratio")
swearsx = top10swears["song"]
swearsy = top10swears["swear_words_ratio"]

t = figure(plot_height = 800, plot_width = 1500, x_range = list(top10comp["song"]), title = "Top 10 Complexity Scores by Song Name")
pbars = t.vbar(compx, 0.5, compy, color = Spectral10)
q = figure(plot_height = 800, plot_width = 1500, x_range = list(top10swears["song"]), title = "Top 10 Swear Ratios Scores by Song Name")
qbars = q.vbar(swearsx, 0.5, swearsy, color = PiYG10)


def update2(Graph):
    if Graph == "Complexity":
        show(t, notebook_handle = True)
    if Graph == "Swear Words Ratio":
        show(q, notebook_handle = True)        
    push_notebook()

interact(update2, Graph=['Complexity', 'Swear Words Ratio'])

A Jupyter Widget

<function __main__.update2>

## Visualization 2 Analysis

Surprisingly, or unsurprisingly, it seems as though the songs that scored higher on the complexity metric had much tamer/thoughtful song titles compared to songs that scored higher on the swear words ratio metric. Lyrically complex songs tended to more poetically titled with a more philosophical predilection ("The Resonant Frequency of Flesh"). Songs with a high proportion of swear words were titled much more aggressively and carried a darker tone ("Scum Fuck the Weak", for example).

# Band Names and Complexity

[Back to top](#Midterm)

The last visualization we created attempted to observe the distribution in complexity between artists, which meant averaging the complexity scores of all their songs contained in the data set.

In [5]:
import numpy as np
from bokeh.models import HoverTool

combined = list(zip(dfd['artist'], dfd['complexity']))
avgcomp = {}
for item in combined:
    avg = avgcomp.get(item[0], 0)
    avgcomp[item[0]] = (avg + item[1])/2

dfn = pd.DataFrame()
dfn['bandname'] = list(avgcomp.keys())
dfn['complexity'] = list(avgcomp.values())
dfn['index'] = np.arange(len(dfn.index))

r = figure(plot_height= 800, plot_width = 800, tools = ["hover"], title = "Avg Complexity by Band (Hover for Details)")
r.circle('index', "complexity", size = 20, source = dfn, color = "aquamarine")

r.select_one(HoverTool).tooltips = [
    ('Band Name', '@bandname'),
    ('Complexity', '@complexity')
]

combined2 = list(zip(dfd['artist'], dfd['swear_words_ratio']))
avgcomp2 = {}
for item in combined2:
    avg = avgcomp2.get(item[0], 0)
    avgcomp2[item[0]] = (avg + item[1])/2

dfm = pd.DataFrame()
dfm['bandname'] = list(avgcomp2.keys())
dfm['swear_words_ratio'] = list(avgcomp2.values())
dfm['index'] = np.arange(len(dfm.index))

l = figure(plot_height= 800, plot_width = 800, tools = ["hover"], title = "Avg Swearing by Band (Hover for Details)")
l.circle('index', "swear_words_ratio", size = 20, source = dfm, color = "Navy")

l.select_one(HoverTool).tooltips = [
    ('Band Name', '@bandname'),
    ('Swear Words Ratio', '@swear_words_ratio')
]

def update3(Graph):
    if Graph == "Average Complexity By Band Name":
        show(r, notebook_handle = True)
    if Graph == "Average Swear Ratio By Band Name":
        show(l, notebook_handle = True)        
    push_notebook()

interact(update3, Graph=['Average Complexity By Band Name', 'Average Swear Ratio By Band Name'])

A Jupyter Widget

<function __main__.update3>

## Visualization 3 Analysis

From this scatter plot, we can see that bands with more literately complicated/poetic names such as "Aeon", "Extol", "Arsis" have relatively high lyrical complexity compared to most other bands with aggressive, less poetic sounding words such as "Dying Fetus", "Decapitated", "Monstrosity", and "Aborted". There also exist a number of wordy, poetic names such as "Obscura" and "Nocturnus" with average complexity scores similar to bands alongside these bands, but the only band with a non-poetic name is "Death", which is perhaps less aggressive than something like "Aborted". However, having high average lyrical complexity does not mean a low average swearing ratio. "Extol" maintains the highest swearing ratio despite also having one of the highest average complexity scores.

# TFIDF

[Back to top](#Midterm)

Term frequency inverse document frequency (TFIDF) is a good way to visualize which words are the most descriptive of a certain corpus. We can use it to get an idea of the most descriptive words in the genre as a whole. It can also be used to distinguish between bands or distinguish which songs are the most descriptive of a band.

TFIDF treats text as a Bag of Words which means that order doesn't matter and punctuation is ignored. This is good for lyrics because punctuation is sort of a free-for-all. There may be a lot of incomplete sentences or repeated words.

In [6]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer

lyrics_df = pd.read_csv('cleaned_lyrics_data.csv')

tfidf_vectorizer = TfidfVectorizer(stop_words='english',max_df=0.7)

def normalise(vec):
    return vec / np.dot(vec,vec)

def combine_vectors(vectors):
    return normalise(np.sum(vectors,axis=0))

lyrics_df.dropna(inplace=True)
lyrics_df["vectors_unnormalised"] = list(tfidf_vectorizer.fit_transform(lyrics_df.lyrics.values).toarray())
lyrics_df["vectors"] = lyrics_df.vectors_unnormalised.apply(normalise)

band_vectors = (
    lyrics_df
    .groupby("artist")
    .vectors
    .apply(combine_vectors)
)

In [7]:
%matplotlib notebook

import matplotlib.pyplot as plt

from scipy.cluster.hierarchy import dendrogram, linkage, fcluster

Z = linkage(np.stack(list(band_vectors.values)), method='complete', metric="cosine")

n_clusters = fcluster(Z, 0.57, criterion='distance')

plt.figure()
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('sample index')
plt.ylabel('distance')
dendrogram(
    Z,
    leaf_rotation=90.,  # rotates the x axis labels
    labels=band_vectors.index.values
)

plt.title("Clustering Metal Lyrics")
plt.xticks(rotation=90)

plt.show()

<IPython.core.display.Javascript object>

In [8]:
plt.close()

## Visualization 4 Analysis

This visualization borrows code directly from the example we based our visualizations off of. In the cells below we recreate a similar visualization using Bokeh. Both visualizations show similarity between bands. A more detailed explanation follows our Bokeh code.

# Cosine Distance

[Back to top](#Midterm)

This measure was used to recognize band similarity and how representative certain songs were for a band. It also allowed the different bands to be clustered. We borrowed some code from the original author's notebook on [GitHub](https://github.com/ijmbarr/pythonic-metal/blob/master/pythonic-metal-part-1-counting.ipynb).

This measure of cosine similarity is based on the term frequency inverse document frequency measures of words in the combined set of lyrics. This is a measure of which words are the most descriptive of a given band.

Cosine distance relies on something called term frequency inverse document frequency.

Heatmap of cosine distance between bands. The closer the cosine distance is to 1, the more different two bands are in terms of word frequencies. The closer the cosine distance is to 0, the more similar two bands are.

In [9]:
import pandas as pd

from scipy.spatial.distance import cosine

from bokeh.io import show
from bokeh.models import (
    ColumnDataSource,
    HoverTool,
    LinearColorMapper,
    BasicTicker,
    PrintfTickFormatter,
    ColorBar
)

from bokeh.plotting import figure

# Build the cosine distance matrix
cos_df = pd.DataFrame(columns=band_vectors.index.values,
                      index=band_vectors.index.values)
for i in band_vectors.index.values:
    for j in band_vectors.index.values:
        cos_df.at[i,j] = cosine(band_vectors[i],band_vectors[j])

cos_df.index.name='BandA'
cos_df.columns.name='BandB'
bandsA = list(cos_df.index)
bandsB = list(cos_df.columns)
        
# Stack it because bokeh sucks for heatmaps
df = pd.DataFrame(cos_df.stack(),columns=['cos']).reset_index()
        
mapper = LinearColorMapper(palette='Spectral10',low=0,high=1)

source = ColumnDataSource(df)

TOOLS = "hover,save,pan,box_zoom,reset,wheel_zoom"
p = figure(title="Cosine Distance Between Artists",
           x_range=bandsA,y_range=bandsB,
           tools=TOOLS, toolbar_location='above')

p.grid.grid_line_color=None
p.axis.axis_line_color=None
p.axis.major_tick_line_color=None
p.axis.major_label_text_font_size='5pt'
p.axis.major_label_standoff=0
p.xaxis.major_label_orientation=45

p.rect(x="BandA",y="BandB",width=1,height=1,
       source=source,
       fill_color={'field':'cos','transform':mapper},
       line_color=None)

color_bar = ColorBar(color_mapper=mapper,major_label_text_font_size='5pt',
                     border_line_color=None, location=(0,0))
p.add_layout(color_bar, 'right')
p.select_one(HoverTool).tooltips = [
    ('pair','@BandA and @BandB'),
    ('cos','@cos')
]

show(p, notebook_handle=True)

## Visualization 5 Analysis:

The purple line running diagonally up the heatmap illustrates that each band is perfectly similar to itself (cosine distance = 0). A few of the bands stand out for being very different from all other bands, such as Obscura and Aeon. The cosine distances between these two bands and all others is always very close to 1.

Below is a sample of Obscura lyrics.

In [10]:
lyrics_df[lyrics_df.artist=='obscura'].lyrics

209    The Sermon of the Seven Suns A funeral of worl...
210    As I walk through time and space Nourished fro...
211    What sudden blaze of majesty Is that which we ...
212    A crown, created with divine will An inﬁnite l...
Name: lyrics, dtype: object

For Aeon, this is likely due to the fact that we were only able to gather one sample of real lyrics from our parsing.

In [11]:
lyrics_df[lyrics_df.artist=='aeon'].lyrics

421     I love you Satan, my father, my pride You sho...
422                       [Music: Z. Nilsson, D. Dlimi] 
Name: lyrics, dtype: object

Becoming the Archetype, Cephalic Carnage, and As They Sleep also stand out in this heatmap for being very similar. Becoming the Archetype and As They Sleep have a cosine distance of only 0.114. Unfortunately this seems to be due to the fact that our method for gathering lyrics wasn't perfect. Neither of these bands has many lyrics to begin with. The only lyrics for As They Sleep are "INSTRUMENTAL" which is also half of the lyrics for Becoming the Archetype.

In [12]:
lyrics_df[lyrics_df.artist=='becoming the archetype'].lyrics

270    There was a time When we all sang the song of ...
271    Deep within the ocean's keep* There lies a cor...
274                                        INSTRUMENTAL 
277                                      (Instrumental) 
278    It hurts to see you live you life revolving ar...
279                                      (Instrumental) 
280                                      (Instrumental) 
288    I bear these scars A constant reminder of my o...
Name: lyrics, dtype: object

In [13]:
lyrics_df[lyrics_df.artist=='as they sleep'].lyrics

46    INSTRUMENTAL 
Name: lyrics, dtype: object

# Lyric Generation

[Back to top](#Midterm)

The original author built his own Markov chain class to generate lyrics. We're just going to use the Markovify library by [jsvine](https://github.com/jsvine/markovify).

Markov models work by essentially calculating the likelihood of transitioning between every pair of words in a corpus. They are able to generate sentences by starting with a (sometimes random) seed word and using the previously calculated likelihoods to select the next word in the sentence. The generation stops when the output is the desired length.

We trained a separate markov chain for each artist. The goal is that someone familiar with these bands could potentially guess which band's markov model generated which lyrics.

It's not a Bokeh visualization but it is interactive and it's fun to play with.

In [14]:
import pandas as pd

import markovify

import ipywidgets as widgets
from ipywidgets import interact

lyrics_df = pd.read_csv('lyrics_line.csv')
lyrics_df.dropna(inplace=True)

markov_models = {}

# build mini markov chains for each band
for band in set(lyrics_df.artist.values):
    text = ' '.join(lyrics_df.loc[lyrics_df.artist==band].lyrics.tolist())
    text = text.replace('\n','. ')
    model = markovify.Text(text)
    markov_models[band] = model
    
# n is number of sentences, k is number of characters
def output_lyrics(band,n=20,k=75):
    model = markov_models[band]
    sentences = []
    for i in range(n):
        sentences.append(model.make_short_sentence(k))
    try:
        print('\n'.join(sentences))
    except:
        print('Technical difficulties...try adjusting the sliders or selecting a different band :)')
    
interact(output_lyrics, band=sorted(set(lyrics_df.artist.values)), n=(3,40),k=(50,140))

A Jupyter Widget

<function __main__.output_lyrics>

# Summary

[Back to top](#Midterm)

We did our analyis on a dataset of tech death metal lyrics gathered from various websites. Because it was gathered by hand, the dataset needed to be cleaned. We mainly used Bokeh and sklearn to visualize different properties of the lyrics and how they differ for each tech death artist. These visualizations required calculating several metrics on the dataset including complexity, tfidf, and others. Through our examinations we found that there is a lot of variety even in one small sub-genre of metal. There is a wide variety of lyric complexities and use of swear words. In addition, most of the artist's tfidf vectors are very far from those of other artists meaning each really does have their own unique style.

Our results were not exactly the same as the example we based our work on. However, we believe this may be due in part to the quality of the data that we were able to gather.