# Color Analysis of Bandcamp Album Cover Art

## Introduction

The internet has allowed any artist to share their work with the world, and in many cases be paid for it. This is mostly a recent development, with the creation of online marketplaces, and patronage services. The defining marketplace for the indie music scene on the internet is [Bandcamp](https://www.bandcamp.com). Although it has been difficult to find an estimate for how large this marketplace actually is, Bandcamp does report some information about how much money has been spent on their platform. According to their home page:
> Fans have paid artists \$468 million using Bandcamp, and \$9.5 million in the last 30 days alone.

According to Google, there are ~20 million indexed sites under the bandcamp domain. This seems to correspond roughly with the number of artists on the marketplace. These artists are spread across 10 "genres" but they're allowed to attach their own genre tags to their music. This investigation will look at how colors are used to signal these genre and subgenre tags. For the purposes of this discussion, we will not distinguish between "genre" and "subgenre," and will instead use the term "tag" to describe groups of albums. We will look at how indie musicians on Bandcamp use color choices in their album covers to signal their tags. Using tools usually developed for natural language processing, we'll be able to take these album covers, decomposed into their most prominent colors, and then extract "topics" from those sets. The topics are generated through Latent Dirichlet Allocation, which extracts features from text ang generates topics, based on words that appear together. In our case, we will adapt that process to work for colors. The topics are generated in an unsupervised manner, and displays the topic as a set of colors, leaving interpretation up to a human, but no matter what we name these color-topics, we will still be able to see how individual album covers are composed out of those color-topics. 

Being that this discussion centers around examining a cultural pattern, it's important to have an explicit idea of what we mean by "culture". For the purposes of this discussion, we'll use a very simple definition of the word. Culture in this context is the set of conventions for producing signs that are common to people within a group. In our case, those conventions are the common features and qualia of album covers, and the groups are the artists and fans that listen to music within a tag. We will also be considering this phenomenon within a specific framework for the metaphysics of signs (semiotics) put forth by Charles Sanders Peirce. Within this framework, a sign is a creature of three parts: the sign-vehicle, the information carrying thing; the object, the thing about which the sign communicates; and the interpretant, the message being signified by the sign. In our case, we are trying to isolate sign-vehicles and interpretants, where the object of these signs is firmly the music. The interpretants form statements along the lines of "this music is has a certain quality". In the Peircian framework, we call this a "dicent". Ideally, the color-topics that we uncover will be 'iconic', by virtue of the fact that they ought to resemble the quality that they're trying to show, but in many cases they will be 'indexical'. Color-topics formed of skin-tones, and grayscales cannot be interpreted without knowing the context, so these topics must be associated with an index, the image of the album, as well as the music itself. In some cases, such as within hip-hop, grayscale can be used to impute seriousness, but in other cases, like with alternative rock, grayscale images are used to signify melancholy. Color-topics like these must be interpreted in association with their music, and lose meaning when isolated from their tags.

In [2]:
import numpy as np
import pandas as pd
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
import matplotlib.pyplot as plt
import colorgram
from PIL import Image
import bc_tools as bw
import os
import pickle
from tqdm import tqdm
import colorsys
import string
from copy import copy