# Category Classification
Model which classifies our events. 

### TODO
- Define how we want to classify our events (categories, sub-categories)

In [120]:
# import modules
import pandas as pd
import numpy as np

## Data
Below we see the composition of our data

In [121]:
# load data
events = pd.read_json('data/events_training_data_v2.json')
events

Unnamed: 0,city,detail,name,source-tags,time,time_provided,venue
0,Vancouver,"Awe-inspiring dance, extraordinary theatre, h...",18th Annual Chutzpah Festival,"[Festivals, Performing Arts]",,False,Norman Rothstein Theatre
1,Vancouver,Our part-time 200 Hour Yoga Teacher Training ...,200 Hour Yoga Teacher Training Vancouver,[Other],7:30-10 pm,True,Semperviva Yoga
2,Vancouver,Representing one of the most important donati...,A Cultivating Journey: The Herman Levy Legacy,[Galleries],,False,Vancouver Art Gallery
3,Vancouver,The electrifying play that became the legenda...,A Few Good Men,[Theatre],8 pm,True,Deep Cove Shaw Theatre
4,Vancouver,Watch 'Amazon Adventure: A True Story of Scie...,Amazon Adventure in IMAX film,"[Film, Kids' Stuff]",,False,Science World at Telus World of Science
5,Vancouver,Celebrating the excessive abundance of the ar...,Beginning with the Seventies: GLUT exhibition,"[Galleries, Activism]",,False,Morris and Helen Belkin Art Gallery
6,Vancouver,"Level 1\nDates: Tuesdays, Jan 9 - Apr 17\nTim...",Bellydance Classes with Maki,"[Dance, Performing Arts]",,False,Scotiabank Dance Centre
7,Vancouver,Big Top has returned to their famed weekly re...,BigTop Tuesdays,"[Performing Arts, Food & Drink]",8:30-11 pm,True,Libra Room
8,Vancouver,We're back! The Birth Fair brings together pr...,Birth Fair,[Other],10 am–3 pm,True,Cloverdale Agriplex
9,Vancouver,BOMBHEAD is a thematic exhibition organized b...,Bombhead,[Galleries],,False,Vancouver Art Gallery


# Event Categories
Categories or tags used to group events together on similarity. Here each event can have multiple tags.

## Places around the web
Looking at the different examples from around the web we find:

### [Facebook](https://www.facebook.com/events/)
Doesn't specify event type

### [Eventbrite](https://www.eventbrite.ca/d/canada--vancouver/events/)
Has the following categories: Arts, Business, Charity & Causes, Community, Film & Media, Food & Drink and Music. Here each category has sub-categories:
* Arts: Fine Art, Comedy, Painting, Theatre, Other, Musical, Dance, ...
* Business: Startups, Career, Educators, Other, Sales & Marketing, Investment, Environment & Sustainability, ...
* Charity & Causes: Other, Education, Human Rights, Poverty, Environment and Healthcare
* Community: Heritage, Other, LGBT, City & Town and Historic
* Film & Media: Film, Other, TV, Adult and Comedy
* Food & Drink: Spirits, Wine, Other and Food
* Music: Electronic & EDM, Classical, Other, Folk, Hip Hop & Rap, Indie, Spiritual & Religious, Opera, ...

### [Tourism Vancouver](https://www.tourismvancouver.com/)
Has a all of different sometimes arcane categories (which makes it harder to use). Here we have a few of them: Aboriginal, Comedy, Concerts/Clubs, Conventions and Consumer Shows, Culinary, Dance, Dine Out Vancouver Festival, Family Friendly, Film/Video/Media, Literary, Multicultural, Museum/Gallery/History, LGBTQ2, Theatre, Visual Arts ...

### [Ticketmaster](https://www.ticketmaster.ca/)
Has the following categories: music, sports, arts & theatre, family and miscellaneous. Here each category has sub-categories:
* Music: Rock and Pop, Country and Folk, Rap and Hip-Hop, Jazz and Blues, World Music, Latin, New Age and Spiritual, Comedy, ...
* Sports: Basketball, Football, Hockey, Baseball, Soccer, Motosports, Wrestling, Rodeo, Golf, Tennis, Boxing, ...
* Arts & Theatre: Ballet and Dance, Opera, Museums and Exhibits, Broadway, Off-Broadway, Plays, Comedy, Classical, ...
* Family: Family Attractions, Ice Shows, Circus, Fairs and Festivals, Magic Shows, ...

## Suggestions
Looking at the examples from above we could just use tags to describe our events. Based on the tags we can then group our events. Events with the tag: Fine Arts, Comedy, Painting, Theatre, Musical or Dance can then be grouped as Arts. This also means that we don't have to keep track of the category in our data set. We just need to specify which tags make up a category. So instead of trying to come up with the category, we just need to come up with the number of tags that match with the given features (detail, name, venue, time, ...)

In [123]:
# This is our current category list
cats = set([item for i in events['source-tags'].tolist() if type(i) == list for item in i])
cats

{'Activism',
 'Attractions',
 'Comedy',
 'Concerts',
 'Dance',
 'Festivals',
 'Film',
 'Food & Drink',
 'Forums & Talks',
 'Fundraisers & Charity',
 'Galleries',
 'Holiday',
 "Kids' Stuff",
 'Literary/Books',
 'Markets',
 'Museums',
 'Nightlife',
 'Other',
 'Performing Arts',
 'Sports',
 'Theatre'}

'ddeeff'