# Project 1 Proposal 

### Understanding Concert Venues and Events

### Jaden Chin, DS3000


## Introduction
Live events are a large source of entertainment and attraction all over the world. Whether it be sports games or concerts, they attract thousands of people to venues of different sizes to experience these events. I want to study the diversity of live events in major cities in the U.S and answer questions of what types of events are most popular and create most revenue. I am also interested in answering the question of what types of venues host certain events and are typically more expensive. There are also multiple questions that are similar that I hope to answer in my studies.i

## Data Collection
I plan to use the ticketmaster API in order to collect the data on events in major cities in the U.S. Ticketmaster is one of the largest ticket selling companies that offers a variety of events. I use it to query data for every event in a certain state and give it a specific time range for the events I am looking for I did need to sign up for api key in order to access the ability to request.

In [3]:
import requests
import json
import pandas as pd
from datetime import datetime

def get_events(state, start_date, end_date):
    """Uses Ticketmaster API to query concert data and create a dataframe of event information
    
    Args:
        state (str): desired state 
        start_date (str): start date
        end_date (str): end date of search

    Returns:
        df_events (DataFrame): dataframe containing specified events and their details
    """
    # url used to query as well as api key
    url = 'https://app.ticketmaster.com/discovery/v2/events.json'
    api_key = 'TM0JuNjaXxkse0u8OzG1EQc49ZdUVtVz'

    # formats start and end date to datetime
    start_datetime = f'{start_date}T00:00:00Z'
    end_datetime = f'{end_date}T00:00:00Z'

    # parameters for the api request
    params = {
        'apikey': api_key,
        'stateCode': state,
        'startDateTime': start_datetime,
        'endDateTime': end_datetime,
        'size': 200, 
    }

    # create empty dataframe for all events
    df_events = pd.DataFrame()

    #queries data using url and parameters
    response = requests.get(url, params=params)

    # converts the response to dictionary
    data = response.json()

    # extracts from dictionary
    events = data["_embedded"]["events"]

    #loops through each event
    for event in events:
            
        # checks for price ranges and returns none if its not there
        if 'priceRanges' in event:
            min_price = event['priceRanges'][0]['min']
            max_price = event['priceRanges'][0]['max']
        else:
            min_price = None
            max_price = None

        # create the series for the relevant data we want
        event_series = pd.Series({
            "name": event["name"],
            "event_date": event["dates"]["start"]["localDate"],
            "status": event['dates']['status']['code'],
            "venue": event['_embedded']['venues'][0]['name'],
            "city": event['_embedded']['venues'][0]['city']['name'],
            "event_type": event['classifications'][0]['segment']['name'],
            "genre": event['classifications'][0]['genre']['name'], 
            "price_range": (min_price, max_price)
        })

        # append series into empty dataframe
        df_events = pd.concat([df_events, event_series.to_frame().T], ignore_index=True)

    return df_events


In [4]:
events200 = get_events("MA", "2024-01-01", "2024-12-31")

In [5]:
events200.head(50)

Unnamed: 0,name,event_date,status,venue,city,event_type,genre,price_range
0,Boston Bruins vs. Montreal Canadiens,2024-10-10,onsale,TD Garden,Boston,Sports,Hockey,"(140.0, 1565.5)"
1,Boston Bruins vs. Los Angeles Kings,2024-10-12,onsale,TD Garden,Boston,Sports,Hockey,"(120.0, 1565.5)"
2,Banner Night: Boston Celtics v. New York Knicks,2024-10-22,onsale,TD Garden,Boston,Sports,Basketball,"(319.5, 5285.5)"
3,Boston Bruins vs. Florida Panthers,2024-10-14,onsale,TD Garden,Boston,Sports,Hockey,"(95.0, 1565.5)"
4,Preseason: Boston Celtics v. Philadelphia 76ers,2024-10-12,onsale,TD Garden,Boston,Sports,Basketball,"(62.5, 1115.5)"
5,Boston Bruins vs. St. Louis Blues,2024-11-16,onsale,TD Garden,Boston,Sports,Hockey,"(141.0, 1565.5)"
6,Boston Bruins vs. Ottawa Senators,2024-11-09,onsale,TD Garden,Boston,Sports,Hockey,"(156.0, 1565.5)"
7,Boston Bruins vs. Seattle Kraken,2024-11-03,onsale,TD Garden,Boston,Sports,Hockey,"(121.0, 1565.5)"
8,Boston Bruins vs. Montreal Canadiens,2024-12-01,onsale,TD Garden,Boston,Sports,Hockey,"(231.0, 1565.5)"
9,Boston Bruins vs. Philadelphia Flyers,2024-10-29,onsale,TD Garden,Boston,Sports,Hockey,"(81.0, 1565.5)"


## Data Collection and Remaining Issues
The data above contains a lot of categorical data including the name, venue, genre as well as the status of the event. I also included the price ranges as well of tickets for the event. I plan to use this data to see which events have the most expensive price ranges as well as which genres do. I am still trying to figure out how to deal with missing data and how to return it with None value. Additionally I am trying to figure out how to fetch more events as the max I could do is 200 events. I hope to either do this using either pagination or by doing something similar to hw2 and doing it in month periods. This has limited me to only be able to look at one state at a time. With the data, I hope to count how many events of each genre are hosted at each venue to see the distribution of variety as well as look at the prices of each event what venues are most expensive. I hope to use supervised learning models in order to predict things like predicting what venues will host what events. I hope to also use regression to predict prices of tickets in the futre for certain venues or genres.