# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
#Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
#Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [1]:
from bs4 import BeautifulSoup
import requests
html_page = requests.get('https://www.residentadvisor.net/events')
soup = BeautifulSoup(html_page.content, 'html.parser')
soup.prettify

<bound method Tag.prettify of <!DOCTYPE html>

<html lang="en,ja,es">
<head id="_x1"><title>
	RA: Events in Texas, United States of America
</title><meta content="text/html; charset=utf-8" http-equiv="Content-Type"/><meta content="en,ja,es" http-equiv="content-language"/><meta content="RA: Resident Advisor" name="Description"/><meta content="RA, residentadvisor, resident, advisor, music, ra, events, in, texas, united, states, america" name="Keywords"/><meta content="Resident Advisor" name="Author"/><meta content="Resident Advisor" property="og:site_name"/><meta content="712773712080127" property="fb:app_id"/><link href="/bundles/default-css?v=ATv7yC5anBBrxJoYdSr-DqUPyab_mqaaXHG0qxMzlYI1" rel="stylesheet"/>
<meta content="app-id=981952703, app-argument=ra-guide://search" name="apple-itunes-app"/><link href="/bundles/cat-listings-css?v=qgpSmyPbylOKeJFqy2yvCrTgAsw9yQYcJtLKS_vPO6s1" rel="stylesheet"/>
<link href="/favicon.ico" rel="icon" type="image/vnd.microsoft.icon"/><link color="#00000

In [2]:
concerts = soup.find('div', class_="fl col4")
concerts

<div class="fl col4" id="event-listing">
<ul class="list" id="items">
<li><p class="eventDate date"><a href="/events.aspx?ai=318&amp;v=day&amp;mn=11&amp;yr=2019&amp;dy=15"><span>Fri, 15 Nov 2019 /</span></a></p></li><li class=""><article class="event-item clearfix" itemscope="" itemtype="http://data-vocabulary.org/Event"><span style="display:none;"><time datetime="2019-11-15T00:00" itemprop="startDate">2019-11-15T00:00</time></span><a href="/events/1310517"><img height="76" src="/images/events/flyer/2019/11/us-1115-1310517-list.jpg" width="152"/></a><div class="bbox"><h1 class="event-title" itemprop="summary"><a href="/events/1310517" itemprop="url" title="Event details of Disrupt presents DJ W!ld">Disrupt presents DJ W!ld</a> <span>at <a href="/club.aspx?id=164420">Bauhaus</a>, <a href="/events.aspx?ai=63">Houston</a></span></h1><div class="grey event-lineup">DJ W!LD, Memo Sepulveda, Hector Moran, Gabo</div><p class="attending"><span>12</span> Attending</p></div></article></li><li cla

In [3]:
# function to call on time
def retrieve_time(soup):
    times = [thing.find('time').attrs['datetime'] for thing in soup.findAll('article')]
    return times
retrieve_time(concerts)

['2019-11-15T00:00',
 '2019-11-15T00:00',
 '2019-11-15T00:00',
 '2019-11-15T00:00',
 '2019-11-15T00:00',
 '2019-11-16T00:00',
 '2019-11-16T00:00',
 '2019-11-16T00:00',
 '2019-11-16T00:00',
 '2019-11-17T00:00',
 '2019-11-17T00:00']

In [4]:
#Trobuleshooting code for above cell

# times = [thing.find('time').attrs['datetime'] for thing in concert.findAll('article')]
# len(times)
# times1 = []
# for x in times:
#     times1.append(times[x].findAll('time'))
# times1
# times1 = [times[x].findAll('time') for x in enumerate(times)]
# times1

In [5]:
# function to call time
def retrieve_names(soup):
    names = [thing.find('h1').find('a').attrs['title'] for thing in soup.findAll('article')]
    new_names = []
    for x in names:
        new_names.append(x.replace('Event details of ', ''))
    return new_names
retrieve_names(concerts)

['Disrupt presents DJ W!ld',
 'Prok - Fitch',
 'Fleetmac Wood presents Sea of Love Disco - Austin',
 'Be Careful with Shyboi',
 'Toxic',
 'Seismic Dance Event',
 'Lokal with Suciu (Pressure Traxx, Sunrise)',
 'The Oven Afterhours: Apocolypse Meow: Denied Music Showcase',
 'Sound Sinners feat. David Gtronic',
 'Seismic Dance Event',
 'R U Down']

In [6]:
# function to call venue
def retrieve_venue(soup):
    venues_href = [thing.find('h1').find('span').find('a') for thing in soup.findAll('article')]
    new_venues = [x.split('>') for x in str(venues_href).split('<')]
    new_venues_list_indices = [i for i in range(1, len(new_venues), 2)]
    new_venues_lists = []
    for j in new_venues_list_indices:
        new_venues_lists.append(new_venues[j])
    new_venue_list = [new_venues_lists[k][1] for k in range(len(new_venues_lists))]
    return new_venue_list
retrieve_venue(concerts)

['Bauhaus',
 'RBC',
 'The Parish',
 'The Dive',
 'San Antonio',
 'Austin',
 'San Antonio',
 'The Oven',
 'Roof 324',
 'Austin',
 'The Dive']

In [10]:
attendees_href = [thing.find('p') for thing in concerts.findAll('article')]
new_attendees = [x.split('</span>') for x in str(attendees_href).split('</p>')]
print(new_attendees)
print(len(new_attendees))
new_attendees_list = [new_attendees[1][i] for i in range(1-len(new_attendees))]
new_attendees_list

[['[<p class="attending"><span>12', ' Attending'], [', <p class="attending"><span>3', ' Attending'], [', <p class="attending"><span>2', ' Attending'], [', <p class="attending"><span>1', ' Attending'], [', <p class="attending"><span>1', ' Attending'], [', <p class="attending"><span>13', ' Attending'], [', <p class="attending"><span>3', ' Attending'], [', <p class="attending"><span>2', ' Attending'], [', <p class="attending"><span>2', ' Attending'], [', <p class="attending"><span>13', ' Attending'], [', None]']]
11


[]

In [8]:
# function to call number of attendees
def retrieve_attendees(soup):
    

SyntaxError: unexpected EOF while parsing (<ipython-input-8-77417afad5a0>, line 3)

In [None]:
def scrape_events(events_page_url):
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

## Write a Function to Retrieve the URL for the Next Page

In [None]:
def next_page(url):
    #Your code here
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [None]:
#Your code here

## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!