# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [50]:
from bs4 import BeautifulSoup
import requests
import re
import shutil
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import pandas as pd
from IPython.display import Image, HTML

In [59]:
#Load the https://www.residentadvisor.net/events page in your browser.
html_page = requests.get('https://www.residentadvisor.net/events/us/seattle') 
#Make a get request to retrieve the page
soup = BeautifulSoup(html_page.content, 'html.parser') 
#Pass the page contents to beautiful soup for parsing

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [60]:
#Open the inspect element feature in your browser
soup.prettify

<bound method Tag.prettify of <!DOCTYPE html>

<html lang="en,ja,es">
<head id="_x1"><title>
	RA: Events in Seattle, United States of America
</title><meta content="text/html; charset=utf-8" http-equiv="Content-Type"/><meta content="en,ja,es" http-equiv="content-language"/><meta content="RA: Resident Advisor" name="Description"/><meta content="RA, residentadvisor, resident, advisor, music, ra, events, in, seattle, united, states, america" name="Keywords"/><meta content="Resident Advisor" name="Author"/><meta content="Resident Advisor" property="og:site_name"/><meta content="712773712080127" property="fb:app_id"/><link href="/bundles/default-css?v=ATv7yC5anBBrxJoYdSr-DqUPyab_mqaaXHG0qxMzlYI1" rel="stylesheet"/>
<meta content="app-id=981952703, app-argument=ra-guide://search" name="apple-itunes-app"/><link href="/bundles/cat-listings-css?v=qgpSmyPbylOKeJFqy2yvCrTgAsw9yQYcJtLKS_vPO6s1" rel="stylesheet"/>
<link href="/favicon.ico" rel="icon" type="image/vnd.microsoft.icon"/><link color="#0

In [61]:
events_container = soup.find('div', {'id':'event-listing'})

In [62]:
events_container

<div class="fl col4" id="event-listing">
<ul class="list" id="items">
<li><p class="eventDate date"><a href="/events.aspx?ai=46&amp;v=day&amp;mn=7&amp;yr=2019&amp;dy=25"><span>Thu, 25 Jul 2019 /</span></a></p></li><li class=""><article class="event-item clearfix tickets-bkg-logo" itemscope="" itemtype="http://data-vocabulary.org/Event"><a href="/events/1275558#tickets"><img class="nohide" src="https://residentadvisor.net/images/ra-tix.png" style="height: 23px; width: 40px; right: 0px; position: absolute; top: 1px;"/></a><span style="display:none;"><time datetime="2019-07-25T00:00" itemprop="startDate">2019-07-25T00:00</time></span><a href="/events/1275558"><img height="76" src="/images/events/flyer/2019/7/us-0725-1275558-list.jpg" width="152"/></a><div class="bbox"><h1 class="event-title" itemprop="summary"><a href="/events/1275558" itemprop="url" title="Event details of Deck'd Out #6 Stayin' Alive: Pezzner, Riz, Jame$ Ervin, Cat Claw">Deck'd Out #6 Stayin' Alive: Pezzner, Riz, Jame$ E

In [86]:
events = events_container.findAll('article')

In [87]:
events[0]

<article class="event-item clearfix tickets-bkg-logo" itemscope="" itemtype="http://data-vocabulary.org/Event"><a href="/events/1275558#tickets"><img class="nohide" src="https://residentadvisor.net/images/ra-tix.png" style="height: 23px; width: 40px; right: 0px; position: absolute; top: 1px;"/></a><span style="display:none;"><time datetime="2019-07-25T00:00" itemprop="startDate">2019-07-25T00:00</time></span><a href="/events/1275558"><img height="76" src="/images/events/flyer/2019/7/us-0725-1275558-list.jpg" width="152"/></a><div class="bbox"><h1 class="event-title" itemprop="summary"><a href="/events/1275558" itemprop="url" title="Event details of Deck'd Out #6 Stayin' Alive: Pezzner, Riz, Jame$ Ervin, Cat Claw">Deck'd Out #6 Stayin' Alive: Pezzner, Riz, Jame$ Ervin, Cat Claw</a> <span>at <a href="/club.aspx?id=84493">The Monkey Loft</a></span></h1><div class="grey event-lineup">Pezzner, Riz Rolings, Jame$ Ervin, Cat Claw</div><p class="attending"><span>13</span> Attending</p></div></

In [69]:
event_names = [event.find('a').contents[0] for event in events]

In [93]:
#find name
events[0].find('h1').find('a').contents[0]

"Deck'd Out #6 Stayin' Alive: Pezzner, Riz, Jame$ Ervin, Cat Claw"

In [95]:
#find venue
events[0].find('h1').find('span').find('a').contents[0]

'The Monkey Loft'

In [96]:
#find line-up
events[0].find('div', {'class':'grey event-lineup'}).contents[0]

'Pezzner, Riz Rolings, Jame$ Ervin, Cat Claw'

In [108]:
#find date
events[0].findAll('span')[0].find('time').contents[0][:10]

'2019-07-25'

In [125]:
#find attendees
events[0].findAll('span')[-1].contents[0]

'13'

In [109]:
event_names = [event.find('h1').find('a').contents[0] for event in events]

In [112]:
event_venues = [event.find('h1').find('span').find('a').contents[0] for event in events]

In [115]:
event_line_ups = [event.find('div', {'class':'grey event-lineup'}).
                  contents[0] for event in events]

In [121]:
event_dates = [event.findAll('span')[0].find('time').contents[0][:10] for event in events]

In [166]:
events[-1].findAll('span')[0].find('a') is None

True

In [136]:
attendees = [int(event.findAll('span')[-1].contents[0]) 
                 if len(event.findAll('span'))==3 else 0 for event in events]

In [137]:
attendees

[13, 3, 1, 1, 10, 3, 0]

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [167]:
#for seattle
def scrape_events(events_page_url):
    #Your code here
    html_page = requests.get(events_page_url) 
    soup = BeautifulSoup(html_page.content, 'html.parser') 
    
    events_container = soup.find('div', {'id':'event-listing'})
    events_info = events_container.findAll('article')
    
    event_names = [event.find('h1').find('a').contents[0] 
                   for event in events_info]
    event_venues = [event.find('h1').find('span').find('a').contents[0] 
                    if event.find('h1').find('span').find('a') is not None 
                    else 'TBD'
                    for event in events_info]
    event_dates = [event.findAll('span')[0].find('time').contents[0][:10] 
                   for event in events_info]
    event_attendees = [int(event.findAll('span')[-1].contents[0]) 
                 if len(event.findAll('span'))==3 else 0 for event in events_info]
    
    df = pd.DataFrame([event_names, event_venues, event_dates, event_attendees]).transpose()
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    
    return df

In [143]:
event_df = scrape_events('https://www.residentadvisor.net/events/us/seattle')

In [144]:
event_df

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,"Deck'd Out #6 Stayin' Alive: Pezzner, Riz, Jam...",The Monkey Loft,2019-07-25,13
1,Field Trip 81: Fehrplay,Q Nightclub,2019-07-25,3
2,"Shook: One Year Anniversary with TMSV, A.Fruit...",Kremwerk,2019-07-26,1
3,1luv presents: Franklyn Watts: Aundrea: Rhett:...,The Monkey Loft,2019-07-26,1
4,Diggin' Deep 12hr with Doc Martin Sublevel Live,The Monkey Loft,2019-07-27,10
5,Secondnature x Bassiani,Kremwerk,2019-07-27,3
6,27 Re-bar presents Howell St Irregulars - Theo...,Re-Bar,2019-07-27,1


## Write a Function to Retrieve the URL for the Next Page

In [150]:
'https://www.residentadvisor.net'+soup.find('li', {'id':'liNext'}).find('a').attrs['href']

'https://www.residentadvisor.net/events/us/seattle/week/2019-07-29'

In [151]:
def next_page(url):
    #Your code here
    html_page = requests.get(url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    base_url = 'https://www.residentadvisor.net'
    href_url = soup.find('li', {'id':'liNext'}).find('a').attrs['href']
    
    next_page_url = base_url + href_url
    
    return next_page_url

In [154]:
next_page('https://www.residentadvisor.net/events/us/seattle/week/2019-07-30')

'https://www.residentadvisor.net/events/us/seattle/week/2019-08-06'

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [169]:
seattle_events_df = pd.DataFrame(columns = ["Event_Name", "Venue", 
                                            "Event_Date", "Number_of_Attendees"])
seattle_events_df

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees


In [168]:
seattle_events_df

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,"Deck'd Out #6 Stayin' Alive: Pezzner, Riz, Jam...",The Monkey Loft,2019-07-25,13
1,Field Trip 81: Fehrplay,Q Nightclub,2019-07-25,3
2,"Shook: One Year Anniversary with TMSV, A.Fruit...",Kremwerk,2019-07-26,1
3,1luv presents: Franklyn Watts: Aundrea: Rhett:...,The Monkey Loft,2019-07-26,1
4,Diggin' Deep 12hr with Doc Martin Sublevel Live,The Monkey Loft,2019-07-27,10
5,Secondnature x Bassiani,Kremwerk,2019-07-27,3
6,27 Re-bar presents Howell St Irregulars - Theo...,Re-Bar,2019-07-27,1
0,Deck'd Out #7 Cascadia NW Decompression with N...,The Monkey Loft,2019-08-01,6
1,Haüsed x Depth: B. Traits,Kremwerk,2019-08-01,2
2,Field Trip 82: Nukid,Q Nightclub,2019-08-01,1


In [170]:
events_url = 'https://www.residentadvisor.net/events/us/seattle'

for i in range(0,20):
    
    temp_df = scrape_events(events_url)
    seattle_events_df = pd.concat([seattle_events_df, temp_df])
    
    events_url = next_page(events_url)

KeyError: 'href'

In [172]:
seattle_events_df

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,"Deck'd Out #6 Stayin' Alive: Pezzner, Riz, Jam...",The Monkey Loft,2019-07-25,13
1,Field Trip 81: Fehrplay,Q Nightclub,2019-07-25,3
2,"Shook: One Year Anniversary with TMSV, A.Fruit...",Kremwerk,2019-07-26,1
3,1luv presents: Franklyn Watts: Aundrea: Rhett:...,The Monkey Loft,2019-07-26,1
4,Diggin' Deep 12hr with Doc Martin Sublevel Live,The Monkey Loft,2019-07-27,10
5,Secondnature x Bassiani,Kremwerk,2019-07-27,3
6,27 Re-bar presents Howell St Irregulars - Theo...,Re-Bar,2019-07-27,1
0,Deck'd Out #7 Cascadia NW Decompression with N...,The Monkey Loft,2019-08-01,6
1,Haüsed x Depth: B. Traits,Kremwerk,2019-08-01,2
2,Field Trip 82: Nukid,Q Nightclub,2019-08-01,1


In [None]:
#Your code here


## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!