# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
# Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
# Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [428]:
import pandas as pd
from bs4 import BeautifulSoup
import requests

html_page = requests.get('https://www.residentadvisor.net/events')
soup = BeautifulSoup(html_page.content, 'html.parser')
soup.prettify
events = soup.findAll('article', class_="event-item")
events

[<article class="event-item clearfix" itemscope="" itemtype="http://data-vocabulary.org/Event"><span style="display:none;"><time datetime="2020-07-25T00:00" itemprop="startDate">2020-07-25T00:00</time></span><a href="/events/1399703"><img height="76" src="/images/events/flyer/2020/7/us-0725-1399703-list.jpg" width="152"/></a><div class="bbox"><h1 class="event-title" itemprop="summary"><a href="/events/1399703" itemprop="url" title="Event details of FNGRS CRSSD Pres: OTR">FNGRS CRSSD Pres: OTR</a> <span>at <a href="/club.aspx?id=79291">Bang Bang</a>, <a href="/events.aspx?ai=309">San Diego</a></span></h1><div class="grey event-lineup">OTR</div><p class="attending"><span>2</span> Attending</p></div></article>,
 <article class="event-item clearfix tickets-bkg-logo" itemscope="" itemtype="http://data-vocabulary.org/Event"><a href="/events/1361257#tickets"><img class="nohide" src="https://residentadvisor.net/images/ra-tix.png" style="height: 23px; width: 40px; right: 0px; position: absolute

In [446]:
events[0].find('h1', class_="event-title").text.split(' at ')[0].strip()

'Void 001'

In [305]:
# venue
events[6].findAll('span')[-2].text.strip('at ')

'Livestream'

In [98]:
# event date
events[1].find('time').text

'2020-07-25T00:00'

In [99]:
# number of attendees
events[1].find('p', class_="attending").find('span').text

'288'

In [450]:
def scrape_events(events_page_url): 

    html_page = requests.get(events_page_url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    events = soup.findAll('article', class_="event-item")
    
    Event_Name = []
    Venue = []
    Event_Date = []
    Number_of_Attendees = []
    i = 0
    
    while i < len(events):

        Event_Name.append(events[i].find('h1', class_="event-title").text.split(' at ')[0].strip())
            
        Venue.append(events[i].findAll('span')[-2].text.strip('at '))

        Event_Date.append(events[i].find('time').text)
    
        try:
            Number_of_Attendees.append(events[i].find('p', class_="attending").find('span').text)
        except Exception:
            0
        i += 1
        
    df = pd.DataFrame([Event_Name, Venue, Event_Date, Number_of_Attendees]).transpose()
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

In [451]:
scrape_events('https://www.residentadvisor.net/events')

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,FNGRS CRSSD Pres: OTR,"Bang Bang, San Diego",2020-07-25T00:00,2
1,[POSTPONED] Sunset Campout 2020,"Belden Town, San Francisco",2020-07-25T00:00,288
2,Set presents 'Mask.Erade' Sunday,"The Midway, San Francisco",2020-07-26T00:00,1
3,[POSTPONED] Sunset Campout 2020,"Belden Town, San Francisco",2020-07-26T00:00,288
4,Sequence Feat. Gentlemen's Club & Dirty Snatcha,"DNA Lounge, San Francisco",2020-07-30T00:00,1
5,Squish 1 Year Anniversary,"Public Works, San Francisco",2020-07-31T00:00,9
6,"Factory 93 presents Eli & Fur, Sacha Robotti, ...",Livestream,2020-07-31T00:00,2


In [364]:
soup.find('a', attrs={'ga-event-action':"Next "}).attrs['href']

'/events/us/california/week/2020-08-01'

## Write a Function to Retrieve the URL for the Next Page

In [385]:
# def next_page(url):
    
#     html_page = requests.get(url)
#     soup = BeautifulSoup(html_page.content, 'html.parser')
    
#     ext = soup.find('a', attrs={'ga-event-action':"Next "}).attrs['href']
#     next_page_url = "https://www.residentadvisor.net" + ext
#     return next_page_url

def next_page(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    url_ext = soup.find('a', attrs={'ga-event-action':"Next "}).attrs['href']
    next_page_url = "https://www.residentadvisor.net" + url_ext
    #Your code here
    return next_page_url

## Scrape the Next 40 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [457]:
#Your code here
import time

events_url = 'https://www.residentadvisor.net/events/'
requested_events = []
total_rows = 0

while total_rows < 40:
    df = scrape_events(events_url)
    requested_events.append(df)
    total_rows += len(df)
    events_url = next_page(events_url)
    time.sleep(.2)
    
df = pd.concat(requested_events)
print(len(df))
df

42


Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,FNGRS CRSSD Pres: OTR,"Bang Bang, San Diego",2020-07-25T00:00,2.0
1,[POSTPONED] Sunset Campout 2020,"Belden Town, San Francisco",2020-07-25T00:00,288.0
2,Set presents 'Mask.Erade' Sunday,"The Midway, San Francisco",2020-07-26T00:00,1.0
3,[POSTPONED] Sunset Campout 2020,"Belden Town, San Francisco",2020-07-26T00:00,288.0
4,Sequence Feat. Gentlemen's Club & Dirty Snatcha,"DNA Lounge, San Francisco",2020-07-30T00:00,1.0
5,Squish 1 Year Anniversary,"Public Works, San Francisco",2020-07-31T00:00,9.0
6,"Factory 93 presents Eli & Fur, Sacha Robotti, ...",Livestream,2020-07-31T00:00,2.0
0,Sequence Feat. Tisoki & Minesweepa,"DNA Lounge, San Francisco",2020-08-13T00:00,1.0
0,Sequence feat. Spag Heddy & Effin,"DNA Lounge, San Francisco",2020-08-28T00:00,1.0
0,Sequence Feat. Arius,"DNA Lounge, San Francisco",2020-09-03T00:00,1.0


## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!