# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [1]:
# Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [2]:
# Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [3]:
def scrape_events(events_page_url):
    #Your code here
    html_page = requests.get(events_page_url)# Loading HTML
    soup = BeautifulSoup(html_page.content, 'html.parser') #Converting it to usable format
    events_list = soup.findAll('article', class_="event-item")#looking for event objects
    df = pd.DataFrame()
    event_details = []
    for event in events_list:
        event_name = event.find('h1', class_="event-title").getText()
        event_date = event.find('time').attrs['datetime']
        event_venue = event.find('h1', class_='event-title').find('span').getText()[3:]
        attendees = event.find('p', class_="attending")
        if attendees != None:
            event_attendees = int(event.find('p', class_="attending").getText().split(" ")[0])
        else:
            event_attendees = 0
        event_details.append([event_name, event_date, event_venue, event_attendees])
        #Need to append new row to df
        df = pd.DataFrame(event_details)
        df.columns = ["Event_Name", "Event_Date", "Venue", "Number_of_Attendees"]
    return df

## Write a Function to Retrieve the URL for the Next Page

In [13]:
def next_page(url):
    html_page = requests.get(url)
    soup = BeautifulSoup(html_page.content, 'html.parser') 
    nextbutton = soup.find('li', class_= "but arrow-right right")
    if nextbutton is None:
        next_page_url = "End"
    else:
        regex = re.compile("(/week/.*)")
        buttonfind = regex.findall(nextbutton.find('a').attrs['href'])
        next_page_url = url.rsplit('/week/')[0] + buttonfind[0]
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [14]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import re

In [16]:
events_page_url = 'https://www.residentadvisor.net/events/us/california' #URL
#Your code here
dfs = []
total_records = 0
while total_records < 1000:
    page_df = scrape_events(events_page_url)
    total_records += len(page_df)
    dfs.append(page_df)
    events_page_url = next_page(events_page_url)
    if events_page_url == "End":
        break
df = pd.concat(dfs)
df = df.iloc[:1000]
print(len(df))


475


In [17]:
df.head(20)

Unnamed: 0,Event_Name,Event_Date,Venue,Number_of_Attendees
0,"Mioli Music 10 Year Anniversary at Hotel Via, ...",2020-03-08T00:00,"Hotel Via, San Francisco",36
1,"WERD. with DJ CZ / Eichef at Monarch, San Fran...",2020-03-08T00:00,"Monarch, San Francisco",8
2,"Sunday Sanctuary presents: Artur, Seedorf, Con...",2020-03-08T00:00,"One666, Los Angeles",4
3,Crssd Festival Spring '20 presented by FNGRS C...,2020-03-08T00:00,"Waterfront Park in San Diego, San Diego",85
4,Crssd After Dark: Charlotte De Witte + Jon Run...,2020-03-08T00:00,"Spin, San Diego",0
5,Crssd After Dark: Patrick Topping + Archie Ham...,2020-03-08T00:00,"Hornblower Landing, San Diego",0
6,Crssd After Dark: Hernan Cattaneo + Nick Warre...,2020-03-08T00:00,"Rich's Nightclub, San Diego",0
7,Beyond Borders: An International Women's Day B...,2020-03-08T00:00,"906 World Cultural Center, San Francisco",0
8,Crssd After Dark: Purple Disco Machine + Never...,2020-03-08T00:00,"Hornblower Landing, San Diego",0
9,"The Gel Lab presents: Lifted with DJ Kerry, To...",2020-03-08T00:00,"Ace Hotel - Downtown, Los Angeles",0


## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!