# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
# Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
# Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [None]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
html_page = requests.get('https://www.residentadvisor.net/events')
soup = BeautifulSoup(html_page.content, 'html.parser')

In [None]:
def scrape_events(events_page_url):
    html_page = requests.get(events_page_url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    events = soup.findAll(class_ = 'event-item')
    Attendees = soup.findAll(class_ = 'counter nohide')
    Titles = []
    Venues = []
    Event_Dates = []
    Num_Attendees = []
    for event in events:
        Titles.append(event.div.div.get_text())
        Venues.append(event.div.span.get_text()[3:])
        Event_Dates.append(event.span.get_text()[:10])
    for item in Attendees:
        Num_Attendees.append(item.span.get_text())
    df = pd.DataFrame([Titles, Venues, Event_Dates,Num_Attendees]).transpose()
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

In [None]:
scrape_events("https://www.residentadvisor.net/events")

## Write a Function to Retrieve the URL for the Next Page

In [7]:
def next_page(url):
    Next = soup.findAll(class_ ='but arrow-right right')
    url_extension = str((Next[0].find('a')['href'])[7:])
    next_page_url = url + url_extension
    return next_page_url

In [8]:
next_page('https://www.residentadvisor.net/events')

'https://www.residentadvisor.net/events/us/newyork/week/2020-08-02'

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [9]:
import time 
scraped_events = []
count = 0
base_url = 'https://www.residentadvisor.net/events'

In [None]:
while count < 1001:
    df = scrape_events(base_url)
    scraped_events.append(df)
    count += len(df)
    base_url = next_page(base_url)
    time.sleep(.2)

In [None]:
df = pd.concat(scraped_events)
df.head()
df = df.iloc[:1000]

## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!