# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
# Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
# Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

In [2]:
def scrape_events(url):
    html = requests.get(url)
    soup = BeautifulSoup(html.content, 'html.parser')
    
    rows = []
    lis = soup.find('div', id='event-listing').findAll('li')

    for li in lis:
        date = li.find('p', class_='eventDate')
        event = li.find('h1', class_='event-title')
        if event:
            name = event.text.split(' at ')[0]
            venue = event.text.split(' at ')[1]
            date = current_date
            attend = event.parent.find('p', class_='attending')
            if attend:
                attending = int(attend.text.split()[0])
            else:
                attending = np.nan
            rows.append([name, venue, date, attending])
        elif date:
            current_date = date.text[:-2]

    df = pd.DataFrame(rows)
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

In [3]:
scrape_events('https://www.residentadvisor.net/events/us/florida')

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Tripping Project presents: Chad Andrew & AMO,"TBA - Miami, Miami","Sun, 20 Dec 2020",7.0
1,Sundeep Sundayz,Sit On The Furniture,"Sun, 20 Dec 2020",3.0
2,Sunsets,"Treehouse Miami, Miami","Sun, 20 Dec 2020",1.0
3,undr:wtr presents Roger Sanchez,WTR,"Sun, 20 Dec 2020",4.0
4,Wax and Wane 15 - the Sunday Before Christmas,"Grumpy's Underground Lounge, Orlando","Sun, 20 Dec 2020",
5,undr:wtr presents Roger Sanchez,WTR,"Mon, 21 Dec 2020",4.0
6,Sit On The Furniture ft Archila / Octa Digio/ ...,"Do Not Sit On The Furniture, Miami","Wed, 23 Dec 2020",3.0
7,Leilaxmarek,Sit On The Furniture,"Thu, 24 Dec 2020",3.0
8,SIS,Sit On The Furniture,"Fri, 25 Dec 2020",4.0
9,Sunsets with Dombresky,"Treehouse Miami, Miami","Fri, 25 Dec 2020",2.0


## Write a Function to Retrieve the URL for the Next Page

In [20]:
def next_page(url):
    #Your code here
    url_base = "https://www.residentadvisor.net"
    html = requests.get(url)
    soup = BeautifulSoup(html.content, 'html.parser')
    
    next_btn = soup.findAll('li', id="liNext2")
    next_page_ext = ''
    if next_btn:
        next_page_ext += next_btn[0].find('a').attrs.get('href')
        
    next_page_url = url_base + next_page_ext
    return next_page_url

In [21]:
next_page('https://www.residentadvisor.net/events/us/florida')

'https://www.residentadvisor.net/events/us/florida/week/2020-12-27'

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [22]:
#Your code here
def scrape_1000_events(url):
    df = scrape_events(url)
    event_count = len(df)
    while event_count <= 1000:
        next_url = next_page(url)
        df_next = scrape_events(next_url)
        event_count += len(df_next)
        df = pd.concat([df, df_next], ignore_index=True)
    return df[:1000]

In [23]:
url = 'https://www.residentadvisor.net/events/us/florida'
scrape_1000_events(url).head()

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Tripping Project presents: Chad Andrew & AMO,"TBA - Miami, Miami","Sun, 20 Dec 2020",7.0
1,Sundeep Sundayz,Sit On The Furniture,"Sun, 20 Dec 2020",3.0
2,Sunsets,"Treehouse Miami, Miami","Sun, 20 Dec 2020",1.0
3,undr:wtr presents Roger Sanchez,WTR,"Sun, 20 Dec 2020",4.0
4,Wax and Wane 15 - the Sunday Before Christmas,"Grumpy's Underground Lounge, Orlando","Sun, 20 Dec 2020",


## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!