# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [1]:
# Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [2]:
# 'https://www.residentadvisor.net/events/us/newyork' Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [94]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import re
import numpy as np
import time


In [95]:
def scrape_events(events_page_url):
    html_page = requests.get(events_page_url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    event_container = soup.find('div', class_= "strip slide small")
    event_container
    eventdate = [p.text.split("\n") for p in event_container.findAll('article', class_="highlight-top")]
    event_name = []
    venue = []
    event_date = []
    number_of_attendees = []
    for p in eventdate:
        event_date.append(p[1])
        venue.append(p[12])
        event_name.append(p[8])
        number_of_attendees.append(p[4].split(" ")[0])
    df = pd.DataFrame([event_name, venue, event_date, number_of_attendees]).transpose()
    df.columns = ['Event_Name', 'Venue', 'Event_Date', 'Number_of_Attendees']
    return df
scrape_events('https://www.residentadvisor.net/events/us/newyork')

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Friday: Theo Parrish All Night,Nowadays,"Fri, 3 Jan 2020",278
1,Saturday: Anthony Naples and Four Tet,Nowadays,"Sat, 4 Jan 2020",286
2,Julian Jeweil // Marco Faraone // Ramon Tapia,Kings Hall - Avant Gardner,"Fri, 17 Jan 2020",146
3,"Elseworld: Leon Vynehall, Moxie, Physical Ther...",Elsewhere,"Fri, 17 Jan 2020",96
4,Saturday: Aurora Halal and Ben UFO,Nowadays,"Sat, 18 Jan 2020",259
5,Detroit Love: Luciano a2a Carl Craig,Avant Gardner,"Sat, 18 Jan 2020",95
6,Horse Meat Disco - New York Residency,Elsewhere,"Sun, 19 Jan 2020",149
7,T4T LUV NRG ~ Octo Octa b2b Eris Drew All Nigh...,Good Room,"Fri, 24 Jan 2020",164


In [96]:
def scrape_events(events_page_url):
    #Your code here
    response = requests.get(events_page_url)
    soup = BeautifulSoup(response.content, 'html.parser')
    entries = event_listings.findAll('li')
    rows = []
    for entry in entries:
        #Is it a date? If so, set current date.
        date = entry.find('p', class_="eventDate date")
        event = entry.find('h1', class_="event-title")
        if event:
            details = event.text.split(' at ')
            event_name = details[0].strip()
            venue = details[1].strip()
            try:
                n_attendees = int(re.match("(\d*)", entry.find('p', class_="attending").text)[0])
            except:
                n_attendees = np.nan
            rows.append([event_name, venue, cur_date, n_attendees])
        elif date:
            cur_date = date.text
        else:
            continue
    df = pd.DataFrame(rows)
    df.head()
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df
scrape_events('https://www.residentadvisor.net/events/us/newyork')

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Friday: Theo Parrish All Night,Nowadays,"Fri, 03 Jan 2020 /",278.0
1,"Francois K, Rimarkable",Elsewhere,"Fri, 03 Jan 2020 /",30.0
2,Eli Escobar All Night Long (House of Yes Anniv...,House Of Yes,"Fri, 03 Jan 2020 /",21.0
3,"Jacques Renault, Timo Lee, Uncle Dev & Friends...",Good Room,"Fri, 03 Jan 2020 /",14.0
4,"Rendezvous with Sergio Dimoff, Arvi, CGC More",TBA Brooklyn,"Fri, 03 Jan 2020 /",11.0
5,D'Noir AM Saturday Morning Afterhours At Polyg...,Polygon BK,"Fri, 03 Jan 2020 /",2.0
6,Suzanne Kraft,public records,"Fri, 03 Jan 2020 /",12.0
7,"2lanes, Kfeelz, Amelia Holt",Mood Ring,"Fri, 03 Jan 2020 /",11.0
8,The Office presents: Red Light Secrets Music B...,TBA - Brooklyn,"Fri, 03 Jan 2020 /",
9,JACK DEPT. NYC / Dr. Sync / Katie Rex / John B...,Bossa Nova Civic Club,"Fri, 03 Jan 2020 /",


## Write a Function to Retrieve the URL for the Next Page

In [97]:
def next_page(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    url_ext = soup.find('a', attrs={'ga-event-action':"Next "}).attrs['href']
    next_page_url = "https://www.residentadvisor.net" + url_ext
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [None]:
#Your code here


## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!