# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [1]:
#Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [2]:
#Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [3]:
def scrape_events(events_page_url):
    rows = []
    html_page = requests.get(events_page_url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    event_container = soup.find('div', id="event-listing")
    for event in event_container.findAll('h1'):
        specs = event.text.split(' at ')
        event_name = specs[0]
        venue = specs[1]
        event_date = event.parent.previousSibling.previousSibling.text.split('T')[0]
        try:
            number_attending = event.nextSibling.nextSibling.text[0]
        except:
            number_attending = np.nan
        rows.append([event_name, venue, event_date, number_attending])
    try:
        df = pd.DataFrame(rows)
        df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    except:
        return None
    return df

## Write a Function to Retrieve the URL for the Next Page

In [5]:
def next_page(url):
    html_page = requests.get(url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    try:
        url_ext = soup.find('a', attrs={'ga-event-action':"Next "}).attrs["href"]
        next_page_url = 'https://www.residentadvisor.net' + url_ext
        return next_page_url
    except:
        return None

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [6]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
dfs = []
url = 'https://www.residentadvisor.net/events'
count = 0
while count <= 1000 and url:
    df = scrape_events(url)
    if df is None:
        url = next_page(url)
    else:
        dfs.append(df)
        count += len(df)
        url = next_page(url)
        print(count)
df = pd.concat(dfs, ignore_index=True)
print(df.shape)
df.head()

4
10
11
14
18
19
20
21
(21, 4)


Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Wax Cave: All Vinyl: Infrared,Warehouse on Watts,2019-08-09,3
1,Jellyfish ⇛ August Courtyard Disco,P Town Bar,2019-08-09,2
2,Shameless Techno Party,The Barbary,2019-08-09,2
3,Hot Mass Pres. Analog Soul,Hot Mass,2019-08-10,1
4,Honcho Campout 2019,Four Quarters Interfaith Sanctuary,2019-08-15,3


It looks like Philadelphia is not too exciting in the summertime.  There are only 21 events currently scheduled.

## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!