# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
#Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [1]:
#Open the inspect element feature in your browser
from bs4 import BeautifulSoup
import requests
import pandas as pd
import time
import re
import numpy as np
from splinter import Browser
executable_path = {'executable_path':'</path/to/chrome>'}

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [22]:
html_page = requests.get('https://www.residentadvisor.net/events/us/newyork') #Make a get request to retrieve the page
soup = BeautifulSoup(html_page.content, 'html.parser')

In [25]:
events = soup.findAll('article', class_='event-item')
print(events[10])

<article class="event-item clearfix tickets-bkg-logo small-item" itemscope="" itemtype="http://data-vocabulary.org/Event"><a href="/events/1284645#tickets"><img class="nohide" src="https://residentadvisor.net/images/ra-tix.png" style="height: 23px; width: 40px; right: 0px; position: absolute; top: 1px;"/></a><span style="display:none;"><time datetime="2019-07-20T00:00" itemprop="startDate">2019-07-20T00:00</time></span><div class="bbox"><h1 class="event-title" itemprop="summary"><a href="/events/1284645" itemprop="url" title="Event details of Sole Rehab &amp; Signal &gt; Noise present: Garrett David / Vicki Powell">Sole Rehab &amp; Signal &gt; Noise present: Garrett David / Vicki Powell</a> <span>at <a href="/club.aspx?id=140928">Photo City Improv</a>, <a href="/events.aspx?ai=443">Buffalo/Rochester</a></span></h1></div></article>


In [26]:
#eventh1 = event.find('h1', class_='event-title')

Event_Name = events[10].h1.a.text

print(Event_Name)

Sole Rehab & Signal > Noise present: Garrett David / Vicki Powell


In [40]:
Venue = events[0].h1.span.text[3:]
print(Venue)

Brooklyn Mirage


In [28]:
Event_Date = events[10].time.text[0:-6]
print(Event_Date)

2019-07-20


In [30]:
Number_of_Attendees = events[1].p.span.text
print(Number_of_Attendees)

320


In [31]:
len(events)

142

In [46]:
import numpy as np
rows = []
for event in events:
    Event_Name = event.h1.a.text
    Venue = event.h1.span.text[3:]
    Event_Date = event.time.text[0:-6]
    try:
        Number_of_Attendees = int(event.p.span.text)
    except:
        Number_of_Attendees = np.nan
    row = [Event_Name, Venue, Event_Date, Number_of_Attendees]
    rows.append(row)
df = pd.DataFrame(rows)
df.head()

<bound method NDFrame.head of                                                      0  \
0                 Taste of Dust at The Brooklyn Mirage   
1                     ReSolute w Zip & Thomas Melchior   
2               Bassiani Night with Héctor Oaks / Ndrx   
3    Boat Trippin' x Hotboi Nation with Doc Martin,...   
4                  FIXED with Roman Flugel Plus Remedy   
5    KUNÁ Sunset Rooftop: Djuma Soundsystem, Bonjou...   
6              Baauer, Take a Daytrip and Trillnatured   
7        New York Trax & Distrikt1: Thomas P. Heckmann   
8                  Dope Jams 5th Annual Open-Air Party   
9    Made In Colombia Boat Party with Cristian Aran...   
10   Sole Rehab & Signal > Noise present: Garrett D...   
11                 The Spectrum presents: Daughter 2.0   
12   Warm Up: Take A Daytrip / Smino / Shenseea / B...   
13                             Mark Farina, Musclecars   
14   Saturday: Seltzer (Precolumbian and Bearcat) a...   
15                                        

In [48]:
df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
df.head()

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Taste of Dust at The Brooklyn Mirage,Brooklyn Mirage,2019-07-20,544.0
1,ReSolute w Zip & Thomas Melchior,TBA - New York,2019-07-20,320.0
2,Bassiani Night with Héctor Oaks / Ndrx,BASEMENT,2019-07-20,166.0
3,"Boat Trippin' x Hotboi Nation with Doc Martin,...",Circle Line Cruises,2019-07-20,162.0
4,FIXED with Roman Flugel Plus Remedy,Good Room,2019-07-20,141.0


In [8]:
def scrape_events(events_page_url):
    #Your code here
    response = requests.get(events_page_url)
    soup = BeautifulSoup(html, 'html.parser')
    events = soup.findAll('article', class_='event-item')
    
    rows = []
    for event in events:
        event_name = event.h1.a.text
        venue = event.h1.span.text[3:]
        event_date = event.time.text[0:-6]
        try:
            number_of_attendees = int(event.p.span.text)
        except:
            number_of_attendees = np.nan
        row = [event_name, venue, event_date, number_of_attendees]
        rows.append(row)
        
    df = pd.DataFrame(rows)
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

In [9]:
scrape_events('https://www.residentadvisor.net/events/us/newyork').head()

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,"Body & Soul Anniversary: Francois K, Joe Claus...",Brooklyn Mirage,2019-07-21,174.0
1,Mister Sunday: Justin Carter and Eamon Harkin,Nowadays,2019-07-21,169.0
2,Chus & Ceballos (Elsewhere Roof),Elsewhere,2019-07-21,153.0
3,Miki Beach NYC 'Fundraiser' with Miyagi + Clarian,TBA - Brooklyn,2019-07-21,90.0
4,Deep & Sexy,Knockdown Center,2019-07-21,


## Write a Function to Retrieve the URL for the Next Page

In [14]:
    browser = Browser('chrome', headless=False)
    browser.visit('https://www.residentadvisor.net/events/us/newyork')
    browser.click_link_by_partial_text('Next')
    #browser.find_link_by_partial_text('Next')
    url = browser.url
    print(url)

https://www.residentadvisor.net/events/us/newyork/week/2019-07-29


In [15]:
def next_page(url):
    #Your code here
    browser = Browser('chrome', headless=False)
    browser.visit(url)
    browser.click_link_by_partial_text('Next')
    next_page_url = browser.url
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [16]:
#Your code here
dfs = []
total_rows = 0
cur_url = "https://www.residentadvisor.net/events/us/newyork"
while total_rows <= 1000:
    df = scrape_events(cur_url)
    dfs.append(df)
    total_rows += len(df)
    cur_url = next_page(cur_url)
    time.sleep(1)
df = pd.concat(dfs)
df = df.iloc[:1000]
print(len(df))
df.head()


100


Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,"Body & Soul Anniversary: Francois K, Joe Claus...",Brooklyn Mirage,2019-07-22,174.0
1,Miki Beach NYC 'Fundraiser' with Miyagi + Clarian,TBA - Brooklyn,2019-07-22,90.0
2,Modern Times,Jupiter Disco,2019-07-22,32.0
3,Sunset Sunday: The Jazz Diaries,Le Bain,2019-07-22,5.0
4,"Rollupalooza: Toribio, Devoye, Auntie Starr, M...",Bossa Nova Civic Club,2019-07-22,


## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!