# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [6]:
#Load the https://www.residentadvisor.net/events page in your browser.
from bs4 import BeautifulSoup
import requests

html_page = requests.get('https://www.residentadvisor.net/events/us/newyork/day/2019-05-10')
soup = BeautifulSoup(html_page.content, 'html.parser')
soup.prettify

<bound method Tag.prettify of <!DOCTYPE html>

<html lang="en,ja,es">
<head id="_x1"><title>
	RA: Events in New York on Friday, 10 May 2019
</title><meta content="text/html; charset=utf-8" http-equiv="Content-Type"/><meta content="en,ja,es" http-equiv="content-language"/><meta content="RA: Resident Advisor" name="Description"/><meta content="RA, residentadvisor, resident, advisor, music, ra, events, in, new, york, on, friday, 10, may, 2019" name="Keywords"/><meta content="Resident Advisor" name="Author"/><meta content="Resident Advisor" property="og:site_name"/><meta content="712773712080127" property="fb:app_id"/><link href="/bundles/default-css?v=73_zn4f444Ms1nbtnaddvbDUe15CsJN6vhoNK7oQovg1" rel="stylesheet"/>
<meta content="app-id=981952703, app-argument=ra-guide://search" name="apple-itunes-app"/><link href="/bundles/cat-listings-css?v=w7DJdRHlwvlSlvivLjU2DnToUsYFU7IYixebCORYtxw1" rel="stylesheet"/>
<link href="/favicon.ico" rel="icon" type="image/vnd.microsoft.icon"/><link color="

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [104]:
#Open the inspect element feature in your browser
listing = soup.find('div', class_="fl col4")

def event_names(listing):
    names = []
    for a in listing.find_all('h1'):
        names.append(a.find('a').text)
    return names

print(len(event_names(listing)), event_names(listing))

40 ['Black Coffee - Brooklyn Mirage Opening Event - Sold Out', "Boiler Room's 4:3 x DIS - Ian Isiah, Le1f, Br0nz3_g0dd3ss, Papi Juice", 'Friday: Avalon Emerson All Night', 'Rebekah / Heidi Sabertooth / Auspex / Ne/Re/A', 'Nicola Cruz (Live/AV)', 'Broke City: Luca Lozano (US Debut) / D. Tiffany / Hank Jackson / Working Women & Broke City DJs', 'Omar S, Taraval (Live), Rissa Garcia and More', 'Sublimate: Clark Price (Honcho) & Analog Soul', "Freaky 15 || Pink Mammoth's 15 Year Anniversary Party || NY", 'Inoki Party: David Hohme, Enamour & Marsh', "3'Hi (NYC) with D Double E (Live)", 'The Nose Dive', 'Parasol Sound with Twin Primes, Ma Sha, Mira Fahrenheit', 'Noir Music presents: Noir, Ramiro Lopez and Juliet Fox Boat Party', 'Inland, Cassegrain and Lychee', 'Afterhours - Bushwick A/V: Juliet Fox / Ohm Hourani / Blu9 / Pete Bones / 34th St John / Zallo', "D'Noir AM feat. Jeff Veliz & Crossbow", 'the Party. on a Boat', '[POSTPONED] Dillon Nathaniel / Attlas', 'NYC Hip Hop vs. Reggae Yacht 

In [105]:
def venue(listing):
    venues = []
    for a in listing.find_all('h1'):
        venues.append(a.find('span').text.strip('at '))
    return venues
# print(len(venues), venues)

In [106]:
def attendees(listing):
    num_attend = []
    for s in listing.find_all('p', class_="attending"):
        num_attend.append(s.find('span').text)
    return num_attend
# print(len(num_attend), num_attend)

In [107]:
import datetime

In [116]:
def event_date(listing):
    dates = []
    for i in range(len(listing.find_all('h1'))):
        dates.append(datetime.datetime(2019, 5, 10).strftime('%Y-%m-%d'))
    return dates
len(event_date(listing))

40

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [109]:
import pandas as pd

In [117]:
def scrape_events(events_page_url):
    #Your code here
    html_page = requests.get(events_page_url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    listing = soup.find('div', class_="fl col4")
    df = pd.DataFrame([event_names(listing), venue(listing), event_date(listing), attendees(listing)]).transpose()
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

scrape_events('https://www.residentadvisor.net/events/us/newyork/day/2019-05-10')

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Black Coffee - Brooklyn Mirage Opening Event -...,Brooklyn Mirage,2019-05-10,493
1,"Boiler Room's 4:3 x DIS - Ian Isiah, Le1f, Br0...",The 1896,2019-05-10,341
2,Friday: Avalon Emerson All Night,Nowadays,2019-05-10,243
3,Rebekah / Heidi Sabertooth / Auspex / Ne/Re/A,BASEMENT,2019-05-10,228
4,Nicola Cruz (Live/AV),Knockdown Center,2019-05-10,182
5,Broke City: Luca Lozano (US Debut) / D. Tiffan...,Market Hotel,2019-05-10,172
6,"Omar S, Taraval (Live), Rissa Garcia and More",Elsewhere,2019-05-10,148
7,Sublimate: Clark Price (Honcho) & Analog Soul,TBA - Brooklyn,2019-05-10,86
8,Freaky 15 || Pink Mammoth's 15 Year Anniversar...,House Of Yes,2019-05-10,85
9,"Inoki Party: David Hohme, Enamour & Marsh",Good Room,2019-05-10,74


## Write a Function to Retrieve the URL for the Next Page

In [None]:
def next_page(url):
    #Your code here
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [None]:
#Your code here

## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!