# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [37]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
from numpy import nan, random

In [38]:
#Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [39]:
#Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [40]:
def scrape_events(events_page_url):
    page = requests.get(events_page_url)
    assert(page.status_code == 200)
    
    soup = BeautifulSoup(page.content)
    
    # find all event-items
    events = soup.find_all(class_='event-item')
    
    cols = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    
    names = []
    venues = []
    dates = []
    noa = []
    
    for event in events:
        event_title = event.find(class_='event-title').get_text().split(sep=' at ')
        event_name = event_title[0]
        event_venue = event_title[1]
        attending = event.find(class_='attending')
        try:
            attending = int(attending.get_text()[0])
        except:
            attending = nan
        
        event_date = event.find('time').get_text()
        names.append(event_name)
        venues.append(event_venue)
        dates.append(event_date)
        noa.append(attending)
        
    df_dict = {cols[0]: names, cols[1]: venues, cols[2]: dates, cols[3]: noa}

    df = pd.DataFrame.from_dict(df_dict)
    df.Event_Date = pd.to_datetime(df.Event_Date, infer_datetime_format=True)
    
    return df

scrape_events('https://www.residentadvisor.net/events')

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,5 Mag Welcomes Junior Sanchez / Gene Hunt / Cz...,smartbar,2019-08-30,1.0
1,Joris Voorn: Beats On The Beach,Castaways,2019-08-30,2.0
2,Paradigm Underground ft Rebekah,Secret Location TBA,2019-08-30,1.0
3,Utopia presents Hieroglyphic Being,Exit,2019-08-30,6.0
4,The Function with Maliibu Miitch - Live,Le Nocturne,2019-08-30,3.0
5,Material with Colette / Dajae / Diz / DJ Heath...,smartbar,2019-08-31,2.0
6,Matthew Dear (DJ Set),Spybar,2019-08-31,5.0
7,Midway Hustle Record's Release Party,Gramaphone Records,2019-08-31,3.0
8,"SEA of Fools with Dēacön Blü, Duke Grip & Gues...",Cerise,2019-08-31,1.0
9,House of Efunk,Space Stage Studios,2019-09-01,3.0


## Write a Function to Retrieve the URL for the Next Page

In [41]:
def next_page(url):
    page = requests.get(url)
    assert(page.status_code == 200)
    
    soup = BeautifulSoup(page.content)
    
    next_item = soup.find(id='liNext')
    next_page_url = next_item.a.get('href')
    
    next_page_url = 'https://www.residentadvisor.net' + next_page_url
    
    return next_page_url

next_page('https://www.residentadvisor.net/events')

'https://www.residentadvisor.net/events/us/chicago/week/2019-09-06'

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [46]:
# so that we don't get blocked by the system
from time import sleep

#Your code here
def scrape_n_events(n):
    url = 'https://www.residentadvisor.net/events'
    print(url)
    df = scrape_events(url)
    while len(df.index) < n:
        print(len(df.index))
        url = next_page(url)
        print(url)
        df = pd.concat([df, scrape_events(url)])
        wait_time = 2*random.random()
        sleep(wait_time)

    return df

# investigation reveals only 77 events for a while, so we'll only scrape 50 of them.
thousand_events = scrape_n_events(50)
thousand_events

https://www.residentadvisor.net/events
18
https://www.residentadvisor.net/events/us/chicago/week/2019-09-06
30
https://www.residentadvisor.net/events/us/chicago/week/2019-09-13
38
https://www.residentadvisor.net/events/us/chicago/week/2019-09-20
44
https://www.residentadvisor.net/events/us/chicago/week/2019-09-27


Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,5 Mag Welcomes Junior Sanchez / Gene Hunt / Cz...,smartbar,2019-08-30,1.0
1,Joris Voorn: Beats On The Beach,Castaways,2019-08-30,2.0
2,Paradigm Underground ft Rebekah,Secret Location TBA,2019-08-30,1.0
3,Utopia presents Hieroglyphic Being,Exit,2019-08-30,6.0
4,The Function with Maliibu Miitch - Live,Le Nocturne,2019-08-30,3.0
5,Material with Colette / Dajae / Diz / DJ Heath...,smartbar,2019-08-31,2.0
6,Matthew Dear (DJ Set),Spybar,2019-08-31,5.0
7,Midway Hustle Record's Release Party,Gramaphone Records,2019-08-31,3.0
8,"SEA of Fools with Dēacön Blü, Duke Grip & Gues...",Cerise,2019-08-31,1.0
9,House of Efunk,Space Stage Studios,2019-09-01,3.0


In [53]:
thousand_events.sort_values(by=['Number_of_Attendees', 'Event_Date'], ascending=False)

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,J.Phlip / Paul Johnson / Duke Shin,smartbar,2019-09-27,9.0
1,Relate with DJ Slugo / Erika Glück / Professor...,smartbar,2019-09-13,9.0
0,Format x Sleepwalker present: FJAAK,TBA - Chicago,2019-09-06,9.0
4,The Function with Vladimir Ivkovic / Traxx / J...,smartbar,2019-09-14,8.0
10,Queen! Labor Day Edition (All-Building),smartbar,2019-09-01,7.0
11,Road To Imagine Festival - Chicago,TBA - Chicago,2019-09-01,7.0
5,Mid-Week Queen! with Cerrone / Michael Serafin...,smartbar,2019-10-02,6.0
10,Spybar & Nightsweat presents: Dax J,Spybar,2019-09-12,6.0
3,Utopia presents Hieroglyphic Being,Exit,2019-08-30,6.0
6,Matthew Dear (DJ Set),Spybar,2019-08-31,5.0


## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!