# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
#Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
#Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

In [2]:
url = "https://www.residentadvisor.net/events"
r = requests.get(url)
soup = BeautifulSoup(r.text)


In [3]:
#table = soup.find('li',{'class':'clearfix'})
event_listing = soup.find('div', {'id':'event-listing'})
#list_items = event_listing.findAll('li')
#event_name, event_venue = list_items[3].find('h1').text.split(' at ')

#print(event_name)
#print(event_venue)

i = 3

num_attending = event_listing.findAll('li')[i].p.span.text
print(num_attending)

event_name, event_venue = event_listing.findAll('li')[i].h1.text.split(' at ')
print(event_name)
print(event_venue)

event_date = event_listing.findAll('li')[i].time.text

print(event_date)

1
Drop Dance Party - Silent Disco Edition
Afrobrazilian Cultural Center of NJ
2019-05-18T00:00


In [4]:
table = soup.findAll('div',{'class':'bbox'})
len(table)

5

In [5]:
def scrape_events(events_page_url):
    #Your code here
    r = requests.get(events_page_url).text
    soup = BeautifulSoup(r)
    event_list = []
    
    
    event_listing = soup.find('div', {'id':'event-listing'})
    
    for event in event_listing.findAll('li'):
        try:
            num_attending = event.p.span.text
        except:
            num_attending = None
        try:
            event_name, event_venue = event.h1.text.split(' at ')
        except:
            event_name, event_venue = None, None
        try:
            event_date = event.time.text
        except:
            event_date = None
        if event_name:
            event_list.append({"Event_Name": event_name, "Venue": event_venue, "Event_Date": event_date, "Number_of_Attendees": num_attending})

    
    df = pd.DataFrame.from_dict(event_list)
    df.Event_Date = df.Event_Date.astype('datetime64') 
    return df

In [6]:
df_test = scrape_events("https://www.residentadvisor.net/events")
df_test

Unnamed: 0,Event_Date,Event_Name,Number_of_Attendees,Venue
0,2019-05-17,The Spring Up,2,Headroom
1,2019-05-18,Drop Dance Party - Silent Disco Edition,1,Afrobrazilian Cultural Center of NJ
2,2019-05-19,Summer Rooftop Series,2,Pour Abbey's


## Write a Function to Retrieve the URL for the Next Page

In [7]:
def next_page(url):
    #Your code here
    search_dict = {"ga-on":"click", "ga-event-category":"event-listings", "ga-event-action":"Next "}
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    next_page_url = soup.find('a', search_dict)
    return next_page_url['href']

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [8]:
url = "https://www.residentadvisor.net/events/us/newyork/"

In [9]:
scrape_events(url)

Unnamed: 0,Event_Date,Event_Name,Number_of_Attendees,Venue
0,2019-05-17,Innervisions New York,711,Knockdown Center
1,2019-05-17,Headless Horseman Live / Vatican Shadow / Volv...,256,BASEMENT
2,2019-05-17,Friday: PLO Man All Night,117,Nowadays
3,2019-05-17,ReSolute w Move D & Flabbergast,103,TBA - New York
4,2019-05-17,Material 17: Nico Laa,28,Hart bar
5,2019-05-17,Full Moon with Sébastien Léger,24,House Of Yes
6,2019-05-17,"Museum of Love (DJ set), L&l&l Record Club Plu...",17,Good Room
7,2019-05-17,Pete Rock,14,Analog Bkny
8,2019-05-17,"Just Blaze, Matt FX and Trillnatured",,Elsewhere
9,2019-05-17,"Rendezvous with Sons of Immigrants, Arvi, CGC",,TBA Brooklyn


In [10]:
#Your code here
main_url = "https://www.residentadvisor.net"

list_of_dfs = []
total_events = 0
url = main_url + '/events/us/newyork/week/2019-03-10'

In [11]:
i = 0
while total_events<1000:
    i += 1
    try:
        df_temp = scrape_events(url)
        if len(df_temp)>0:
            list_of_dfs.append(df_temp)
            total_events += len(df_temp)
            print(i)
        url = main_url + next_page(url)
    except:
        pass
    if i>1000: break
        
df_1000 = pd.concat(list_of_dfs)[:1000]
print(len(df_1000))
df_1000.sort_values('Event_Date', ascending=False, inplace=True)
df_1000.head()

1
2
3
4
5
6
7
8
9
1000


Unnamed: 0,Event_Date,Event_Name,Number_of_Attendees,Venue
68,2019-05-10,Nitzer Ebb,,Elsewhere
56,2019-05-10,"Parasol Sound with Twin Primes, Ma Sha, Mira F...",,TBA Brooklyn
44,2019-05-10,Black Coffee - Brooklyn Mirage Opening Event -...,494.0,Brooklyn Mirage
45,2019-05-10,"Boiler Room's 4:3 x DIS - Ian Isiah, Le1f, Br0...",341.0,The 1896
46,2019-05-10,Friday: Avalon Emerson All Night,243.0,Nowadays


## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!