# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
from pprint import pprint as pp
from collections import defaultdict

In [None]:
#Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
#Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [8]:
request = requests.get('https://www.residentadvisor.net/events')
soup = BeautifulSoup(request.content)

In [124]:
rows = []
for li in soup.select('#items li'):
    date_attr = li.select_one('p.eventDate.date')
    if date_attr:
        date = date_attr.get_text()[5:-2]
        
    event_item = li.select_one('.event-item')
    if event_item:
        title = event_item.select_one('.event-title a').get_text()
        venue = event_item.select_one('.event-title span').get_text()
        attending = event_item.select_one('.attending span').get_text()
        print(title, venue, attending, sep=' ### ')
        rows.append([date, title, venue, attending])
        
    
    #print(date)
print(rows[0])
df = pd.DataFrame(rows)
df.columns = ['date', 'title', 'venue', 'attending']
df.venue = df.venue.str.replace('at ', '')
df.date = pd.to_datetime(df.date)

Trouble IN Paradise at Daddy's Grotto ### at TBA - Richmond ### 3
Phreak ### at Eyeclops Studios ### 1
Trouble IN Paradise at Daddy's Grotto ### at TBA - Richmond ### 3
['10 Aug 2019', "Trouble IN Paradise at Daddy's Grotto", 'at TBA - Richmond', '3']


In [179]:
del rows

In [2]:
def scrape_events(events_page_url):
    request = requests.get(events_page_url)
    soup = BeautifulSoup(request.content)
    
    rows = []
    for li in soup.select('#items li'):
        date_attr = li.select_one('p.eventDate.date')
        if date_attr:
            date = date_attr.get_text()[5:-2]

        event_item = li.select_one('.event-item')
        if event_item:
            title = event_item.select_one('.event-title a').get_text()
            venue = event_item.select_one('.event-title span').get_text()
            
            try:
                attending = event_item.select_one('.attending span').get_text()
            except AttributeError:
                attending = 0
                
            #print(title, venue, attending, sep=' ### ')
            rows.append([title, venue, date, attending])

    return rows

#rows = scrape_events('https://www.residentadvisor.net/events')
#print(rows)

In [160]:
next_page('https://www.residentadvisor.net/events/us/virginia/week/2019-08-21')

'https://www.residentadvisor.net//events/us/virginia/week/2019-08-28'

## Write a Function to Retrieve the URL for the Next Page

In [3]:
def next_page(url):
    base_url = 'https://www.residentadvisor.net'
    request = requests.get(url)
    soup = BeautifulSoup(request.content)
    next_url = soup.select_one('#liNext2 a').attrs['href']
    next_page_url = base_url + next_url
    return next_page_url

In [165]:
next_page('https://www.residentadvisor.net/events')

'https://www.residentadvisor.net//events/us/newyork/week/2019-08-15'

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [13]:
max_num_of_events = 1000
url = 'https://www.residentadvisor.net/events'
    
events_list = []
while(len(events_list) < max_num_of_events):
    print('Loading events from:', url)
    events = scrape_events(url)
    events_list.extend(events)
    #print(events_list)
    url = next_page(url)
    #print(url)

df = pd.DataFrame(events_list)
df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
df.Venue = df.Venue.str.replace('at ', '')
df.Event_Date = pd.to_datetime(df.Event_Date)
#print(df)

Loading events from: https://www.residentadvisor.net/events
Loading events from: https://www.residentadvisor.net/events/us/newyork/week/2019-08-15
Loading events from: https://www.residentadvisor.net/events/us/newyork/week/2019-08-22
Loading events from: https://www.residentadvisor.net/events/us/newyork/week/2019-08-29
Loading events from: https://www.residentadvisor.net/events/us/newyork/week/2019-09-05
Loading events from: https://www.residentadvisor.net/events/us/newyork/week/2019-09-12
Loading events from: https://www.residentadvisor.net/events/us/newyork/week/2019-09-19
Loading events from: https://www.residentadvisor.net/events/us/newyork/week/2019-09-26
Loading events from: https://www.residentadvisor.net/events/us/newyork/week/2019-10-03
Loading events from: https://www.residentadvisor.net/events/us/newyork/week/2019-10-10
Loading events from: https://www.residentadvisor.net/events/us/newyork/week/2019-10-17


AttributeError: 'NoneType' object has no attribute 'attrs'

In [15]:
df.head()

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,"Morgana [free]: HOSH, jozif, Tim Sweeney, Morgan",Brooklyn Mirage,2019-08-08,386
1,Frendzone! // Doc Scott / Jacky Sommer b2b Kel...,Good Room,2019-08-08,18
2,Deeper Love with Proper Villains + Mike Trotte...,The Sultan Room,2019-08-08,4
3,DJ Seinfeld,public records,2019-08-08,25
4,Agenda: Fabuloso Edition,Bossa Nova Civic Club,2019-08-08,4


## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!