# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
from bs4 import BeautifulSoup
import requests
import re

In [None]:
# Load the https://www.residentadvisor.net/events page in your browser.

base_url = 'https://www.residentadvisor.net/events/us/washingtondc'
last_url = 'https://www.residentadvisor.net/events/us/washingtondc/week/2020-03-26'
# base_url = last_url
html_page = requests.get(base_url) # Make a get request to retrieve the page
soup = BeautifulSoup(html_page.content, 'html.parser')

In [None]:
events = soup.select(".event-item")

In [None]:
event = events[0]
event

In [None]:
# Date
event.find("time", {"itemprop" : "startDate"})['datetime']

In [None]:
# Name
event.find(class_='event-title').a.text

In [None]:
# Venue
event.find(href=re.compile("club")).text

In [None]:
# Attending (sometimes none)

int(event.find(class_ ='attending').span.text)

In [None]:
next_page = soup.find(id='liNext').a
print(next_page)
'href' in next_page

In [None]:
soup.select_one('#liNext a')['href']

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
# Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [None]:
def scrape_events(events_page_url):
    #Your code here
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

## Write a Function to Retrieve the URL for the Next Page

In [None]:
def next_page(url):
    #Your code here
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [None]:
import pandas as pd
import importlib
import parseevents
importlib.reload(parseevents);

In [None]:
#Your code here

from parseevents import EventScraper
url = 'https://www.residentadvisor.net/events/us/washingtondc'
es = EventScraper(url)


In [None]:
es.__dict__


In [None]:
es.offset

In [None]:
es.page_count

In [None]:
es.page.__dict__

In [None]:
es.scrape()

In [None]:
es.page.soup.find(id='liNext').a

In [None]:
es.current_url

In [None]:
df = pd.DataFrame(es.event_dicts)
df.head()

In [None]:
df.info()

In [None]:
df.date = pd.to_datetime(df.date)

In [None]:
df.date.value_counts().sort_index()

## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!