# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [1]:
# Load the https://www.residentadvisor.net/events page in your browser.
# Using https://www.residentadvisor.net/events/us/california because there are no events in Indiana

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
# Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [58]:
from bs4 import BeautifulSoup
import pandas as pd
import requests
from datetime import datetime, timedelta
import time

In [215]:
def scrape_events(events_page_url):
    #Your code here
    r = requests.get(events_page_url)
    soup = BeautifulSoup(r.text, 'html.parser')

    rows = soup.find(id = "items").findAll('li')
    deets = {"Event_Name":[], "Venue":[], "Event_Date":[], "Number_of_Attendees":[]}

    for row in rows:
        if 'eventDate' not in row.p.attrs['class']:
#             print("Not the date row")
            
#             #date = row.a.span.text[:-1].strip()
#             date = spans[0].text
#              print ('Date Row: ', spans[0].text)
#         else:
            deets["Event_Date"].append(row.find('span').text)
            row = row.find('div', {'class', 'bbox'})
#             print(row.prettify())
            spans = row.findAll('span')
#             i = 0
#             print('SPANS')
#             for span in spans:
#                 print (i, span.text)
#                 i += 1
                
            
#             i = 0
#             print('As')
            anchors = row.findAll('a')
#             for a in anchors:
#                 print (i, a.text)
#                 i += 1
            
#             print(row.p.attrs['class'])
            
            links = row.findAll('a')
            
#             print(len(links), links)
#             print(len(spans), spans)
#             print ('Event name: ', )
            deets["Event_Name"].append(anchors[0].text)
            deets["Venue"].append(spans[0].findChild().text)
            deets["Number_of_Attendees"].append(row.find('p',{'class':'attending'}).span.text)

    df = pd.DataFrame(deets)

    #     df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

In [216]:
scrape_events('https://www.residentadvisor.net/events/us/california/week/2020-11-24')

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,NU Tekno Friday Night Dim Sum,Northern Ducks,2020-11-27T00:00,2
1,Kick Back at The Drive In - Clozee Charlesthef...,Oak Canyon Ranch,2020-11-28T00:00,2
2,Kick Back at The Drive In - Clozee Charlesthef...,Oak Canyon Ranch,2020-11-29T00:00,2
3,Kick Back at the Drive-In - Lee Burridge and Hoj,Oak Canyon Ranch,2020-11-29T00:00,2
4,Kick Back at the Drive-In - Lee Burridge and Hoj,Oak Canyon Ranch,2020-11-30T00:00,2


In [217]:
scrape_events('https://www.residentadvisor.net/events/us/california/week/2020-12-04')

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Technometrik 3 Days Desert Event,TBA - Joshua Tree,2020-12-04T00:00,25
1,Technometrik 3 Days Desert Event,TBA - Joshua Tree,2020-12-05T00:00,25
2,Technometrik 3 Days Desert Event,TBA - San Diego,2020-12-05T00:00,5


## Write a Function to Retrieve the URL for the Next Page

In [65]:
def next_page(url):
    #Your code here
    url.split('/')[-1]
    next_date = datetime.strptime(url.split('/')[-1], '%Y-%m-%d') + timedelta(days=10)
    next_date = next_date.strftime("%Y-%m-%d")
    next_page_url = url[:-10] + next_date
   
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [218]:
#Your code here

#HAHAHA 1000 events.  Lets try to get 8. Without getting blocked.

url = 'https://www.residentadvisor.net/events/us/california/week/2020-11-24'
df = pd.DataFrame()
while len(df) < 8:
    df = df.append(scrape_events(url), ignore_index=True)
    print(len(df))
    url = next_page(url)
    print('Next URL', url)
    time.sleep(1)
    
df

5
Next URL https://www.residentadvisor.net/events/us/california/week/2020-12-04
8
Next URL https://www.residentadvisor.net/events/us/california/week/2020-12-14


Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,NU Tekno Friday Night Dim Sum,Northern Ducks,2020-11-27T00:00,2
1,Kick Back at The Drive In - Clozee Charlesthef...,Oak Canyon Ranch,2020-11-28T00:00,2
2,Kick Back at The Drive In - Clozee Charlesthef...,Oak Canyon Ranch,2020-11-29T00:00,2
3,Kick Back at the Drive-In - Lee Burridge and Hoj,Oak Canyon Ranch,2020-11-29T00:00,2
4,Kick Back at the Drive-In - Lee Burridge and Hoj,Oak Canyon Ranch,2020-11-30T00:00,2
5,Technometrik 3 Days Desert Event,TBA - Joshua Tree,2020-12-04T00:00,25
6,Technometrik 3 Days Desert Event,TBA - Joshua Tree,2020-12-05T00:00,25
7,Technometrik 3 Days Desert Event,TBA - San Diego,2020-12-05T00:00,5


In [226]:
df['Event_Date'] = pd.to_datetime(df['Event_Date'])
df

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,NU Tekno Friday Night Dim Sum,Northern Ducks,2020-11-27,2
1,Kick Back at The Drive In - Clozee Charlesthef...,Oak Canyon Ranch,2020-11-28,2
2,Kick Back at The Drive In - Clozee Charlesthef...,Oak Canyon Ranch,2020-11-29,2
3,Kick Back at the Drive-In - Lee Burridge and Hoj,Oak Canyon Ranch,2020-11-29,2
4,Kick Back at the Drive-In - Lee Burridge and Hoj,Oak Canyon Ranch,2020-11-30,2
5,Technometrik 3 Days Desert Event,TBA - Joshua Tree,2020-12-04,25
6,Technometrik 3 Days Desert Event,TBA - Joshua Tree,2020-12-05,25
7,Technometrik 3 Days Desert Event,TBA - San Diego,2020-12-05,5


## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!