# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [3]:
# Load the https://www.residentadvisor.net/events page in your browser.
import pandas as pd
import numpy as np
import requests 
from bs4 import BeautifulSoup
import re


In [84]:
url = 'https://www.residentadvisor.net/events/us/newyork'
html_page = requests.get('https://www.residentadvisor.net/events/us/newyork')
soup = BeautifulSoup(html_page.content, 'html.parser')


## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [85]:
# Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [86]:
soup.find('div', id='event-listing').findAll('li')

[<li><p class="eventDate date"><a href="/events.aspx?ai=8&amp;v=day&amp;mn=11&amp;yr=2020&amp;dy=25"><span>Wed, 25 Nov 2020 /</span></a></p></li>,
 <li class=""><article class="event-item clearfix" itemscope="" itemtype="http://data-vocabulary.org/Event"><span style="display:none;"><time datetime="2020-11-25T00:00" itemprop="startDate">2020-11-25T00:00</time></span><a href="/events/1430723"><img height="76" src="/images/events/flyer/2020/11/us-1120-1430723-list.jpg" width="152"/></a><div class="bbox"><h1 class="event-title" itemprop="summary"><a href="/events/1430723" itemprop="url" title="Event details of Unter Baths: A Holiday Merch Drop">Unter Baths: A Holiday Merch Drop</a> <span>at <span class="grey" style="display:inline;">Unter Baths</span></span></h1><div class="grey event-lineup">Volvox</div><p class="attending"><span>1</span> Attending</p></div></article></li>,
 <li><p class="eventDate date"><a href="/events.aspx?ai=8&amp;v=day&amp;mn=11&amp;yr=2020&amp;dy=26"><span>Thu, 26 N

In [87]:
soup.find('div', id='event-listing').findAll('li')[0]

<li><p class="eventDate date"><a href="/events.aspx?ai=8&amp;v=day&amp;mn=11&amp;yr=2020&amp;dy=25"><span>Wed, 25 Nov 2020 /</span></a></p></li>

In [88]:
soup.find('div', id='event-listing').findAll('li')[0].text

'Wed, 25 Nov 2020 /'

In [89]:
soup.find('div', id='event-listing').findAll('li')[0].text.replace('/', '')

'Wed, 25 Nov 2020 '

In [90]:
soup.find('div', id='event-listing').findAll('li')[0].text.replace('/', '').strip()

'Wed, 25 Nov 2020'

In [91]:
soup.find('div', id='event-listing').findAll('li')[0].find(class_='eventDate date')

<p class="eventDate date"><a href="/events.aspx?ai=8&amp;v=day&amp;mn=11&amp;yr=2020&amp;dy=25"><span>Wed, 25 Nov 2020 /</span></a></p>

In [92]:
for x in soup.find('div', id='event-listing').findAll('li'):
    if x.find(class_='eventDate date'):
        print(x.text.replace('/', '').strip())

Wed, 25 Nov 2020
Thu, 26 Nov 2020
Fri, 27 Nov 2020
Sat, 28 Nov 2020
Sun, 29 Nov 2020
Mon, 30 Nov 2020


In [93]:
soup.find('div', id='event-listing').findAll('li')[1].find(class_='event-item clearfix').find(class_='event-title').find('a')

<a href="/events/1430723" itemprop="url" title="Event details of Unter Baths: A Holiday Merch Drop">Unter Baths: A Holiday Merch Drop</a>

In [94]:
soup.find('div', id='event-listing').findAll('li')[1].find(class_='event-item clearfix').find(class_='event-title').find('a').text

'Unter Baths: A Holiday Merch Drop'

In [95]:
for x in soup.find('div', id='event-listing').findAll('li'):
    if x.find(class_='eventDate date'):
        print(x.text.replace('/', '').strip())
    if x.find(class_='event-item clearfix') or x.find(class_='event-item clearfix tickets-bkg-logo'):
        print(x.find(class_='event-title').find('a').text)

Wed, 25 Nov 2020
Unter Baths: A Holiday Merch Drop
Thu, 26 Nov 2020
Unter Baths: A Holiday Merch Drop
Fri, 27 Nov 2020
Bugs Bunny Thanksgiving Secret Speakeasy 
Unter Baths: A Holiday Merch Drop
Whatever. - Sean Cormac & Chili Davis, Ruez
Sat, 28 Nov 2020
NY Hip Hop vs Reggae® Synset Cruise Skyport Marina Cabana Yacht
White Rabbit - Desert Rabbit
Bugs Bunny Thanksgiving Secret Speakeasy 
Unter Baths: A Holiday Merch Drop
Sun, 29 Nov 2020
Taj Lounge NYC Sunday Funday Hip Hop vs. Reggae® Brunch & Day Party
Mon, 30 Nov 2020
Taj Lounge NYC Sunday Funday Hip Hop vs. Reggae® Brunch & Day Party


In [96]:
soup.find('div', id='event-listing').findAll('li')[1].find('h1')

<h1 class="event-title" itemprop="summary"><a href="/events/1430723" itemprop="url" title="Event details of Unter Baths: A Holiday Merch Drop">Unter Baths: A Holiday Merch Drop</a> <span>at <span class="grey" style="display:inline;">Unter Baths</span></span></h1>

In [97]:
soup.find('div', id='event-listing').findAll('li')[1].find('h1').find('span')

<span>at <span class="grey" style="display:inline;">Unter Baths</span></span>

In [98]:
soup.find('div', id='event-listing').findAll('li')[1].find('h1').find('span').text

'at Unter Baths'

In [99]:
soup.find('div', id='event-listing').findAll('li')[1].find('h1').find('span').text.replace('at ', '').strip()

'Unter Baths'

In [100]:
for x in soup.find('div', id='event-listing').findAll('li'):
    if x.find(class_='eventDate date'):
        print(x.text.replace('/', '').strip())
    if x.find(class_='event-item clearfix') or x.find(class_='event-item clearfix tickets-bkg-logo'):
        print(x.find(class_='event-title').find('a').text)
        print(x.find('h1').find('span').text.replace('at ', '').strip())

Wed, 25 Nov 2020
Unter Baths: A Holiday Merch Drop
Unter Baths
Thu, 26 Nov 2020
Unter Baths: A Holiday Merch Drop
Unter Baths
Fri, 27 Nov 2020
Bugs Bunny Thanksgiving Secret Speakeasy 
The Museum of Interesting Things
Unter Baths: A Holiday Merch Drop
Unter Baths
Whatever. - Sean Cormac & Chili Davis, Ruez
TBA Brooklyn
Sat, 28 Nov 2020
NY Hip Hop vs Reggae® Synset Cruise Skyport Marina Cabana Yacht
Skyport Marina
White Rabbit - Desert Rabbit
Secret Location
Bugs Bunny Thanksgiving Secret Speakeasy 
The Museum of Interesting Things
Unter Baths: A Holiday Merch Drop
Unter Baths
Sun, 29 Nov 2020
Taj Lounge NYC Sunday Funday Hip Hop vs. Reggae® Brunch & Day Party
Taj Lounge
Mon, 30 Nov 2020
Taj Lounge NYC Sunday Funday Hip Hop vs. Reggae® Brunch & Day Party
Taj Lounge


In [101]:
soup.find('div', id='event-listing').findAll('li')[1].find('p', class_='attending')

<p class="attending"><span>1</span> Attending</p>

In [102]:
soup.find('div', id='event-listing').findAll('li')[1].find('p', class_='attending').text

'1 Attending'

In [103]:
soup.find('div', id='event-listing').findAll('li')[1].find('p', class_='attending').text.replace('Attending', '').strip()

'1'

In [104]:
for x in soup.find('div', id='event-listing').findAll('li'):
    if x.find(class_='eventDate date'):
        print(x.text.replace('/', '').strip())
    if x.find(class_='event-item clearfix') or x.find(class_='event-item clearfix tickets-bkg-logo'):
        print(x.find(class_='event-title').find('a').text)
        print(x.find('h1').find('span').text.replace('at ', '').strip())
        print(x.find('p', class_='attending').text.replace('Attending', '').strip())

Wed, 25 Nov 2020
Unter Baths: A Holiday Merch Drop
Unter Baths
1
Thu, 26 Nov 2020
Unter Baths: A Holiday Merch Drop
Unter Baths
1
Fri, 27 Nov 2020
Bugs Bunny Thanksgiving Secret Speakeasy 
The Museum of Interesting Things
2
Unter Baths: A Holiday Merch Drop
Unter Baths
1
Whatever. - Sean Cormac & Chili Davis, Ruez
TBA Brooklyn
1
Sat, 28 Nov 2020
NY Hip Hop vs Reggae® Synset Cruise Skyport Marina Cabana Yacht
Skyport Marina
1
White Rabbit - Desert Rabbit
Secret Location
4
Bugs Bunny Thanksgiving Secret Speakeasy 
The Museum of Interesting Things
2
Unter Baths: A Holiday Merch Drop
Unter Baths
1
Sun, 29 Nov 2020
Taj Lounge NYC Sunday Funday Hip Hop vs. Reggae® Brunch & Day Party
Taj Lounge
1
Mon, 30 Nov 2020
Taj Lounge NYC Sunday Funday Hip Hop vs. Reggae® Brunch & Day Party
Taj Lounge
1


In [105]:
def scrape_events(events_page_url):
    #Your code here
    html_page = requests.get(events_page_url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    
    name_list = []
    venue_list = []
    date_list = []
    attendees_list = []
    
    for x in soup.find('div', id='event-listing').findAll('li'):
        if x.find(class_='eventDate date'):
            date = x.text.replace('/', '').strip()
        elif x.find(class_='event-item clearfix') or x.find(class_='event-item clearfix tickets-bkg-logo'):
            name_list.append(x.find(class_='event-title').find('a').text)
            venue_list.append(x.find('h1').find('span').text.replace('at ', '').strip())
            attendees_list.append(x.find('p', class_='attending').text.replace('Attending', '').strip())
            date_list.append(date)

    df = pd.DataFrame({'Event_Name': name_list,
                     'Venue': venue_list,
                     'Even_Date': date_list,
                     'Number_of_Attendees': attendees_list})
    
    return df

In [147]:
scrape_events('https://www.residentadvisor.net/events/us/newyork')

Unnamed: 0,Event_Name,Venue,Even_Date,Number_of_Attendees
0,Unter Baths: A Holiday Merch Drop,Unter Baths,"Wed, 25 Nov 2020",1
1,Unter Baths: A Holiday Merch Drop,Unter Baths,"Thu, 26 Nov 2020",1
2,Bugs Bunny Thanksgiving Secret Speakeasy,The Museum of Interesting Things,"Fri, 27 Nov 2020",2
3,Unter Baths: A Holiday Merch Drop,Unter Baths,"Fri, 27 Nov 2020",1
4,"Whatever. - Sean Cormac & Chili Davis, Ruez",TBA Brooklyn,"Fri, 27 Nov 2020",1
5,NY Hip Hop vs Reggae® Synset Cruise Skyport Ma...,Skyport Marina,"Sat, 28 Nov 2020",1
6,White Rabbit - Desert Rabbit,Secret Location,"Sat, 28 Nov 2020",4
7,Bugs Bunny Thanksgiving Secret Speakeasy,The Museum of Interesting Things,"Sat, 28 Nov 2020",2
8,Unter Baths: A Holiday Merch Drop,Unter Baths,"Sat, 28 Nov 2020",1
9,Taj Lounge NYC Sunday Funday Hip Hop vs. Regga...,Taj Lounge,"Sun, 29 Nov 2020",1


## Write a Function to Retrieve the URL for the Next Page

In [148]:
soup.find('li', class_='but arrow-right right')

<li class="but arrow-right right" id="liNext">
<a ga-event-action="Next " ga-event-category="event-listings" ga-on="click" href="/events/us/newyork/week/2020-12-02">Next </a>
</li>

In [149]:
soup.find('li', class_='but arrow-right right').find('a')

<a ga-event-action="Next " ga-event-category="event-listings" ga-on="click" href="/events/us/newyork/week/2020-12-02">Next </a>

In [150]:
soup.find('li', class_='but arrow-right right').find('a')['href']

'/events/us/newyork/week/2020-12-02'

In [151]:
'https://www.residentadvisor.net' + soup.find('li', class_='but arrow-right right').find('a')['href']

'https://www.residentadvisor.net/events/us/newyork/week/2020-12-02'

In [152]:
def next_page(url):
    #Your code here
    html_page = requests.get(url)
    soup = BeautifulSoup(html_page.content, 'html.parser')
    
    back_of_url = soup.find('li', class_='but arrow-right right').find('a')['href']
    next_page_url = 'https://www.residentadvisor.net' + back_of_url
    return next_page_url

In [153]:
next_page('https://www.residentadvisor.net/events/us/newyork')

'https://www.residentadvisor.net/events/us/newyork/week/2020-12-02'

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [167]:
import time

#Your code here
dfs = []
total_rows = 0
cur_url = "https://www.residentadvisor.net/events/us/newyork"
while total_rows <= 1000:
    df = scrape_events(cur_url)
    dfs.append(df)
    total_rows += len(df)
    cur_url = next_page(cur_url)
    time.sleep(.2)
    
df = pd.concat(dfs)
df = df.iloc[:1000]
print(len(df))
df.head()

AttributeError: 'NoneType' object has no attribute 'text'

## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!