# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [None]:
#Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
#Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

In [2]:
html_page = requests.get('https://www.residentadvisor.net/events/es/ibiza') 
soup = BeautifulSoup(html_page.content, 'html.parser')

In [66]:
events_etc=soup.find('div',id='event-listing')
events=events_etc.findAll('li')
events[1].find('a',itemprop="url").text

"Resistance Ibiza Week 2 - Carl's Birthday"

In [98]:
events[2].findAll('a')[-1].text

'Hï Ibiza'

In [70]:
events[0].find('p', class_='eventDate').text[:-2]

'Tue, 30 Jul 2019'

In [111]:
type(events[5].find('p',class_='attending'))

NoneType

In [140]:
def scrape_events(events_page_url):
    event_names=[]
    venues=[]
    event_dates=[]
    numbers_of_attendees=[]
    html_page=requests.get(events_page_url)
    soup=BeautifulSoup(html_page.content,'html.parser')
    events_etc=soup.find('div',id='event-listing')
    events=events_etc.findAll('li')
    for i in range(len(events)):
        date_or_not=events[i].findAll('p', class_='eventDate')
        if len(date_or_not)==1:
            event_date=events[i].find('p', class_='eventDate').text[:-2]
        else:
            end_of_events=events[i].find('div',class_='but myra mt16')
            if end_of_events:
                pass
            else:
                event_name=events[i].find('a',itemprop="url").text
                venue=events[i].findAll('a')[-1].text
                attending=events[i].findAll('p',class_='attending')
                if len(attending)==0:
                    number_of_attendees=float('NaN')
                else:
                    number_of_attendees=attending[0].text.split()[0]
                event_dates.append(event_date)
                event_names.append(event_name)
                venues.append(venue)
                numbers_of_attendees.append(number_of_attendees)
          
    df=pd.DataFrame([event_names,venues,event_dates,numbers_of_attendees]).transpose()
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

In [141]:
scrape_events('https://www.residentadvisor.net/events/es/ibiza')

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,Resistance Ibiza Week 2 - Carl's Birthday,Privilege,"Tue, 30 Jul 2019",225
1,BODYWORKS,Hï Ibiza,"Tue, 30 Jul 2019",14
2,The Ritual with Anané & Louie Vega at Heart Fa...,Heart Ibiza,"Tue, 30 Jul 2019",8
3,Tomorrowland presents Dimitri Vegas & Like Mike,Ushuaïa Beach Hotel,"Tue, 30 Jul 2019",7
4,Шпагат,Ibiza Underground,"Tue, 30 Jul 2019",
5,Craig David's TS5 Pool Party,Ibiza Rocks Hotel,"Tue, 30 Jul 2019",
6,Kisstory,Ocean Beach,"Tue, 30 Jul 2019",
7,Diplo & Anitta,Pacha Ibiza,"Tue, 30 Jul 2019",
8,Together - Season 9 - Week 9,Amnesia,"Tue, 30 Jul 2019",
9,Flesh Label Showcase,Folys Cafe Ibiza,"Tue, 30 Jul 2019",


## Write a Function to Retrieve the URL for the Next Page

In [160]:
url='https://www.residentadvisor.net/events/es/ibiza'
base_url='https://www.residentadvisor.net/
more_pages=soup.find('div',id='previous-next')
next_page=more_pages.findAll('a')[-1]
next_page.attrs['href']


'/events/es/ibiza/week/2019-08-06'

In [163]:
def next_page(url):
    base_url='https://www.residentadvisor.net'
    html_page=requests.get(url)
    soup=BeautifulSoup(html_page.content,'html.parser')
    more_pages=soup.find('div',id='previous-next')
    next_page=more_pages.findAll('a')[-1]
    next_page_url=base_url+next_page.attrs['href']
    return next_page_url
next_page('https://www.residentadvisor.net/events/es/ibiza')

'https://www.residentadvisor.net/events/es/ibiza/week/2019-08-06'

In [165]:
next_page('https://www.residentadvisor.net/events/es/ibiza/week/2019-08-06')

'https://www.residentadvisor.net/events/es/ibiza/week/2019-08-13'

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [182]:
current_page='https://www.residentadvisor.net/events/es/ibiza'
df=scrape_events(current_page)
number_of_events=len(df)
number_of_pages=1
while number_of_events<1000:
    current_page=next_page(current_page)
    print(current_page)
    next_df=scrape_events(current_page)
    df=pd.concat([df,next_df],ignore_index=True)
    number_of_events=len(df)
    number_of_pages+=1

https://www.residentadvisor.net/events/es/ibiza/week/2019-08-06
https://www.residentadvisor.net/events/es/ibiza/week/2019-08-13
https://www.residentadvisor.net/events/es/ibiza/week/2019-08-20
https://www.residentadvisor.net/events/es/ibiza/week/2019-08-27
https://www.residentadvisor.net/events/es/ibiza/week/2019-09-03
https://www.residentadvisor.net/events/es/ibiza/week/2019-09-10
https://www.residentadvisor.net/events/es/ibiza/week/2019-09-17
https://www.residentadvisor.net/events/es/ibiza/week/2019-09-24
https://www.residentadvisor.net/events/es/ibiza/week/2019-10-01
https://www.residentadvisor.net/events/es/ibiza/week/2019-10-08
https://www.residentadvisor.net/events/es/ibiza/week/2019-10-15
https://www.residentadvisor.net/events/es/ibiza/week/2019-10-22
https://www.residentadvisor.net/events/es/ibiza/week/2019-10-29


KeyError: 'href'

In [190]:
df.sort_values(['Number_of_Attendees','Event_Date'],ascending=False)

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
349,Paradise,DC-10,"Wed, 21 Aug 2019",92
31,Afterlife,Hï Ibiza,"Thu, 01 Aug 2019",92
323,Circoloco Ibiza,DC-10,"Mon, 19 Aug 2019",92
352,Boris Brejcha,Hï Ibiza,"Wed, 21 Aug 2019",9
133,Lost In Ibiza Boat Party Jamie Jones' Paradise...,Boat - Captain Nemo II,"Wed, 07 Aug 2019",9
817,House Of Silk Meets Garage Nation ( Ibiza Clos...,Eden,"Thu, 26 Sep 2019",9
149,Lost In Ibiza Boat Party Jamie Jones' Paradise...,Boat - Captain Nemo II,"Thu, 08 Aug 2019",9
150,Storytellers presents: Mayan Warrior,Cova Santa,"Thu, 08 Aug 2019",9
565,ABODE Thursdays Amnesia,Amnesia,"Thu, 05 Sep 2019",9
777,Glitterbox,Hï Ibiza,"Sun, 22 Sep 2019",9


## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!