# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Create a full scraping pipeline that involves traversing over many pages of a website, dealing with errors and storing data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [5]:
# Load the https://www.residentadvisor.net/events page in your browser.

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [6]:
# Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [7]:
from bs4 import BeautifulSoup
import requests
    
html_page = requests.get('https://www.residentadvisor.net/events') # Make a get request to retrieve the page
soup = BeautifulSoup(html_page.content, 'html.parser') # Pass the page contents to beautiful soup for parsing

### Exploration for final function

In [8]:
# Find first event name
name_h1 = soup.find('h1', class_='event-title')
name = name_h1.find('a')
name.text

'Dark Desires with Lee K (Circulate Records)'

In [9]:
# Find first event venue
ven_h1 = soup.find('h1', class_='event-title')
venue = ven_h1.find('span')
venue.text

'at The North Door, Austin'

In [10]:
# Find first event date
date = soup.find('time')
date.text

'2019-11-08T00:00'

In [11]:
# Find first event attendees
att = soup.find('p', class_='attending')
att.text

'6 Attending'

In [12]:
def scrape_events(events_page_url='https://www.residentadvisor.net/events'):

    from bs4 import BeautifulSoup
    import requests
    import pandas as pd

    # Make a get request to retrieve the page
    html_page = requests.get(events_page_url)
    # Pass the page contents to beautiful soup for parsing
    soup = BeautifulSoup(html_page.content, 'html.parser')
    # Get all events from soup to iterate over
    event_list = soup.find('div', id='event-listing')
    events = event_list.findAll(
        'article', itemtype="http://data-vocabulary.org/Event")

    # Create containers for desired variables
    names_list = []
    venues_list = []
    dates_list = []
    attendees_list = []

    # Iterate through list of events at correct tags to get name, venue, date, attendees for each
    for event in events:

        name_h1 = event.find('h1', class_='event-title')
        name = name_h1.find('a')
        names_list.append(name.text)

        ven_h1 = event.find('h1', class_='event-title')
        venue = ven_h1.find('span')
        venues_list.append(venue.text[3:])

        date = event.find('time')
        dates_list.append(date.text[:10])

        # Conditional statement to handle if event has no attendees
        if event.find('p', class_='attending'):
            att = event.find('p', class_='attending')
            attendees_list.append(att.text[:1])
        else:
            attendees_list.append(0)

    df = pd.DataFrame([names_list, venues_list, dates_list,
                       attendees_list]).transpose()
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

In [13]:
scrape_events(events_page_url='https://www.residentadvisor.net/events/us/newyork')

Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,"Trïbel: ✺ Alice Iguchi, Medina, Cïaga ✺ Nov 7",Downtime,2019-11-07,5
1,Deep Root Sessions At Public Arts with OFFAIAH,Public Arts,2019-11-07,7
2,"The Funky Seshwa with Cedar Sound Workshop, Wi...",Good Room,2019-11-07,2
3,Banana e Paisano with Alexi Delano,TBA Brooklyn,2019-11-07,1
4,Darker Than Wax with Paurro,Le Bain,2019-11-07,7
...,...,...,...,...
112,"11-12- Salsa, Classics, Disco, Dance!-1 Year A...",Taj Lounge,2019-11-12,0
113,"Quo Vadis presents: Alessandro Cortini, 51717 ...",Good Room,2019-11-13,1
114,House of Vogue with MikeQ & Qween Beat,House Of Yes,2019-11-13,2
115,Exile: Ron Jackson / A.M.D,Bossa Nova Civic Club,2019-11-13,1


## Write a Function to Retrieve the URL for the Next Page

In [14]:
# Exploration before function writing
next_url_ext = soup.find('a', attrs = {'ga-event-action': "Next "})['href']
next_url_ext

'/events/us/texas/week/2019-11-14'

In [15]:
def next_page(url):

    from bs4 import BeautifulSoup
    import requests

    # Make a get request to retrieve the page
    html_page = requests.get(url)
    # Pass the page contents to beautiful soup for parsing
    soup = BeautifulSoup(html_page.content, 'html.parser')
    # Find the next url extension using the next button tag
    next_url_ext = soup.find(
        'a', attrs={'ga-event-action': "Next "}).attrs['href']

    # Combine base url with next page url extension to get next_page_url
    next_page_url = 'https://www.residentadvisor.net/' + next_url_ext

    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [17]:
import pandas as pd

all_dfs = []
row_count = 0
url = 'https://www.residentadvisor.net/events/us/newyork'
while row_count <= 1000:
    df = scrape_events(url)
    row_count += len(df)
    print(f'Total row count is {row_count}')
    if next_page(url) is True:
        url = next_page(url)
    all_dfs.append(df)

final_df = pd.concat(all_dfs)
display(final_df.head())
print(len(final_df))

Total row count is 117
Total row count is 234
Total row count is 351
Total row count is 468
Total row count is 585
Total row count is 702
Total row count is 819
Total row count is 936
Total row count is 1053


Unnamed: 0,Event_Name,Venue,Event_Date,Number_of_Attendees
0,"Trïbel: ✺ Alice Iguchi, Medina, Cïaga ✺ Nov 7",Downtime,2019-11-07,5
1,Deep Root Sessions At Public Arts with OFFAIAH,Public Arts,2019-11-07,7
2,"The Funky Seshwa with Cedar Sound Workshop, Wi...",Good Room,2019-11-07,2
3,Banana e Paisano with Alexi Delano,TBA Brooklyn,2019-11-07,1
4,Darker Than Wax with Paurro,Le Bain,2019-11-07,7


1053


## Summary 

Congratulations! In this lab, you successfully developed a pipeline to scrape a website for concert event information!