# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [2]:
#Load the https://www.residentadvisor.net/events page in your browser.
from bs4 import BeautifulSoup
import requests
import re
import pandas as pd

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

In [None]:
#Open the inspect element feature in your browser

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

In [3]:
response = requests.get("https://www.residentadvisor.net/events/us/seattle")
soup = BeautifulSoup(response.content, 'html.parser')

In [12]:
event_listings = soup.find('div', id="event-listing")
event_listings

<div class="fl col4" id="event-listing">
<ul class="list" id="items">
<li><p class="eventDate date"><a href="/events.aspx?ai=46&amp;v=day&amp;mn=6&amp;yr=2019&amp;dy=13"><span>Thu, 13 Jun 2019 /</span></a></p></li><li class=""><article class="event-item clearfix" itemscope="" itemtype="http://data-vocabulary.org/Event"><span style="display:none;"><time datetime="2019-06-13T00:00" itemprop="startDate">2019-06-13T00:00</time></span><a href="/events/1269543"><img height="76" src="/images/events/flyer/2019/6/us-0613-1269543-list.jpg" width="152"/></a><div class="bbox"><h1 class="event-title" itemprop="summary"><a href="/events/1269543" itemprop="url" title="Event details of Field Trip 76: Yolanda Be Cool">Field Trip 76: Yolanda Be Cool</a> <span>at <a href="/club.aspx?id=68203">Q Nightclub</a></span></h1><div class="grey event-lineup">Yolanda Be Cool</div><p class="attending"><span>1</span> Attending</p></div></article></li><li><p class="eventDate date"><a href="/events.aspx?ai=46&amp;v=da

In [8]:
event_listings1 = soup.find('div', class_= "strip slide small")
event_listings1

<div class="strip slide small" data-type="events" id="events-listing">
<ul class="list small clearfix popular" style="padding: 0;">
<li class="">
<article class="highlight-top">
<p>Sat, 15 Jun 2019</p>
<a ga-event-action="popular-events" ga-event-category="events-page" ga-on="click" href="/events/1273647"><img class="nohide" src="/images/events/flyer/2019/6/us-0615-1273647-list.jpg"/></a>
<p class="counter nohide">
<span>3</span> attending
</p>
<a ga-event-action="popular-events" ga-event-category="events-page" ga-on="click" href="/events/1273647">
<h1>
Depth: Truncate &amp; Shyboi with Sharlese
</h1>
</a>
<p class="copy nohide">
<a href="\club.aspx?id=88733">Kremwerk</a>
</p>
</article>
</li><li class="">
<article class="highlight-top">
<p>Sun, 16 Jun 2019</p>
<a ga-event-action="popular-events" ga-event-category="events-page" ga-on="click" href="/events/1260006"><img class="nohide" src="/images/events/flyer/2019/6/us-0616-1260006-list.jpg"/></a>
<p class="counter nohide">
<span>6</sp

In [10]:
entries = event_listings.findAll('li')
print(len(entries), entries[0])

9 <li><p class="eventDate date"><a href="/events.aspx?ai=46&amp;v=day&amp;mn=6&amp;yr=2019&amp;dy=13"><span>Thu, 13 Jun 2019 /</span></a></p></li>


In [None]:
rows = []
for entry in entries:
    #Is it a date? If so, set current date.
    date = entry.find('p', class_="eventDate date")
    event = entry.find('h1', class_="event-title")
    if event:
        details = event.text.split(' at ')
        event_name = details[0].strip()
        venue = details[1].strip()
        try:
            n_attendees = int(re.match("(\d*)", entry.find('p', class_="attending").text)[0])
        except:
            n_attendees = np.nan
        rows.append([event_name, venue, cur_date, n_attendees])
    elif date:
        cur_date = date.text
    else:
        continue
df = pd.DataFrame(rows)
df.head()

In [None]:
def scrape_events(events_page_url):
    #Your code here
    response = requests.get(events_page_url)
    soup = BeautifulSoup(response.content, 'html.parser')
    entries = event_listings.findAll('li')
    rows = []
    for entry in entries:
        #Is it a date? If so, set current date.
        date = entry.find('p', class_="eventDate date")
        event = entry.find('h1', class_="event-title")
        if event:
            details = event.text.split(' at ')
            event_name = details[0].strip()
            venue = details[1].strip()
            try:
                n_attendees = int(re.match("(\d*)", entry.find('p', class_="attending").text)[0])
            except:
                n_attendees = np.nan
            rows.append([event_name, venue, cur_date, n_attendees])
        elif date:
            cur_date = date.text
        else:
            continue
    df = pd.DataFrame(rows)
    df.head()
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

## Write a Function to Retrieve the URL for the Next Page

In [None]:
def next_page(url):
    #Your code here
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [None]:
#Your code here

## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!