# Scraping Concerts - Lab

## Introduction

Now that you've seen how to scrape a simple website, it's time to again practice those skills on a full-fledged site!
In this lab, you'll practice your scraping skills on a music website: https://www.residentadvisor.net.
## Objectives

You will be able to:
* Scrape events from a website
* Follow links to those events to retrieve further information
* Clean and store scraped data

## View the Website

For this lab, you'll be scraping the https://www.residentadvisor.net website. Start by navigating to the events page [here](https://www.residentadvisor.net/events) in your browser.

<img src="images/ra.png">

In [1]:
import requests
from bs4 import BeautifulSoup
import re
import numpy as np
import pandas as pd

In [2]:
def prepare_soup(url):
    html_page = requests.get(url)
    soup = BeautifulSoup(html_page.content,"html.parser")
    return soup

In [3]:
def get_attendees(soup):
    attendees = soup.find(id="MembersFavouriteCount").text.strip()
    return attendees
    

In [4]:
def attendence(sub_container):
    attendence_array = np.zeros(len(sub_container))
    for i in range(0,len(sub_container)):
        prime_attendence = sub_container[i].find("p", {"class":"attending"})
        if prime_attendence:
            attendence_array[i] = prime_attendence.find("span").text
            #print(f"{i} {prime_attendence.find('span').text}")
        else: 
            link = sub_container[i].find("a").get("href")
            temp_url = "https://www.residentadvisor.net/events"+link[7:]
            temp_soup = prepare_soup(temp_url)
            attendence_array[i] = get_attendees(temp_soup)
    return attendence_array

In [5]:
def scrape_events(events_page_url):
    #Setup
    global url
    url = events_page_url
    soup = prepare_soup(url)
    container = soup.find(id="event-listing")
    sub_container = container.select(".bbox")
    
    #Run Function
    event_name = [event.find("a").text for event in sub_container]
    venue = [event.find("span").text[3:] for event in sub_container]
    event_date = [event.parent.find("time").text for event in sub_container]
    attendees = attendence(sub_container)
    infos = [event_name,venue,event_date,attendees]
    df = pd.DataFrame(infos).transpose()
    df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
    return df

In [None]:
#final_df = scrape_events("https://www.residentadvisor.net/events")

In [None]:
#final_df

## Open the Inspect Element Feature

Next, open the inspect element feature from your web browser in order to preview the underlying HTML associated with the page.

### Backup csv

In [None]:

#infos = [event_name,venue,event_date,attendees]
#df = pd.DataFrame(infos).transpose()
#df.columns = ["Event_Name", "Venue", "Event_Date", "Number_of_Attendees"]
#df.to_csv("Scrape1.backup")

In [None]:
#df = pd.read_csv("Scrape1.backup").drop("Unnamed: 0",axis=1)

## Write a Function to Scrape all of the Events on the Given Page Events Page

The function should return a Pandas DataFrame with columns for the Event_Name, Venue, Event_Date and Number_of_Attendees.

## Write a Function to Retrieve the URL for the Next Page

In [None]:
#url = "https://www.residentadvisor.net/events"
#next_page("https://www.residentadvisor.net/events/us/newyork/week/2019-05-24")

In [6]:
def next_page(url):
    #Your code here
    soup = prepare_soup(url)
    link = soup.find(id = "previous-next").find("a",{"ga-event-action":"Next "}).get("href")[7:]
    next_page_url = "https://www.residentadvisor.net/events" +link
    return next_page_url

## Scrape the Next 1000 Events for Your Area

Display the data sorted by the number of attendees. If there is a tie for the number attending, sort by event date.

In [10]:
#Your code here
def scraper(url,num_entries):
    df = pd.DataFrame()
    while len(df)<= num_entries:
        print(len(df))
        len1 = len(df)
        df1= scrape_events(url)
        name = f"Event_Scraper_{len(df)}.csv"
        df.to_csv(name)
        print(f"Stored {name}")
        df = pd.concat([df,df1])
        len2 = len(df)
        url = next_page(url)
        if not len2-len1:
            print("no more entries found")
            return df
    return df
        

In [21]:
#scraper("https://www.residentadvisor.net/events",1000)

In [13]:
final_df = pd.read_csv("Event_Scraper_560.csv")

In [20]:
final_df = final_df.drop("Unnamed: 0", axis=1).sort_values(["Number_of_Attendees","Event_Date"], ascending = False)

In [25]:
final_df.groupby("Venue").sum().sort_values("Number_of_Attendees", ascending = False)

Unnamed: 0_level_0,Number_of_Attendees
Venue,Unnamed: 1_level_1
Brooklyn Mirage,4295.0
Knockdown Center,1594.0
Circle Line Cruises,1067.0
BASEMENT,1029.0
Elsewhere,953.0
Nowadays,724.0
Sugar Hill Disco,515.0
99 Scott Ave,453.0
TBA - New York,437.0
Good Room,346.0


## Summary 

Congratulations! In this lab, you successfully scraped a website for concert event information!