# Web Scraping Lab

## Questions

Resident Advisor is an events listing website for electronic music.

Go to www.residentadvisor.net/events.  This is the url we'll be starting with for this lab.  For question 1, just use this url.  In the next two, you'll use country and region in the format: http://www.residentadvisor.net/country/region/ i.e. us/losangeles/.  Be sure to explore the web pages in both the browswer and the HTML file.  You'll need both to really understand what's going on.

1. Which venues are hosting events this week?
2. Make a function which returns the events this week given region and country (this will take two arguments)
    - return the event name, link, and list of artists
    - function returns list of ['event name', 'www.linkaddress.com', ['artist1','artist2','artist3']]
3. Create a function which returns the users attending 
4. Bonus


### Question 1 - Which venues are hosting events this week?

In [1]:
import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.residentadvisor.net/events')
c = r.content
soup = BeautifulSoup(c, 'html.parser')

In [2]:
events =  soup.findAll('h1', {'class':"event-title"})

In [3]:
venues = [a.find('span').get_text().replace('at ','') for a in events]
venues

['Valley Arts Community Gallery']

Your solution output should look like: '101bklyn', '291 Hooper St', '99 Scott Ave','Alphaville', 'Analog Bkny'...

### Question 2 - Write a function to which returns the events this week given region and country.

In [6]:
def cook_soup(country, region):
    #build url for events in (inputs) country and region
    url = 'http://www.residentadvisor.net/events/'+country.lower().replace(' ','')+'/'+region.lower().replace(' ','')
    #get html from webpage 
    r = requests.get(url)
    c = r.content
    soup = BeautifulSoup(c, 'html.parser')
    return soup

In [7]:
def find_events(country, region):

    soup = cook_soup(country, region)
    
    #instantiate full list.  Full list will be a list of lists.  Each event is a list [event name & venue, url, lineup].
    full_list = []
    #access each event in soup
    for event in soup.find_all(class_='bbox'):
        #instantiate event list within for loop so it will reset with each event
        event_list = []
        #find if event has label 'h1', since soup includes classes that include bbox, 
        #rather than only classes specifically named "bbox"
        if event.find('h1'):
            #append the text from 'h1', the event title and location, to the event_list 
            event_list.append(event.find('h1').get_text())
            #append the url to event_list
            event_list.append('https://www.residentadvisor.net'+event.find('h1').find('a').get('href'))
            #some events don't have a lineup, so check for lineup 
            if event.find('div'):
                #if a lineup is listed, append the lineup to event_list
                event_list.append(event.find('div').get_text())
            else:
                #if no lineup, append "no lineup" to event_list
                event_list.append('No Lineup')
            #append event_list to full_list
            full_list.append(event_list)

    return full_list


In [8]:
# you should be able to output something like this
find_events('us','sanfrancisco')[0]

['Foals DJ Set at Halcyon',
 'https://www.residentadvisor.net/events/1230341',
 'Foals']

### Question 3 - Create a function which returns the numbers of users attending each event this week, given country and region.  Then plot a histogram

In [8]:
def users_attending(country, region):
    
    soup = cook_soup(country, region)

    #instantiate attendance. This will be a list of 
    attendance = []
    
    #access each event in soup
    for event in soup.find_all(class_='bbox'):
        #find if event has label 'h1', since soup includes classes that contain 'bbox', 
        #rather than only classes specifically named "bbox"
        if event.find('h1'):
            #not all events have attendance listed, so find attendance
            if event.find('p'):
                #if attendance is listed, append it to the attendance list
                attendance.append(int(event.find('p').get_text().replace(' Attending','')))
    return attendance 
    
    
    
    
    
    

In [11]:
# you should be able to output something like this
users_attending('us','newyork')[:10]

[8, 5, 4, 3, 49, 18, 10, 3, 2, 11]

In [None]:
#now use the function to make a histogram
import plotly.offline as offline
import plotly.graph_objs as go

offline.init_notebook_mode()

offline.iplot([go.Histogram(x = users_attending('jp','tokyo'))])

## Bonus: Build object relations between artists, venues, and events with sqlalchemy!
Think about what each table should include - URLs, dates, etc.