Web Scraping the Ryman Calendar
In this exercise, your objective is to use BeautifulSoup in order to obtain a dataset of upcoming events at the Ryman. This information is available at https://ryman.com/events/, but you will take the contents of this website and convert it into a pandas DataFrame.

The website splits the events across multiple pages, but start by just working on the first page. Later on in the exercise, you'll take what you've done for the first page and apply it across other pages.

1) Start by using either the inspector or by viewing the page source. Can you identify a tag that might be helpful for finding the names of all performers? For now, just worry about the headliner and don't worry about the opener. (Eg. For Vince Gill, featuring Wendy Moten, we only care about Vince Gill.) Make use of this to create a list containing just the names of each headliner.

In [24]:
import requests
from bs4 import BeautifulSoup as BS
from IPython.core.display import HTML
import pandas as pd

In [121]:
URL = 'https://ryman.com/events/'

response = requests.get(URL)

In [122]:
soup = BS(response.text)

In [123]:
soup.find('h2')

<h2 class="tribe-events-visuallyhidden">Events Search and Views Navigation</h2>

In [124]:
h2tag = soup.findAll('h2')
h2tag

[<h2 class="tribe-events-visuallyhidden">Events Search and Views Navigation</h2>,
 <h2 class="tribe-events-list-event-title">
 <a class="tribe-event-url" href="https://ryman.com/event/mt-joy-102922/" rel="bookmark" title="Mt. Joy">
 		Mt. Joy	</a>
 </h2>,
 <h2 class="tribe-events-list-event-title">
 <a class="tribe-event-url" href="https://ryman.com/event/sidewalk-sessions-103022/" rel="bookmark" title="Ryman Sidewalk Sessions">
 		Ryman Sidewalk Sessions	</a>
 </h2>,
 <h2 class="tribe-events-list-event-title">
 <a class="tribe-event-url" href="https://ryman.com/event/marcus-mumford/" rel="bookmark" title="Marcus Mumford">
 		Marcus Mumford	</a>
 </h2>,
 <h2 class="tribe-events-list-event-title">
 <a class="tribe-event-url" href="https://ryman.com/event/puscifer/" rel="bookmark" title="Puscifer">
 		Puscifer	</a>
 </h2>,
 <h2 class="tribe-events-list-event-title">
 <a class="tribe-event-url" href="https://ryman.com/event/gipsy-kings/" rel="bookmark" title="Gipsy Kings featuring Nicolas

In [125]:
h2tag_headliner = [x.get('title') for x in h2tag]
h2tag_headliner

[None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None]

In [126]:
soup.findAll('h2')[1].find('a').get('title')

'Mt. Joy'

In [127]:
h2atag = [x.find('a') for x in h2tag]
h2atag

[None,
 <a class="tribe-event-url" href="https://ryman.com/event/mt-joy-102922/" rel="bookmark" title="Mt. Joy">
 		Mt. Joy	</a>,
 <a class="tribe-event-url" href="https://ryman.com/event/sidewalk-sessions-103022/" rel="bookmark" title="Ryman Sidewalk Sessions">
 		Ryman Sidewalk Sessions	</a>,
 <a class="tribe-event-url" href="https://ryman.com/event/marcus-mumford/" rel="bookmark" title="Marcus Mumford">
 		Marcus Mumford	</a>,
 <a class="tribe-event-url" href="https://ryman.com/event/puscifer/" rel="bookmark" title="Puscifer">
 		Puscifer	</a>,
 <a class="tribe-event-url" href="https://ryman.com/event/gipsy-kings/" rel="bookmark" title="Gipsy Kings featuring Nicolas Reyes">
 		Gipsy Kings featuring Nicolas Reyes	</a>,
 <a class="tribe-event-url" href="https://ryman.com/event/sidewalk-sessions-110222/" rel="bookmark" title="Ryman Sidewalk Sessions">
 		Ryman Sidewalk Sessions	</a>,
 <a class="tribe-event-url" href="https://ryman.com/event/cole-swindell-110222/" rel="bookmark" title="

In [128]:
events = soup.findAll('a', attrs={'class' : 'tribe-event-url'})
events

[<a class="tribe-event-url" href="https://ryman.com/event/mt-joy-102922/" rel="bookmark" title="Mt. Joy">
 		Mt. Joy	</a>,
 <a class="tribe-event-url" href="https://ryman.com/event/sidewalk-sessions-103022/" rel="bookmark" title="Ryman Sidewalk Sessions">
 		Ryman Sidewalk Sessions	</a>,
 <a class="tribe-event-url" href="https://ryman.com/event/marcus-mumford/" rel="bookmark" title="Marcus Mumford">
 		Marcus Mumford	</a>,
 <a class="tribe-event-url" href="https://ryman.com/event/puscifer/" rel="bookmark" title="Puscifer">
 		Puscifer	</a>,
 <a class="tribe-event-url" href="https://ryman.com/event/gipsy-kings/" rel="bookmark" title="Gipsy Kings featuring Nicolas Reyes">
 		Gipsy Kings featuring Nicolas Reyes	</a>,
 <a class="tribe-event-url" href="https://ryman.com/event/sidewalk-sessions-110222/" rel="bookmark" title="Ryman Sidewalk Sessions">
 		Ryman Sidewalk Sessions	</a>,
 <a class="tribe-event-url" href="https://ryman.com/event/cole-swindell-110222/" rel="bookmark" title="Cole Sw

In [129]:
headliners = [x.get('title') for x in events]
headliners

['Mt. Joy',
 'Ryman Sidewalk Sessions',
 'Marcus Mumford',
 'Puscifer',
 'Gipsy Kings featuring Nicolas Reyes',
 'Ryman Sidewalk Sessions',
 'Cole Swindell',
 'Ryman Sidewalk Sessions',
 'Cole Swindell',
 'The Lone Bellow',
 'We The Kingdom',
 'The Revivalists',
 'The Revivalists',
 'Dayglow',
 'Bono: Stories of Surrender',
 'Chris Renzema',
 'Craig Morgan',
 'Louder with Crowder',
 'Louder with Crowder',
 'Disney Junior Live On Tour']

2) Next, try and find a tag that could be used to find the date and time for each show. Extract these into two lists, one containing the date and the other containing the time. (Eg. THURSDAY, AUGUST 4, 2022 AT 8:00 PM CDT should be split into August 4, 2022 and 8:00 PM CDT.)

In [130]:
date_and_time = soup.findAll('time')

In [131]:
date_and_time = [x.get_text() for x in date_and_time]

In [132]:
date_and_time

['Saturday, October 29, 2022 at 8:00 PM CDT',
 'Sunday, October 30, 2022 at 6:00 PM CDT',
 'Sunday, October 30, 2022 at 8:00 PM CDT',
 'Monday, October 31, 2022 at 7:30 PM CDT',
 'Tuesday, November 1, 2022 at 7:30 PM CDT',
 'Wednesday, November 2, 2022 at 5:30 PM CDT',
 'Wednesday, November 2, 2022 at 7:30 PM CDT',
 'Thursday, November 3, 2022 at 5:30 PM CDT',
 'Thursday, November 3, 2022 at 7:30 PM CDT',
 'Friday, November 4, 2022 at 8:00 PM CDT',
 'Saturday, November 5, 2022 at 7:00 PM CDT',
 'Sunday, November 6, 2022 at 7:30 PM CST',
 'Monday, November 7, 2022 at 7:30 PM CST',
 'Tuesday, November 8, 2022 at 7:30 PM CST',
 'Wednesday, November 9, 2022 at 8:00 PM CST',
 'Thursday, November 10, 2022 at 7:30 PM CST',
 'Friday, November 11, 2022 at 8:00 PM CST',
 'Saturday, November 12, 2022 at 7:00 PM CST',
 'Saturday, November 12, 2022 at 9:30 PM CST',
 'Sunday, November 13, 2022 at 4:00 PM CST']

In [139]:
dt1 = [x.split(' at ', 1) for x in date_and_time]

In [140]:
dt1

[['Saturday, October 29, 2022', '8:00 PM CDT'],
 ['Sunday, October 30, 2022', '6:00 PM CDT'],
 ['Sunday, October 30, 2022', '8:00 PM CDT'],
 ['Monday, October 31, 2022', '7:30 PM CDT'],
 ['Tuesday, November 1, 2022', '7:30 PM CDT'],
 ['Wednesday, November 2, 2022', '5:30 PM CDT'],
 ['Wednesday, November 2, 2022', '7:30 PM CDT'],
 ['Thursday, November 3, 2022', '5:30 PM CDT'],
 ['Thursday, November 3, 2022', '7:30 PM CDT'],
 ['Friday, November 4, 2022', '8:00 PM CDT'],
 ['Saturday, November 5, 2022', '7:00 PM CDT'],
 ['Sunday, November 6, 2022', '7:30 PM CST'],
 ['Monday, November 7, 2022', '7:30 PM CST'],
 ['Tuesday, November 8, 2022', '7:30 PM CST'],
 ['Wednesday, November 9, 2022', '8:00 PM CST'],
 ['Thursday, November 10, 2022', '7:30 PM CST'],
 ['Friday, November 11, 2022', '8:00 PM CST'],
 ['Saturday, November 12, 2022', '7:00 PM CST'],
 ['Saturday, November 12, 2022', '9:30 PM CST'],
 ['Sunday, November 13, 2022', '4:00 PM CST']]

In [141]:
date = [x[0] for x in dt1]
date

['Saturday, October 29, 2022',
 'Sunday, October 30, 2022',
 'Sunday, October 30, 2022',
 'Monday, October 31, 2022',
 'Tuesday, November 1, 2022',
 'Wednesday, November 2, 2022',
 'Wednesday, November 2, 2022',
 'Thursday, November 3, 2022',
 'Thursday, November 3, 2022',
 'Friday, November 4, 2022',
 'Saturday, November 5, 2022',
 'Sunday, November 6, 2022',
 'Monday, November 7, 2022',
 'Tuesday, November 8, 2022',
 'Wednesday, November 9, 2022',
 'Thursday, November 10, 2022',
 'Friday, November 11, 2022',
 'Saturday, November 12, 2022',
 'Saturday, November 12, 2022',
 'Sunday, November 13, 2022']

In [142]:
time = [x[1] for x in dt1]
time

['8:00 PM CDT',
 '6:00 PM CDT',
 '8:00 PM CDT',
 '7:30 PM CDT',
 '7:30 PM CDT',
 '5:30 PM CDT',
 '7:30 PM CDT',
 '5:30 PM CDT',
 '7:30 PM CDT',
 '8:00 PM CDT',
 '7:00 PM CDT',
 '7:30 PM CST',
 '7:30 PM CST',
 '7:30 PM CST',
 '8:00 PM CST',
 '7:30 PM CST',
 '8:00 PM CST',
 '7:00 PM CST',
 '9:30 PM CST',
 '4:00 PM CST']

3) Take the three lists you created on parts 1 and 2 and convert them into a pandas DataFrame.


In [143]:
dict = {'headliner': headliners, 'date': date, 'time': time} 
ryman_calander = pd.DataFrame(dict)

In [144]:
ryman_calander

Unnamed: 0,headliner,date,time
0,Mt. Joy,"Saturday, October 29, 2022",8:00 PM CDT
1,Ryman Sidewalk Sessions,"Sunday, October 30, 2022",6:00 PM CDT
2,Marcus Mumford,"Sunday, October 30, 2022",8:00 PM CDT
3,Puscifer,"Monday, October 31, 2022",7:30 PM CDT
4,Gipsy Kings featuring Nicolas Reyes,"Tuesday, November 1, 2022",7:30 PM CDT
5,Ryman Sidewalk Sessions,"Wednesday, November 2, 2022",5:30 PM CDT
6,Cole Swindell,"Wednesday, November 2, 2022",7:30 PM CDT
7,Ryman Sidewalk Sessions,"Thursday, November 3, 2022",5:30 PM CDT
8,Cole Swindell,"Thursday, November 3, 2022",7:30 PM CDT
9,The Lone Bellow,"Friday, November 4, 2022",8:00 PM CDT


4) Now, you need to take what you created for the first page and apply it across multiple rest of the pages so that you can scrape all inductees. Notice how the url changes when you click the "More Events" button at the top of the page. Check that the code that you wrote for the first page still works for page 2. Once you have verified that your code will still work, write a for loop that will cycle through the first five pages of events.


In [145]:
URL2 = 'https://ryman.com/events/list/?tribe_event_display=list&tribe_paged=2'
response = requests.get(URL2)

In [146]:
soup = BS(response.text)

In [150]:
events = soup.findAll('a', attrs={'class' : 'tribe-event-url'})
headliners = [x.get('title') for x in events]
headliners

['Lynyrd Skynyrd',
 'Charley Crockett',
 'Tauren Wells',
 'Sometimes Y',
 'Trombone Shorty & Orleans Avenue',
 'Dropkick Murphys',
 'No Small Endeavor',
 'Opry NextStage Live In Concert',
 'Christmas 4 Kids',
 'Omar Apollo',
 'W.A.S.P.',
 'Brett Eldredge',
 'Brett Eldredge',
 'Brett Eldredge',
 'Natalie Grant & Danny Gokey',
 'A Day To Remember',
 'The Piano Guys',
 'Jason Bonham’s Led Zeppelin Evening',
 'The Mavericks',
 'The Mavericks']

In [151]:
date_and_time = soup.findAll('time')

In [152]:
date_and_time = [x.get_text() for x in date_and_time]
date_and_time

['Sunday, November 13, 2022 at 7:30 PM CST',
 'Monday, November 14, 2022 at 7:30 PM CST',
 'Tuesday, November 15, 2022 at 7:30 PM CST',
 'Thursday, November 17, 2022 at 7:30 PM CST',
 'Friday, November 18, 2022 at 8:00 PM CST',
 'Saturday, November 19, 2022 at 7:30 PM CST',
 'Sunday, November 20, 2022 at 7:30 PM CST',
 'Sunday, November 20, 2022 at 7:30 PM CST',
 'Monday, November 21, 2022 at 7:00 PM CST',
 'Tuesday, November 22, 2022 at 7:30 PM CST',
 'Wednesday, November 23, 2022 at 7:30 PM CST',
 'Friday, November 25, 2022 at 8:00 PM CST',
 'Saturday, November 26, 2022 at 8:00 PM CST',
 'Sunday, November 27, 2022 at 8:00 PM CST',
 'Monday, November 28, 2022 at 7:30 PM CST',
 'Tuesday, November 29, 2022 at 8:00 PM CST',
 'Wednesday, November 30, 2022 at 7:30 PM CST',
 'Wednesday, November 30, 2022 at 8:00 PM CST',
 'Thursday, December 1, 2022 at 8:00 PM CST',
 'Friday, December 2, 2022 at 8:00 PM CST']

In [153]:
dt1 = [x.split(' at ', 1) for x in date_and_time]
date = [x[0] for x in dt1]
time = [x[1] for x in dt1]

In [154]:
dict = {'headliner': headliners, 'date': date, 'time': time} 
ryman_calander2 = pd.DataFrame(dict)

In [155]:
ryman_calander2

Unnamed: 0,headliner,date,time
0,Lynyrd Skynyrd,"Sunday, November 13, 2022",7:30 PM CST
1,Charley Crockett,"Monday, November 14, 2022",7:30 PM CST
2,Tauren Wells,"Tuesday, November 15, 2022",7:30 PM CST
3,Sometimes Y,"Thursday, November 17, 2022",7:30 PM CST
4,Trombone Shorty & Orleans Avenue,"Friday, November 18, 2022",8:00 PM CST
5,Dropkick Murphys,"Saturday, November 19, 2022",7:30 PM CST
6,No Small Endeavor,"Sunday, November 20, 2022",7:30 PM CST
7,Opry NextStage Live In Concert,"Sunday, November 20, 2022",7:30 PM CST
8,Christmas 4 Kids,"Monday, November 21, 2022",7:00 PM CST
9,Omar Apollo,"Tuesday, November 22, 2022",7:30 PM CST


In [156]:
URL = 'https://ryman.com/events/list/?tribe_event_display=list&tribe_paged='
  
for page in range(1,6):
    
  
    req = requests.get(URL + str(page) + '/')
    soup = BS(req.text)
  
    events = soup.findAll('a', attrs={'class' : 'tribe-event-url'})
    headliners = [x.get('title') for x in events]
    date_and_time = soup.findAll('time')
    date_and_time = [x.get_text() for x in date_and_time]
    dt1 = [x.split(' at ', 1) for x in date_and_time]
    date = [x[0] for x in dt1]
    time = [x[1] for x in dt1]
    dict = {'headliner': headliners, 'date': date, 'time': time} 
    ryman_calander = pd.DataFrame(dict)
    print(ryman_calander)

                              headliner                         date  \
0                               Mt. Joy   Saturday, October 29, 2022   
1               Ryman Sidewalk Sessions     Sunday, October 30, 2022   
2                        Marcus Mumford     Sunday, October 30, 2022   
3                              Puscifer     Monday, October 31, 2022   
4   Gipsy Kings featuring Nicolas Reyes    Tuesday, November 1, 2022   
5               Ryman Sidewalk Sessions  Wednesday, November 2, 2022   
6                         Cole Swindell  Wednesday, November 2, 2022   
7               Ryman Sidewalk Sessions   Thursday, November 3, 2022   
8                         Cole Swindell   Thursday, November 3, 2022   
9                       The Lone Bellow     Friday, November 4, 2022   
10                       We The Kingdom   Saturday, November 5, 2022   
11                      The Revivalists     Sunday, November 6, 2022   
12                      The Revivalists     Monday, November 7, 

5) Bonus #1:: Add to your data frame the opening act for all shows that list an opener.

6) Bonus #2: If you click the "MORE INFO" button for an event, it will take you to a page which shows ticket prices. Write code that can be used to retrieve the ticket prices for each show that you have scraped. Make sure that your code can handle cases where the show is sold out (eg. https://ryman.com/event/revivalists/).

In [168]:
URL = 'https://ryman.com/events/list/?tribe_event_display=list&tribe_paged=1'

response = requests.get(URL)
soup = BS(response.text)


In [169]:
openers = soup.findAll('span', {'class' : 'opener'})
openers 

[<span class="opener">with The Brook &amp; The Bluff</span>,
 <span class="opener">with Ben Patrick</span>,
 <span class="opener">with Night Club</span>,
 <span class="opener">with Shutterdog</span>,
 <span class="opener">2nd Show Added!</span>,
 <span class="opener">with Ashley Cooke and Dylan Marlowe</span>,
 <span class="opener">with Michael Leatherman</span>,
 <span class="opener">with Ashley Cooke and Dylan Marlowe</span>,
 <span class="opener">with BAILEN</span>,
 <span class="opener">with Cory Asbury plus special guests</span>,
 <span class="opener">with Paris Jackson</span>,
 <span class="opener">2nd Show Added!</span>,
 <span class="opener">with Paris Jackson</span>,
 <span class="opener">with special guest Ritt Momney</span>,
 <span class="opener">An evening of words, music and some mischief ...</span>,
 <span class="opener">with Jervis Campbell &amp; special guests</span>,
 <span class="opener">with special guest Ray Fulcher</span>,
 <span class="opener">Dave Landau &amp; St

In [170]:
openers = [x.get_text() for x in openers]

In [171]:
openers

['with The Brook & The Bluff',
 'with Ben Patrick',
 'with Night Club',
 'with Shutterdog',
 '2nd Show Added!',
 'with Ashley Cooke and Dylan Marlowe',
 'with Michael Leatherman',
 'with Ashley Cooke and Dylan Marlowe',
 'with BAILEN',
 'with Cory Asbury plus special guests',
 'with Paris Jackson',
 '2nd Show Added!',
 'with Paris Jackson',
 'with special guest Ritt Momney',
 'An evening of words, music and some mischief ...',
 'with Jervis Campbell & special guests',
 'with special guest Ray Fulcher',
 'Dave Landau & Steven Crowder',
 '2nd Show Added',
 'Dave Landau & Steven Crowder',
 ' ',
 'Costume Palooza']

In [172]:
openers = pd.DataFrame(openers, columns=['openers'])

In [173]:
openers

Unnamed: 0,openers
0,with The Brook & The Bluff
1,with Ben Patrick
2,with Night Club
3,with Shutterdog
4,2nd Show Added!
5,with Ashley Cooke and Dylan Marlowe
6,with Michael Leatherman
7,with Ashley Cooke and Dylan Marlowe
8,with BAILEN
9,with Cory Asbury plus special guests


In [174]:
openers['openers'] = openers['openers'].str.replace('with', '')

In [175]:
openers

Unnamed: 0,openers
0,The Brook & The Bluff
1,Ben Patrick
2,Night Club
3,Shutterdog
4,2nd Show Added!
5,Ashley Cooke and Dylan Marlowe
6,Michael Leatherman
7,Ashley Cooke and Dylan Marlowe
8,BAILEN
9,Cory Asbury plus special guests


In [177]:
URL = 'https://ryman.com/events/list/?tribe_event_display=list&tribe_paged='
  
for page in range(1,6):
    
  
    req = requests.get(URL + str(page) + '/')
    soup = BS(req.text)
  
    events = soup.findAll('a', attrs={'class' : 'tribe-event-url'})
    headliners = [x.get('title') for x in events]
    date_and_time = soup.findAll('time')
    date_and_time = [x.get_text() for x in date_and_time]
    dt1 = [x.split(' at ', 1) for x in date_and_time]
    date = [x[0] for x in dt1]
    time = [x[1] for x in dt1]
    openers = soup.findAll('span', {'class' : 'opener'})
    openers = [x.get_text() for x in openers]
    dict = {'headliner': headliners, 'opener': openers, 'date': date, 'time': time} 
    ryman_calander = pd.DataFrame(dict)
    print(ryman_calander)

ValueError: arrays must all be same length