# Project Guide: Event Details Scraping



## Overview
This project guide will walk you through the process of scraping event details, including date, time, venue, images, and URL links, from a website. We'll be using Python along with the BeautifulSoup and Requests libraries for web scraping.

## Skills required
- Basic knowledge of Python programming

- Familiarity with HTML structure

## Tools and Libraries
- Python version suitable
- BeautifulSoup4
- Requests
- Pandas

Importing the necessary libraries

In [3]:
import requests

In [4]:
from bs4 import BeautifulSoup

Selecting a website to scrape event details. Ensuring that scraping is allowed by checking the website's "robots.txt" file.

In [5]:
#The website's url is saved in a variable 
events_url = "https://egotickets.com/events"

Using the Requests library to send an HTTP request and get the HTML content of the webpage

In [6]:
#The contents of the webpage are saved into the variable page_contents
page_contents = requests.get(events_url)

In [None]:
#Checking the status code of the html content to verify that it is equal to 200
page_contents.status_code

Create a BeautifulSoup object to parse the HTML content

In [7]:
#Parsing the html content to the variable doc
doc = BeautifulSoup(page_contents.text, 'html.parser')

In [6]:
len(page_contents.text)

148857

## Scraping for the event name 

In [9]:
#Inspecting the html cibtebts to find the html tags cintaining the event names
h3_tags = doc.find_all('h3', {'class': "title uk-text-nowrap uk-text-truncate"})

In [63]:
len(h3_tags)

20

In [11]:
#Printing out the first event tag to verify that it contains the event name
h3_tags[0]

<h3 class="title uk-text-nowrap uk-text-truncate"><a href="/events/naturopathic-clinician-training-ghana-145367aa" title="Naturopathic Clinician Training - Ghana">Naturopathic Clinician Training - Ghana</a></h3>

In [12]:
h3_tags[0].text

'Naturopathic Clinician Training - Ghana'

In [13]:
#Saving the event names into a list named as event_names
events_names = []
for i in range(0,20):
    event_name = h3_tags[i].text
    events_names.append(event_name)

## Scraping for the event venue

In [15]:
#Inspecting the html cibtebts to find the html tags cintaining the event venues
div_tags = doc.find_all('div', {'class' : "venue"})

In [17]:
#Printing out the first event tag to verify that it contains the event venue
div_tags[0]

<div class="venue">
<span>Ghana</span>
</div>

In [18]:
div_tags[0].text.strip()

'Ghana'

In [19]:
#Saving the event names into a list named as event_venues
events_venues = []
for i in range(0,20):
    event_venue = div_tags[i].text.strip()
    events_venues.append(event_venue)

## Scraping for the event details url

In [20]:
#Inspecting the html cibtebts to find the html tags cintaining the event details url
a_tags = h3_tags[0].find_all('a')

In [21]:
len(a_tags)

1

In [22]:
event_link = a_tags[0]['href']

In [23]:
base_url = 'https://egotickets.com'

In [24]:
base_url + event_link

'https://egotickets.com/events/naturopathic-clinician-training-ghana-145367aa'

In [62]:
#Using a for loop to extract all event urls
events_details_url = []
for i in range(0,20):
    a_tag = h3_tags[i].find_all('a')
    event_details = base_url + a_tag[0]['href']
    events_details_url.append(event_details)

In [26]:
events_details_url

['https://egotickets.com/events/naturopathic-clinician-training-ghana-145367aa',
 'https://egotickets.com/events/reflecting-re-engaging-and-re-imagining-higher-education-pedagogy',
 'https://egotickets.com/events/happy-hour-nouveau',
 'https://egotickets.com/events/apostolic-invasion',
 'https://egotickets.com/events/happy-hour-35189',
 'https://egotickets.com/events/ladies-night-bc788cdd',
 'https://egotickets.com/events/splash-bash',
 'https://egotickets.com/events/clock-out-wednesday',
 'https://egotickets.com/events/one-love-wednesday',
 'https://egotickets.com/events/y2k-anight-of-rhythms',
 'https://egotickets.com/events/games-night-36464',
 'https://egotickets.com/events/karoake-night-35473',
 'https://egotickets.com/events/naked-hennessy',
 'https://egotickets.com/events/women-of-valour-official-lunch',
 'https://egotickets.com/events/sip-and-paint-happening-every-thursday-35009',
 'https://egotickets.com/events/roots-culture-every-thursday',
 'https://egotickets.com/events/liv

In [27]:
#Creating a dictionary using the details saved so far
events_dict = {
    'Event Name' : events_names,
    'Event Venue' : events_venues,
    'Event Details Url': events_details_url
}

In [28]:
#I,porting pandas to convert the dictionary into a dataframe
import pandas as pd

In [30]:
df = pd.DataFrame(events_dict)

In [31]:
df

Unnamed: 0,Event Name,Event Venue,Event Details Url
0,Naturopathic Clinician Training - Ghana,Ghana,https://egotickets.com/events/naturopathic-cli...
1,"Reflecting, Re-Engaging and Re-Imagining Highe...",University of Ghana,https://egotickets.com/events/reflecting-re-en...
2,HAPPY HOUR NOUVEAU,Nouveau Accra,https://egotickets.com/events/happy-hour-nouveau
3,Apostolic Invasion,"Spintex, Accra,Accra, Ghana",https://egotickets.com/events/apostolic-invasion
4,Happy Hour,Lupita Ghana,https://egotickets.com/events/happy-hour-35189
5,LADIES NIGHT.,Mtn Ghana East Legon Branch,https://egotickets.com/events/ladies-night-bc7...
6,SPLASH BASH,Offline Lounge,https://egotickets.com/events/splash-bash
7,Clock out wednesday,Sixteen Hundred Bar & Lounge,https://egotickets.com/events/clock-out-wednesday
8,One Love Wednesday,Laboma Beach,https://egotickets.com/events/one-love-wednesday
9,Y2K (Anight of Rhythms),"355 Restaurant and Lounge, Labone - A...",https://egotickets.com/events/y2k-anight-of-rh...


## Functions that scrape the events' website to collect the events names, venues and the links to the individual websites of the events

In [32]:
import requests 
from bs4 import BeautifulSoup

def get_events():
    events_url = 'https://egotickets.com/events'
    response = requests.get(events_url)
    if response.status_code != 200:
        raise Exception('Failed to load page: {}'.format(events_url))
    doc = BeautifulSoup(response.text, 'html.parser')
    return doc

In [33]:
def get_event_names():
    h3_tags = doc.find_all('h3', {'class': "title uk-text-nowrap uk-text-truncate"})
    events_names = []
    for i in range(0,20):
        event_name = h3_tags[i].text
        events_names.append(event_name)
    return events_names

In [34]:
get_event_names()

['Naturopathic Clinician Training - Ghana',
 'Reflecting, Re-Engaging and Re-Imagining Higher Education  Pedagogy',
 'HAPPY HOUR NOUVEAU',
 'Apostolic Invasion',
 'Happy Hour',
 'LADIES NIGHT.',
 'SPLASH BASH',
 'Clock out wednesday',
 'One Love Wednesday',
 'Y2K (Anight of Rhythms)',
 'Games Night',
 'Karoake Night',
 'Naked Hennessy',
 'WOMEN OF VALOUR OFFICIAL LUNCH',
 'SIP AND PAINT (HAPPENING EVERY THURSDAY)',
 'Roots & Culture (Every Thursday)',
 'Live Sessions Featuring BRYMO',
 'HOT TUB THURSDAYS',
 'GAMES NIGHT',
 'Karoake Party']

In [35]:
def get_event_venues():
    div_tags = doc.find_all('div', {'class' : "venue"})
    events_venues = []
    for i in range(0,20):
        event_venue = div_tags[i].text.strip()
        events_venues.append(event_venue)
    return events_venues

In [36]:
get_event_venues()

['Ghana',
 'University of Ghana',
 'Nouveau Accra',
 'Spintex, Accra,Accra, Ghana',
 'Lupita Ghana',
 'Mtn Ghana East Legon Branch',
 'Offline Lounge',
 'Sixteen Hundred Bar & Lounge',
 'Laboma Beach',
 '355 Restaurant and Lounge, Labone - A...',
 'Junction 5 Lounge',
 'Krom Pub',
 'Signature Lounge',
 'Accra',
 'Grind & Aroma',
 'Monte Carlo',
 'Rooftop Lekki Chinese Restaurant',
 'Hermanos Jungle',
 'BIG LEAGUES SPORTS AND GAMES LOUNGE',
 'THE WOODS']

In [37]:
def get_event_details_url():
    base_url = 'https://egotickets.com'
    events_details_url = []
    for i in range(0,20):
        a_tag = h3_tags[i].find_all('a')
        event_details = base_url + a_tag[0]['href']
        events_details_url.append(event_details)
    return events_details_url

In [38]:
get_event_details_url()

['https://egotickets.com/events/naturopathic-clinician-training-ghana-145367aa',
 'https://egotickets.com/events/reflecting-re-engaging-and-re-imagining-higher-education-pedagogy',
 'https://egotickets.com/events/happy-hour-nouveau',
 'https://egotickets.com/events/apostolic-invasion',
 'https://egotickets.com/events/happy-hour-35189',
 'https://egotickets.com/events/ladies-night-bc788cdd',
 'https://egotickets.com/events/splash-bash',
 'https://egotickets.com/events/clock-out-wednesday',
 'https://egotickets.com/events/one-love-wednesday',
 'https://egotickets.com/events/y2k-anight-of-rhythms',
 'https://egotickets.com/events/games-night-36464',
 'https://egotickets.com/events/karoake-night-35473',
 'https://egotickets.com/events/naked-hennessy',
 'https://egotickets.com/events/women-of-valour-official-lunch',
 'https://egotickets.com/events/sip-and-paint-happening-every-thursday-35009',
 'https://egotickets.com/events/roots-culture-every-thursday',
 'https://egotickets.com/events/liv

## Event Date Time Venue

In [39]:
response_event_link = requests.get('https://egotickets.com/events/white-on-jeans-5db9be66')

In [40]:
doc_event_link = BeautifulSoup(response_event_link.text, 'html.parser')

In [41]:
doc_event_link.status_code

In [42]:
len(doc_event_link.text)

3595

In [43]:
date_tags = doc_event_link.find_all('div', {'class': "foot-note uk-margin-small ego-white-text"})

In [44]:
events_datetime = []
events_new_venue = []
for i in range(0,2):
    if i == 0:
        date_time = date_tags[i].text
        events_datetime.append(date_time)
    else:
        new_venue = date_tags[i].text
        events_new_venue.append(new_venue)

In [45]:
date_tags[0]

<div class="foot-note uk-margin-small ego-white-text"><span class="uk-margin-right uk-visible@s" uk-icon="calendar"></span>Sat, Jan 27. 12AM - 8PM</div>

In [46]:
date_tags[1]

<div class="foot-note uk-margin-small ego-white-text"><span class="uk-margin-right uk-visible@s" uk-icon="location"></span>KRUNA The Club, Accra</div>

In [47]:
events_new_venue

['KRUNA The Club, Accra']

In [48]:
events_datetime

['Sat, Jan 27. 12AM - 8PM']

In [49]:
def get_event_detail_doc():
    get_event_details_url()
    docs = []
    for i in range(0,20):
        url = events_details_url[i]
        response = requests.get(url)
        if response .status_code != 200:
            raise Exception('Failed to load page: {}'.format(events_url))
        doc = BeautifulSoup(response.text, 'html.parser')
        docs.append(doc)

In [50]:
def get_event_details(doc):
    date_tags = doc.find_all('div', {'class': "foot-note uk-margin-small ego-white-text"})
    date_time = date_tags[0].text
    new_venue = date_tags[1].text
    return date_time, new_venue

In [51]:
events_datetime

['Sat, Jan 27. 12AM - 8PM']

In [52]:
get_event_detail_doc()

In [53]:
def get_events_lists():
    get_event_details_url()
    docs = []
    for i in range(0,20):
        url = events_details_url[i]
        response = requests.get(url)
        if response .status_code != 200:
            raise Exception('Failed to load page: {}'.format(events_url))
        doc = BeautifulSoup(response.text, 'html.parser')
        docs.append(doc)
        
    date_time = []
    new_venue = []
    for i in range(0,20):
        date_tags = docs[i].find_all('div', {'class': "foot-note uk-margin-small ego-white-text"})
        date_time.append(date_tags[0].text)
        new_venue.append(date_tags[1].text)
    events_dict = {'Date and Time': date_time,
                  'Locatiion and Venue': new_venue}
    import pandas as pd
    events_df = pd.DataFrame(events_dict)
    return events_df

In [54]:
get_events_lists()

Unnamed: 0,Date and Time,Locatiion and Venue
0,"Mon, Feb 5. 8AM - Fri 10PM","Ghana, Accra"
1,"Mon, Feb 5. 9:30AM - Thu 3:30PM","University of Ghana, Accra"
2,"Mon, Feb 5. 5PM - Thu 8PM","Nouveau Accra, Accra"
3,"Mon, Feb 5. 6PM - 9PM","Spintex, Accra,Accra, Ghana, Accra"
4,"Tue, Feb 6. 5PM - 7PM","Lupita Ghana, Accra"
5,"Tue, Feb 6. 7PM - Tue 10PM","Mtn Ghana East Legon Branch, Accra"
6,"Wed, Feb 7. 10AM - Thu 12AM","Offline Lounge, Accra"
7,"Wed, Feb 7. 5PM - 10PM","Sixteen Hundred Bar & Lounge, Accra"
8,"Wed, Feb 7. 7PM - Wed 10PM","Laboma Beach, Accra"
9,"Wed, Feb 7. 7PM - Wed 10PM","355 Restaurant and Lounge, Labone - Accra, Accra"


In [55]:
import requests 
from bs4 import BeautifulSoup

def get_events():
    events_url = 'https://egotickets.com/events'
    response = requests.get(events_url)
    if response.status_code != 200:
        raise Exception('Failed to load page: {}'.format(events_url))
    doc = BeautifulSoup(response.text, 'html.parser')
    get_event_names()
    get_event_venues()
    get_event_details_url()
    events_dict = {
    'Event Name' : events_names,
    'Event Venue' : events_venues,
    'Event Details Url': events_details_url}
    import pandas as pd
    events1_df = pd.DataFrame(events_dict)
    return events1_df 

def get_event_names():
    h3_tags = doc.find_all('h3', {'class': "title uk-text-nowrap uk-text-truncate"})
    events_names = []
    for i in range(0,20):
        event_name = h3_tags[i].text
        events_names.append(event_name)
        
def get_event_venues():
    div_tags = doc.find_all('div', {'class' : "venue"})
    events_venues = []
    for i in range(0,20):
        event_venue = div_tags[i].text.strip()
        events_venues.append(event_venue)
    return events_venues
def get_event_details_url():
    base_url = 'https://egotickets.com'
    events_details_url = []
    for i in range(0,20):
        a_tag = h3_tags[i].find_all('a')
        event_details = base_url + a_tag[0]['href']
        events_details_url.append(event_details)
    return events_details_url
    
def get_events_df():
    get_event_details_url()
    docs = []
    for i in range(0,20):
        url = events_details_url[i]
        response = requests.get(url)
        if response .status_code != 200:
            raise Exception('Failed to load page: {}'.format(events_url))
        doc = BeautifulSoup(response.text, 'html.parser')
        docs.append(doc)
        
    date_time = []
    new_venue = []
    for i in range(0,20):
        date_tags = docs[i].find_all('div', {'class': "foot-note uk-margin-small ego-white-text"})
        date_time.append(date_tags[0].text)
        new_venue.append(date_tags[1].text)
    events_dict = {'Date and Time': date_time,
                  'Locatiion and Venue': new_venue}
    import pandas as pd
    events2_df = pd.DataFrame(events_dict)
    return events2_df

In [57]:
df_2 = get_events_df()

In [59]:
df_events = pd.concat([df, df_2], axis = 1)

In [60]:
df_events

Unnamed: 0,Event Name,Event Venue,Event Details Url,Date and Time,Locatiion and Venue
0,Naturopathic Clinician Training - Ghana,Ghana,https://egotickets.com/events/naturopathic-cli...,"Mon, Feb 5. 8AM - Fri 10PM","Ghana, Accra"
1,"Reflecting, Re-Engaging and Re-Imagining Highe...",University of Ghana,https://egotickets.com/events/reflecting-re-en...,"Mon, Feb 5. 9:30AM - Thu 3:30PM","University of Ghana, Accra"
2,HAPPY HOUR NOUVEAU,Nouveau Accra,https://egotickets.com/events/happy-hour-nouveau,"Mon, Feb 5. 5PM - Thu 8PM","Nouveau Accra, Accra"
3,Apostolic Invasion,"Spintex, Accra,Accra, Ghana",https://egotickets.com/events/apostolic-invasion,"Mon, Feb 5. 6PM - 9PM","Spintex, Accra,Accra, Ghana, Accra"
4,Happy Hour,Lupita Ghana,https://egotickets.com/events/happy-hour-35189,"Tue, Feb 6. 5PM - 7PM","Lupita Ghana, Accra"
5,LADIES NIGHT.,Mtn Ghana East Legon Branch,https://egotickets.com/events/ladies-night-bc7...,"Tue, Feb 6. 7PM - Tue 10PM","Mtn Ghana East Legon Branch, Accra"
6,SPLASH BASH,Offline Lounge,https://egotickets.com/events/splash-bash,"Wed, Feb 7. 10AM - Thu 12AM","Offline Lounge, Accra"
7,Clock out wednesday,Sixteen Hundred Bar & Lounge,https://egotickets.com/events/clock-out-wednesday,"Wed, Feb 7. 5PM - 10PM","Sixteen Hundred Bar & Lounge, Accra"
8,One Love Wednesday,Laboma Beach,https://egotickets.com/events/one-love-wednesday,"Wed, Feb 7. 7PM - Wed 10PM","Laboma Beach, Accra"
9,Y2K (Anight of Rhythms),"355 Restaurant and Lounge, Labone - A...",https://egotickets.com/events/y2k-anight-of-rh...,"Wed, Feb 7. 7PM - Wed 10PM","355 Restaurant and Lounge, Labone - Accra, Accra"
