# Webscraping - lesson 7 - 20/04

- Making requests
- Working with BeautifulSoup
- Combining scraping & API's


## Making a request

In [11]:
import requests
import re

url = "https://en.wikipedia.org/wiki/Elephant"

response = requests.get(url)
if response.status_code == 200:
    html = response.text
    print(html)

    regex = "elephant|Elephant"
    matches = re.findall(regex, html)
    print(len(matches)) # count the occurrences of the word Elephant/elephant
else:
    print("try again!")

<!DOCTYPE html>
<html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-language-alert-in-sidebar-enabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-enabled vector-feature-main-menu-pinned-disabled vector-feature-limited-width-enabled vector-feature-limited-width-content-enabled vector-feature-zebra-design-disabled" lang="en" dir="ltr">
<head>
<meta charset="UTF-8"/>
<title>Elephant - Wikipedia</title>
<script>document.documentElement.className="client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-language-alert-in-sidebar-enabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-enabled vector-feature-main-menu-pinned-disabled vector-feature-limited-width-enabled vector-feature-limited-width-content-enabled vector-feature-ze

## Working with BeautifulSoup

Install the package --> beautifulsoup4

In [9]:
! pip install beautifulsoup4




[notice] A new release of pip is available: 23.0.1 -> 23.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [22]:
import requests
from bs4 import BeautifulSoup

url = "https://en.wikipedia.org/wiki/Elephant"

response = requests.get(url)
html = response.text
soup = BeautifulSoup(html)

# print(soup.prettify())

print(soup.title)
print(soup.title.get_text())

# print(soup.get_text())

print(soup.get_text().count("Elephant"))
print(soup.get_text().count("elephant"))


h3_tags = soup.find("h3")
print(h3_tags)

h3_tags = soup.find_all("h3")
print(h3_tags)

for element in h3_tags:
    print(element.get_text())


<title>Elephant - Wikipedia</title>
Elephant - Wikipedia
184
362
<h3><span class="mw-headline" id="Evolution_and_extinct_relatives">Evolution and extinct relatives</span></h3>
[<h3><span class="mw-headline" id="Evolution_and_extinct_relatives">Evolution and extinct relatives</span></h3>, <h3><span class="mw-headline" id="Ears_and_eyes">Ears and eyes</span></h3>, <h3><span class="mw-headline" id="Trunk">Trunk</span></h3>, <h3><span class="mw-headline" id="Teeth">Teeth</span></h3>, <h3><span class="mw-headline" id="Skin">Skin</span></h3>, <h3><span id="Legs.2C_locomotion.2C_and_posture"></span><span class="mw-headline" id="Legs,_locomotion,_and_posture">Legs, locomotion, and posture</span></h3>, <h3><span class="mw-headline" id="Organs">Organs</span></h3>, <h3><span class="mw-headline" id="Body_temperature">Body temperature</span></h3>, <h3><span class="mw-headline" id="Social_organisation">Social organisation</span></h3>, <h3><span class="mw-headline" id="Sexual_behaviour">Sexual behavi

In [31]:
# getting all languages

import requests
from bs4 import BeautifulSoup

url = "https://en.wikipedia.org/wiki/Elephant"

response = requests.get(url)
soup = BeautifulSoup(response.text)

li_tags = soup.find_all("li",{'class': "interlanguage-link"})

for tag in li_tags:
    print(tag.a["title"], " --> ", tag.get_text())


Gajah – Achinese  -->  Acèh
Olifant – Afrikaans  -->  Afrikaans
Elefant – Swiss German  -->  Alemannisch
ዝሆን – Amharic  -->  አማርኛ
Elpend – Old English  -->  Ænglisc
فيل – Arabic  -->  العربية
Elefant – Aragonese  -->  Aragonés
Elefandu – Aromanian  -->  Armãneashti
Elephantidae – Asturian  -->  Asturianu
हाथी – Awadhi  -->  अवधी
Tapi'itĩmbuku – Guarani  -->  Avañe'ẽ
فیل – South Azerbaijani  -->  تۆرکجه
Kunjara – Balinese  -->  Basa Bali
Sama – Bambara  -->  Bamanankan
হাতি – Bangla  -->  বাংলা
Gajah – Banjar  -->  Banjar
Chhiūⁿ – Min Nan Chinese  -->  Bân-lâm-gú
Фил – Bashkir  -->  Башҡортса
Слон – Belarusian  -->  Беларуская
Слон – Belarusian (Taraškievica orthography)  -->  Беларуская (тарашкевіца)
Elepante – Central Bikol  -->  Bikol Central
Слон – Bulgarian  -->  Български
གླང་ཆེན། – Tibetan  -->  བོད་ཡིག
Slon – Bosnian  -->  Bosanski
Olifant – Breton  -->  Brezhoneg
Заан – Russia Buriat  -->  Буряад
Elefants – Catalan  -->  Català
Slon – Czech  -->  Čeština
Nzou – Shona  -->  ChiS

## Combining scraping & API's

Scrape the line-up from Tomorrowland's mainstage webpage and print the names of the artists performing: https://www.tomorrowland.com/en/festival/line-up/stages/friday-21-july-2023


Extra: find the spotify-links of the artists, use a python package or an api

In [51]:
import requests
from bs4 import BeautifulSoup

url = "https://www.tomorrowland.com/en/festival/line-up/stages/friday-21-july-2023"

response = requests.get(url)
soup = BeautifulSoup(response.text)

# <div class="eventday" data-eventday-id="132" style="display:none;" data-eventday="Friday 21 July 2023">

div = soup.find("div", {"data-eventday":"Friday 21 July 2023"})

stages = div.find_all("div", {"class": "stage"})

for stage in stages:
    if "Mainstage" in stage.div.h4.get_text():
        for li in stage.find_all("li"):
            artist = li.get_text().strip()
            print(artist)

            url = "https://spotify-scraper.p.rapidapi.com/v1/track/download/soundcloud"

            querystring = {"track": artist}

            headers = {
                "X-RapidAPI-Key": "<your own key>",
                "X-RapidAPI-Host": "spotify-scraper.p.rapidapi.com"
            }

            response = requests.request("GET", url, headers=headers, params=querystring)
            artists = response.json()["spotifyTrack"]["artists"]
            for item in artists:
                if item["name"].strip().lower() == artist.lower().strip():
                    print(item["shareUrl"])




Amelie Lens
https://open.spotify.com/artist/5Ho1vKl1Uz8bJlk4vbmvmf
Anfisa Letyago
https://open.spotify.com/artist/7icoOm5fKKPo49jVxoj1Cq
Daybreak session: Claptone
FAST BOY
https://open.spotify.com/artist/56Qz2XwGj7FxnNKrfkWjnb
Henri PFR
https://open.spotify.com/artist/6n9XmMc3mX18mrTHYOCPIq
MATTN
Mc Stretch
https://open.spotify.com/artist/6oIpax63yT9ajyekkcqv0L
Steve Angello
https://open.spotify.com/artist/4FqPRilb0Ja0TKG3RS3y4s
Sunnery James & Ryan Marciano
https://open.spotify.com/artist/7kABWMhjA5GIl9PBEasBPt
The Chainsmokers
https://open.spotify.com/artist/69GGBxA162lTqCwzJG5jLp
Tiësto
https://open.spotify.com/artist/2o5jDhtHVPhrJdv3cEQ99Z
Vini Vici
https://open.spotify.com/artist/29zsVzEH33dD5QqxeL8dvy
