## Example 1: extracting a paragraph from wikipedia page

The following function converts Erik Durm's wikipedia page into a soup object and extracts the `<p>` element that starts with "In the 2013–14 Bundesliga season...". The function return the Beautiful Soup's `Tag` object itself.

In [291]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [292]:
import datetime
import json
import sys
import random
import requests
import scipy

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
import pandas as pd

from bs4 import BeautifulSoup
from IPython.display import HTML, display, Image
from pathlib import Path

In [295]:
def extract_bundesliga_p_tag(wiki_path):
        '''
        Extracts the <p> tag from ErikDurmWiki that starts with:
        "In the 2013-14 Bundesliga season.

        Parameters:
        -----------
        wiki_path : (str)
            Path to Erik Durm's wikipedia page in the Data folder.

        Returns:
        --------
        p_tag : (bs4.element.tag)   
            Beautiful Soup's tag object containing the paragraph in question.
        ''' 
        
        p_tag  = None 
        # read html file with Beautiful Soup 
        with open(wiki_path, "r", encoding = "utf-8") as wiki_file:
            string_content = wiki_file.read()
            soup = BeautifulSoup(string_content,'lxml')
        line = 'In the 2013–14 Bundesliga season' # the target line we are searching for 
        paragraphs = soup.find_all('p') # find all paragraphs 
        for paragraph in paragraphs:
            if line in paragraph.text: # search paragraph with the target line 
                p_tag = paragraph        
        return p_tag 
    
    
    # call the functio 
wiki_path = "./Data/web_scraping_erik_durm_wiki.html"
p_tag = extract_bundesliga_p_tag(wiki_path)

print(p_tag.text)
print(type(p_tag))

In the 2013–14 Bundesliga season, Durm was inducted into Borussia Dortmund's first team and, on 10 August 2013, debuted for BVB in the Bundesliga; coming on in the 87th minute for Robert Lewandowski as a substitute in BVB's 4–0 win over FC Augsburg.[11][12] Durm debuted in the UEFA Champions League on 1 October 2013 in a 3–0 victory over French club Olympique Marseille.
<class 'bs4.element.Tag'>


## Example 2: extract song title and lyrics 

The following function converts the webpage ([Pink Floyd Lyrics]) containing the lyrics to Pink Floyd's Wish You Were Here into a soup and returns a list with the song's title and lyrics. The lyrics is formatted as a list of strings, with each string being a line in the lyrics. 


In [300]:
def extract_pink_floyd_lyrics(lyrics_path):
    '''
    Extracts the title and lyrics of Pink Floyd's Wish You Were Here song.

    Parameters:
    -----------
    lyrics_path : (str)
        Path to the lyrics page in the Data folder.

    Returns:
    --------
    lyrics_info : (list)
        List where the first element is the song's title (string)
        and second element is a list of strings with the lyrics
    '''
    
    lyrics_info  = []
    # read html file and convert it to a soup object 
    with open(lyrics_path, "r", encoding = "utf-8") as wiki_file:
        string_content = wiki_file.read()
        soup = BeautifulSoup(string_content,'lxml')
    
    # find song name  
    title = soup.find('title') 
    print(type(title))
    song_name = title.text.split('-')[-1].strip()
    print(song_name)
    lyrics_info.append(song_name)
    
    
    # find lyrics 
    div = soup.find('div',class_ = "ringtone")
    lyrics_tag =div.find_next_sibling('div')
    lyrics_str = lyrics_tag.text
    lyrics_list = lyrics_str.strip().split('\n')
    while("" in lyrics_list):
        lyrics_list.remove("")
    lyrics_info.append(lyrics_list)
    return lyrics_info

lyrics_path =  './Data/pink_floyd_wish.html'
extract_pink_floyd_lyrics(lyrics_path)

<class 'bs4.element.Tag'>
Wish You Were Here


['Wish You Were Here',
 ['So, so you think you can tell Heaven from Hell, blue skies from pain.',
  'Can you tell a green field from a cold steel rail?',
  'A smile from a veil?',
  'Do you think you can tell?',
  'Did they get you to trade your heroes for ghosts?',
  'Hot ashes for trees?',
  'Hot air for a cool breeze?',
  'Cold comfort for change?',
  'Did you exchange a walk on part in the war for a lead role in a cage?',
  'How I wish, how I wish you were here.',
  "We're just two lost souls swimming in a fish bowl, year after year,",
  'Running over the same old ground.',
  'What have we found?',
  'The same old fears.',
  'Wish you were here.']]