# Less than 7 minutes of Web Scraping using Python and Beautiful Soup

**Scraping La Liga Stats using Python and Beautiful Soup**

Extracting data from the website aka web scraping can save plenty of time and effort. Thanks to Beautiful Soup, the road of web scraping has become even smoother. In this article, we will scrape La Liga 2019–20 stats from a website using Python.

## Importing Libraries

In [1]:
# Importing Libraries
import pandas as pd
import requests as rq
from bs4 import BeautifulSoup
print('Done')

Done


First things first, all the useful libraries must be imported. I have imported pandas to store the stats in a data frame. The Requests library is imported to send a request to the HTTP server while Beautiful Soup is brought in for web elements retrieval.

### Communicating with webpage

In [2]:
# get the url of the webpage
get_url = rq.get("https://www.msn.com/en-us/sports/soccer/la-liga/player-stats")

get_text = get_url.text

soup = BeautifulSoup(get_text, "html.parser")

We have to get the target URL and then parse it using the Beautiful Soup library. Once the soup object is created, you are good to go.

### Extracting web elements

In [3]:
rank = [i.text for i in soup.findAll('td', {"class" : "hide1 rankvalue"})]

player_name = [i.text for i in soup.findAll('td', {"class" : "playername"})]

team_name = [i.text for i in soup.findAll('td', {"class" : "teamname"})]

team_la = [i.text for i in soup.findAll('td', {"class" : "teamtla"})]

games_played = [int(i.findAll('td')[4].text) for i in soup.findAll('tr', {"class" : "rowlink"})]

goals_scored = [int(i.findAll('td')[7].text) for i in soup.findAll('tr', {"class" : "rowlink"})]

assists = [int(i.findAll('td')[8].text) for i in soup.findAll('tr', {"class" : "rowlink"})]

Using Beautiful Soup methods, you can access the web elements and its items. I have majorly used the “findAll” method giving “class” as a parameter to get the list of all items present.

### Storing in a DataFrame

In [4]:
laliga_stats = pd.DataFrame({

    "Rank" : rank,

    "Player Name" : player_name,

    "Team Name" : team_name,

    "Team" : team_la,

    "Games Played" : games_played,

    "Goals" : goals_scored,

    "Assists" : assists
    })
laliga_stats.set_index('Rank',inplace=True)

Finally, store the data into the data frame using the Pandas library. And, you are good to go for your analysis.

Now you can get the Top 10 goalscorers of 2019–20 La Liga season by just writing a line of code:

In [5]:
laliga_stats.head(10)

Unnamed: 0_level_0,Player Name,Team Name,Team,Games Played,Goals,Assists
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,Lionel Andrés Messi Cuccittini,Barcelona,BAR,33,25,21
2,Karim Benzema,Real Madrid,RMA,37,21,8
3,Gerard Moreno Balaguero,Villarreal,VIL,35,18,5
4,Luis Suárez,Barcelona,BAR,28,16,8
5,Raúl García,Athletic Club,ATH,35,15,1
6,Iago Aspas,Celta de Vigo,CEL,37,14,3
7,Lucas Ariel Ocampos,Sevilla,SFC,33,14,3
8,Ante Budimir,Mallorca,MLL,35,13,2
9,Álvaro Borja Morata Martín,Atlético de Madrid,ATM,34,12,2
10,Sergio Ramos García,Real Madrid,RMA,35,11,0


All it takes is Python and Beautiful Soup’s wonder to scrape a website. So, open up Beautiful Soup’s documentation on a new tab and start scraping.

# Thanks for Reading!