# IKEA item checker

This notebooks checks whether a product is available at the IKEA website. 

*Most of the following steps are based on the book Web Scraping with Python, by Ryan Mitchell*.

In [9]:
# Import libraries
from urllib.request import urlopen
from bs4 import BeautifulSoup
import time
import pandas as pd
import re
import numpy as np

## 1. Get website information

In [10]:
url = 'https://www.ikea.com/mx/es/p/poaeng-sillon-chapa-abedul-hillared-beige-s09306567/'

We will use `urlopen(url).read()`, but one could also get the html code using `requests.get(url).content`. We use `try` to handle possible exceptions. 

In [49]:
try:
    html = urlopen(url)
    print('Status code: ', html.getcode())

except HTTPError as e: # Page is not found
    print(e)

except URLError as e: # Page is down
    print('The server could not be found!')
    
else:
    pass

Status code:  200


Let's transform the html code to a Beautiful Soup object (a more manageable format).

In [50]:
bs = BeautifulSoup(html, 'lxml') # This parser is slightly better than 'html.parser'
                                 # Another good parser is 'html5lib'

In [56]:
print(bs.prettify())

<!DOCTYPE html>
<html dir="ltr" lang="es-MX">
 <head>
  <meta charset="utf-8"/>
  <title>
   POÄNG Sillón - chapa abedul/Hillared beige - IKEA
  </title>
  <meta content="IKEA - POÄNG, Sillón, chapa abedul/Hillared beige, El respaldo alto proporciona un buen apoyo al cuello. Con nuestro surtido variado de cojines podrás renovar fácilmente el aspecto de POÄNG y de la habitación. Si lo combinas con el taburete POÄNG, el sillón será aún más cómodo." name="description"/>
  <meta content="IKEA - POÄNG, Sillón, chapa abedul/Hillared beige, El respaldo alto proporciona un buen apoyo al cuello. Con nuestro surtido variado de cojines podrás renovar fácilmente el aspecto de POÄNG y de la habitación. Si lo combinas con el taburete POÄNG, el sillón será aún más cómodo." property="og:description"/>
  <meta content="POÄNG, Sillón" name="keywords"/>
  <meta content="index, follow" name="robots"/>
  <meta content="POÄNG Sillón - chapa abedul/Hillared beige - IKEA" property="og:title"/>
  <meta content

Now, let's try to get the **name of the article**. The following code checks if the tag exists, in case it is changed in the future.

In [59]:
try:
    article = bs.body.h1.get_text()

except AttributeError as e:
    print('Tag was not found')
    
else:
    if article == None:
        print('Tag was not found')
    else:
        print(article)

POÄNGSillón, chapa abedul/Hillared beige


Now we can save it all as a function:

In [60]:
def get_article_name(url):
    """Returns the name and availability of an IKEA product"""
    
    try:
        html = urlopen(url)
    except HTTPError as e: # Page is not found
        print(e)
    except URLError as e: # Page is down
        print('The server could not be found!')
    
    else:
        try: 
            bs = BeautifulSoup(html, 'lxml')
            article = bs.body.h1.get_text()
        except AttributeError as e: # Tag is not found
            print('Tag was not found')
        
        if article == None:
            print('Tag was not found')

        return article

In [61]:
get_article_name(url)

'POÄNGSillón, chapa abedul/Hillared beige'

In [None]:
def get_article_availability(url):
    