# 7. Web Scraping

Web  scraping  is  the  practice  of  gathering  data  through  any  means  otherthan a program interacting with an API (or, obviously, through a human using a webbrowser).  This  is  most  commonly  accomplished  by  writing  an  automated  programthat queries a web server, requests data (usually in the form of the HTML and otherfiles  that  comprise  web  pages),  and  then  parses  that  data  to  extract  needed  information.

## 7.1 Selenium
Selenium automates browsers. That's it! <br>
Selenium is a Python library and tool used for automating web browsers to do a number of tasks. One of such is web-scraping to extract useful data and information that may be otherwise unavailable. <br>
**For this course, we use Chrome.**

### 7.1 Installing Libraries
We need to install these two libraries

In [None]:
!pip install selenium
!pip install webdriver-manager

### 7.2 Calling Libraries

In [2]:
# this library is to manipulate browser
from selenium import webdriver

# it allows you to work with differen versions of drivers
# We call ChromeDriver
from webdriver_manager.chrome import ChromeDriverManager

### 7.3 Launch Driver
This code opens a Chrome Driver. We are going to use it to go navigate on the web.

In [24]:
driver = webdriver.Chrome( ChromeDriverManager().install() )



Current google-chrome version is 96.0.4664
Get LATEST chromedriver version for 96.0.4664 google-chrome
Driver [C:\Users\Anzony\.wdm\drivers\chromedriver\win32\96.0.4664.45\chromedriver.exe] found in cache


In [25]:
type( driver )

selenium.webdriver.chrome.webdriver.WebDriver

`driver` is an `selenium.webdriver.chrome.webdriver.WebDriver` object. This object has some attributes that will help us to navigate on the web.

In [26]:
# we open a page using the link
driver.get( "https://www.convocatoriascas.com/" )

Now, you can see in the driver that we are in [this link](https://www.convocatoriascas.com/).

**Best Practices before working**

1. Maximize the browser

In [27]:
driver.maximize_window()

2. Set the Browser Zoom Level to 100 percent

In [33]:
driver.execute_script("document.body.style.zoom='100%'")

### 7.4. Identifying elements in a web page

To identify elements of a webpage, we need to inspect the webpage. Open the driver and press `Ctrl`+ `Shift` + `I`.

#### One Element
|Method|Description|
|---|---|
|find_element_by_id| Use id.|
|find_element_by_name| Use name.|
|find_element_by_xpath| Use Xpath.|
|find_element_by_tag_name| Use HTML tag.|
|find_element_by_class_name| Use class name.|
|find_element_by_css_selector| Use css selector.|

#### Multiple  elements
|Method|Description|
|---|---|
|find_elements_by_id| Use id.|
|find_elements_by_name| Use name.|
|find_elements_by_xpath| Use Xpath.|
|find_elements_by_tag_name| Use HTML tag.|
|find_elements_by_class_name| Use class name.|
|find_elements_by_css_selector| Use css selector.|

### 7.4.1. Xpath
XPath in Selenium is an XML path used for navigation through the HTML structure of the page. It is a syntax or language for finding any element on a web page using XML path expression.

The basic format of XPath in selenium is explained below with screen shot.
<img src="../_images/x_path.png">

**DO NOT COMPLICATE!**
Finding the XPath of a element:
1. Go to the element
2. Right click
3. Inspect - You may have to do it twice.
4. Go to the selected line
5. Right click
7. Copy 
8. Copy Full Xpath

**Example**

We are going to select `Economistas` option and make a click. Use `find_element_by_xpath` and click.

In [40]:
economistas_option = driver.find_element_by_xpath( '/html/body/div[2]/div/ul[7]/li[2]/a' )

In [None]:
# See the object
print(economistas_option)

In [45]:
# get the text
print( economistas_option.text )

ECONOMISTAS


In [46]:
# make a click
economistas_option.click()

### 7.4.1. HTML
HTML stands for HyperText Markup Language. You can deduce that it’s a language for creating web pages. It’s not a programming language like Python or Java, but it’s a markup language. It describes the elements of a page through tags characterized by angle brackets.

1. The document always begins and ends using `<html>` and `</html>`.
2. `<body></body>` constitutes the visible part of HTML document.
3. `<h1>` to `<h3>` tags are defined for the headings.

#### 7.4.1.1. HTML Headings
HTML headings are defined with the `<h1>` to `<h6>` tags.
`<h1>` defines the most important heading. `<h6>` defines the least important heading.

We can use text cells since markdown reads html tags.

<h1>This is heading 1</h1>
<h2>This is heading 2</h2>
<h3>This is heading 3</h3>

#### 7.4.1.2. HTML Paragraphs
HTML paragraphs are defined with the `<p>` tag.
`<br>` tag is similar to `"\n"`.

<br>
<p>My first paragraph.</p> <br>
<p>This is another paragraph for this text cell.</p>

#### 7.4.1.3. HTML Links
HTML links are defined with the <a> tag:

<a href="http://bayes.cs.ucla.edu/jp_home.html">This is a link for Judea Pearl Website</a>

#### 7.4.1.3. Unordered HTML List
An unordered list starts with the `<ul>` tag. Each list item starts with the `<li>` tag.

<ul>
  <li>Coffee</li>
  <li>Tea</li>
  <li>Milk</li>
</ul>

#### 7.4.1.4. Ordered HTML List
An ordered list starts with the `<ol>` tag. Each list item starts with the `<li>` tag.

<ol>
  <li>Coffee</li>
  <li>Tea</li>
  <li>Milk</li>
</ol>

#### 7.4.1.4. HTML Tables

A table in HTML consists of table cells inside rows and columns. Each table cell is defined by a `<td>` and a `</td>` tag. Each table row starts with a `<tr>` and end with a `</tr>` tag.

<table>
  <tr>
    <th>Manager</th>
    <th>Club</th>
    <th>Nationality</th>
  </tr>
  <tr>
    <td>Mikel Arteta</td>
    <td>Arsenal</td>
    <td>Spain</td>
  </tr>
  <tr>
    <td>Thomas Tuchel</td>
    <td>Chelsea</td>
    <td>Germany</td>
  </tr>
</table>

#### 7.4.1.5. HTML Iframes

An HTML iframe is used to display a web page within a web page.


<!DOCTYPE html>
<html>
  
<head>
    <title>HTML iframe src Attribute</title>
</head>
  
<body style="text-align: center">
    <h1>Diploma</h1>
    <h2>HTML iframe</h2>
    <iframe>
          
        <!DOCTYPE html>
        <html>

        <head>
            <title>New html</title>
        </head>

        <body style="text-align: center">
            <h1>Diploma2</h1>
            <h2>HTML iframe</h2>
            <iframe>

            </iframe>
        </body>

        </html>
    </iframe>
</body>
  
</html>

#### 7.4.1.6. HTML Tags - Key

|Tag|Description|
|---|---|
|`<h1>` to `<h6>`|	Defines HTML headings|
|`<ul>`|	Defines an unordered list|
|`<ol>`|	Defines an ordered list|
|`<p>`|	Defines a paragraph|
|`<a>`|	It is termed as anchor tag and it creates a hyperlink or link.|
|`<div>`|	It defines a division or section within HTML document.|
|`<strong>`|	It is used to define important text.|
|`<table>`|	It is used to present data in tabular form or to create a table within HTML document.|
|`<td>`|	It is used to define cells of an HTML table which contains table data|
|`<iframe>`|	Defines an inline frame|

**Recomendation** <br>
We do not recomend to use `tag` at first time since most web pages use nested tags and it is difficult to define a element using HTML tag. However, it is great to find elements that is inside another located element in the web. Let's see the example.

Select the first block and take it using Xpath.

In [47]:
block1 = driver.find_element_by_xpath( "/html/body/section/section[1]/article[1]/div[2]" )

From this block, we select the `h2` tag and all the `p` tags. We need to use `find_element_by_tag_name`. <br>
We are selecting the unique `h2` inside the already selected `block1`.

In [51]:
# select tag
h2_tag = block1.find_element_by_tag_name( 'h2' )

In [52]:
# print text
h2_tag.text

'CONVOCATORIA DEVIDA [SERVICIO CIVIL]: 1 Plaza - Economía, Administración, Derecho, Sociología, Antropología, Otros'

In [55]:
# select all p tags in block1
p_tags = block1.find_elements_by_tag_name( 'p' )
len( p_tags )
# There are three `p` tags in block1

3

In [57]:
# store all `p` tags
p_tags_text = []
for p in p_tags:
    p_tags_text.append( p.text )

In [58]:
p_tags_text

['Entidad: DEVIDA',
 'Departamentos: Cusco',
 'Remuneración: S/. 12,214.29 Soles']

## 7.5 Cleaning Results - Regex
A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. We suggest you to use [this link](https://regex101.com/r/yzMkTg/1). It can help you te get the right regex pattern.

|Method|	Description|
|---|---|
|findall|	Returns a list containing all matches|
|search|	Returns a Match object if there is a match anywhere in the string|
|split|	Returns a list where the string has been split at each match|
|sub|	Replaces one or many matches with a string|
|match|Python will search the regular expression pattern and return the first occurrence. |


|Character|Description|
|---|---|
|[]|	A set of characters|
|\	|Signals a special sequence (can also be used to escape special characters)|
|.	|Any character (except newline character)|
|^	|Starts with|
|$	|Ends with	|
|*	|Zero or more occurrences|
|+	|One or more occurrences	|
|?	|Zero or one occurrences	|
|{}|	Exactly the specified number of occurrences|
||	|Either or	|
|()|	Capture and group|

**Examples**

We want to get text before and after `:`.

In [71]:
import re

In [95]:
string1 = p_tags_text[0]

**Look ahead**

In [96]:
# find elements after :
ahead = re.search( "(?<=:).+", string1 )

In [99]:
# we want to find the string after :
ahead.group(0)

' DEVIDA'

**Look behind**

In [100]:
# find elements after :
behind = re.search( ".+(?=:)", string1 )

In [102]:
behind.group(0)

'Entidad'

**Split**

In [103]:
re.split( ":", string1 )

['Entidad', ' DEVIDA']

**Get all numbers**

In [104]:
string2 = p_tags_text[ -1 ]
string2

'Remuneración: S/. 12,214.29 Soles'

In [117]:
# we get all numbers, but not in the right form
re.findall( "[0-9]+", string2 )

['12', '214', '29']

**Getting salary with decimals**

In [122]:
salary = "".join( re.findall( "(\d+(?:\.\d+)?)" , string2 ) )
salary

'12214.29'

**Cleaning strings**

In [186]:
import unidecode
string_spanish = "Ñañúüá"
unaccented_string = unidecode.unidecode(string_spanish)
unaccented_string

'Nanuua'

## 7.6 Practice
We are going to get all job posts for economists.

### 7.6.1. First Option - Using Loop

In [150]:
from selenium.webdriver.chrome.options import Options

option = Options()
option.add_argument("--disable-infobars")

# open driver
driver = webdriver.Chrome( ChromeDriverManager().install(),chrome_options=option )

# we open our page
driver.get("https://www.convocatoriascas.com/")

# good practices
driver.maximize_window()
driver.execute_script("document.body.style.zoom='100%'")

# Click on economics
driver.find_element_by_xpath('//*[@id="superior_cont"]/ul[7]/li[2]/a').click()



Current google-chrome version is 96.0.4664
Get LATEST chromedriver version for 96.0.4664 google-chrome
Driver [C:\Users\Anzony\.wdm\drivers\chromedriver\win32\96.0.4664.45\chromedriver.exe] found in cache
  driver = webdriver.Chrome( ChromeDriverManager().install(),chrome_options=option )


Dealing with advertisement. 

Sometimes, we deal with iframe html advertisements. The advertisement does not always pop up. We use try.

In [154]:
try:
    # id of selected ifram
    iframe = driver.find_element_by_id('aswift_5')
    # click on iframe
    driver.switch_to.frame(iframe)
    # click on close button
    driver.find_element_by_id( "dismiss-button" ).click()
    # siwtch to default html
    driver.switch_to.default_content()
except:
    print("No advertisement")

No advertisement


In [152]:
# click on economists
driver.find_element_by_xpath('//*[@id="superior_cont"]/ul[7]/li[2]/a').click()

In [155]:
# getting one two job posting
first_post = driver.find_element_by_xpath( "/html/body/section/section[1]/article[1]/div[2]" )

In [159]:
# get post information
p_tags = first_post.find_elements_by_tag_name( 'p' )


In [167]:
#Storing information
string1 =  p_tags[0].text
string1

'Entidad: DEVIDA'

In [166]:
# Getting behind :
behind = re.search( ".+(?=:)", string1 )
behind.group( 0 )

'Entidad'

In [175]:
# Getting ahead :
ahead = re.search( "(?<=:).+", string1 )
clean_ahead = ahead.group( 0 ).lstrip()
clean_ahead

'DEVIDA'

In [189]:
# Loop over all p storing in dictionary
post_info = {}
for p in p_tags:
    string1 = p.text
    
    # Getting behind :
    behind = re.search( ".+(?=:)", string1 )
    key = behind.group( 0 ).lower()
    key = unidecode.unidecode( key )
    
    # Getting ahead :
    ahead = re.search( "(?<=:).+", string1 )
    clean_ahead = ahead.group( 0 ).lstrip().lower()
    value = clean_ahead
    value = unidecode.unidecode( value )
    
    # clean salary
    
    # storing in dictionary
    post_info[ key ] = value

Loop over all job posts

We see the following common format for each job post.

`/html/body/section/section[1]/article[3]/div[2]`
`/html/body/section/section[1]/article[4]/div[2]`
`/html/body/section/section[1]/article[5]/div[2]`

We loop over all articles. We do not know how many articles we will find. We are going to use a while loop.

In [191]:
# iterator
i = 3
# we will iterate in the number of articles

# dfining while loop
box = "there is an article"

# List to store all job post information
job_publication = []
while box != 0:
    try:
        # get job post
        job_post = driver.find_element_by_xpath( f"/html/body/section/section[1]/article[{i}]/div[2]" )
        
        # get post information
        p_tags = job_post.find_elements_by_tag_name( 'p' )
        
        # Loop over all p storing in dictionary
        post_info = {}
        for p in p_tags:
            string1 = p.text

            # Getting behind :
            behind = re.search( ".+(?=:)", string1 )
            key = behind.group( 0 ).lower()
            key = unidecode.unidecode( key )

            # Getting ahead :
            ahead = re.search( "(?<=:).+", string1 )
            clean_ahead = ahead.group( 0 ).lstrip().lower()
            value = clean_ahead
            value = unidecode.unidecode( value )

            # storing in dictionary
            post_info[ key ] = value
        
        # store job post
        job_publication.append( post_info )
    except:
        # break code
        box = 0 
        
    # add value to iteration 
    # go to the next article
    i = i + 1 

In [192]:
job_publication

[{'entidad': 'autoridad portuaria(apn)',
  'departamentos': 'callao',
  'remuneracion': 's/. 7000 soles'},
 {'entidad': 'osce',
  'departamentos': 'lima',
  'remuneracion': 's/. 5500 soles'},
 {'entidad': 'onp',
  'departamentos': 'arequipa, ica, junin, la libertad, lambayeque, lima, piura',
  'remuneracion': 'entre s/. 1,885.71 y s/. 7,778.57 soles'},
 {'entidad': 'universidad nacional de frontera de sullana',
  'departamentos': 'piura',
  'remuneracion': 'entre s/. 3658 y s/. 7557,32 soles'},
 {'entidad': 'unsaac',
  'departamentos': 'cusco',
  'remuneracion': 'no especifica'},
 {'entidad': 'onp',
  'departamentos': 'lima, pasco',
  'remuneracion': 'entre s/. 3,407.14 y s/. 10,517.86 soles'},
 {'entidad': 'defensoria del pueblo',
  'departamentos': 'lima',
  'remuneracion': 's/. 7000 soles'},
 {'entidad': 'essalud',
  'departamentos': 'ancash, arequipa, ayacucho, cajamarca, lima, moquegua, pasco',
  'remuneracion': 'entre s/. 2276 y s/. 6240 soles'},
 {'entidad': 'superintendencia me

In [196]:
# checking all posts have the same information
# look at the keys of all dictionaries
expected_keys = {'departamentos', 'entidad', 'remuneracion'}

In [197]:
# compare with information collected
total_posts = len( job_publication )
for i in range( total_posts ):
    
    #dict informtion
    dict_info = job_publication[ i ]
    
    # get boolean value
    value = expected_keys == set( dict_info.keys() )
    
    # print value
    print( f"Post { i } have the expected keys : { value }")
    

Post 0 have the expected keys : True
Post 1 have the expected keys : True
Post 2 have the expected keys : True
Post 3 have the expected keys : True
Post 4 have the expected keys : True
Post 5 have the expected keys : True
Post 6 have the expected keys : True
Post 7 have the expected keys : True
Post 8 have the expected keys : True
Post 9 have the expected keys : True
Post 10 have the expected keys : True
Post 11 have the expected keys : True
Post 12 have the expected keys : True
Post 13 have the expected keys : True
Post 14 have the expected keys : True
Post 15 have the expected keys : True
Post 16 have the expected keys : True


In [199]:
# we have check all posts have the same information
# store in a pd.DataFrame
data_dict = { 
                'Departament' : [],
                "Entity" : [] ,
                "Salary" : []
            }

for i in range( total_posts ):
    
    #dict informtion
    dict_info = job_publication[ i ]
    
    dpt = dict_info['departamentos']
    entity = dict_info['entidad']
    salary = dict_info['remuneracion']
    
    # get boolean value
    data_dict[ 'Departament' ].append( dpt )
    data_dict[ 'Entity' ].append( entity )
    data_dict[ 'Salary' ].append( salary )

In [207]:
import pandas as pd

job_posts_economists = pd.DataFrame( data_dict )

job_posts_economists

Unnamed: 0,Departament,Entity,Salary
0,callao,autoridad portuaria(apn),s/. 7000 soles
1,lima,osce,s/. 5500 soles
2,"arequipa, ica, junin, la libertad, lambayeque,...",onp,"entre s/. 1,885.71 y s/. 7,778.57 soles"
3,piura,universidad nacional de frontera de sullana,"entre s/. 3658 y s/. 7557,32 soles"
4,cusco,unsaac,no especifica
5,"lima, pasco",onp,"entre s/. 3,407.14 y s/. 10,517.86 soles"
6,lima,defensoria del pueblo,s/. 7000 soles
7,"ancash, arequipa, ayacucho, cajamarca, lima, m...",essalud,entre s/. 2276 y s/. 6240 soles
8,lima,superintendencia mercado valores(smv),s/. 6342 soles
9,"ayacucho, huanuco, lima, loreto, moquegua, piu...",sis,entre s/. 2000 y s/. 5400 soles


In [206]:
# save in a excel file
job_posts_economists.to_excel( r'../_data/job_posts_result.xlsx' )

### 7.6.2. First Option - Using Article

In [208]:
from selenium.webdriver.chrome.options import Options

option = Options()
option.add_argument("--disable-infobars")

# open driver
driver = webdriver.Chrome( ChromeDriverManager().install(),chrome_options=option )

# we open our page
driver.get("https://www.convocatoriascas.com/")

# good practices
driver.maximize_window()
driver.execute_script("document.body.style.zoom='100%'")

# Click on economics
driver.find_element_by_xpath('//*[@id="superior_cont"]/ul[7]/li[2]/a').click()



Current google-chrome version is 96.0.4664
Get LATEST chromedriver version for 96.0.4664 google-chrome
Driver [C:\Users\Anzony\.wdm\drivers\chromedriver\win32\96.0.4664.45\chromedriver.exe] found in cache
  driver = webdriver.Chrome( ChromeDriverManager().install(),chrome_options=option )


Dealing with advertisement. 

Sometimes, we deal with iframe html advertisements. The advertisement does not always pop up. We use try.

In [209]:
try:
    # id of selected ifram
    iframe = driver.find_element_by_id('aswift_5')
    # click on iframe
    driver.switch_to.frame(iframe)
    # click on close button
    driver.find_element_by_id( "dismiss-button" ).click()
    # siwtch to default html
    driver.switch_to.default_content()
except:
    print("No advertisement")

In [210]:
# go to section where posts are located
post_sections = driver.find_element_by_xpath( "/html/body/section/section[1]" )

In [212]:
# find all articles
articles = post_sections.find_elements_by_tag_name( 'article' )

In [214]:
job_publication = []
for art in articles:
    # get post information
    p_tags = art.find_elements_by_tag_name( 'p' )
    
    # Loop over all p storing in dictionary
    post_info = {}
    for p in p_tags:
        string1 = p.text

        # Getting behind :
        behind = re.search( ".+(?=:)", string1 )
        key = behind.group( 0 ).lower()
        key = unidecode.unidecode( key )

        # Getting ahead :
        ahead = re.search( "(?<=:).+", string1 )
        clean_ahead = ahead.group( 0 ).lstrip().lower()
        value = clean_ahead
        value = unidecode.unidecode( value )

        # clean salary

        # storing in dictionary
        post_info[ key ] = value
    
    job_publication.append( post_info )

In [215]:
job_publication

[{'entidad': 'devida',
  'departamentos': 'cusco',
  'remuneracion': 's/. 12,214.29 soles'},
 {'entidad': 'instituto investigaciones amazonia',
  'departamentos': 'loreto',
  'remuneracion': 's/. 2,058.29 soles'},
 {'entidad': 'autoridad portuaria(apn)',
  'departamentos': 'callao',
  'remuneracion': 's/. 7000 soles'},
 {'entidad': 'osce',
  'departamentos': 'lima',
  'remuneracion': 's/. 5500 soles'},
 {'entidad': 'onp',
  'departamentos': 'arequipa, ica, junin, la libertad, lambayeque, lima, piura',
  'remuneracion': 'entre s/. 1,885.71 y s/. 7,778.57 soles'},
 {'entidad': 'universidad nacional de frontera de sullana',
  'departamentos': 'piura',
  'remuneracion': 'entre s/. 3658 y s/. 7557,32 soles'},
 {'entidad': 'unsaac',
  'departamentos': 'cusco',
  'remuneracion': 'no especifica'},
 {'entidad': 'onp',
  'departamentos': 'lima, pasco',
  'remuneracion': 'entre s/. 3,407.14 y s/. 10,517.86 soles'},
 {'entidad': 'defensoria del pueblo',
  'departamentos': 'lima',
  'remuneracion':

In [216]:
# checking all posts have the same information
# look at the keys of all dictionaries
expected_keys = {'departamentos', 'entidad', 'remuneracion'}

In [217]:
# compare with information collected
total_posts = len( job_publication )
for i in range( total_posts ):
    
    #dict informtion
    dict_info = job_publication[ i ]
    
    # get boolean value
    value = expected_keys == set( dict_info.keys() )
    
    # print value
    print( f"Post { i } have the expected keys : { value }")
    

Post 0 have the expected keys : True
Post 1 have the expected keys : True
Post 2 have the expected keys : True
Post 3 have the expected keys : True
Post 4 have the expected keys : True
Post 5 have the expected keys : True
Post 6 have the expected keys : True
Post 7 have the expected keys : True
Post 8 have the expected keys : True
Post 9 have the expected keys : True
Post 10 have the expected keys : True
Post 11 have the expected keys : True
Post 12 have the expected keys : True
Post 13 have the expected keys : True
Post 14 have the expected keys : True
Post 15 have the expected keys : True
Post 16 have the expected keys : True
Post 17 have the expected keys : True
Post 18 have the expected keys : True


In [218]:
# we have check all posts have the same information
# store in a pd.DataFrame
data_dict = { 
                'Departament' : [],
                "Entity" : [] ,
                "Salary" : []
            }

for i in range( total_posts ):
    
    #dict informtion
    dict_info = job_publication[ i ]
    
    dpt = dict_info['departamentos']
    entity = dict_info['entidad']
    salary = dict_info['remuneracion']
    
    # get boolean value
    data_dict[ 'Departament' ].append( dpt )
    data_dict[ 'Entity' ].append( entity )
    data_dict[ 'Salary' ].append( salary )

In [219]:
import pandas as pd

job_posts_economists = pd.DataFrame( data_dict )

job_posts_economists

Unnamed: 0,Departament,Entity,Salary
0,cusco,devida,"s/. 12,214.29 soles"
1,loreto,instituto investigaciones amazonia,"s/. 2,058.29 soles"
2,callao,autoridad portuaria(apn),s/. 7000 soles
3,lima,osce,s/. 5500 soles
4,"arequipa, ica, junin, la libertad, lambayeque,...",onp,"entre s/. 1,885.71 y s/. 7,778.57 soles"
5,piura,universidad nacional de frontera de sullana,"entre s/. 3658 y s/. 7557,32 soles"
6,cusco,unsaac,no especifica
7,"lima, pasco",onp,"entre s/. 3,407.14 y s/. 10,517.86 soles"
8,lima,defensoria del pueblo,s/. 7000 soles
9,"ancash, arequipa, ayacucho, cajamarca, lima, m...",essalud,entre s/. 2276 y s/. 6240 soles


In [220]:
# save in a excel file
job_posts_economists.to_excel( r'../_data/job_posts_result.xlsx' )

https://www.amazon.com/Web-Scraping-Python-Collecting-Modern/dp/1491985577/ref=asc_df_1491985577/?tag=hyprod-20&linkCode=df0&hvadid=312045876164&hvpos=&hvnetw=g&hvrand=12104913302684462183&hvpone=&hvptwo=&hvqmt=&hvdev=c&hvdvcmdl=&hvlocint=&hvlocphy=9031928&hvtargid=pla-428806391172&psc=1

