Scraping of **LinkedIn profiles** is a very useful activity especially to achieve public relations / marketing tasks. Using Python you can make this process smoother, using your time to focus on those profiles that have critical peculiarities.

**Disclaimer:** this tutorial(boot camp) presents practices that if applied in some circumstances might violate LinkedIn Terms and Conditions. It is reader’s responsibility to check LinkedIn Policy before proceeding in any way.

Once again, Web Scraping is a technique used to extract data from websites.

#### **Note**

If you’ve ever copy and pasted information from a website, you’ve performed the same function as any web scraper, only on a microscopic, manual scale.

**Technically web scraping can be performed in two ways:**

**Direct HTTP requests:** best choice for static websites.


**Driving a Web Browser:** best choice for dynamic websites with content asynchronously loaded or IFrames (unfortunatelly not so common as you may think, especially in legacy systems)

### **Direct HTTP Requests**
As you may know, website are just a rendering of the HTML + CSS code that the web server returns as a result of a GET / POST request of your browser. As a result, a simple script can send automatically HTTP requests and parse the answer, scraping the content.

**Example 1.01**

In [None]:
curl -X GET http://www.wikipedia.com

**Example 1.2**

In [3]:
import urllib.request
from bs4 import BeautifulSoup as bs
import re
import pandas as pd
#load html code from a url
page = urllib.request.urlopen("https://docs.python.org/3/library/random.html")
soup = bs(page)

#find all function names
names = soup.body.findAll('dt')
#print all function names
print(names)

### **Driving a Web Browser**

Sometimes websites loads part of the content asynchronously. This means that the information you want to scrape may not be contained in the first HTTP response, but they are loaded only as a consequence of a page scrolling eg. like LinkedIn or twitter case or after the click of a button.

To overcome this barrier, you can use a Web Browser Driver 

The Web Browser drivers let you run a real web browser enabling your python scriptto emulate user behavior on the page, basically executing Javascript code through the browser console.

In this way you can, for example, emulate the click on a button — assuming this is useful to the scraping activity.

```javascript
document.getElementById('buttonID').click()
```

#### **Selenium Web Driver**
Selenium Web Driver is one of the best Web Browser Driver available for Python. It’s part of the Selenium framework which is a portable framework for testing web applications.

#### **Loading the LinkedIn.com home page.**

In [None]:
from selenium import webdriver

# Creation of a new instance of Google Chrome
browser = webdriver.Chrome(executable_path='path/to/chromedriver/executable')

# Load the page on the browser
browser.get('https://www.linkedin.com/')

#### **Interacting with the page: how to run Javascript**

If you open a LinkedIn Profile page, you will realize that in order to scrape the email address is necessary to click on the ‘Contact info’ link, wait for a popup to load, and then — if provided by the user — you can see the email address and so or  eventually, scrape it.

Therefore we must emulate such user interaction through some javascript:

In [None]:
# Execute Javascript code on webpage
browser.execute_script(
    "(function(){try{for(i in document.getElementsByTagName('a')){let el = document.getElementsByTagName('a')[i]; "
    "if(el.innerHTML.includes('Contact info')){el.click();}}}catch(e){}})()")

# Wait 5 seconds for the page to load
time.sleep(5)

# Scrape the email address from the 'Contact info' popup
email = browser.execute_script(
    "return (function(){try{for (i in document.getElementsByClassName('pv-contact-info__contact-type')){ let el = "
    "document.getElementsByClassName('pv-contact-info__contact-type')[i]; if(el.className.includes('ci-email')){ "
    "return el.children[2].children[0].innerText; } }} catch(e){return '';}})()")

### **Worthy noting info from Wikipedia:**

**Selenium accepts commands, sent in Selenese, or via a Client API and sends them to a browser. This is implemented through a browser-specific browser driver, which sends commands to a browser and retrieves results.
Selenium WebDriver does not need a special server to execute tests: instead, the WebDriver directly starts a browser instance and controls it.**

[Full Project]()