# Web Scraping

## Top Data Analytics Companies
To improve effectiveness in business processes, companies are focussing on collecting and utilizing data. Data analytics companies enable businesses to analyze the acquired data and use them as required. Data analytics services can assist in product development, identifying potential market gaps, improving operational efficiency, etc. [Goodfirms](https://www.goodfirms.co/big-data-analytics/data-analytics)

The aim of this project is to scrape Goodfirms website for details about top data analytics company. These details includes, review, rating, year founded, location, etc.

### Libraries

In [2]:
import requests as r
from bs4 import BeautifulSoup


### Download and save HTML file

In [3]:
# # URL link
# url = 'https://www.goodfirms.co/big-data-analytics/data-analytics'
# # access website
# html = r.get(url)
# with open('top_da_company.html', mode='wb') as file:
#     file.write(html.content)

In [6]:
with open("top_da_company.html", encoding='utf-8', mode='r') as file:
    bs = BeautifulSoup(file, 'lxml')
    # bs = BeautifulSoup(file, 'html5lib')

Name Extraction

In [7]:
names = bs.find_all('span', {'itemprop': "name"})
names[:5]

[<span itemprop="name">Home</span>,
 <span itemprop="name">big data analytics</span>,
 <span itemprop="name">
 Data Analytics </span>,
 <span itemprop="name">SPEC INDIA</span>,
 <span itemprop="name">instinctools</span>]

In [8]:
names_lst = []
for name in names[3:]:
    names_lst.append(name.text)

print(len(names_lst))
# names_lst

51


In [12]:
com_motors = bs.find_all('p', {'class': "profile-tagline"})
com_motors[:5]

[<p class="profile-tagline">Enterprise Software, Mobility &amp; BI Solutions</p>,
 <p class="profile-tagline">Delivering the future. Now.</p>,
 <p class="profile-tagline">Blockchain | IoT | Mobility | AI | Big Data</p>,
 <p class="profile-tagline">Discover the world of Big Data with us!</p>,
 <p class="profile-tagline">The Hub Of Data Science Innovation</p>]

In [16]:
motor_lst = []
for motor in com_motors:
    motor_lst.append(motor.text)

print(len(motor_lst))
# motor_lst

51


In [15]:
com_reviews = bs.find_all('span', {'class': "listinv_review_label"})
com_reviews[:5]

[<span class="listinv_review_label">4.8 (26 Reviews)</span>,
 <span class="listinv_review_label">4.8 (8 Reviews)</span>,
 <span class="listinv_review_label">5.0 (32 Reviews)</span>,
 <span class="listinv_review_label">4.7 (5 Reviews)</span>,
 <span class="listinv_review_label">5.0 (5 Reviews)</span>]

In [18]:
review_lst = []
for review in com_reviews:
    review_lst.append(review.text)
    
print(len(review_lst))
# review_lst

51


In [9]:
progress_value = bs.find_all('div', {'class': "circle-progress-value"})
progress_value[:5]

[<div class="circle-progress-value">20%</div>,
 <div class="circle-progress-value">15%</div>,
 <div class="circle-progress-value">5%</div>,
 <div class="circle-progress-value">10%</div>,
 <div class="circle-progress-value">15%</div>]

In [10]:
service_pct = []
platform_pct = []
for percent in enumerate(progress_value):
    if percent[0] % 2 == 0:
        service_pct.append(percent[1].text)
    else:
        platform_pct.append(percent[1].text)
        
print(len(service_pct))
print(len(platform_pct))
# service_pct
# platform_pct

51
51


In [6]:
c = bs.find_all('div', {'class': "firm-pricing"})


In [8]:
d = bs.find_all('div', {'class': "firm-employees"})

In [10]:
e = bs.find_all('div', {'class': "firm-founded"})


In [11]:
f = bs.find_all('div', {'class': "firm-location"})
f

[<div class="firm-location">
 India, United States </div>,
 <div class="firm-location">
 United States, Germany </div>,
 <div class="firm-location">
 United States, India </div>,
 <div class="firm-location">
 United States, Australia </div>,
 <div class="firm-location">
 United States, India </div>,
 <div class="firm-location">
 India, United States </div>,
 <div class="firm-location">
 United States </div>,
 <div class="firm-location">
 United States, India </div>,
 <div class="firm-location">
 Germany </div>,
 <div class="firm-location">
 Ukraine, Estonia </div>,
 <div class="firm-location">
 Luxembourg, Poland </div>,
 <div class="firm-location">
 Ukraine, United States </div>,
 <div class="firm-location">
 Portugal, United Kingdom </div>,
 <div class="firm-location">
 Poland, Ukraine </div>,
 <div class="firm-location">
 India </div>,
 <div class="firm-location">
 Israel </div>,
 <div class="firm-location">
 India, United States </div>,
 <div class="firm-location">
 United States <