# Web-scraping-project: Scraping Customer Reviews



## Customer Reviews

Customer reviews are basically the different impressions and opinions about a product. Almost every business can monitor their reputation, develop various analyses, and build trust upon them.Customer reviews are important for several reasons. Obviously, they are a source of feedback for companies that supply goods and services, so you can identify any areas for improvement. But it's not as simple as that. For a true overview of customer sentiment, you need comprehensive data. New reviews are posted all the time, which is why you need: **Web scraping.**

## What is Web Scraping?

Web scraping is a technique used to extract large amount of data from websites. This data can be later used for analysis. By aggregating multiple reviews, and potentially scraping multiple third-party customer review websites, you can build a database that allows you to serve your entire customer base better.



**In this notebook, I will scrape reviews of a medical alert devices system from consumer affairs website using python.**

### Steps to Follow :
- Install important libraries that will be helpful for the project i.e. requests, BeautifulSoup4, pandas.
- Download the web page using the requests library
- Inspecting HTML source code of the web page
- Separating parts of the website using Beautiful Soup
- Convert separated parts into csv file
- Have a look on csv file using pandas library.

In [1]:
pip install requests --upgrade --quiet

Note: you may need to restart the kernel to use updated packages.


In [2]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [3]:
#Importing url links.
mg="https://www.consumeraffairs.com/medical-alert-systems/medical-guardian.html?page=3#scroll_to_reviews=true"
b=requests.get(mg)

In [4]:
# Checking if request was successful.
print(b.status_code)

200


In [5]:
page_contents=b.text
with open ('medical guardian reviews.html','w') as file:
    file.write(page_contents)

In [6]:
with open ('medical guardian reviews.html', 'r') as f:
    html_source=f.read()
doc=BeautifulSoup(html_source,'html.parser')   
type(doc)

bs4.BeautifulSoup

In [8]:
div_tags=doc.find_all('div',class_='rvw-aut__inf')
len(div_tags)

30

There are 30 reviews on this page, hence the length of div_tags is 30. div_tags contains the information such as author's name, ratings, reviews.

## Extracting name of reviewers

In [9]:
name=doc.find_all('strong', class_='rvw-aut__inf-nm')
names=[]
for tag in name:
    author=tag.find('span').text
    names.append(author)
names    

['Donna of Richmond, IN',
 'David of Forked River, NJ',
 'Margaret of Franklin Township, NJ',
 'Leslie of Naples, FL',
 'James of Pontiac, IL',
 'Jeff of Bradenton, FL',
 'Judith of Turnersville, DE',
 'Shirley of Westminster, MD',
 'Betty of East Pittsburgh, PA',
 'P. of Ny, NY',
 'J. of Va, VA',
 'Jean of Naperville, IL',
 'Nancy of Sylvania, OH',
 'Ronald of Sarasota, FL',
 'Merle of Highland Park, IL',
 'Janis of Citrus Springs, FL',
 'R. of Mo, MO',
 'Tabitha of Waynesville, NC',
 'Gwen of Pigeon, MI',
 'Lisa of Pegram, TN',
 'Janice of Indianapolis, IN',
 'Doris of Rockaway Township, NJ',
 'Becky of Osseo, MN',
 'Lonnie of Grafton, OH',
 'L. of Fl, FL',
 'C. of Ca, CA',
 'E. of Ny, NY',
 'G. of Me, ME',
 'Gisele of Leominster, MA',
 'A. of Fl, FL']

## Extracting Reviews
Let's extract reviews from div tags.
div_tags2 contains all the reviews.

In [11]:
div_tags2=doc.find_all('div', class_='rvw-bd')
reviews=[]
for tag in div_tags2:
    review=tag.find('p').text
    reviews.append(review)
reviews    

["I've had several falls so my son got me a medical alert device. He looked through several medical alert device companies and device on Medical Guardian. I used their necklace unit for a while and then I shifted to their bracelet. The bracelet started to bother me so I went back to the necklace. The necklace is so long though that I cut it a little bit to make it fit. This way, I wouldn't have all that cord hanging down. The unit I have functions in my home until 1,300 feet away. While I would like for it to help me outside of home too, I'm satisfied so far.",
 "My father lives by himself and getting up in age, which was why I got him a medical alert device. I liked Medical Guardian’s rating and signing up with them was simple. My father told me he has tried the button, and his only complaint is that it is large and there's a red flashing light on it.",
 "I have fear of falling. I’ve fallen three times in a row, and each time I had to have a hip replaced. It's been tough times. It was

## Extracting date of reviews.

In [12]:
dr=doc.find_all('div', class_='rvw-bd')
dates=[]
for pos in dr:
    date=pos.find('span').text
    dates.append(date)
dates 

['Original review: Aug. 31, 2020',
 'Original review: Aug. 30, 2020',
 'Original review: Aug. 30, 2020',
 'Original review: Aug. 30, 2020',
 'Original review: Aug. 29, 2020',
 'Original review: Aug. 29, 2020',
 'Original review: Aug. 29, 2020',
 'Original review: Aug. 28, 2020',
 'Original review: Aug. 28, 2020',
 'Original review: Aug. 28, 2020',
 'Original review: Aug. 27, 2020',
 'Original review: Aug. 27, 2020',
 'Original review: Aug. 27, 2020',
 'Original review: Aug. 26, 2020',
 'Original review: Aug. 26, 2020',
 'Original review: Aug. 26, 2020',
 'Original review: Aug. 25, 2020',
 'Original review: Aug. 25, 2020',
 'Original review: Aug. 25, 2020',
 'Original review: Aug. 25, 2020',
 'Original review: Aug. 24, 2020',
 'Original review: Aug. 24, 2020',
 'Original review: Aug. 24, 2020',
 'Original review: Aug. 24, 2020',
 'Original review: Aug. 23, 2020',
 'Original review: Aug. 23, 2020',
 'Original review: Aug. 23, 2020',
 'Original review: Aug. 23, 2020',
 'Original review: A

## Converting extracted files to a pandas dataframe.

In [15]:
REVIEWS = pd.DataFrame({
    "NAMES OF REVIEWERS": names,
    "REVIEWS": reviews,
    "DATE OF REVIEWS": dates,
})
REVIEWS

Unnamed: 0,NAMES OF REVIEWERS,REVIEWS,DATE OF REVIEWS
0,"Donna of Richmond, IN",I've had several falls so my son got me a medi...,"Original review: Aug. 31, 2020"
1,"David of Forked River, NJ",My father lives by himself and getting up in a...,"Original review: Aug. 30, 2020"
2,"Margaret of Franklin Township, NJ",I have fear of falling. I’ve fallen three time...,"Original review: Aug. 30, 2020"
3,"Leslie of Naples, FL",My mother is under hospice care and she is get...,"Original review: Aug. 30, 2020"
4,"James of Pontiac, IL",I wear my Medical Guardian around my neck and ...,"Original review: Aug. 29, 2020"
5,"Jeff of Bradenton, FL",My mom's Medical Guardian device has been so f...,"Original review: Aug. 29, 2020"
6,"Judith of Turnersville, DE",I've fallen down many times. My son purchased ...,"Original review: Aug. 29, 2020"
7,"Shirley of Westminster, MD",I decided to get a Medical Guardian because I ...,"Original review: Aug. 28, 2020"
8,"Betty of East Pittsburgh, PA",My daughter decided to get me a medical alert ...,"Original review: Aug. 28, 2020"
9,"P. of Ny, NY",The customer service of Medical Guardian has b...,"Original review: Aug. 28, 2020"


## Conclusions

We have successfully extracted and converted extracted files to a dataframe,the data collected may be analysed, visualized and used in other projects.

I hope this helped you understand the basics of web scraping with Python.Would love to hear feedback!

In [17]:
!pip install jovian --upgrade --quiet

In [18]:
import jovian

In [None]:
# Execute this to save new versions of the notebook
jovian.commit(project="web-scraping-project")

<IPython.core.display.Javascript object>