This is the start of the project on sentiment analysis of customer reviews

Objective: Determine the sentiment (positive, neutral, negative) of customer reviews.

Scope: Decide whether you'll focus on reviews from a specific industry, product, or service.

Outcome: Develop a model that can automatically classify the sentiment of reviews.

The first step is to go on amazon.com and find products that have a lot of reviews and the reviews are descriptive. I have chosen beat wireless headphones. the full url is: https://www.amazon.co.uk/s?k=beats+headphones+wireless&crid=6JIPHRNVM3HH&sprefix=beats+headp%2Caps%2C80&ref=nb_sb_ss_ts-doa-p_1_11

I go on the page, right click and inspect element which pulls up the window to get access to the websites html files.


In [3]:
#import the relevant libraries
import pandas as pd
import requests
from bs4 import BeautifulSoup

In [4]:
'''
first i retrieve the base url and search query from the site browser then i add the header variable which 
assigns the user agent and referer so when i request from amazon they dont think its a bot and 
dont block my request. 
helps with webscraping 

'''

base_url='https://www.amazon.com/s?k='
search_query='beats+headphones+wireless'
url=base_url+search_query
header={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0','referer':'https://www.amazon.co.uk/s?k=beats+headphones+wireless&crid=6JIPHRNVM3HH&sprefix=beats+headp%2Caps%2C80&ref=nb_sb_ss_ts-doa-p_1_11'}
response=requests.get(url,headers=header)
response.status_code

200

In [5]:
#the 200 status code means the request succeeded 

In [6]:
#response.text

Now we create the functions to retrive all the reviews

In [7]:
#function to get the search query page
def get_Amazon_page(search_query):
    url= base_url + search_query
    print(url)
    page=requests.get(url,headers=header)
    if page.status_code == 200:
        return page
    else:
        return 'Error'

In [8]:
#function to get the contents of each product page using the unique identification number(asin)
#the /dp/ leads to the directory path to the product detail page
def get_asin(asin):
    url='https://www.amazon.com/dp/'+asin
    print(url)
    page=requests.get(url,headers=header)
    if page.status_code==200:
        return page
    else:
        return 'Error'

In [9]:
#function to extract the content from 'see all reviews' page
def get_reviews(review_link):
    url='https://www.amazon.com'+ review_link
    print(url)
    page= requests.get(url,headers=header)
    if page.status_code == 200:
        return page
    else:
        return 'Error'

Now we begin the extraction of product reviews

In [10]:
#I create the empty list and call my first function. I then inspect element for a product on the page and
#find the span element in the html and copy and paste the class
product_names=[]
response=get_Amazon_page(search_query)
soup=BeautifulSoup(response.content)
for n in soup.findAll('span',{'class':'a-size-medium a-color-base a-text-normal'}):# this tag is common for all the product names
        product_names.append(n.text) #adds them to the empty list

https://www.amazon.com/s?k=beats+headphones+wireless


In [11]:
product_names

['Soundcore by Anker Life Q30 Hybrid Active Noise Cancelling Headphones with Multiple Modes, Hi-Res Sound, Custom EQ via App, 40H Playtime, Comfortable Fit, Bluetooth, Multipoint Connection',
 'ZIHNIC Active Noise Cancelling Headphones, 40H Playtime Wireless Bluetooth Headset with Deep Bass Hi-Fi Stereo Sound,Comfortable Earpads for Travel/Home/Office (Black)',
 'Beats Studio Pro - Wireless Bluetooth Noise Cancelling Headphones - Sandstone (Renewed)',
 'Beats Studio Pro - Wireless Bluetooth Noise Cancelling Headphones - Personalized Spatial Audio, USB-C Lossless Audio, Apple & Android Compatibility, Up to 40 Hours Battery Life - Black',
 'Beats Solo3 Wireless Headphones - Silver (Renewed)',
 'Beats Solo2 On-Ear Headphone Luxe Edition (WIRED, Not Wireless) (Renewed) - Blue',
 'beats by Dr. Dre EP On-Ear Headphones - Black (Renewed)',
 "Beats Solo3 Wireless On-Ear Headphones - Apple W1 Headphone Chip, Class 1 Bluetooth, 40 Hours Of Listening Time - Mickey's 90th Anniversary Edition - Gre

In [12]:
len(product_names)

24

In [13]:
#getting the unqiue asin numbers uses the same function and method 
asin_data=[]
response=get_Amazon_page(search_query)
soup=BeautifulSoup(response.content)
for n in soup.findAll('div',{'class':'sg-col-20-of-24 s-result-item s-asin sg-col-0-of-12 sg-col-16-of-20 sg-col s-widget-spacing-small sg-col-12-of-16'}):
    asin_data.append(n['data-asin'])

https://www.amazon.com/s?k=beats+headphones+wireless


In [14]:
asin_data

['B0CGKLFHW4',
 'B0C8PR4W22',
 'B0CQLP2431',
 'B0789YJNPP',
 'B06XH4RF7J',
 'B07HJJ94NJ',
 'B0CL3NY4JM',
 'B08QBCXZXP',
 'B09VWH36DH',
 'B0CBQKBYXR',
 'B0CG5Q83K7',
 'B0BQPNMXQV',
 'B0C6KJ3R71',
 'B0C1QNRGHC',
 'B0CCZ26B5V',
 'B08WM3LMJF']

In [15]:
len(asin_data)

16

Now we pass the thr asin numbers to extract the 'see all reviews' link for each product page

In [16]:
link=[]
for n in range(len(asin_data)):
    response=get_asin(asin_data[n])
    soup=BeautifulSoup(response.content)
    for n in soup.findAll('a',{'data-hook':'see-all-reviews-link-foot'}):
        link.append(n['href'])

https://www.amazon.com/dp/B0CGKLFHW4
https://www.amazon.com/dp/B0C8PR4W22
https://www.amazon.com/dp/B0CQLP2431
https://www.amazon.com/dp/B0789YJNPP
https://www.amazon.com/dp/B06XH4RF7J
https://www.amazon.com/dp/B07HJJ94NJ
https://www.amazon.com/dp/B0CL3NY4JM
https://www.amazon.com/dp/B08QBCXZXP
https://www.amazon.com/dp/B09VWH36DH
https://www.amazon.com/dp/B0CBQKBYXR
https://www.amazon.com/dp/B0CG5Q83K7
https://www.amazon.com/dp/B0BQPNMXQV
https://www.amazon.com/dp/B0C6KJ3R71
https://www.amazon.com/dp/B0C1QNRGHC
https://www.amazon.com/dp/B0CCZ26B5V
https://www.amazon.com/dp/B08WM3LMJF


In [17]:
link

['/Beats-Studio-Pro-Cancelling-Headphones/product-reviews/B0CGKLFHW4/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews',
 '/Beats-Studio-Pro-Personalized-Compatibility/product-reviews/B0C8PR4W22/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews',
 '/Beats-Studio-Pro-Personalized-Compatibility/product-reviews/B0C8PR4W22/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews',
 '/Beats-Solo2-Headphone-Wireless-Refurbished/product-reviews/B0789YJNPP/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews',
 '/Beats-Dr-Dre-Ear-Headphones/product-reviews/B06XH4RF7J/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews',
 '/Beats-Dr-Dre-Ear-Headphones/product-reviews/B06XH4RF7J/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews',
 '/Beats-Solo3-Wireless-Headphones-Anniversary/product-reviews/B07HJJ94NJ/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews',
 '/Beats-Solo3-Wireless-Headphones-Anniversary/product-reviews/B07HJJ94NJ/ref=c

In [18]:
len(link)

24

Now that we have the links for the 'see all reviews' page we can extract the reviews from pages

In [19]:
reviews=[]
for n in range(len(link)):
    for i in range(2):
        response=get_reviews(link[n]+'&pageNumber='+str(i))
        soup=BeautifulSoup(response.content)
        for j in soup.findAll('span',{'data-hook':'review-body'}):
            reviews.append(j.text)

https://www.amazon.com/Beats-Studio-Pro-Cancelling-Headphones/product-reviews/B0CGKLFHW4/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews&pageNumber=0
https://www.amazon.com/Beats-Studio-Pro-Cancelling-Headphones/product-reviews/B0CGKLFHW4/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews&pageNumber=1
https://www.amazon.com/Beats-Studio-Pro-Personalized-Compatibility/product-reviews/B0C8PR4W22/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews&pageNumber=0
https://www.amazon.com/Beats-Studio-Pro-Personalized-Compatibility/product-reviews/B0C8PR4W22/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews&pageNumber=1
https://www.amazon.com/Beats-Studio-Pro-Personalized-Compatibility/product-reviews/B0C8PR4W22/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews&pageNumber=0
https://www.amazon.com/Beats-Studio-Pro-Personalized-Compatibility/product-reviews/B0C8PR4W22/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews&pageNumber=1
ht

In [20]:
len(reviews)

231

In [21]:
rev={'Reviews':reviews}

In [22]:
review_data=pd.DataFrame.from_dict(rev)

In [23]:
pd.set_option('max_colwidth',10000)
review_data.tail(5)

Unnamed: 0,Reviews
226,"\nThe JBL Tune 510BT wireless on-ear headphones have exceeded my expectations, delivering an exceptional audio experience wrapped in a stylish and comfortable design.The first thing that caught my eye was the vibrant blue color – a refreshing departure from the typical headphone hues. The sleek design and medium size make them not only visually appealing but also comfortable for extended wear. The on-ear cups sit snugly without causing discomfort, and the adjustable headband ensures a perfect fit for various head sizes.Sound quality is where the JBL Tune 510BT truly shines. The Purebass technology lives up to its name, delivering deep, rich bass that adds a new dimension to my music and enhances the overall listening experience. The audio is well-balanced, providing crisp highs and clear mids, making it suitable for a variety of genres.The wireless connectivity is seamless, and pairing with my devices was a breeze. The Bluetooth range is impressive, allowing me to move around without any signal loss. The on-ear controls are intuitive and responsive, providing easy access to playback, volume adjustment, and calls.Battery life is another strong suit of these headphones. A single charge lasts for hours, making them ideal for long commutes, workouts, or extended listening sessions. The convenience of a quick charge feature ensures that even if I forget to charge them overnight, a short charging session provides enough power for a substantial listening period.The built-in microphone delivers clear and distortion-free calls, and the ambient noise reduction feature is a nice touch, ensuring crystal-clear communication even in noisy environments.What sets the JBL Tune 510BT apart is the perfect balance it strikes between style, comfort, and high-quality sound. The inclusion of the foldable design adds portability, making them an excellent choice for on-the-go use. Whether you're a music enthusiast or someone who appreciates a reliable pair of wireless headphones, the JBL Tune 510BT is a solid investment. It's a testament to JBL's commitment to delivering an outstanding audio experience in a stylish and affordable package. I wholeheartedly recommend these headphones to anyone in search of a top-tier wireless audio solution.\n"
227,"\nMy first impression of these headphones were that they were very lightweight. Not too heavy to wear for long periods of time. The ear muff parts are isolated so they can swivel to be stored or to fit on your head. If they are too tight, they are easy to adjust for a perfect fit. These headphones were very easy to connect to my phone via Bluetooth, and the music sounded great played through what obviously are good quality speakers. They are somewhat noise-cancelling as well.Something that I generally look for in considering headphones is being able to listen to higher volumes without disturbing the people around me. With these headphones, I could listen at my preferred volume without being heard by people seated nearby, which comes in especially handy when I'm on a flight or commuting across town. All in all, great sound - lightweight - adjustable - and a bargain. I'm very happy with my purchase.\n"
228,"\nFantastic purchase. Tried these out because I'm prone to ear infections, so in-ear buds are out of the question. Wish I would've considered these years ago, instead of going with the high dollar galaxy buds/airpods. This is the sound quality you would expect from $200-300 headphones for a fraction of the price. Not in the business of supporting big brands like this with reviews, but truly a quality product. They charge very fast and stay charged for literal hours. They're also very comfortable/flexible on my big head and ears.\n"
229,"\nI bought a set of JBL ear buds so I could work in the shop and listen to some opera as well as classic rock with impunity. “Keeping it to myself” took on a whole new meaning and I tried to work with that dictate, so bought the buds. But they kept falling out, the music was lackluster with little or no bass component, and I just felt they didn’t give me the listening experience I was looking for. It’s hard to tell until you try them. I recognize that listening tastes are personal, so I returned the buds in acknowledgement of their apparent shortcomings. Despite not being a power user I have had other, and always liked JBL products; they usually hit my listening sweet spot, regardless of the genre of music, so as a brand loyal customer I ordered a set of over-ear JBL headphones, the JBL Tune 510 BT. I’m not sure what the model numbers mean, but when I first put them on and listened to them I had a “wow” moment, and said “these are keepers”. They are comfortable, they stay put on my ears when I’m working in the shop, they block out a lot of background noise, and they reveal so much more within the music than I was used to, ever knew existed, or was even hoping for. I can’t say with certainty that the headphones hit the highest highs or the lowest lows, but the overall presentation is complex, colorful and pleasing. They leave me wanting for nothing. I’m entertained now with “active hearing”, as I can discern so much more depth of sound with these headphones. I fully enjoy them such that I even wear them while preparing meals. They would get 10 stars if they also made me a better cook. But that’s just not gonna happen under any circumstance. At least I still have the headphones to fall back on.I don’t know if I’ll ever travel with them, but they fold kinda flat, which seems like a nice feature, although not a dealbreaker for me. My quest started out about the quality of sound, and that’s where it ended when I purchased the JBL Tune 510BT headphones. I’m satisfied that I can cross that quest off my to-do list.\n"
230,"\nThese are a great value for what I paid, I got them during a sale and I hope they come down in price again because I would 100% buy these again. They are noise canceling and have a surround sound effect which is awesome. The noise canceling was so good for me that I couldn’t hear my kids, which isn’t necessarily a bad thing when they are fighting. The range is decent too, I am able to cook and clean in peace and able to put my phone down and still listen to music or a show. These are going to be SO nice on the ride down to Florida for vacation next year as it is an 11-13 hour drive. Battery life is decent too. The headphones themselves are comfortable to wear but they are for a more smaller/ narrow head someone who might have a larger head might not be able to wear these comfortably. Maybe something the brand can look into for the future to be more inclusive to clientele.\n"


Now that I have all the reviews I need, this part of the project is complete. I can now store the reviews as a csv file for use in the second part of the project: preparing the data 

In [24]:
review_data.to_csv('Scraped_Reviews.csv',index=False)