# Author Bakki Akhil

### The main aim of this project is to extract data of users who bought huami Amazfit GTS smartwatch from flipkart website.
### The user data is extracted from the reviews and the data here consists of Customer name, Ratings given by the customer about the product, Title of the review, and The main review given by the customers.
### Here in this project we have considerd only the top 20 customer reviews.
### In this project for scrapping the data from website we have used Beautifulsoup package available in python 
### In this to get the data from various pages in the flipkart review website we have taken the help of "whatismybrowser" website as it helps to load the url when the pages have been shifted in flipkart website as one page maintains only 10 reviews. 
### After scrapping the required data the data of 4 columns are stored in .csv file.
### Now to do senstivity analysis we have use TextBlob library for processing textual data
### The TextBlob returns polarity and subjectivity of a sentence.
### Polarity lies between -1 to +1 where -1 defines negative statement and +1 defines positive statement and 0 defines neutral statement
### Subjectivity quantifies the amount of personal opinion and factual information contained in the text. The higher subjectivity means that the text contains personal opinion rather than factual information.
### In this we have done sentivity analysis for the "Main review" given by the users about the product.

In [1]:
from bs4 import BeautifulSoup as bs # This line loads Beautifulsoup for web scrapping

In [2]:
import requests # This makes requests for taking the data from website

In [3]:
# This header file link is obtained from "Whatismybrowser.com" website which helps in obtaining information from various pages in a website
header = {
    "User_Agent" : "https://developers.whatismybrowser.com/useragents/parse/?analyse-my-user-agent=yes"
}

In [4]:
# The below code is used to obtain the details of Customer names who have given review for the product
customer_name = []
for x in range(1,3): # The x here loads the reviews of first two pages in reviews
    page = requests.get(f"https://www.flipkart.com/huami-amazfit-gts-smartwatch/product-reviews/itmd8be178f03412?pid=SMWFNSX922XJAUAY&lid=LSTSMWFNSX922XJAUAYUNCO19&marketplace=FLIPKART&page={x}") # The flipkart website from where the reviews have been taken and in this at the end 'x' has been loaded for getting the details from various pages
    soup = bs(page.content,"html.parser") # This is used to load the content of page in html format
    
    names = soup.find_all("p",{"class" : "_2sc7ZR _2V5EHH"}) # This is used to extract the information of usernames of 20 reviews
    
    for i in range(0, len(names)):
        customer_name.append(names[i].get_text())# Loading the usernames in textual format
        
len(customer_name)

20

In [5]:
customer_name # List of names of 20 customer names

['Aarthi K',
 'Sai Kiran Kumar.KV',
 'Lokesh  Behera ',
 'kavan KB',
 'Rahul Chavan',
 'SAIF  KHAN',
 'Aakash  Kumar',
 'Dr.Md Sami khan',
 'Manish Meena',
 'Salman Shamsi',
 'MAYURESH WAGH',
 'Sanjay Barkade',
 'Aditya Jana',
 'Karun Kittur',
 'Ronny Carassco',
 'Flipkart Customer',
 'SARAVANAN.K KAMALANATHAN',
 'Kaustuv Sarkar',
 'AKHIL SHAIKH',
 'Minhaj Uddin']

In [6]:
# The below code is used to obtain the details of ratings who have given review for the product
rating = []
for x in range(1,3):
    page = requests.get(f"https://www.flipkart.com/huami-amazfit-gts-smartwatch/product-reviews/itmd8be178f03412?pid=SMWFNSX922XJAUAY&lid=LSTSMWFNSX922XJAUAYUNCO19&marketplace=FLIPKART&page={x}")
    soup = bs(page.content,"html.parser")
    
    rate = soup.find_all("div",{"class" : "_3LWZlK _1BLPMq"})
    
    for i in range(0, len(rate)):
        rating.append(rate[i].get_text())
        
len(rating)

20

In [7]:
rating

['5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '3',
 '5',
 '3',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '5',
 '4',
 '5']

In [8]:
# The below code is used to obtain the details of Review Title who have given review for the product
review_title = []
for x in range(1,3):
    page = requests.get(f"https://www.flipkart.com/huami-amazfit-gts-smartwatch/product-reviews/itmd8be178f03412?pid=SMWFNSX922XJAUAY&lid=LSTSMWFNSX922XJAUAYUNCO19&marketplace=FLIPKART&page={x}")
    soup = bs(page.content,"html.parser")
    
    review = soup.find_all("p",{"class" : "_2-N8zT"})
    
    for i in range(0,len(review)):
        review_title.append(review[i].get_text())
        
len(review_title)

20

In [9]:
review_title

['Super!',
 'Terrific',
 'Awesome',
 'Terrific purchase',
 'Terrific',
 'Highly recommended',
 'Decent product',
 'Terrific',
 'Fair',
 'Terrific',
 'Brilliant',
 'Perfect product!',
 'Terrific purchase',
 'Mind-blowing purchase',
 'Terrific purchase',
 'Highly recommended',
 'Awesome',
 'Wonderful',
 'Good choice',
 'Worth every penny']

In [10]:
# The below code is used to obtain the details of main review who have given review for the product
main_review = []
for x in range(1,3):
    page = requests.get(f"https://www.flipkart.com/huami-amazfit-gts-smartwatch/product-reviews/itmd8be178f03412?pid=SMWFNSX922XJAUAY&lid=LSTSMWFNSX922XJAUAYUNCO19&marketplace=FLIPKART&page={x}")
    soup = bs(page.content,"html.parser")
    
    main = soup.find_all("div",{"class" : "t-ZTKy"})
    
    for i in range(0, len(main)):
        main_review.append(main[i].get_text())
len(main_review)

20

In [11]:
main_review

['Ultra clear display! Cool features like social media and call notifications, app alerts, call can be cut or muted, compass, shows altitude, air pressure, music control, alarm, timer, uv index, idle alerts, step counter, heart rate, calories burnt, various sports modes, screen flashlight, several built-in and customizable watch faces, magnetic charging pad, etc. Overall good and premium looking!  Cannot attend calls or reply to message though but can dismiss calls or mute. Will update my revie...READ MORE',
 "Awesome Smart WatchAwesome Performance.A Product which have Rich Looks & Stunning Styling .... With Accurate Performance.... Fully Satisfied with #Aamzfit GTS... Finally it's a Worthy Buy 🤘🏼...READ MORE",
 'Till now this one is the best which I have used so far.No complaints about the screen as it is an AMOLED display which is perfect both in night and daylight.Best thing is its battery life. It sustained almost 16days including Always On display for 12hrs a day.If you want a nor

In [12]:
import pandas as pd # Here we are importing pandas to load and save data

In [13]:
df = pd.DataFrame() # We are using pandas to built a DataFrame

In [14]:
# The below code used to load the data in rows and columns format
df["Customer Name"] = customer_name
df["Rating on a scale of 5"] = rating
df["Review Title"] = review_title
df["Main review"] = main_review

In [16]:
df

Unnamed: 0,Customer Name,Rating on a scale of 5,Review Title,Main review
0,Aarthi K,5,Super!,Ultra clear display! Cool features like social...
1,Sai Kiran Kumar.KV,5,Terrific,Awesome Smart WatchAwesome Performance.A Produ...
2,Lokesh Behera,5,Awesome,Till now this one is the best which I have use...
3,kavan KB,5,Terrific purchase,"Good fitness tarcker, good amoled display, goo..."
4,Rahul Chavan,5,Terrific,Awesome product right now in best price like a...
5,SAIF KHAN,5,Highly recommended,Really smart watch...it's very comfortable and...
6,Aakash Kumar,3,Decent product,"It has a GPS issue, after a replacement it rem..."
7,Dr.Md Sami khan,5,Terrific,"Best Smartwatch with a great look, amazing dis..."
8,Manish Meena,3,Fair,After a couple of weeks usage here are some of...
9,Salman Shamsi,5,Terrific,This watch has premium quality body and amazin...


In [15]:
# The below code is used to store the df data in desktop, the location can be changed as the users wish, Index = False here specifies not loading of first column that is 0,1,2,3,....19 in .csv file
df.to_csv(r"C:\Users\Akhil\OneDrive\Desktop\Review_for_Amazfit_GTS.csv", index = False)

# Senstivity Analysis
### The below code is used to do the sensitivity analysis for the data generated above

In [17]:
# Here below we are reading the .csv file and we are considering only the "Main review" column to do sensitivity analysis
df = pd.read_csv('C:\\Users\\Akhil\\OneDrive\\Desktop\\Review_for_Amazfit_GTS.csv', usecols = ["Main review"])

# For doing sensitivity analysis we need to have TextBlob, we are importing it below; if TextBlob is not installed in your anaconda please do install it and then run it
from textblob import TextBlob

# The x in the lambda function is a row, and the axis = 1 specifies considering the columns
# "Apply" iterates the function accross the dataframe's rows
df['polarity'] = df.apply(lambda x: TextBlob(x["Main review"]).sentiment.polarity, axis=1)# This line gives Polarity
df['subjectivity'] = df.apply(lambda x: TextBlob(x["Main review"]).sentiment.subjectivity, axis=1)# This line gives Subjectivity
print(df) # Here it shows sentivity report for Main review 

                                          Main review  polarity  subjectivity
0   Ultra clear display! Cool features like social...  0.235417      0.337500
1   Awesome Smart WatchAwesome Performance.A Produ...  0.477827      0.815774
2   Till now this one is the best which I have use...  0.627083      0.695833
3   Good fitness tarcker, good amoled display, goo...  0.528571      0.500000
4   Awesome product right now in best price like a...  0.696429      0.583929
5   Really smart watch...it's very comfortable and...  0.558571      0.785714
6   It has a GPS issue, after a replacement it rem...  0.233766      0.497078
7   Best Smartwatch with a great look, amazing dis...  0.708333      0.683333
8   After a couple of weeks usage here are some of...  0.142917      0.500000
9   This watch has premium quality body and amazin...  0.560000      0.760000
10  Received a very well packed box. The white mai...  0.370476      0.450833
11  Super watch.Showing exact all readings.(heart ...  0.337037 