<h2><b>FlipKart Sentiment Analysis</b></h2>
<h4><b>Author:</b> Data Science @ Georgia Tech</h4>
<p><b>Reference:</b> <a href="https://medium.com/coders-camp/225-machine-learning-projects-with-python-44d6ea8ace18">Medium</a></p>

<b>Welcome to the FlipKart Sentiment Analysis self-guided project!</b>

This project will be exploring sentiment analysis in relationship to ecommerce platforms.

We will be exploring product reviews left on FlipKart, which is like Amazon, but an e-commerce company in India.

We are going to import the modules we will be using.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

Now, read the data from the CSV file and print the first five rows of it.

You can find the CSV file here: https://raw.githubusercontent.com/amankharwal/Website-data/master/flipkart_reviews.csv

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Reading the Dataset</b></font></summary>
  <pre>
    <code style="display: block;">
      # Solution
      data = pd.read_csv("https://raw.githubusercontent.com/amankharwal/Website-data/master/flipkart_reviews.csv")
      print(data.head())
    </code>
  </pre>
</details>

Now, check and see if the dataset has any missing values.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Finding Missing Values</b></font></summary>
  <pre>
    <code style="display: block;">
      # Solution Code
      print(data.isnull().sum())
    </code>
  </pre>
</details>

Now we will clean the data.

Clean the dataset. Take note of what needs to be <b>removed/changed</b> so that our sentiment model will be accurate.

In [None]:
# Write your code here.

<details>
  <summary>Click for solution: <font color="sky blue"><b>Cleaning Dataset</b></font></summary>
  <pre>
    <code style="display: block;">
        # Our data cleaning solution
        import nltk
        import re
        nltk.download('stopwords')
        stemmer = nltk.SnowballStemmer("english")
        from nltk.corpus import stopwords
        import string
        stopword=set(stopwords.words('english'))
        def clean(text):
            text = str(text).lower()
            text = re.sub('\[.*?\]', '', text)
            text = re.sub('https?://\S+|www\.\S+', '', text)
            text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
            text = re.sub('\n', '', text)
            text = re.sub('\w*\d\w*', '', text)
            text = [word for word in text.split(' ') if word not in stopword]
            text=" ".join(text)
            text = [stemmer.stem(word) for word in text.split(' ')]
            text=" ".join(text)
            return text
        data["Review"] = data["Review"].apply(clean)
    </code>
  </pre>
</details>

We will be looking at the ratings most customers leave on FlipKart.

Create a pie chart that shows the percentage of ratings customers leave.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Ratings Pie Chart</b></font></summary>
  <pre>
    <code style="display: block;">
        # Our customer rating solution
        ratings = data["Rating"].value_counts()
        numbers = ratings.index
        quantity = ratings.values
        import plotly.express as px
        figure = px.pie(data, values=quantity, names=numbers,hole = 0.5)
        figure.show()
    </code>
  </pre>
</details>

We will look at the mostly used words left in the Review column.

Create a wordcloud that looks at what words are used most. The bigger the word, the more it is used.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Create Wordcloud</b></font></summary>
  <pre>
    <code style="display: block;">
        # Our solution
        text = " ".join(i for i in data.Review)
        stopwords = set(STOPWORDS)
        wordcloud = WordCloud(stopwords=stopwords,
                              background_color="white").generate(text)
        plt.figure( figsize=(15,10))
        plt.imshow(wordcloud, interpolation='bilinear')
        plt.axis("off")
        plt.show()
    </code>
  </pre>
</details>

Next we will be looking at the polarity score of the different kinds of reviews, positive, negative, and neutral.

Create three new columns and populate them with the polarity scores from the reviews.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Find Polarity Scores</b></font></summary>
  <pre>
    <code style="display: block;">
        # Our polarity score solution
        nltk.download('vader_lexicon')
        sentiments = SentimentIntensityAnalyzer()
        data["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in data["Review"]]
        data["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in data["Review"]]
        data["Neutral"] = [sentiments.polarity_scores(i)["neu"] for i in data["Review"]]
        data = data[["Review", "Positive", "Negative", "Neutral"]]
        print(data.head())
    </code>
  </pre>
</details>

Now that we have seen the polarity scores, let's see what majority of customers' opinions are of the products and services from FlipKart.

Create a function that determines the category that the majority of the reviews fall under left by customers.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Sentiment Categories</b></font></summary>
  <pre>
    <code style="display: block;">
        # Product and service Flipkart review solutions
        x = sum(data["Positive"])
        y = sum(data["Negative"])
        z = sum(data["Neutral"])
        def sentiment_score(a, b, c):
            if (a>b) and (a>c):
                print("Positive 😊 ")
            elif (b>a) and (b>c):
                print("Negative 😠 ")
            else:
                print("Neutral 🙂 ")
        sentiment_score(x, y, z)
    </code>
  </pre>
</details>

# **Summary**

**Congratulations on completing the FlipKart Sentiment Analysis project!**

We hope you learned about sentiment analysis and the basis of why it is performed.