# W266 Final Project Code
# Amazon Product Review Aspect-Based Sentiment
## Jennifer Mahle and Joanna Wang (Sections 3 and 1, respectively) 

#### Introduction
For our final project, we built a classification system for Amazon product reviews. The system categorizes product reviews into various classes of what the review focuses on, then determines whether the review is positive or negative for a given product trait (ie durability, quality, etc). As a user, star ratings alone might not give enough information about the product, so reading the reviews still is the best way to determine if the product fits the user’s needs. The challenge is, sometimes there can be hundreds of reviews for a product and users cannot spend time reading all of them.  So we want to provide this classification system to reduce the review reading process and help the users to find what they need. 


### Exploratory Data Analysis

In this section, we load, clean, and explore the data. We are using Amazon product reviews for electronics from the website https://nijianmo.github.io/amazon/index.html

In [1]:
#Import packages 
# Importing libraries
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np
import nltk
import string
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

from sklearn.feature_extraction.text import CountVectorizer
from nltk.stem.porter import PorterStemmer
import re
import string
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from nltk.stem.wordnet import WordNetLemmatizer

import os
import sys

In [2]:
#!pip3 install --user keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras.layers import Dropout
# fix random seed for reproducibility
np.random.seed(7)

Using TensorFlow backend.


In [None]:
#DON'T NEED TO RUN THIS PART FOR NOW. RUN THE NEXT CELL TO LOAD DATA
#####################################################################
dataset = "Electronics_5.json"

if os.path.isfile(dataset):
    df = pd.read_json("Electronics_5.json", lines=True)
else:
    url = r"http://deepyeti.ucsd.edu/jianmo/amazon/categoryFilesSmall/Electronics_5.json.gz"
    df = pd.read_json(url, compression='gzip', lines=True)

display(df.tail(10))
df.shape
print(df.info())
df_mini = df[(df.asin == "B01HJCN1EI") | (df.asin == "B01HJH42KU") | 
                            (df.asin == "B01HJH40WU") | (df.asin == "B01HJF704M") | 
                           (df.asin == "B01HJCN5GC") | (df.asin == "B01HJCN5TO") |
                           (df.asin == "B01HJDNL60") | (df.asin == "B01HJDR9DQ") |
                           (df.asin == "B01HJFFHTC") | (df.asin == "B01HJCN1EI")]
df_mini.shape
df_mini.to_csv('/home/wangjia/datasci-w266-finalProject/df_mini.csv')
######################################################################

In [3]:
df = pd.read_csv("df_mini.csv") 
display(df.tail(10))
df.shape
print(df.info())

Unnamed: 0.1,Unnamed: 0,overall,vote,verified,reviewTime,reviewerID,asin,style,reviewerName,reviewText,summary,unixReviewTime,image
141,6739580,5,,True,"07 25, 2017",A1OOVLE2KZ6KGA,B01HJCN1EI,,Puddzee,These are my favorite charging cords for a few...,Worth the price.,1500940800,
142,6739581,1,,True,"04 4, 2017",A77K1B31UAQ29,B01HJCN1EI,,addictedtoreading,"Update....after 2 months of gentle use, cable ...",UPDATE...BREAKS AND SLOW CHARGING,1491264000,
143,6739582,3,,True,"07 8, 2017",A2SVXUVUAWUDK2,B01HJH42KU,,Andrew,These are okay. The connection becomes very if...,Hope this makes sense. You'd understand if you...,1499472000,
144,6739583,2,,True,"05 21, 2017",A12E1JGKV0ETAB,B01HJH42KU,,John Adams,I liked the length and the product at first bu...,Lost ability to connect.,1495324800,
145,6739584,3,,True,"06 26, 2017",A1HKXEX8BEQC2E,B01HJH40WU,,Dasha stephens,not holding up over time :(,not holding up over time :(,1498435200,
146,6739585,4,,True,"03 21, 2017",A33MAQA919J2V8,B01HJH40WU,,Kurt Wurm,"These seem like quality USB cables, time will ...",Four Stars,1490054400,
147,6739586,4,,True,"01 9, 2017",A1AKHSCPD1BHM4,B01HJH40WU,,C.L Momof3,"Works great, love the longer cord. As with any...",Nice long cord,1483920000,
148,6739587,5,2.0,True,"12 1, 2016",A2HUZO7MQAY5I2,B01HJH40WU,,michael clontz,"Ok here is an odd thing that happened to me, I...",Not the correct product as linked in the sale.,1480550400,
149,6739588,5,2.0,True,"11 29, 2016",AJJ7VX2L91X2W,B01HJH40WU,,Faith,Works well.,Five Stars,1480377600,
150,6739589,5,,True,"03 31, 2017",A1FGCIRPRNZWD5,B01HJF704M,,Brando,I have it plugged into a usb extension on my g...,Works well enough..,1490918400,


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 151 entries, 0 to 150
Data columns (total 13 columns):
Unnamed: 0        151 non-null int64
overall           151 non-null int64
vote              20 non-null float64
verified          151 non-null bool
reviewTime        151 non-null object
reviewerID        151 non-null object
asin              151 non-null object
style             15 non-null object
reviewerName      151 non-null object
reviewText        151 non-null object
summary           151 non-null object
unixReviewTime    151 non-null int64
image             5 non-null object
dtypes: bool(1), float64(1), int64(3), object(8)
memory usage: 14.4+ KB
None


In [4]:
#Remove NA review rows
df = df.dropna(subset=['reviewText'])
#Checking one of the reviews
print(df["reviewText"].iloc[100])

This works great, i needed a longer cable (this is 10') and stronger connection between the plug & wire.  This looks like it is stronger.


In [5]:
# Downloading stopwords
nltk.download('stopwords')

#set of stopwords in English
from nltk.corpus import stopwords
stop = set(stopwords.words('english'))
words_to_keep = set(('not'))
stop -= words_to_keep
#initialising the snowball stemmer
sno = nltk.stem.SnowballStemmer('english')

#function to clean the word of any html-tags
def cleanhtml(sentence):
    cleanr = re.compile('<.*?>')
    cleantext = re.sub(cleanr, ' ', sentence)
    return cleantext

#function to clean the word of any punctuation or special characters
def cleanpunc(sentence): 
    cleaned = re.sub(r'[?|!|\'|"|#]',r'',sentence)
    cleaned = re.sub(r'[.|,|)|(|\|/]',r' ',cleaned)
    return  cleaned

[nltk_data] Downloading package stopwords to
[nltk_data]     /home/wangjia/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [6]:
#Code for removing HTML tags , punctuations . Code for removing stopwords . Code for checking if word is not alphanumeric and
# also greater than 2 . Code for stemmimg and also to convert them to lowercase letters 

i=0
str1=' '
final_string=[]
all_positive_words=[] # store words from +ve reviews here
all_negative_words=[] # store words from -ve reviews here.
s=''
for sent in df.reviewText:
    filtered_sentence=[]
    #print(sent);
    sent=cleanhtml(sent) # remove HTMl tags
    for w in sent.split():
        for cleaned_words in cleanpunc(w).split():
            if((cleaned_words.isalpha()) & (len(cleaned_words)>2)):    
                if(cleaned_words.lower() not in stop):
                    s=(sno.stem(cleaned_words.lower())).encode('utf8')
                    filtered_sentence.append(s)
                    if (df['reviewText'].values)[i] == 1: 
                        all_positive_words.append(s) #list of all words used to describe positive reviews
                    if(df['reviewText'].values)[i] == 0:
                        all_negative_words.append(s) #list of all words used to describe negative reviews reviews
                else:
                    continue
            else:
                continue 
    
    str1 = b" ".join(filtered_sentence) #final string of cleaned words
    
    
    final_string.append(str1)
    i+=1

In [7]:
#adding a column of CleanedText which displays the data after pre-processing of the review
df['CleanedText']=final_string  
df['CleanedText']=df['CleanedText'].str.decode("utf-8")
#below the processed review can be seen in the CleanedText Column 
print('Shape of final',df.shape)
df.head()

Shape of final (151, 14)


Unnamed: 0.1,Unnamed: 0,overall,vote,verified,reviewTime,reviewerID,asin,style,reviewerName,reviewText,summary,unixReviewTime,image,CleanedText
0,6099679,5,,True,"10 10, 2016",ATGTQKPUR7XIO,B01HJCN5GC,,Arthur,Great buy!,Five Stars,1476057600,,great buy
1,6099680,5,,True,"09 8, 2016",A15VV7NPTST593,B01HJCN5GC,,Randy T.,Works very well and we have lots (& lots) of e...,Extend your reach with ease,1473292800,,work well lot lot extra extens abl pass devic ...
2,6099681,5,,True,"03 15, 2017",AIM3MWK3Y7XOR,B01HJCN5GC,,Kindle Customer,This cable is very flexible. Just what I wanted.,Flexible cable,1489536000,,cabl flexibl want
3,6099682,5,,True,"02 16, 2017",A5W6EI03IKOLB,B01HJCN5GC,,P.Davidson,"These are the best charging cables, and if oth...",Best cables,1487203200,,best charg cabl famili member didnt take would...
4,6099683,4,,True,"02 14, 2017",A3QZTMHQ1XZ8PM,B01HJCN5GC,,glittergirl,I bought this in rose gold or light pink and i...,super cute cord,1487030400,,bought rose gold light pink clear bubblegum br...


In [8]:
#After processing the sample review looks like this
print(df["CleanedText"].iloc[100])

work great need longer cabl stronger connect plug wire look like stronger


In [21]:
#Sorting data according to asin in ascending order
sorted_data=df.sort_values('asin', axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last')

#Deduplication of entries
final=sorted_data.drop_duplicates(subset={"reviewerID","reviewerName","reviewText","summary"}, keep='first', inplace=False)

#Removed not verified rows
final = final[final.verified != False]

#Drop NA 
final = final.dropna(subset=['reviewText'])
print(final.shape)

(128, 14)


### Text Encoding using Universal Sentence Encoder

In the subsequent code cells, we load the Universal Sentence Encoder (USE), break the data into training and testing data, and apply the USE to the data. 

In [10]:
!pip3 uninstall tensorflow-gpu
!pip3 uninstall tensorflow

Found existing installation: tensorflow-gpu 2.1.0
Uninstalling tensorflow-gpu-2.1.0:
  Would remove:
    /home/wangjia/anaconda3/bin/estimator_ckpt_converter
    /home/wangjia/anaconda3/bin/saved_model_cli
    /home/wangjia/anaconda3/bin/tensorboard
    /home/wangjia/anaconda3/bin/tf_upgrade_v2
    /home/wangjia/anaconda3/bin/tflite_convert
    /home/wangjia/anaconda3/bin/toco
    /home/wangjia/anaconda3/bin/toco_from_protos
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow/*
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/*
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_gpu-2.1.0.dist-info/*
  Would not remove (might be manually added):
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/_api/v1/__init__.py
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/_api/v1/app/__init__.py
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/_api/v1/audio/__init__.py
    /ho

    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/_api/v1/compat/v1/saved_model/signature_def_utils/__init__.py
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/_api/v1/compat/v1/saved_model/tag_constants/__init__.py
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/_api/v1/compat/v1/saved_model/utils/__init__.py
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/_api/v1/compat/v1/sets/__init__.py
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/_api/v1/compat/v1/signal/__init__.py
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/_api/v1/compat/v1/sparse/__init__.py
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/_api/v1/compat/v1/spectral/__init__.py
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/_api/v1/compat/v1/strings/__init__.py
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorfl

    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/contrib/saved_model/python/saved_model/reader.py
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/contrib/seq2seq/__init__.py
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/contrib/seq2seq/ops/__init__.py
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/contrib/seq2seq/ops/gen_beam_search_ops.py
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/contrib/seq2seq/python/__init__.py
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/contrib/seq2seq/python/ops/__init__.py
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/contrib/seq2seq/python/ops/_beam_search_ops.so
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/contrib/seq2seq/python/ops/attention_wrapper.py
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/contrib/seq2seq/python/

Proceed (y/n)? ^C
[31mERROR: Operation cancelled by user[0m
Found existing installation: tensorflow 1.15.0
Uninstalling tensorflow-1.15.0:
  Would remove:
    /home/wangjia/anaconda3/bin/estimator_ckpt_converter
    /home/wangjia/anaconda3/bin/freeze_graph
    /home/wangjia/anaconda3/bin/saved_model_cli
    /home/wangjia/anaconda3/bin/tensorboard
    /home/wangjia/anaconda3/bin/tf_upgrade_v2
    /home/wangjia/anaconda3/bin/tflite_convert
    /home/wangjia/anaconda3/bin/toco
    /home/wangjia/anaconda3/bin/toco_from_protos
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow-1.15.0.dist-info/*
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow/*
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/*
  Would not remove (might be manually added):
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/_api/v2/__init__.py
    /home/wangjia/anaconda3/lib/python3.7/site-packages/tensorflow_core/_api/v2/audio/__init__.py
    /

Proceed (y/n)? ^C
[31mERROR: Operation cancelled by user[0m


In [11]:
# Remove ## from lines starting with ! and run them the first time to install necessary packages 

##%%capture
# Install the Tensorflow 2.0.0 version.
!pip3 install --user tensorflow==2.0.0
# Install TF-Hub.
!pip3 install --user tensorflow-hub
!pip3 install --user seaborn


Collecting tensorflow==2.0.0
  Downloading tensorflow-2.0.0-cp37-cp37m-manylinux2010_x86_64.whl (86.3 MB)
[K     |████████████████████████████████| 86.3 MB 44 kB/s s eta 0:00:01
Collecting tensorflow-estimator<2.1.0,>=2.0.0
  Downloading tensorflow_estimator-2.0.1-py2.py3-none-any.whl (449 kB)
[K     |████████████████████████████████| 449 kB 67.0 MB/s eta 0:00:01
[?25hCollecting tensorboard<2.1.0,>=2.0.0
  Downloading tensorboard-2.0.2-py3-none-any.whl (3.8 MB)
[K     |████████████████████████████████| 3.8 MB 48.3 MB/s eta 0:00:01
[31mERROR: tensorflow-gpu 2.1.0 has requirement tensorboard<2.2.0,>=2.1.0, but you'll have tensorboard 2.0.2 which is incompatible.[0m
[31mERROR: tensorflow-gpu 2.1.0 has requirement tensorflow-estimator<2.2.0,>=2.1.0rc0, but you'll have tensorflow-estimator 2.0.1 which is incompatible.[0m
Installing collected packages: tensorflow-estimator, tensorboard, tensorflow
Successfully installed tensorboard-2.0.2 tensorflow-2.0.0 tensorflow-estimator-2.0.1


In [12]:
#@title Load the Universal Sentence Encoder's TF Hub module
from absl import logging

import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import re
import seaborn as sns

module_url = "https://tfhub.dev/google/universal-sentence-encoder/4" #@param ["https://tfhub.dev/google/universal-sentence-encoder/4", "https://tfhub.dev/google/universal-sentence-encoder-large/5"]
model = hub.load(module_url)
print ("module %s loaded" % module_url)
def embed(input):
  return model(input)

module https://tfhub.dev/google/universal-sentence-encoder/4 loaded


In [22]:
#create embeddings on the training data 
logging.set_verbosity(logging.ERROR)
message_embeddings = embed(final.CleanedText)

In [23]:
print("Training X Shape", final.shape)

Training X Shape (128, 14)


In [24]:
final.head()

Unnamed: 0.1,Unnamed: 0,overall,vote,verified,reviewTime,reviewerID,asin,style,reviewerName,reviewText,summary,unixReviewTime,image,CleanedText
96,6099775,5,,True,"09 6, 2016",A2A871CAU2PP3F,B01HJCN1EI,,Joyce Mirowski,I enjoy reading on the kindle the cable i s lo...,very nice cable,1473120000,,enjoy read kindl cabl long enough keep charg d...
141,6739580,5,,True,"07 25, 2017",A1OOVLE2KZ6KGA,B01HJCN1EI,,Puddzee,These are my favorite charging cords for a few...,Worth the price.,1500940800,,favorit charg cord reason length mean reach an...
101,6099780,5,6.0,True,"08 14, 2016",A2AWUJAT7Z6L68,B01HJCN1EI,,Luanne Serrato,"<a data-hook=""product-link-linked"" class=""a-li...",Great Buy!,1471132800,,kindl usb cabl nylon usb cabl deego high speed...
100,6099779,5,,True,"08 22, 2016",A1LYZQDDYM3VKW,B01HJCN1EI,,kfb,"This works great, i needed a longer cable (thi...","This works great, i needed a longer cable (thi...",1471824000,,work great need longer cabl stronger connect p...
99,6099778,5,,True,"08 25, 2016",A3DDSH3IG02ESZ,B01HJCN1EI,,Volunteer,I actually bought this to charge my Kindle Fir...,Works Great,1472083200,,actual bought charg kindl fire work great thin...


In [25]:
message_embeddings[10]

<tf.Tensor: shape=(512,), dtype=float32, numpy=
array([-3.91825549e-02, -2.71066427e-02, -7.11323768e-02,  3.20367552e-02,
       -6.37410432e-02, -6.67169839e-02, -6.39935657e-02, -1.02262916e-02,
        5.94306365e-02, -5.20161819e-03,  5.01778834e-02, -2.73479000e-02,
        1.19817071e-02, -8.35636556e-02,  4.20056880e-02,  3.54250446e-02,
       -8.00610259e-02,  3.76052335e-02, -1.35132752e-03,  6.52171746e-02,
       -3.95172425e-02,  1.90974399e-02,  1.65000558e-02, -2.88264249e-02,
        6.26439825e-02,  1.94014236e-02, -5.75696528e-02,  5.20338975e-02,
       -4.02670652e-02,  7.51960874e-02, -3.07151545e-02,  6.17478192e-02,
        2.56837308e-02,  6.56776652e-02,  6.05690181e-02, -5.80549147e-03,
        4.36521339e-04, -5.07867672e-02, -7.83550814e-02, -2.04834770e-02,
        4.11956804e-03, -9.18720886e-02, -3.92481266e-03, -7.68377557e-02,
       -2.71673873e-02,  4.73924391e-02, -3.10623646e-02, -2.54868846e-02,
        2.80272197e-02,  2.22418383e-02,  1.88030880

## Model creation

In [27]:
from sklearn.model_selection import train_test_split
#split the data into training and testing data, using "overall" as the target variable
y=final.overall
#x=final_mini.drop('overall',axis=1)
x_train,x_test,y_train,y_test=train_test_split(final,y,test_size=0.2)
print("No. of datapoints in x_train :",len(x_train))
print("No. of datapoints in x_test :",len(x_test))
print("Shape of y_train :",y_train.shape)
print("Shape of y_test :",y_test.shape)

No. of datapoints in x_train : 102
No. of datapoints in x_test : 26
Shape of y_train : (102,)
Shape of y_test : (26,)


In [33]:
# create the model
embedding_vecor_length = 32

# Initialising the model
model_1 = Sequential()

# Adding embedding
model_1.add(Embedding(len(message_embeddings) + 1, embedding_vecor_length, input_length=14))

# Adding Dropout
model_1.add(Dropout(0.2))

# Adding first LSTM layer
model_1.add(LSTM(100))

# Adding Dropout
model_1.add(Dropout(0.2))

# Adding output layer
model_1.add(Dense(1, activation='sigmoid'))

# Printing the model summary
print(model_1.summary())

# Compiling the model
model_1.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_6 (Embedding)      (None, 14, 32)            4128      
_________________________________________________________________
dropout_9 (Dropout)          (None, 14, 32)            0         
_________________________________________________________________
lstm_5 (LSTM)                (None, 100)               53200     
_________________________________________________________________
dropout_10 (Dropout)         (None, 100)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 101       
Total params: 57,429
Trainable params: 57,429
Non-trainable params: 0
_________________________________________________________________
None


In [34]:
# Fitting the data to the model
history_1 = model_1.fit(x_train, y_train, nb_epoch=10, batch_size=512 ,verbose=1,validation_data=(x_test, y_test))

Train on 102 samples, validate on 26 samples
Epoch 1/10


ValueError: could not convert string to float: '03 31, 2017'

## Stanford POS Tagger to Find Product Attributes

We use the Stanford POS tagger to find the most common nouns used in product reviews for each product ID (ASIN). Then we use the most common nouns as product attributes. 

In [None]:
#!python -m pip install --upgrade pip
#!pip install torch
#!pip install stanfordnlp

In [None]:
# need to install java (unless you already have it installed) 
# and update the path to where ever it is stored on your computer
import os
java_path = "C:/Program Files/Java/jre1.8.0_241/bin/java.exe"
os.environ['JAVAHOME'] = java_path

# need to follow instructions to install Stanford POS tagger here: 
# https://phitchuria.wordpress.com/2018/09/29/python-nltk-using-stanford-pos-tagger-in-nltk-on-windows/
from nltk.tag import StanfordPOSTagger
from nltk.corpus import stopwords
stanford_dir = "C:\Stanford\stanford-postagger-2018-10-16"
modelfile = stanford_dir+"\models\english-bidirectional-distsim.tagger"
jarfile=stanford_dir+"\stanford-postagger.jar"

tagger=StanfordPOSTagger(model_filename=modelfile, path_to_jar=jarfile)

In [None]:
freq_dist={}
for i in range(1,len(x_train)): 
#for i in range(1,10): 
    tagged_POS = tagger.tag(x_train.reviewText[i].split())
    for word,tag in tagged_POS:
        if tag == 'NN' or tag == 'NNS':
            if word in freq_dist:
                freq_dist[word] += 1
            else:
                freq_dist[word] = 1


In [None]:
import operator
sorted_freq_dist=sorted(freq_dist.items(),key=operator.itemgetter(1))
# change into the dictionary since it is easier to approach
dict_sorted_freq_dist=dict(sorted_freq_dist)

print(dict_sorted_freq_dist)

In [None]:
print(freq_dist[0])