In [162]:
# Pandas
import pandas as pd

import numpy as np

import warnings
warnings.filterwarnings('ignore')

## Part 2.

>Find all the mentions of world countries in the whole corpus, using the pycountry utility (HINT: remember that there will be different surface forms for the same country in the text, e.g., Switzerland, switzerland, CH, etc.) Perform sentiment analysis on every email message using the demo methods in the nltk.sentiment.util module. Aggregate the polarity information of all the emails by country, and plot a histogram (ordered and colored by polarity level) that summarizes the perception of the different countries. Repeat the aggregation + plotting steps using different demo methods from the sentiment analysis module -- can you find substantial differences?

**Data Emails**

In [163]:
#Data location
data_path = "hillary-clinton-emails/"

#Import data
aliases          = pd.read_csv(data_path+"Aliases.csv",         index_col=0)
emailsReceivers  = pd.read_csv(data_path+"EmailReceivers.csv",  index_col=0)
emails           = pd.read_csv(data_path+"Emails.csv",          index_col=0)
persons          = pd.read_csv(data_path+"Persons.csv",         index_col=0)

In [164]:
emails_sub_body = emails[['ExtractedBodyText','ExtractedSubject']]
emails_sub_body.count()

ExtractedBodyText    6742
ExtractedSubject     6260
dtype: int64

In [165]:
emails_sub_body.ExtractedBodyText.fillna('',inplace=True)
emails_sub_body.ExtractedSubject.fillna('',inplace=True)
emails_sub_body["SubBody"] = emails_sub_body['ExtractedBodyText'] + " " + emails_sub_body['ExtractedSubject']

In [166]:
emails = emails_sub_body.drop(['ExtractedBodyText', 'ExtractedSubject'], 1)
emails.head()

Unnamed: 0_level_0,SubBody
Id,Unnamed: 1_level_1
1,FW: Wow
2,"B6\nThursday, March 3, 2011 9:45 PM\nH: Latest..."
3,Thx Re: Chris Stevens
4,FVV: Cairo Condemnation - Final
5,"H <hrod17@clintonemail.com>\nFriday, March 11,..."


In [167]:
emails_sub_body.SubBody = emails_sub_body.SubBody.str.replace('\n', " ")
emails.head()

Unnamed: 0_level_0,SubBody
Id,Unnamed: 1_level_1
1,FW: Wow
2,"B6\nThursday, March 3, 2011 9:45 PM\nH: Latest..."
3,Thx Re: Chris Stevens
4,FVV: Cairo Condemnation - Final
5,"H <hrod17@clintonemail.com>\nFriday, March 11,..."


In [168]:
test_sample = emails_sub_body['SubBody'].loc[345]
print(test_sample)

Here's a partial list of followup from our last trip and the last week: What can we do to help protect the Christians in Iraq as requested by Ken Joseph whom we saw in Baghdad? JoDee Winterhof raised questions about how the PRTs and the language DOD uses about them are problematic for NGOs like care. Pls ask one of Holbrooke's people if they ever talked to Wolfgang Danspeckgruber at Princeton about building a railroad in Aghanistan. Also Dr. Arthur Keys at International Relief + Development wanted to talk w someone from Holbrooke's team about development in Af. I asked the Spec IG for Af Recon, Arnold Fields, to alert us to problems as soon as they can. I'm not sure how to formalize this or even if it's appropriate. Let's discuss. What are the "Iran Watchers"? Followup


**Countries and cities**

We will use *pycountry* for the countries and countries code.

In [169]:
import pycountry

In [170]:
all_country = []

for c in list(pycountry.countries):
    country_entry = [c.alpha_2, c.alpha_3, c.name, c.numeric, getattr(c, 'official_name', "")]
    all_country.append(country_entry)
    
country_dict = pd.DataFrame(all_country, columns=('Alpha2', 'Alpha3', 'Name', 'Numeric', 'Official_name'))

country_dict.head()

Unnamed: 0,Alpha2,Alpha3,Name,Numeric,Official_name
0,AW,ABW,Aruba,533,
1,AF,AFG,Afghanistan,4,Islamic Republic of Afghanistan
2,AO,AGO,Angola,24,Republic of Angola
3,AI,AIA,Anguilla,660,
4,AX,ALA,Åland Islands,248,


But we will also add to the *pycountry* data, the capital for each country. Indeed, emails often cite directly the capital, without specifying a country.

In [171]:
capital_cities = "https://raw.githubusercontent.com/icyrockcom/country-capitals/master/data/country-list.csv"
capitals = pd.read_csv(capital_cities)

capitals.head()

Unnamed: 0,country,capital,type
0,Abkhazia,Sukhumi,countryCapital
1,Afghanistan,Kabul,countryCapital
2,Akrotiri and Dhekelia,Episkopi Cantonment,countryCapital
3,Albania,Tirana,countryCapital
4,Algeria,Algiers,countryCapital


Therefore, we merge our two country data together.

In [172]:
country_dict['Capital'] = ""

for i, capital_entry in capitals.iterrows():
    for j, country_entry in country_dict.iterrows():
        if (capital_entry['country'] == country_entry['Name']):
            country_dict.set_value(j, "Capital", capital_entry.capital)

country_dict.head()

Unnamed: 0,Alpha2,Alpha3,Name,Numeric,Official_name,Capital
0,AW,ABW,Aruba,533,,Oranjestad
1,AF,AFG,Afghanistan,4,Islamic Republic of Afghanistan,Kabul
2,AO,AGO,Angola,24,Republic of Angola,Luanda
3,AI,AIA,Anguilla,660,,The Valley
4,AX,ALA,Åland Islands,248,,


**Country Alternative names**

People may refere to a country not only by mentionned its name or its capital's name. Therefore, we need a way to add alternative names for a country. 
Exemple: *'CH'* for Switzerland

In [173]:
country_dict['Alt_names'] = ""

country_dict.head()

Unnamed: 0,Alpha2,Alpha3,Name,Numeric,Official_name,Capital,Alt_names
0,AW,ABW,Aruba,533,,Oranjestad,
1,AF,AFG,Afghanistan,4,Islamic Republic of Afghanistan,Kabul,
2,AO,AGO,Angola,24,Republic of Angola,Luanda,
3,AI,AIA,Anguilla,660,,The Valley,
4,AX,ALA,Åland Islands,248,,,


In [174]:
# function to add any alternative name to a country
def add_country_alt_name(name, alt):
    for index, row in country_dict.iterrows():
        if(row.Name == name):
            row.Alt_names += "-"
            row.Alt_names += alt
            print("Added successfully")

**Countries names list**

Build a dictionnary with all names that refer to a country.

In [175]:
def country_city_list(n):
    """
        Returns a list of all words referring to a country.
        By words, we mean the name of the country, the capital,
        and all other alternative names, like 'CH' for Switzerland.
        
        INPUT
            n: index of the country in the 'country_dict' dataframe
            
        OUTPUT
            l: list of all words referring to the country
    """
    
    l = []
    country_entry = country_dict.loc[n]
    
    # Country Name
    l.append(country_entry.Name)
    
    # Country Capital
    if (country_entry.Capital != ""):
        l.append(country_entry.Capital)
    
    # All others alternative names, cities, ...
    if (country_entry.Alt_names != ""):
        names = country_entry.Alt_names.split("-")
        l.extend(names)
        
    # return list
    return l

In [176]:
country_names = {}

for index, row in country_dict.iterrows():
    country_names[row.Name] = country_city_list(index)

** Country in email**

In [177]:
def containsCountryInfo(content):
    """
        Returns the countries that the given string refers to.
        
        INPUT
            content: string to analyse, which may mention a country
            
        OUTPUT
            country_list: list of countrie mentionned is the input 'content'
    """
    
    country_list = []
    
    for index, row in country_dict.iterrows():
        inside = False
        
        for name in country_names[row.Name]:
            if(name != "" and name in content):
                inside = True
                
        if inside:
            country_list.append(row.Name)
                
    return country_list

In [178]:
emails["Country"] = [containsCountryInfo(email) for email in emails.SubBody]
emails.head()

Unnamed: 0_level_0,SubBody,Country
Id,Unnamed: 1_level_1,Unnamed: 2_level_1
1,FW: Wow,[]
2,"B6\nThursday, March 3, 2011 9:45 PM\nH: Latest...",[]
3,Thx Re: Chris Stevens,[]
4,FVV: Cairo Condemnation - Final,[Egypt]
5,"H <hrod17@clintonemail.com>\nFriday, March 11,...",[]


In [179]:
emails["Nbr country"] = [len(c) for c in emails.Country]
emails.head()

Unnamed: 0_level_0,SubBody,Country,Nbr country
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,FW: Wow,[],0
2,"B6\nThursday, March 3, 2011 9:45 PM\nH: Latest...",[],0
3,Thx Re: Chris Stevens,[],0
4,FVV: Cairo Condemnation - Final,[Egypt],1
5,"H <hrod17@clintonemail.com>\nFriday, March 11,...",[],0


** Sentiment analysis **

In [180]:
a = emails["Nbr country"] == 0
data_for_sentiment = emails[~ a]
data_for_sentiment.head()

Unnamed: 0_level_0,SubBody,Country,Nbr country
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
4,FVV: Cairo Condemnation - Final,[Egypt],1
7,"FW: Anti-Muslim film director in hiding, foll...","[Egypt, Libya]",2
10,"B6\nWednesday, September 12, 2012 6:16 PM\nFwd...",[Libya],1
11,Fyi\nB6\n— — AbZ and Hb3 on Libya and West Ban...,[Libya],1
12,"B6\nWednesday, September 12, 2012 6:16 PM\nFwd...",[Libya],1


In [181]:
print("Emails without country:", len(emails))
print("Emails with country:", len(data_for_sentiment))
print("Percentage:",len(data_for_sentiment)/len(emails)*100, "%")

Emails without country: 7945
Emails with country: 1642
Percentage: 20.66708621774701 %


In [182]:
mult_countries = data_for_sentiment["Nbr country"] > 1
print("Emails mentionning more than one country", mult_countries.sum())
print("Percentage:", mult_countries.sum()/len(data_for_sentiment)*100, "%")

Emails mentionning more than one country 438
Percentage: 26.6747868453 %


BLA BLA BLA WE NEED TO DEAL WITH MULTIPLE COUNTRIES BLA BLA

In [183]:
from textblob import TextBlob

data_for_sentiment["Polarity"] = ""

for index, row in data_for_sentiment.iterrows():
    content = TextBlob(row.SubBody)
    data_for_sentiment.set_value(index, "Polarity", content.sentiment.polarity)
    
data_for_sentiment.sample(10)

Unnamed: 0_level_0,SubBody,Country,Nbr country,Polarity
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
5805,Fw: (AP) al-Shabab claims responsibility for ...,[Somalia],1,-0.75
5242,Word is that we're going to announce our respo...,[Israel],1,0.0
3315,Here is rasmussen statement at press conf toda...,"[Afghanistan, United States]",2,0.229132
4443,He is willing to take your call anytime betwee...,[Algeria],1,0.25
5920,8:00 am Call Iv/ Israeli PM Netanyahu\n8:10 am...,[Israel],1,-0.203846
1485,See below. FW: For the Secretary of State visi...,[Haiti],1,0.0
1848,Fyi Fw: Sri Lanka army: 3 Tamil Tiger leaders ...,[Sri Lanka],1,0.15
6339,Fw: (AP) Netanyahu wants Israeli troops at Pa...,[Israel],1,-0.275
5749,HRC: Below is an oped on the National Security...,"[Afghanistan, Brazil, China, Colombia, Iraq, J...",8,0.00897617
3519,Fw: Libya,[Libya],1,0.0


In [184]:
temp = list()

for index, row in data_for_sentiment.iterrows():
    email = row.SubBody
    polarity = row.Polarity
    for c in row.Country:
        temp.append([email, polarity, c])
        
email_polarity = pd.DataFrame(temp, columns=["Body", "Polarity", "Country"])
email_polarity.sample(10)

Unnamed: 0,Body,Polarity,Country
2076,H: I'm sure you are preoccupied with the adven...,0.059808,Spain
1094,I. Introduction\nThe next six months will be a...,0.047551,Puerto Rico
1145,"Jake, Want to make sure you saw the front page...",0.003628,Palau
729,So you see the traffic when I stepped in - scr...,-0.155556,Sri Lanka
498,AMS did not like any of this both for the subs...,0.104368,Indonesia
1165,"H: Of interest, Hague latest briefing, hostile...",0.114383,Poland
1896,"The Runaway General\nStanley McChrystal, Obama...",0.048852,Canada
2443,S Fw: (Reuters) Assad says peace chances with...,-0.75,Israel
694,Fw: U.N. asks Afghanistan to lift election me...,0.0,Afghanistan
2642,I would like to discuss. Re: S stop in Ecuador,-0.25,Ecuador


In [186]:
sentiment_count = email_polarity.copy()

sentiment_count['Sign'] = np.sign(test.Polarity)

sentiment_count = sentiment_count.groupby('Country').Sign.value_counts().unstack()

sentiment_count.sample(10)

Sign,-1.0,0.0,1.0
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Costa Rica,1.0,1.0,4.0
Palau,1.0,7.0,14.0
Congo,,3.0,4.0
Spain,,1.0,14.0
Niger,,2.0,6.0
Hungary,,,1.0
Netherlands,,1.0,7.0
Kuwait,1.0,2.0,5.0
Montserrat,1.0,,
Samoa,,,1.0


In [187]:
email_polarity_groupby = email_polarity['Polarity'].groupby(email_polarity['Country'])

temp = email_polarity_groupby.mean()

email_polarity_analysis = pd.DataFrame(temp)

email_polarity_analysis = email_polarity_analysis.rename(columns = {'Polarity':'Mean'})

email_polarity_analysis['Max'] = email_polarity_groupby.max()
email_polarity_analysis['Min'] = email_polarity_groupby.min()
email_polarity_analysis['Count'] = email_polarity_groupby.count()
email_polarity_analysis['Std'] = email_polarity_groupby.std()

email_polarity_analysis.sample(10)

Unnamed: 0_level_0,Mean,Max,Min,Count,Std
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Slovenia,0.38,0.38,0.38,1,
Argentina,0.058167,0.4,-0.2,23,0.146936
Nigeria,0.131545,0.628409,0.0,8,0.208803
Montserrat,-0.083333,-0.083333,-0.083333,1,
Gabon,0.0,0.0,0.0,1,
Samoa,0.45,0.45,0.45,1,
Iraq,0.010429,0.575,-0.75,94,0.252395
Oman,0.180138,0.5,-0.07875,14,0.197067
Egypt,-0.002394,0.5,-0.75,65,0.229433
Antarctica,0.091667,0.091667,0.091667,1,


In [188]:
email_sentiment_analysis = pd.concat([email_polarity_analysis, sentiment_count], axis=1)

email_sentiment_analysis = email_sentiment_analysis.rename(columns = {-1.0:'Negative_count'})
email_sentiment_analysis = email_sentiment_analysis.rename(columns = {0.0:'Neutral_count'})
email_sentiment_analysis = email_sentiment_analysis.rename(columns = {1.0:'Positive_count'})

email_sentiment_analysis.fillna(0,inplace=True)

email_sentiment_analysis.sample(10)

Unnamed: 0_level_0,Mean,Max,Min,Count,Std,Negative_count,Neutral_count,Positive_count
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Afghanistan,0.080838,0.8,-0.75,141,0.184772,10.0,27.0,104.0
Singapore,0.127324,0.575,-0.125,13,0.197034,3.0,2.0,8.0
Ethiopia,0.036799,0.113316,0.0,6,0.045974,0.0,3.0,3.0
Chad,0.018101,0.018101,0.018101,1,0.0,0.0,0.0,1.0
Turkmenistan,0.095521,0.095521,0.095521,1,0.0,0.0,0.0,1.0
Bosnia and Herzegovina,0.106005,0.106005,0.106005,1,0.0,0.0,0.0,1.0
Samoa,0.45,0.45,0.45,1,0.0,0.0,0.0,1.0
Montserrat,-0.083333,-0.083333,-0.083333,1,0.0,1.0,0.0,0.0
Latvia,0.031977,0.056333,0.007621,2,0.034444,0.0,0.0,2.0
Italy,0.092513,0.5,-0.008894,24,0.13313,1.0,7.0,16.0


In [210]:
df_plot = email_sentiment_analysis.loc[email_sentiment_analysis['Count'] > 0]
df_plot = df_plot.sort(['Count'], ascending=[1])

df_plot.head()

Unnamed: 0_level_0,Mean,Max,Min,Count,Std,Negative_count,Neutral_count,Positive_count
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Montserrat,-0.083333,-0.083333,-0.083333,1,0.0,1.0,0.0,0.0
Chad,0.018101,0.018101,0.018101,1,0.0,0.0,0.0,1.0
Guinea-Bissau,-0.0125,-0.0125,-0.0125,1,0.0,1.0,0.0,0.0
Guam,0.107143,0.107143,0.107143,1,0.0,0.0,0.0,1.0
Malta,0.337879,0.337879,0.337879,1,0.0,0.0,0.0,1.0


In [222]:
import plotly.plotly as py

import plotly.tools as tls
tls.set_credentials_file(username='butterflyg', api_key='6obPJi5vjylihiP6SnIm')

from plotly.graph_objs import *

# Template from https://plot.ly/~Dreamshot/239#code

def plot_histogram(df_plot):

    trace1 = {
      #"x": [0.9, 1.9, 6.8, 3.8, 3.8, 1.9, 4.8, 5, 0, 6.7, 3.9, 1.9, 7.8, 0.8, 8.8, 5.8, 7.8, 3.9, 7.8, 2.9, 1.9, 1.9, 0, 21.7], 
      "x" : df_plot.Negative_count,
      #"y": ["LinkedIn", "Yelp", "Facebook", "Flickr", "Pinterest", "Blogger", "WordPress.com", "Goodreads", "Slashdot", "Last.fm", "StumbleUpon", "Quora", "Twitter", "Stack Overflow", "MySpace", "LiveJournal", "Tagged", "Hi5", "Tumblr", "Reddit", "Github", "Orkut", "Hacker News", "deviantART"], 
      "y" : df_plot.index,
      "marker": {"color": "rgb(255, 0, 0)"}, 
      "name": "Negative", 
      "orientation": "h", 
      "type": "bar", 
      "uid": "063b98", 
      "xsrc": "Dreamshot:4231:b631ec", 
      "ysrc": "Dreamshot:4231:b4bc0c"
    }

    trace2 = {
      #"x": [0.9, 1.9, 6.8, 3.8, 3.8, 1.9, 4.8, 5, 0, 6.7, 3.9, 1.9, 7.8, 0.8, 8.8, 5.8, 7.8, 3.9, 7.8, 2.9, 1.9, 1.9, 0, 21.7], 
      "x" : df_plot.Neutral_count,
      #"y": ["LinkedIn", "Yelp", "Facebook", "Flickr", "Pinterest", "Blogger", "WordPress.com", "Goodreads", "Slashdot", "Last.fm", "StumbleUpon", "Quora", "Twitter", "Stack Overflow", "MySpace", "LiveJournal", "Tagged", "Hi5", "Tumblr", "Reddit", "Github", "Orkut", "Hacker News", "deviantART"], 
      "y" : df_plot.index,
      "marker": {"color": "rgb(41, 128, 171)"}, 
      "name": "Neutral", 
      "orientation": "h", 
      "type": "bar", 
      "uid": "063b98", 
      "xsrc": "Dreamshot:4231:b631ec", 
      "ysrc": "Dreamshot:4231:b4bc0c"
    }
    
    trace3 = {
      "x" : df_plot.Positive_count,
      "y" : df_plot.index,
      "marker": {"color": "rgb(36, 118, 23)"}, 
      "name": "Positive", 
      "orientation": "h", 
      "type": "bar", 
      "uid": "063b98", 
      "xsrc": "Dreamshot:4231:b631ec", 
      "ysrc": "Dreamshot:4231:b4bc0c"
    }



    data = Data([trace1, trace2, trace3])
    layout = {
      "autosize": False, 
      "bargap": 0.05, 
      "bargroupgap": 0.15, 
      "barmode": "stack", 
      "boxgap": 0.3, 
      "boxgroupgap": 0.3, 
      "boxmode": "overlay", 
      "dragmode": "zoom", 
      "font": {
        "color": "rgb(255, 255, 255)", 
        "family": "'Open sans', verdana, arial, sans-serif", 
        "size": 12
      }, 
      "height": 2700, 
      "hidesources": False, 
      "hovermode": "x", 
      "legend": {
        "x": 1.11153846154, 
        "y": 1.01538461538, 
        "bgcolor": "rgba(255, 255, 255, 0)", 
        "bordercolor": "rgba(0, 0, 0, 0)", 
        "borderwidth": 1, 
        "font": {
          "color": "", 
          "family": "", 
          "size": 0
        }, 
        "traceorder": "normal", 
        "xanchor": "auto", 
        "yanchor": "auto"
      }, 
      "margin": {
        "r": 80, 
        "t": 100, 
        "autoexpand": True, 
        "b": 80, 
        "l": 100, 
        "pad": 0
      }, 
      "paper_bgcolor": "rgb(67, 67, 67)", 
      "plot_bgcolor": "rgb(67, 67, 67)", 
      "separators": ".,", 
      "showlegend": True, 
      "smith": False, 
      "title": "<br> Sentiment Analysis of Emails by Country", 
      "titlefont": {
        "color": "rgb(255, 255, 255)", 
        "family": "", 
        "size": 0
      }, 
      "width": 700, 
      "xaxis": {
        "anchor": "y", 
        "autorange": True, 
        "autotick": True, 
        "domain": [0, 1], 
        "dtick": 20, 
        "exponentformat": "e", 
        "gridcolor": "#ddd", 
        "gridwidth": 1, 
        "linecolor": "#000", 
        "linewidth": 1, 
        "mirror": False, 
        "nticks": 0, 
        "overlaying": False, 
        "position": 0, 
        "range": [0, 105.368421053], 
        "rangemode": "normal", 
        "showexponent": "all", 
        "showgrid": False, 
        "showline": False, 
        "showticklabels": True, 
        "tick0": 0, 
        "tickangle": "auto", 
        "tickcolor": "#000", 
        "tickfont": {
          "color": "", 
          "family": "", 
          "size": 0
        }, 
        "ticklen": 5, 
        "ticks": "", 
        "tickwidth": 1, 
        "title": "Sorted by number of emails mentions in Dataset <br><i>Source: Hillary Clinton Leaked Emails</i>", 
        "titlefont": {
          "color": "", 
          "family": "", 
          "size": 0
        }, 
        "type": "linear", 
        "zeroline": False, 
        "zerolinecolor": "#000", 
        "zerolinewidth": 1
      }, 
      "yaxis": {
        "anchor": "x", 
        "autorange": True, 
        "autotick": True, 
        "domain": [0, 1], 
        "dtick": 1, 
        "exponentformat": "e", 
        "gridcolor": "#ddd", 
        "gridwidth": 1, 
        "linecolor": "#000", 
        "linewidth": 1, 
        "mirror": False, 
        "nticks": 0, 
        "overlaying": False, 
        "position": 0, 
        "range": [-0.5, 23.5], 
        "rangemode": "normal", 
        "showexponent": "all", 
        "showgrid": False, 
        "showline": False, 
        "showticklabels": True, 
        "tick0": 0, 
        "tickangle": "auto", 
        "tickcolor": "#000", 
        "tickfont": {
          "color": "", 
          "family": "", 
          "size": 0
        }, 
        "ticklen": 5, 
        "ticks": "", 
        "tickwidth": 1, 
        "title": "", 
        "titlefont": {
          "color": "", 
          "family": "", 
          "size": 0
        }, 
        "type": "category", 
        "zeroline": False, 
        "zerolinecolor": "#000", 
        "zerolinewidth": 1
      }
    }
    fig = Figure(data=data, layout=layout)
    return py.iplot(fig)

In [223]:
plot_histogram(df_plot)