## Dashboards

In our previous lecture (lecture 8), we have seen how to add some interactivity to your table and your plots. Now, imagine that you are working on your project, and that you already have some preliminary results that you want to share with a team of colleagues working on the same project, or with an institution/an organisation supporting your project, or even to address non-technical readers. For this, you can use a simple dashboard in which you sum up the essential results of your preliminary work. 

In order to deploy such a dashboard, you will need to install the following utilities: 

  1. a jupyter lab server, which is serving notebooks (like this one) on your own workspace: 

In [3]:
#pip3 install jupyterlab

  2. an application for your dashboard -- in this example, we are using the simple _voila_ dashboard utility because it is simple to setup and it offers basic dashboard functionalities:

In [4]:
#pip3 install voila

  3. in order to have a better layout for our voila-dashboard, we install also the gridstack and the voila-vuetify templates like this:

In [5]:
#pip3 install voila-gridstack voila-vuetify

  4. finally, to enable conversions of your dashboard into other formats (like PDF f.ex.), you can also install the nbconvert utility like this: 

In [6]:
#pip3 install nbconvert

  5. troubleshooting possible old packages -- as _voila_ develop at quick pace, it is possible that you encounter errors by installing it, coming from dependencies that are too old. In order to update voila and its dependencies, you can run the following: 

In [7]:
#pip3 install jupyter-book myst-nb nbclient voila nbinteract nbformat --upgrade

Now, you can run jupyter lab as a standalone server on your computer by typing:

In [8]:
#jupyter lab

in a terminal or a windows cmd window. Your Internet-browser will be opened at the start page of jupyter lab, and you will have the possibility to create an ipynb notebook file which will be your dashboard. Let us create an example dashboard from the data that we have used in our lecture 8: 

  1. go to 'File', 'New' and select 'Notebook'
  2. at the pop-up window, be sure to select python3
  3. an 'Untitled.ipynb' notebook file is created -- in our example, we rename it to 'Dash9.ipynb'. 
  
While opening your new notebook in jupyter lab, you will see a taskbar with on the left an icon to save your work, then a plus isgn, etc., and on the extreme right of the taskbar a square icon with darkgrey squares in it -- this is the gridstack editor which we will use to create our dashboard. You can click on it in your notebook, and we will place some of the information taken from our lecture 8 in order to construct our dashboard (see our example 'Dash9.ipynb').

### First import the libraries

First, we import into our dashboard the libraries that we need to construct our tables and plots. We import the libraries for the static plots and tables, and also the one needed to implement some interactivity. In 'Dash9.ipynb', we make two cells in order to render the differences between these libraries, but you could paste them into one cell, too. 

The second step is to customize our file 'Dominant_Topics_ENG_2.csv', in order to have a table with more explicit content and appropriate sort keys. Run all cells in your notebook in order to be sure that you have the appropriate table. Now, we have everything needed to make our dashboard.

### Table -- Show your dataframe

The first element we would like to show in our dashboard is our dataframe, and we would like to enable the audience to sort it by the value of the topics. In our lecture 8, we have some code for that presentation, so let us take it into our dashboard. This were the following lines of code: 

In [9]:
# Widget to filter table by topic value
#@interact
#def show_articles(column=['Topic_0', 'Topic_1', 'Topic_2', 'Topic_3', 'Topic_4'], value=(0, 1, 0.05)):
#    return df_eng.loc[df_eng[column] > value]

Let us paste them into our 'Dash9.ipynb' file -- take the cell with the mouse, and drag-drop it to the gridstack editor. Write a title in another (Markdown) cell, and paste it above your table. Now you see a title and after that, your table with the interactive properties that we have coded in our lecture 8. You as well as the audience using your dashboard will be able to change the topic or / and the value of them to filter your dataset. Let us do the same with a plot. 

### Plot -- Show first results

We want to show a plot in which the values are aggregated, letting the audience see the multiple factors of our dataset, and letting it interact with them. For this, we use the code or our cell 11 in our lecture 8, to which we add the following line from our cell 10 in order to display the years correctly, and we add it just after the call to the plotly express library (see our file 'Dash9.ipynb'):

In [10]:
#df_eng2["Year"] = df_eng2["Year"].astype(int)

We now can drag and drop this cell to our dashboard. We first write a title for this plot in a cell, drag-drop this title to the dashboard, then drag-drop the cell with the code of our plot. 

### Troubles with voila and gridstack

Depending on the version of the widgets, you can quickly get into troubles using the voila gridstack template, f.ex. at the time of writing, plotly plots won't be shown at all in the dashboard. A workaround is to use the voila rendering without the gridstack template, by clicking on the blue-yellow circle icon which is the on the right of the drop-down menu where you can select the formatting of your cells (Markdown, Code or Raw). 

### Web-app

Another solution to avoid hunting bugs in the interplay between jupyter lab, the voila application and the widgets and jupyter extensions, you simply could use a public service like f.ex. mybinder.org which target the notebook in which you have written your presentation.

For that, you first need to edit a file called 'requirements.txt' -- you can find mine in this repository. In this file, you have to list down all the libraries that your notebook is needed in order to be properly constructed by binder. If you look at mine, I have the name of the library, two equal signs, and finally the version of my library. In order to find your installed libraries in python with the version number, you can type in a terminal window (cmd window in windows) the following: 

In [11]:
#pip3 list

This code output all packages installed in your python with their version number. You have to read the list and pick up the libraries and its version numbers corresponding to the ones which you are using in your dashboard.

Another requirement is to have a github, a gitlab or another account in relation to mybinder in order for binder to scan the files in this account -- or optionally one of the file (your dashboard in our example) in this account. Then, you just have to make your account 'public' (and not 'private'), for example just for the time you want to share your dashboard, making it 'private' again after this time.

Having all these elements, you can go to mybinder.org, input the path to your online account and give the name of your dashboard, and it will be published online, for other people to see what you have achieved. You will be given an URL of your dashboard online that you can distribute to others.

Such web-apps are also an interesting way to turn your dashboard into a presentation, f.ex. if you have to give a talk at a conference and don't want to carry everything with you. 

## Network of topics

To make a useful presentation of your topic analysis, you can also opt for a network graph, introducing to the relationships between your topics and your case. Let us come back to our data 'Dominant_Topics_ENG_2.csv', and to our table in the form of a pandas dataframe 'df_eng2'.

In [12]:
from matplotlib import pyplot as plt
plt.rcParams["figure.figsize"] = [10, 6]
# Set up with a higher resolution screen (useful on Mac)
%config InlineBackend.figure_format = 'retina'
import pandas as pd
df_eng = pd.read_csv('Dominant_Topics_ENG_2.csv')
# First step -- rename columns
df_eng.rename(columns={ df_eng.columns[0]: "Articles" }, inplace = True)
df_eng.rename(columns={ df_eng.columns[1]: "Topic_0" }, inplace = True)
df_eng.rename(columns={ df_eng.columns[2]: "Topic_1" }, inplace = True)
df_eng.rename(columns={ df_eng.columns[3]: "Topic_2" }, inplace = True)
df_eng.rename(columns={ df_eng.columns[4]: "Topic_3" }, inplace = True)
df_eng.rename(columns={ df_eng.columns[5]: "Topic_4" }, inplace = True)
# Second step -- drop the column of the dominant topics
df_eng.drop('Dominant_Topic_NMF', axis=1, inplace=True)
# Third step -- create separate columns for title of newspapers and year of publication
df_eng['Year'] = df_eng['Articles']
df_eng['Newspaper'] = df_eng['Articles']
# Fourth step -- set years of publication as int variables
df_eng['Year']= df_eng['Year'].map(lambda x: str(x)[0:4])
df_eng['Year'].astype(int)
# Fifth step -- shorten the newspapers' names in the newspaper column
df_eng['Newspaper']= df_eng['Newspaper'].map(lambda x: str(x)[11:14])
# Sixth step -- sort years 
df_eng2 = df_eng.sort_values(by='Year',ascending=True)
# Display our reworked table
df_eng2.head()

Unnamed: 0,Articles,Topic_0,Topic_1,Topic_2,Topic_3,Topic_4,Year,Newspaper
0,2002-03-1-MajAustr-1-67-61.txt,0.734406,0.03105,0.052941,0.121928,0.059674,2002,ajA
76,2003-07-01-NYTimes-1-7-4.txt,0.759704,0.001919,0.067554,0.170823,0.0,2003,NYT
75,2004-03-15-MajAustr-1-67-30.txt,0.749237,0.000306,0.12506,0.091593,0.033804,2004,Maj
74,2004-10-11-TorontoS-1-5-3.txt,0.652717,0.105692,0.149433,0.092158,0.0,2004,Tor
73,2004-12-22-Independent-1-9-5.txt,0.344901,0.118249,0.104265,0.128737,0.303847,2004,Ind


We can take this data as a base for our network graph. A network graph is basically a graph of the relationships (or edges) between entities (or nodes). In our example, we want to understand the relationships between: 

  1. newspapers' articles and topics
  3. newspapers' articles
  
We can then rework our 'df_eng2' data in order to get the needed data for the network graph, and because we shall use two software in order to plot our data, we will create files that we can use with both of them. Let us first tailor our 'df_eng2' data with the needed information for our network graph. Here, we will: 

  1. fusion our columns 'Newspaper' and 'Year' into a new column 'News' to have short names of cases 
  2. drop the columns 'Year' and 'Newspaper' that we added in the frame of our lecture 8, in order to have more keys to sort our data
  3. drop the column 'Articles' which will be replaced by the column 'News'
  4. shift the column 'News' to the first column
  5. save the result to a . csv file (here a file called 'gephi-data.csv', because we will use it with the gephi software afterwards)

In [13]:
df_eng2["News"] = df_eng2["Year"].astype(str) + '-' + df_eng2["Newspaper"]
df_eng2 = df_eng2.drop('Articles', 1)
df_eng2 = df_eng2.drop('Year', 1)
df_eng2 = df_eng2.drop('Newspaper', 1)
first_column = df_eng2.pop('News')
df_eng2.insert(0, 'News', first_column)
df_eng2.head()
df_eng2.to_csv('gephi-data.csv', index=False)

  df_eng2 = df_eng2.drop('Articles', 1)
  df_eng2 = df_eng2.drop('Year', 1)
  df_eng2 = df_eng2.drop('Newspaper', 1)


### Parsing the 'gephi-data.csv' file

We parse the 'gephi-data.csv' file in order to get a file that our networking graph application can read. We are using first low-level programming tools in order to extract every relationships in our 'gephi-data.csv' file and to put them comma separated on one line. 

We then use a python script to save in the file 'texttopic.txt' the relationships between texts in the 'News' column and the topics.

Finally, we use low-level programming tools to make out of texttopic.txt an edge file, as well as a node file. Let us go to the code and explain it.

In [14]:
!awk -F',' '{$NF=""}1' gephi-data.csv > dt.csv # use awk to segment the gephi-data.csv and make a dt.csv data out of it
!sed -i "s/ /,/g" dt.csv # use sed to replace blank spaces with a comma 
!sed -i 's/.$//' dt.csv # use sed to remove the last comma at the end of each line

# Python script

csv = open("dt.csv") # open the dt.csv file
columns = csv.readline().strip().split(',')[1:] # read each line and split it at each comma, save the result in a variable columns
file=open("texttopic.txt", "w") # open a file texttopic.txt for writing in it
for line in csv: # iterate over each line and make a variable tokens and a variable row
    tokens = line.strip().split(',')
    row = tokens[0] 
    for column, cell in zip(columns,tokens[1:]): # iterate for each column and cell over columns and the second elements of tokens
        print ('{},{},{}'.format(row,column,cell)) # formate the result
        s = str('{},{},{}'.format(row,column,cell)) # save the result into a variable s
        file.write(s + "\n") # write the variable s to the file texttopic.txt, jump to next line for each iteration
file.close() # close the file texttopic.txt

# Create nodes and edges from texttopic.txt

!sed '/,0.0/d' texttopic.txt > ttedge_list.txt # use sed to delete null entries, and save the results as ttedge_list.txt
!awk -F',' ' { print $1 "," $1 } ' texttopic.txt | sort | uniq | sed 's/\.txt//' | awk -F',' ' {print $2 "," $1} ' > ttnode_list.txt # make a sorted list of node using the first column of texttopic.txt and save the nodes as ttnode_list.txt
!echo "id,Label" | cat - ttnode_list.txt > ttnode_list.csv # Add the line id,label as first line of the ttnode_list.txt file
!echo "Source,Target,Weight" | cat - ttedge_list.txt > ttedge_list.csv # Same as above with other labels added to ttedge_list.csv file

2002-ajA,Topic_0,0.7344061409198571
2002-ajA,Topic_1,0.0310502227922358
2002-ajA,Topic_2,0.0529413514249301
2002-ajA,Topic_3,0.121927838969488
2003-NYT,Topic_0,0.7597036666329167
2003-NYT,Topic_1,0.0019191491553409
2003-NYT,Topic_2,0.0675543861992488
2003-NYT,Topic_3,0.1708227980124935
2004-Maj,Topic_0,0.7492368149093654
2004-Maj,Topic_1,0.0003056282203393
2004-Maj,Topic_2,0.1250599243111435
2004-Maj,Topic_3,0.091593133651301
2004-Tor,Topic_0,0.6527171523621804
2004-Tor,Topic_1,0.1056924320489151
2004-Tor,Topic_2,0.1494326657312158
2004-Tor,Topic_3,0.0921577498576886
2004-Ind,Topic_0,0.3449014987246375
2004-Ind,Topic_1,0.1182492384763447
2004-Ind,Topic_2,0.1042652003722774
2004-Ind,Topic_3,0.1287369775072112
2005-Ind,Topic_0,0.0
2005-Ind,Topic_1,0.0188957965116325
2005-Ind,Topic_2,0.9811042034883676
2005-Ind,Topic_3,0.0
2005-Tor,Topic_0,0.9008021778597286
2005-Tor,Topic_1,0.0057570007156965
2005-Tor,Topic_2,0.0374773042990694
2005-Tor,Topic_3,0.0174409502742877
2006-ajA,Topic_0,0.89948

## Making our network graph in python

Now that we have our nodes and edges files, we can do a network graph with python and within this notebook, which is useful for a presentation from the notebook directly. In order to do this, we are using pyvis and also networkx in the background of pyvis, which enable us do display nice graphs -- in my view nicer than using networkx alone. If you don't have pyvis and networkx installed, you can install them with pip, like this: 

pip3 install pyvis networkx

In [15]:
from pyvis.network import Network as net
import networkx as nx

med_net = net(height='750px', width='75%', bgcolor='#222222', font_color='white', notebook = True, directed=False) # give some cosmetic parameters for the graph
med_net.barnes_hut() # use an algorythm for the shape of the graph
med_data = pd.read_csv('ttedge_list.csv') # take our ttedge_list.csv file to plot the graph

# Define the variables corresponding to the columns of the ttedge_list.csv file

sources = med_data['Source']
targets = med_data['Target']
weights = med_data['Weight']

# Zip the three variables above into a adge_data variable

edge_data = zip(sources, targets, weights)

# Iterate over the edge_data variable in order to plot the departure nodes (src), the arrival nodes (dst) and the edge between them (w); add them to the graph (med_net)

for e in edge_data:
    src = e[0]
    dst = e[1]
    w = e[2]

    med_net.add_node(src, src, title=src)
    med_net.add_node(dst, dst, title=dst)
    med_net.add_edge(src, dst, value=w)

neighbor_map = med_net.get_adj_list() # get a map of the neighbor -- the nodes which are near from oneanother

# add neighbor data to node on hover -- when you move the mouse on the node, you will see the nodes related to it
for node in med_net.nodes:
    node['title'] += ' Neighbors:<br>' + '<br>'.join(neighbor_map[node['id']])
    node['value'] = len(neighbor_map[node['id']])

#med_net.show_buttons(filter_=True) # if you want to use the buttons enabling to modify the parameters of the graph
med_net.show('med.html')

Now, you have a network graph of your topic model analysis, and it is an interactive one that you can personalize, and which have been saved as an independent html file (med.html). 

## Gephi -- standalone solution

Gephi is unfortunately no more actively developed, but there are plans to develop it further. In the meantime, you can download the free gephi software which gives a standalone solution to plot beautiful network graphs based on the files that we have generated from our texttopic.txt file. How do you do that? This is what I want to show you in our course, and do it with you in presence, instead of in this notebook. 