# Data Analysis

## Building Dashboards Plotly and Streamlit

![](https://images.unsplash.com/photo-1573512443418-c6768862dda7?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1567&q=80)

When it comes to data visualization, most data scientists and data analysts are familiar with Matplotlib and Seaborn in their Jupyter Notebooks. But it is often not enough to statisfy production needs. You can hardly use a matplotlib PNG file on a dashboard, and it's even harder to share your data visualization through your jupyter notebook.

For this reason, several libraries dedicated to interactive data visualizations and sharing them have been developped over time. Today, we will cover:
- **Plotly.py**, a library to build flexible interactive charts : https://plotly.com/python
- ***Streamlit***, a framework to create a dashboard and that allows you to deploy your work on a dedicated application in just a few seconds

[Here is an example of an application you will be able to build using the technologies mentioned. It's a COVID-19 dashboard of the situation of the virus in Senegal.](https://storage.googleapis.com/schoolofdata-images/Data-Analysis.Dashboard-Bokeh-Altair-Streamlit/demo.mp4)

# 2. What can Plotly do?

> ⚙️ To install Plotly : `pip install plotly`

Plotly has a module called **Plotly Express** which lets you interact with the library is a really simple manner. Let's use the iris dataframe and plot several kinds of graphs:

In [1]:
import plotly.express as px
df = px.data.iris()
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,species_id
0,5.1,3.5,1.4,0.2,setosa,1
1,4.9,3.0,1.4,0.2,setosa,1
2,4.7,3.2,1.3,0.2,setosa,1
3,4.6,3.1,1.5,0.2,setosa,1
4,5.0,3.6,1.4,0.2,setosa,1


## 2.1. Scatterplot
We use the function `scatter` from Plotly Express, pass the data source and the columns to use as x and y.

In [2]:
fig = px.scatter(df, x="sepal_width", y="sepal_length")
fig.show()

<p align="center">
<img src="https://drive.google.com/thumbnail?id=1mSmBMtmuF_GX-gNW9-3qWeIe4Ngk-TFd&sz=w1000">
</p>

Notice how by hovering on a point you can get by default all information on that point.

We can then improve this scatterplot by adding colors to data points based on the species, changing the size of the dots and displaying additional data on hover:

In [3]:
fig = px.scatter(df, x="sepal_width", y="sepal_length", 
                 color="species", size='petal_length', 
                 hover_data=['petal_width'])
fig.show()

<p align="center">
<img src="https://drive.google.com/thumbnail?id=1NiLxur1iJr-4JFt2YiHezi0i9EJoP38q&sz=w1000">
</p>

We can further improve this chart by adding histograms and box-plots describing the X and the Y-axis.

In [5]:
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species", 
                 size='petal_length', hover_data=['petal_width'], 
                 marginal_y="box", marginal_x="histogram")
fig.show()

<p align="center">
<img src="https://drive.google.com/thumbnail?id=1fmsC0E4h6WFgfEGAEih5_r5DEz2hEij5&sz=w1000">
</p>

Finally, one can add OLS trend-lines as a simple argument of Plotly Express:

In [7]:
fig = px.scatter(df, x="sepal_width", y="sepal_length", 
                 color="species", size='petal_length', 
                 hover_data=['petal_width'], trendline="ols", 
                 marginal_y="box", marginal_x="histogram")
fig.show()

<p align="center">
<img src="https://drive.google.com/thumbnail?id=1JtGTski8KwCs6o5sb_BEQcvb_p4fX9do&sz=w1000">
</p>

And this is how we can generate in a single line of code an interactive and powerful chart, which would be much harder to generate with any other library.

## 2.2. Histogram

You can build a simple histogram with `px.histogram` function.

In [8]:
fig = px.histogram(df, x="sepal_width")
fig

<p align="center">
<img src="https://drive.google.com/thumbnail?id=1w7qfSPGfGqtJnAm5t5X8e4-oo9AVwXNr&sz=w1000">
</p>

## 2.3. Violin plot

Alternatively, use the function `px.violin` to display the distribution and box plots of the sepal_width given species.

In [10]:
fig = px.violin(df, y="sepal_width", color="species_id", 
                box=True, points="all", hover_data=df.columns)
fig.show()

<p align="center">
<img src="https://drive.google.com/thumbnail?id=1KSNWSv-HQJFK4CBsmYdzCBNcuHmuq6-L&sz=w1000">
</p>

## 2.4. Pie charts

This pie chart represents the proportion of data in each class. We just apply a groupby on the dataframe in order to obtain this information.

In [12]:
fig = px.pie(df.groupby('species_id').sum().reset_index(), 
             values='sepal_length', names='species_id', 
             title='Total sepal length by species')
fig.show()

<p align="center">
<img src="https://drive.google.com/thumbnail?id=14eqAK24xbLmVleitvCTLk6050QJo2iT7&sz=w1000">
</p> 

## 2.5. Parallel coordinates

Visualizing high-dimensionality data is not always easy. Therefore, parallel coordinates allow a quick trend-visualization according to a reference label.

In [14]:
fig = px.parallel_coordinates(df, color="species_id",
                              color_continuous_scale=px.colors.diverging.Tealrose, 
                              color_continuous_midpoint=2)
fig.show()

<p align="center">
<img src="https://drive.google.com/thumbnail?id=1SLISNvdzre-umkmVB6WK6ehSB8XxK67e&sz=w1000">
</p>

## 2.6. 3D plots

You can build 3D plots in a single line of code using `scatter_3d` and specifying a z axis. Here, we also apply a template called plotly_dark to enable the dark background.

In [15]:
fig = px.scatter_3d(df, x='sepal_length', y='sepal_width', 
                    z='petal_width', color='species')
fig.show()

<p align="center">
<img src="https://drive.google.com/thumbnail?id=1ObQ-kDuWEVFWF6JMhG7-TE5G02y2MHRL&sz=w1000" style="width:90%">
</p>

## 2.7. Maps


One of the big advantages of Plotly is how easy it is to deal with maps. It relies on Mapbox, an interactive map service that offers high quality and highly customizable maps. There are free plans as well as paying ones. 
To create a Token, you just need to create an account here: https://account.mapbox.com/auth/signup/
        
An access token will be given to you. You just need to replace the token mentioned below by yours:

In [21]:
px.set_mapbox_access_token("YOUR_TOKEN")

df3 = px.data.carshare()
# Stats 
df3.describe()

Unnamed: 0,centroid_lat,centroid_lon,car_hours,peak_hour
count,249.0,249.0,249.0,249.0
mean,45.523417,-73.591834,1092.528782,8.787149
std,0.035177,0.033098,572.187677,7.223874
min,45.448903,-73.738946,33.25,0.0
25%,45.497804,-73.618625,665.583333,3.0
50%,45.527905,-73.587318,1020.916667,5.0
75%,45.546145,-73.570955,1414.916667,15.0
max,45.610879,-73.51246,3274.0,23.0


In [31]:
import plotly.express as px
df = px.data.carshare()
%matplotlib inline
fig = px.scatter_mapbox(df3, lat="centroid_lat", lon="centroid_lon", color="peak_hour", size="car_hours",
                  color_continuous_scale=px.colors.cyclical.IceFire, size_max=15, zoom=10)
fig.show()

<p align="center">
<img src="https://drive.google.com/thumbnail?id=1gPK5ArB9D3uj8EyL_trlSQsQ-gLqfXKU&sz=w1000" style="width:90%">
</p>

---

# 3. Streamlit

## 3.1. What is Streamlit?

**Streamlit** is a new technology that was launched late 2019. It is a light web application framework, meant for prototyping ML and data visualization tools, in just a few lines of code.

Until now, Flask was the easiest way to get started with web applications in Python. Although way more flexible, it is heavier than Streamlit. **Streamlit lets you create really light 1-page static applications, with interactive buttons.**

⚙️ Install it using: `pip install streamlit`

## 3.2. Streamlit Basics

Open a `.py` file, name it `app.py`, and import streamlit.

In [None]:
import streamlit as st

You usually will want to add:
- titles
- text
- text or integer inputs
- make an algorithm predict from these text or integer inputs
- display dataframes

To start, we will add a title to the application:

In [None]:
st.title("Data Visualization web application")

To see what you app looks like, execute from the command line:
    
```bash
streamlit run app.py
```

You app will automatically launch on: [http://localhost:8501/](http://localhost:8501/)

<p align="center">
<img src="https://drive.google.com/thumbnail?id=1RQhjDbGmfonLXVr0BkBJH5aPacF0HWvN&sz=w1000">
</p>

With Streamlit, the logic is the following:
- **to add a title**, use: `st.title()`
- **to add a header**, use: `st.header()`
- **to add text**, use: `st.write()`
- **to add markdown**, use: `st.markdown()`

In [None]:
st.header("Part 1: Data Exploration")
st.write("In this section, we will explore the Altair cars dataset.")
st.markdown("*Further resources [here](https://altair-viz.github.io/gallery/selection_histogram.html)*")

To update the app, simply reload the page.

<p align="center">
<img src="https://drive.google.com/thumbnail?id=1Wflo0b7rSiCvp-bGkU0nKDOXtk8rZ7yZ&sz=w1000">
</p>

The aim of Streamlit is to make the end-user interact with your data visualizations or your ML models. We will first see how one can type text inputs or adjust parameters using a slider for example:
- **to add a slider**, use: `st.slider("Slider title", 0, 100, 50)`
- **to add a checkbox**, use: `st.checkbox("Checkbox title", ["Add a constant", "Add beta 1", "Add beta 2"])`
- **to add a radio button**, use: `st.radio("Radio title", ["Yes", "No"])`
- **to add a text input**, use: `st.text_input("Type here")`
- **to add a text area**, use: `st.text_area("Type here")`
- **to add a button**, use: `st.button("Button name")`

Note, you can add text, sliders and much more in the sidebar using:
`st.sidebar.button` or `st.sidebar.markdown` for example.

In [None]:
slider = st.slider("Slider title", 0, 100, 50)
check = st.checkbox("Checkbox title", ["Add a constant", "Add beta 1", "Add beta 2"])
radio = st.radio("Radio title", ["Yes", "No"])
txt = st.text_input("Type here")
txt_area = st.text_area("Type here")
button = st.button("Button name")

<p align="center">
<img src="https://drive.google.com/thumbnail?id=1uWJbBT4FkuU2Tsplt5i7xRC6yHfomeiE&sz=w1000">
</p> 

These inputs and parameters can be used as inputs for a model for example. You can use the button with an "if" statement to launch a piece of code.

In [None]:
if st.button("Click to launch"):
    execute_code

## 3.3. Streamlit for Data Viz

So for, we mostly presented Streamlit capabilities as a static application with potential applications for ML. But streamlit has a lot of connectors with data visualization libraries. Let's work again with the cars dataset. We will first present the interaction with Matplotlib.

In [None]:
from vega_datasets import data
import matplotlib.pyplot as plt

source = data.cars()

In [None]:
st.header("Visualization")

st.subheader("Matplotlib")

plt.figure(figsize=(12,8))
plt.scatter(source['Horsepower'], source['Miles_per_Gallon'])
st.pyplot(plt)

This visualization is now static. We can do better using Altair.

In [None]:
import altair as alt

brush = alt.selection(type='interval')

points = alt.Chart(source).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color=alt.condition(brush, 'Origin:N', alt.value('lightgray'))
).add_selection(
    brush
)

bars = alt.Chart(source).mark_bar().encode(
    y='Origin:N',
    color='Origin:N',
    x='count(Origin):Q'
).transform_filter(
    brush
)

st.altair_chart(points & bars)

Finally, we can add a simple scatter plot made with Bokeh on the same data:

In [None]:
from bokeh.plotting import figure

p = figure()
p.circle(source['Horsepower'], source['Miles_per_Gallon'])
st.bokeh_chart(p)

Alright, satisfied with our visualization? It's time to share it with your colleagues. 

# 4. Deploy your app with Streamlit Cloud

Streamlit also provide the possibility to deploy your application in their cloud. It is an extremely easy-to-use service that allows you to deploy the content from a GitHub repository.

## 4.1. Build the app

**First of all create a directory to store the app.py file**.

```bash
mkdir my_streamlit_app
```

Then put in the `app.py` file with the script of your app in the new directory.

Finaly, create a `requirements.txt` file in the directory. This file lists all the packages and their versions that are needed to run the application.

⚙️ To create the requirements.txt file, first install pipreqs: `pip install pipreqs`

Then, go to the root of your app folder, and simply run:

```bash
pipreqs
```

You will now have a requirements.txt file in your repo.

## 4.2 Push your app on GitHub

You will need to have the content of your app in a Github repository.

First of all, init git at **the root of your app folder** and commit your files:

```bash 
git init -b main
git add --all
git commit -m "my first commit"
```

Then, create a repository on GitHub : https://github.com/new and copy the ssh link of the repository.

![](https://itknowledgeexchange.techtarget.com/coffee-talk/files/2022/01/github-key-ssh-url-clone.jpg)


Add this link to your local Git folder:

```bash
git remote add origin {remote repository ssh link}
```

Finally, push your commit.

```bash
git push origin main
```


## 5.2. Deployment with Streamlit Cloud

First, go to: https://streamlit.io/cloud and **sign-up with your GitHub account**. In your account clic on **new app button**, select your repository, the branch (`main`) and the name of the app script (`app.py`) 🚀.

> ⚠️ Hint: if there are errors during deployement, read the error, generaly it is just a error with a packages, version, change the requirements.txt file, commit changes, push on GitHub and retry to deploy !