# Advanced Data Visualization 

## Table of Contents

1. [**Introduction**](#Intro)
2. [**Pie Charts**](#pie)
3. [**Bubble Charts**](#bubble)
4. [**Maps**](#maps)
5. [**Contour Plots**](#conplt)
6. [**3D Plots**](#3dplt)

## 1. Introduction <a name="Intro"></a>

We used matplotlib, pandas, and seaborn libraries to create different types of common plots and charts. There are charts that need more effort to make using those libraries or might not be possible to create using those libraries. Here we look at some more advanced library, <font color='blue'>__Plotly__</font> and <font color='blue'>__Folium__</font>.

<font color='blue'>__Plotly__</font>'s Python graphing library is an interactive, open-source, and browser-based graphing library for Python. It enables Python users to create interactive web-based visualizations that can be displayed in Jupyter notebooks, saved to standalone HTML files, or served as part of pure Python-built web applications. The plotly Python library is sometimes referred to as "plotly.py" to differentiate it from the JavaScript library.

We are going to use different libraries and modules from these libraries in this notebook for different types of plots. 


In [None]:
#!pip install plotly

## 2. Pie Charts <a name="pie"></a>

A pie chart simply is a circular statistical chart, which is divided into sectors to illustrate numerical proportions.

Let's start with something simple:

In [None]:
import plotly.express as px  # importing plotly express

fig = px.pie(values=[10,20,15,45,80], names=['slice 1','slice 2','slice 3', 'slice 4','slice 5'])
fig.show()

Let's import the data available for energy generated using various renewable resources in the UK for different years in different regions. The data is stored at: https://raw.githubusercontent.com/MasoudMiM/ME_364/main/UK_Renewable_Energy/UKEnergy.csv

In [None]:
import pandas as pd
import numpy as np

url = 'https://raw.githubusercontent.com/MasoudMiM/ME_364/main/UK_Renewable_Energy/UKEnergy.csv'
dfUK = pd.read_csv(url)

# Dataset is now stored in a Pandas's Dataframe
dfUK.head()

Let's make a pie chart with the total wind energy generated in the UK every year from 2003 to 2015.

In [None]:
fig = px.pie(dfUK, values='Wind2', names='Year', title='Wind Energy Generated in the UK from 2003 to 2015 [GWh]')
fig.show()

How about the total energy generated in the UK at each region from 2003 to 2015?

In [None]:
fig = px.pie(dfUK, values='Total', names='Region', title='Total Energy Generated in the UK [GWh]')
fig.show()

We can make some customization in the pie chart such as using <font color='blue'>labels</font> to rename columns. For further tuning, we call <font color='blue'>fig.update_traces</font> to set the other parameters for the chart.

In [None]:
fig = px.pie(dfUK, values='Wind2', names='Year',
             title='Wind Energy Generated in the UK [GWh]',
             labels={'Wind2':'WIND','Year':'YEAR'})

fig.update_traces(textposition='inside', textinfo='percent+label')   # put percentages and years on each slice
fig.show()

What if you want to save the plot as an interactive plot and not as a png file?

In [None]:
fig = px.pie(dfUK, values='Wind2', names='Year',
             title='Wind Energy Generated in the UK [GWh]',
             labels={'Wind2':'Wind'})

fig.update_traces(textposition='inside', textinfo='percent+label')
fig.show()

fig.write_html("pieplot.html")          # save the plot as an html file

# only when using Colab so we can download the save plot
from google.colab import files 
files.download('pieplot.html')

<font color='red'>__Question (1)__</font>: Using the previously imported renewable energy data for the UK, create a pie chart representing the hydro power generated for each year as a percentage of total hydro power generated from 2003 to 2015. Rename "Hydro" to "Hydro Power". Show both the year and percent on the pie chart slices. Add some lines to the code to download the chart as an html file.

In [None]:
# In-Class Assignment


## 3. Bubble Charts <a name="bubble"></a>

A bubble chart is a scatter plot in which a third dimension of the data is shown through the size of markers.

I am going to use the 3D printer data set we saw before. Remember the parameters were: Layer Height (mm), Wall Thickness (mm), Infill Density (%), Infill Pattern (), Nozzle Temperature (C), Bed Temperature (C), Print Speed (mm/s), Material (), Fan Speed (mm/s), Roughness (µm), Tension (ultimate) Strength (MPa), Elongation (%) .

In [None]:
url = 'https://raw.githubusercontent.com/MasoudMiM/ME_364/main/3D_Printer_Data/3DPrinterDataset.csv'   # Link to the 3D printer data set
df3dprinter = pd.read_csv(url)

# Dataset is now stored in a Pandas's Dataframe
df3dprinter.head()

In [None]:
px.scatter(df3dprinter, x='roughness', y='tension_strenght', size='infill_density')

We can add another column of dataframe to the plot by adding a fourth dimension to the plot. Let's say we want to show the pattern used for the infill, whether it was honeycomb or grid.

In [None]:
px.scatter(df3dprinter, x='roughness', y='tension_strenght', size='infill_density', color='infill_pattern')

What if I want to increase the size of the bubbles?

In [None]:
px.scatter(df3dprinter, x='roughness', y='tension_strenght', size='infill_density', 
           color='infill_pattern', size_max=35)

You can even add more information from your data set to your graph using __hover_name__ parameter for the bubble plot. Here, we add the temperature of the nuzzle to the plot as the first number shown when we hover the mouse over that data point.

In [None]:
px.scatter(df3dprinter, x='roughness', y='tension_strenght',
           size='infill_density', color='infill_pattern', hover_name='nozzle_temperature', size_max=25)

Finally, let's take a look at how can we modify the figure size and labels for a plot. Here are some of the available options:

<font color='green'>__labels__</font>: dictionary with string keys and string values (default `{}`). The keys of this dictionary should correspond to column names, and the values should correspond to the desired label to be displayed.

<font color='green'>__title__</font>: The figure title

<font color='green'>__height__</font> and <font color='green'>__width__</font>: The figure height and width in pixels.

In [None]:
px.scatter(df3dprinter, x='roughness', y='tension_strenght', size='infill_density',
           color='infill_pattern', hover_name='nozzle_temperature',

           labels={'roughness':'Roughness (micrometer)','tension_strenght':'Tensile Strength (MPa)',
                   'infill_density':'Infil Density','infill_pattern':'Pattern'},
           
           title='3D Printer Data',

           width=600,
           height=400,

           size_max=10)

<font color='red'>__Question (2)__</font>: Write a code that generates a bubble chart, representing the tensile strength versus elongation with buble size showing the nuzzle temperature. Set a maximum size for the bubles and rename the three parameters to show their units.

In [None]:
# In-Class Assignment



## 4. Maps <a name="maps"></a>

Folium is a Python library that helps you create several types of maps. The fact that the Folium results are interactive makes this library very useful for dashboard building (what is a dashboard? https://www.klipfolio.com/resources/articles/what-is-data-dashboard). Generating the world map is straightforward in Folium. You simply create a Folium Map object and then you display it. Since Folium maps are interactive, you can zoom into any region of interest despite the initial zoom level. 

In [None]:
import folium               # importing the library

# define the world map
world_map = folium.Map()

# display world map
world_map

In [None]:
latitude = 43.2994
longitude = -74.2179
# define the world map centered around NYS with a zoom level of 7
nys_map = folium.Map(location=[latitude, longitude], zoom_start=7, tiles='OpenStreetMap')   # 'Stamen Terrain', 'Stamen Toner', or 'OpenStreetMap'

# display world map
nys_map

To show the capabilities of this package, we are going to use a dataset representing the locations of all the electric vehicle chargaing stations in New York. Let's superimpose the locations of EV charging stations in NYS on the map. You can find the information about the data here: https://data.ny.gov/Energy-Environment/Electric-Vehicle-Charging-Stations-in-New-York/7rrd-248n

In [None]:
import pandas as pd

df_nysev = pd.read_csv('https://raw.githubusercontent.com/MasoudMiM/ME_364/main/EVStations_NY/EVChargingStations_NY_Sep12.csv')

print('Dataset is downloaded and read into a pandas dataframe!')

Let's take a look at the data set:

In [None]:
df_nysev.head()

Let's create a map of new york and place one of these stations on the map using the latitude and longitude values given in the data set.

In [None]:
# creating the map object
nysev_map1 = folium.Map(location=[43.2994, -74.2179], zoom_start=7, tiles='Stamen Terrain')   # 'Stamen Toner', 'Stamen Terrain', 'OpenStreetMap'

# let's place a station in nys with lat=40.74709 and long=-73.98667 
folium.CircleMarker(location=([40.74709, -73.98667]),
                    radius=5, 
                    color='red', 
                    fill=True, 
                    fill_color='blue',
                    fill_opacity=0.6).add_to(nysev_map1)

# show the map with the station
nysev_map1

We can add some explanation to the data point on the map using the option <font color='blue'>popup</font>:

In [None]:
# Creating the map object
nysev_map2 = folium.Map(location=[43.2994, -74.2179], zoom_start=7, tiles='Stamen Terrain')   # 'Stamen Toner', 'Stamen Terrain', 'Mapbox Bright'

# let's place a station in nys with lat=40.74709 and long=-73.98667 
folium.CircleMarker(location=([40.74709, -73.98667]),
                    radius=5, 
                    color='red', 
                    fill=True,
                    fill_color='blue', 
                    fill_opacity=0.6,
                    popup='Central Parking - Tower 31'
                    ).add_to(nysev_map2)

# show the map with the station
nysev_map2

You could also use the method <font color='blue'>Marker</font> instead of <font color='blue'>CircleMarker</font>

In [None]:
# Creating the map object
nysev_map3 = folium.Map(location=[43.2994, -74.2179], zoom_start=7, tiles='Stamen Terrain')   # 'Stamen Toner', 'Stamen Terrain', 'Mapbox Bright'

# let's place a station in nys with lat=40.74709 and long=-73.98667 
folium.Marker(location=([40.74709, -73.98667]),
                    popup='Central Parking - Tower 31'
                    ).add_to(nysev_map3)

# show the map with the station
nysev_map3

We can use a for-loop to create a station map with all the EV stations using the latitudes and longitudes as well as their names as popups on the map:

In [None]:
# Creating the map object
nysev_map = folium.Map(location=[latitude, longitude], zoom_start=7, tiles='Stamen Terrain')   # 'Stamen Toner', 'Stamen Terrain', 'Mapbox Bright'

# loop through the stations
for lat, lng, label in zip(df_nysev['Latitude'], df_nysev['Longitude'], df_nysev['Station Name']):
        folium.CircleMarker(location=([lat, lng]),
            radius=5, # define how big you want the circle markers to be
            color='red',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6,
            popup=label
        ).add_to(nysev_map)

# add incidents to map
nysev_map

Finally, you can save the map as an HTML file using <font color='blue'>save</font> method. Also, since we are coding on Colab, we probably need to download the file after saving it.

In [None]:
nysev_map.save('NYSEV_map.html')

from google.colab import files # only when using Colab so we can download the map
files.download('NYSEV_map.html')

<font color='red'>__Question (3)__</font>: Using Folium, create a map of San Francisco with latitude=37.7749 and longitude=-122.4194. Use open street tiles and zoom level 11. Look up the latitude and longitude of San Francisco airport and represent that on the map using a marker. Use "SFO" as the label for popup.

In [None]:
# In-Class Assignment



<font color='orange'>Note 1</font>: There are other types of maps that you can create. One of the interesting one is Choropleth map. Here is a link to an example if you are interested to know more: https://towardsdatascience.com/choropleth-maps-with-folium-1a5b8bcdd392.

<font color='orange'>Note 2</font>: __plotly__ library also provides a set of tools for creating maps: https://plotly.com/python/maps/. Here is an example

In [None]:
import plotly.express as px # making sure that plotly express is imported

fig = px.scatter_geo(df_nysev, 
                     center={'lat':43.2994, 'lon':-74.2179},
                     lat=df_nysev['Latitude'], 
                     lon=df_nysev['Longitude'],
                     title='EV Stations in NY',
                     hover_name="City",
                     scope='usa',
                     projection="albers usa")
fig.show()

## 5. Contour Plots <a name="conplt"></a>

A 2D contour plot shows the contour lines of a 2D numerical array. Familair examples for us are heat and temperature distributions on a surface, stress distribution on a plate, and oscillations of a plate.

We can use <font color='blue'>contour</font> or <font color='blue'>contourf</font> in matplotlib library to plot contour plots.

In [None]:
import matplotlib.pylab as plt
import numpy as np

x = np.linspace(0, 10, 50)
y = np.linspace(0, 10, 50)
X, Y = np.meshgrid(x, y)                 # Create a 2D mesh of 50 by 50 points
Z = np.sin(X)*np.cos(Y)

fig=plt.figure( figsize=(15,7) )           # defining the figure object
fig.add_subplot(1,2,1)                   # adding the first subplot
plt.contour(X, Y, Z, colors='black')

fig.add_subplot(1,2,2)                    # adding the second subplot
plt.contourf(X, Y, Z, cmap='RdGy')        # RdGy: short for Red-Gray colormap

<font color='orange'>Note</font>: I used the option <font color='blue'>cmap</font> to define the type of colormap for each contour. Here you can see a full list of possible options for cmap: https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html

You can add labels to the lines on a contour plot as well as adding a colorbar. Also, you can decide about the number of intervals within the data range.

In [None]:
x = np.linspace(0, 2, 50)
y = np.linspace(0, 2, 50)
X, Y = np.meshgrid(x, y)                                     # Create a 2D mesh of 50 by 50 points
Z = np.sin(X)*np.cos(Y)

plt.figure(figsize=(10,8))
contourplt=plt.contourf(X, Y, Z, 10, cmap='RdGy')            # 10 equally spaced intervals within the data range
plt.clabel(contourplt,colors='g',inline=True, fontsize=20)   # Add data to the lines on the plot
plt.colorbar()                                               # Add the colorbar
plt.xlabel('X')
plt.ylabel('Y')

If you want to save the plot, the approach is the same as a typical plot created using matplotlib and you can add the following lines to the previous code block:
```python
plt.savefig('[figure name].png',dpi=200,bbox_inches='tight')
# only when using Colab so we can download the saved plot
from google.colab import files 
files.download('[figure name].png')
```

## 6. 3D Plots <a name="3dplt"></a>

Three dimensional plots are mainly used when we are trying to look into possible relationships between three different features. However, 3D plots are less common in data analysis since they can rarely provide an easy-to-interpret visualization of the data. 

Here, we are going to take a look at two types of 3D plots: (1) 3D scatter plot and (2) 3D surface plot. We will use <font color='blue'>plotly</font> library but keep in mind that <font color='blue'>matplotlib</font> can also be used to create such plots. For 3D examples using <font color='blue'>matplotlib</font>, you can take a look at this link: https://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html

<font color='green'>__(1) 3D Scatter Plot__:</font>

Let's use the 3D printer data set to get to know how to plot such plots:

In [None]:
# just make sure that the data is imported
url = 'https://raw.githubusercontent.com/MasoudMiM/ME_364/main/3D_Printer_Data/3DPrinterDataset.csv'   # Link to the 3D printer data set
df3dprinter = pd.read_csv(url)

# Dataset is now stored in a Pandas's Dataframe
df3dprinter.head()

In [None]:
import plotly.express as px  # make sure plotly express is imported as px

fig = px.scatter_3d(df3dprinter, x='roughness', y='tension_strenght', z='infill_density')
fig.show()

Let's add the type of infill, as the 4$^{th}$ dimension, to the data visualization along with the data for roughness, tension strength, and infill density.

In [None]:
fig = px.scatter_3d(df3dprinter, x='roughness', y='tension_strenght', z='infill_density', color='infill_pattern')
fig.show()

You can set the title, labels of the axes, and figure height and width:

In [None]:
fig = px.scatter_3d(df3dprinter, x='roughness', y='tension_strenght', z='infill_density', color='infill_pattern',
                    title='3D printer Data',
                    labels={
                        "roughness": "Roughness (microm)",
                        "tension_strenght": "Tensile Strength (MPa)",
                        "infill_density": "Infill Density (%)",
                        "infill_pattern":"Infill Pattern"},
                    width=1000, height=800)
fig.show()

In case you want to play around with the fonts, title, and other features in the plot, here is a link to look into: https://plotly.com/python/figure-labels/

Also, you can save and download the plot in HTML format the same way you could save all the plots created by plotly library. Easily add the following lines to the end of the code:
```python
fig.write_html("[figure name].html")
# only when using Colab so we can download the saved plot
from google.colab import files 
files.download('[figure name].html')
```

There are more ways you can plot this type of 3D scatter plots with other possible options. You can see a few of those options at this link: https://plotly.com/python/3d-scatter-plots/

<font color='red'>__Question (4)__</font>: Write a code that generates a 3D scatter plot, representing tensile strength, elongation, and nuzzle temperature. Change the labels so they also represent correct units along with the names of the variables. Set the width and height for the figure and save and download it. 

In [None]:
# In-Class Assignment


<font color='green'>__(2) 3D Surface Plot__:</font>

For this visulization, let's import `graph_objects` module within <font color='blue'>plotly</font> library. Further, we use a data set for temperature distribution in a 2D plate, given the temperature of four sides.

The plate has the dimension of $1~m\times1~m$ and the measurements are done on a 30 by 30 grid point, equally spaced in each direction. The temperature of the top side is $50^\circ~$C, the bottom side is $10^\circ~$C, and two other sides are at $0^\circ~$C.



In [None]:
import plotly.graph_objects as go # importing the module from the library

In [None]:
# Importing temp distribution on a 30 by 30 grid point measurements
url = 'https://raw.githubusercontent.com/MasoudMiM/ME_364/main/2D_HeatTransfer/TempDis.csv'   # Link to the dataset
dfTempDis = pd.read_csv(url,header=None)

dfTempDis.head(10)

In [None]:
zdata = dfTempDis                 # Temp distribution
xdata = np.linspace(0,1,30)
ydata = np.linspace(0,1,30)

fig = go.Figure()
fig.add_surface( x=xdata, y=ydata, z=zdata )
fig.show()

Let's add some features and refine the plot. Since we used `graph_objects` module, we can use `update_layout` to make the changes.

In [None]:
fig=go.Figure()                                                        # Create the figure object
fig.add_surface(z=zdata, x=xdata, y=ydata, colorbar={'title':'Temp (C)'} )

fig.update_layout(scene = dict(
                    xaxis_title='X (m)',
                    yaxis_title='Y (m)',
                    zaxis_title='Temperature (C)'),
                  width=700, height=600)                               # Dimensions of the figure

fig.show()

There is much more you can do regarding the axes and other properties of the figure. Here is a good list of some possible options: https://plotly.com/python/3d-axes/