### Introduction to Data Science – Lecture 15 – More Visualization Examples
*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/*

Since we have time, let's try some parallel coordinates and look at a few more tools.

## Parallel Coordinates

As seen in our slides, parallel coordinates are available. Let's try a couple of them. First, we'll read in our datasets.

In [4]:
import pandas as pd
import numpy as np

pd_movies = pd.read_csv('movies.csv')
pd_movies.head()

FileNotFoundError: [Errno 2] No such file or directory: 'movies.csv'

In [3]:
penguins = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv')

# Clean dataset and put it in more similar size units for the following plots only
penguins = penguins.dropna()
penguins.head()

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,MALE
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,FEMALE
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,FEMALE
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,FEMALE
5,Adelie,Torgersen,39.3,20.6,190.0,3650.0,MALE


Let's try the default parallel coordinates through pandas:

In [None]:
pd.plotting.parallel_coordinates(
    penguins, 'species', cols=['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g'], color=('#556270', '#4ECDC4', '#C7F464')
)

Once again, `body_mass_g` dominates due to scales. Let's even it out:

In [None]:
# Okay, since the pandas parallel coordinate plot shares scales (boohoo), let's do this transform again.
if 'body_mass_g' in penguins.columns:
      penguins['body_mass_100g'] = penguins['body_mass_g'] * 0.01
pd.plotting.parallel_coordinates(
    penguins, 'species', cols=['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_100g'], color=('#556270', '#4ECDC4', '#C7F464')
)

Other parallel coordinate implementations use different ranges for each scale. It means a bit more in terms of ticks and labeling, but it's much more powerful.

Let's try it with Plot.ly instead:

In [None]:
import plotly.express as px

penguins["species_num"] = penguins["species"].map({'Adelie':1, 'Chinstrap':2, 'Gentoo':3})

fig = px.parallel_coordinates(penguins, color="species_num", labels={"species_num": "Species",
                "bill_length_mm": "Bill Length", "bill_depth_mm": "Bill Depth",
                "flipper_length_mm": "Flipper Length", "body_mass_g": "Mass", },
                             color_continuous_scale=px.colors.diverging.Tealrose,
                             color_continuous_midpoint=2)
fig.show()

# PyGWalker

[PyGWalker](https://github.com/Kanaries/pygwalker) is a GUI, like [Tableau Public](https://public.tableau.com/app/discover), for visual analysis.

You'll have to install it:
`pip install pygwalker`

In [None]:
import pygwalker as pyg
gwalker = pyg.walk(pd_movies)