# Data Viz Basics

This notebook is used for generating examples of data viz basics

In [None]:
import plotly.express as px
import pandas as pd

In [None]:
# Load Plotly dataset of Gapminder data
df = px.data.gapminder()
df.head()

### Scatter Plots

Scatter plots are excellent ways to compare data that is discrete, i.e. non-continuous. If there's no obvious expectation that the data may flow from one value to the next on the x-axis (i.e. time), relationships between the data sources can be visualized by matching data.

In the below image, it's clear that there's a moderately strong positive relationship between life expectancy and GDP per capita, at least in G7 countries. This same data could be shown over time, in which case a temporal component could be included and a casual relationship could be inferred, but regardless of *when* the data was collected, the relationship generally holds.

In [None]:
# Scatter plot
g7 = ['Canada', 'United States', 'United Kingdom', 'Germany', 'France', 'Italy', 'Japan']
fig = px.scatter(df[df['country'].isin(g7)], x='gdpPercap', y='lifeExp', color = 'country',
           width=500, height=400, title='G7 life expectancy as a result of GDP',
           labels={'lifeExp':'Life Expectancy (years)', 'gdpPercap':'GDP per capita (USD)', 'country':'Country'})
fig.show()

### Line Plots
Similar to scatter plots, line plots can connect sets of data, only with the implicit assumption that the data is *continuous*, or connected from one value to the next. This is obvious when the independent variable is some measure of **time**, but less obvious when it's another factor.

Though the plotted data is ostentibly discrete, as in we only have measurements for each year, the implication of the line plot is that the trend is maintained *between* the data points as well, as GDP can be measured at any given time. Though the data could just as easily be plotted as a scatter plot, the line plot implies the relationship holds even between data points, and is (nearly) always more appropriate when the X axis is time.

In [None]:
# Line plot
fig = px.line(df[df['country'].isin(g7)], x='year', y='lifeExp', color = 'country',
           width=500, height=400, title='G7 life expectancy over time',
           labels={'lifeExp':'Life Expectancy', 'year':'Year', 'country':'Country'})
fig.show()

In [None]:
df['country'].unique()