# Visualization Technique

## Bubble Plots - what are they?

Bubble plots are a more "advanced" version of the well-known scatter plot.
Here is an example of a scatter plot.

![scatter plot example](https://chartio.com/assets/2477fa/tutorials/charts/scatter-plots/848a5c96881e2e6f1387e74570e16e1fad2559ed65b83aec2a66b7bb86332275/scatter-plot-example-1.png)

As you can see, scatter plots visually tell us the correlation / relationship between two variables, the independent variable on the x-axis and the dependent one on the y-axis. Whenever we have a piece of data that belongs in those axis, we plot them as a dot as shown. Now let's see what a bubble plot is compared to the scatter plot.

![bubble plot example](https://datavizcatalogue.com/methods/images/top_images/bubble_chart.png)

Bubble plots show us not only the x and y dimension of data, but also the third dimension z, which is the weight of those data points, represented by the size of the circle as shown in the example above. Bubble plots utilize the Cartesian plane to show the relationships between the circles. Unlike scatter plots, the plotted points in bubble plots represent assigned labels or categories, and sometimes different colors are also used to distinguish between them. The overall plot can be used to determine patterns and trends in the data.

## When to use it

Bubble plots are designed to visually convey three or four dimensions of data. It is a good idea to use bubble plots when you have such data that you want to convey all information graphically all at once. Moreover, as it is fundamentally a scatter plot with different plot sizes, it is also good to use it to convey relationships and correlation between those data points as well.

## When not to use it

The first mistake for choosing a bubble plot to display data is when such data can be explained better in terms of a pie chart. This means that if your data contains more dimensions than four, it is not advisable to crunch your data graphically into a bubble plot. The result of this can be confusion and incorrect analysis of data. Moreover, if many bubbles are included in the plot, this may overwhelm the audience, and so bubble plots usually have limited data size capacity.

# Visualization Library

## Seaborn

For this task, Seaborn is chosen as the library used to display our bubble plot. As quoted from their website:

```"Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics."```

It is built on top of matplotlib and integrates very well with the Pandas library, which will be used to store tabular data. Seaborn allows declarative API that allows the user to control the elements of the plots with minimal setup, producing beautiful plots that are usually enough for people to interpret data. The library is developed by Michael Waskom in 2014 and is open-source. It includes built-in themes for styling informative data plots. 

The reason this library is chosen is because it provides easy-to-use APIs that integrate well with Pandas which produce results that are beautiful with not much effort to style the graph. It also integrates with Jupyter, showing graphs in-line.

## Installation

To install, simply run the following command from your favorite package manager

If you use PyPI: ```pip install seaborn```

If you use Anaconda: ```conda install seaborn```

## Dependencies

Python 3.6+

## Required dependencies

If not already present, these libraries will be downloaded when you install seaborn.

- numpy

- scipy

- pandas

- matplotlib

## Importing the library

In [10]:
import seaborn as sns

# Dataset

## Selecting Dataset

Before we begin, a bubble plot needs at least 3 or 4 dimensions to display data visually. I have selected the following dataset for this task.

https://www.gapminder.org/data/

The above site contains dataset for populations of each country and many more information related to the population. For the sake of this demo, we would like to graphically display the following information, and analyze if there is a relationship between the data. We would like to explore if there is a relationship between income (gdp per capita) and time. In addition, we would also like to explore if there is a trend between income and life expectancy. Graphically, we will convert those data to a bubble plot to visually understand those data.

Here is what we need:

1. Current year (x-data, independent variable)
2. Income of that year, inflation-adjusted (y-data, dependent variable)
3. Number of years of life expectancy (size of bubbles, secondary dependent variable)

## Retrieving the raw data

In [16]:
import pandas as pd

csv_income = pd.read_csv('income_per_person_gdppercapita_ppp_inflation_adjusted.csv')
csv_income.head()

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2031,2032,2033,2034,2035,2036,2037,2038,2039,2040
0,Afghanistan,603,603,603,603,603,603,603,603,603,...,2550,2600,2660,2710,2770,2820,2880,2940,3000,3060
1,Albania,667,667,667,667,667,668,668,668,668,...,19400,19800,20200,20600,21000,21500,21900,22300,22800,23300
2,Algeria,715,716,717,718,719,720,721,722,723,...,14300,14600,14900,15200,15500,15800,16100,16500,16800,17100
3,Andorra,1200,1200,1200,1200,1210,1210,1210,1210,1220,...,73600,75100,76700,78300,79900,81500,83100,84800,86500,88300
4,Angola,618,620,623,626,628,631,634,637,640,...,6110,6230,6350,6480,6610,6750,6880,7020,7170,7310
