# Pandas Built-in Data Visualization

In this notebook, we will learn about pandas built-in capabilities for data visualization! It's built-off of matplotlib, but it baked into pandas for easier usage!  


In [3]:
!pip install matplotlib



In [4]:
import pandas as pd
import matplotlib

In [5]:
## The Data
df = pd.read_csv('country_vaccinations.csv')
df.head()

Unnamed: 0,country,iso_code,date,total_vaccinations,people_vaccinated,people_fully_vaccinated,daily_vaccinations_raw,daily_vaccinations,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,daily_vaccinations_per_million,vaccines,source_name,source_website
0,Argentina,ARG,2020-12-29,700.0,,,,,0.0,,,,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
1,Argentina,ARG,2020-12-30,,,,,15656.0,,,,346.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
2,Argentina,ARG,2020-12-31,32013.0,,,,15656.0,0.07,,,346.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
3,Argentina,ARG,2021-01-01,,,,,11070.0,,,,245.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
4,Argentina,ARG,2021-01-02,,,,,8776.0,,,,194.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...


First, you need to set up your Jupyter Notebook to display plots with the %matplotlib inline command, which sets up your Jupyter Notebook for displaying the plot in the notebook itself:

In [6]:
%matplotlib inline

.plot() returns a line graph containing data from every row in the DataFrame. You can specify the x-axis and y-axis values manually as followed:



If you don’t provide a parameter to .plot(), then it creates a line plot with the index on the x-axis and all the numeric columns on the y-axis. 

In [None]:
df.plot()

In [None]:
#let set index to be date and see all the numeric trend
df.set_index('date').plot(title='Covid19 Vaccination',figsize =(10,10))

Let's see the number of people that are vaccinated by each date:

In [None]:
df.plot(x='date',y='people_vaccinated',title='Number of People Vaccinated')

In [None]:
df.plot(x='date',y='people_vaccinated',title='Number of People Vaccinated',figsize =(10,10))

# Pop quiz

In [None]:
#Practice 
#Let's look at trend in US

What if we want to see what vaccines get used more frequently across countries in each date?

In [None]:
daily_vaccinations_vaccines=pd.pivot_table(df, values='daily_vaccinations',index='date',columns='vaccines')
daily_vaccinations_vaccines

In [None]:
# let's see which vaccines usage trend line across countries in the data
daily_vaccinations_vaccines.plot(figsize =(20,20),title='Number of daily vaccination')

In [None]:
df[df['vaccines']=='CNBG, Sinovac']

Since there are so many countries in the data frame, let us just focus on UK, US and China and explora more!

In [None]:
us_uk_china_canada=df.loc[df.country.isin(['United States','United Kingdom','China','Canada'])]


# Plot Types

.plot() has several optional parameters. Most notably, the kind parameter accepts eleven different string values and determines which kind of plot you’ll create:

- "area" is for area plots.

- "bar" is for vertical bar charts.

- "barh" is for horizontal bar charts.

- "box" is for box plots.

- "hexbin" is for hexbin plots.

- "hist" is for histograms.

- "kde" is for kernel density estimate charts.

- "density" is an alias for "kde".

- "line" is for line graphs.

- "pie" is for pie charts.

- "scatter" is for scatter plots.


In [None]:
us_uk_china_canada_daily_vaccinations=pd.pivot_table(us_uk_china_canada, values='daily_vaccinations',columns='country' ,index='date')
us_uk_china_canada_daily_vaccinations

## Bar charts


In [None]:
us_uk_china_canada_daily_vaccinations.plot(kind='bar',figsize =(20,20),title='Number of daily vaccination in USA, UK,Canada and China - Bar Chart')


In [None]:
#we can also create a stacked bar chart
us_uk_china_canada_daily_vaccinations.plot(kind='bar',figsize =(20,20),title='Number of daily vaccination in USA, UK and China - Stacked Bar Chart',stacked = True)


## Area chart

An area chart combines the line chart and bar chart to show how one or more groups' numeric values change over the progression of a second variable, typically that of time. An area chart is distinguished from a line chart by the addition of shading between lines and a baseline, like in a bar chart.

In [None]:
us_uk_china_canada_daily_vaccinations.plot(kind='area',figsize =(20,20),title='Number of daily vaccination in USA, UK and China - Area Chart')

## BoxPlots

The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2). The whiskers extend from the edges of box to show the range of the data. By default, they extend no more than 1.5 * IQR (IQR = Q3 - Q1) from the edges of the box, ending at the farthest data point within that interval. Outliers are plotted as separate dots.

In [None]:
us_uk_china_canada_daily_vaccinations.plot(kind='box',figsize =(20,20),title='Number of daily vaccination in USA, UK and China - Box Plot')

# Scatter Plot


Scatter plots are used when you want to show the relationship between two variables. Scatter plots are sometimes called correlation plots because they show how two variables are correlated.

let's plot the daily vaccination versus the total vaccinations which we are expect to be positively correlated:

In [None]:
us_uk_china_canada.plot(kind='scatter',x ='daily_vaccinations', y ='total_vaccinations')