In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

reviews = pd.read_csv("../input/wine-reviews/winemag-data_first150k.csv", index_col=0)
reviews.head(3)

In [None]:
reviews['province'].value_counts().head(10).plot.bar()

**California** produces far more wine than any other province of the world!

In [None]:
(reviews['province'].value_counts().head(10) / len(reviews)).plot.bar()

**California** produces almost a third of wines reviewed in Wine Magazine!

**Bar charts are very flexible**: The height can represent anything, as long as it is a number. And each bar can represent anything, as long as it is a category.

**nominal** categories: "pure" categories that don't make a lot of sense to order.

**ordinal** categories: things that do make sense to compare, like earthquake magnitudes, housing complexes with certain numbers of apartments, and the sizes of bags of chips at your local deli.

**interval** variable goes beyond an ordinal categorical variable: it has a meaningful order, in the sense that we can quantify what the difference between two entries is itself an interval variable.

In [None]:
reviews['points'].value_counts().sort_index().plot.bar()

**Line charts**

What would we do if the magazine rated things 0-100? We'd have 100 different categories; simply too many to fit a bar in for each one!

In that case, instead of bar chart, we could use a line chart:

In [None]:
reviews['points'].value_counts().sort_index().plot.line()

Line charts work well for interval data. Bar charts don't—unless your ability to measure it is very limited, interval data will naturally vary by quite a lot.

In [None]:
reviews['points'].value_counts().sort_index().plot.area()

**Histograms**

In [None]:
reviews[reviews['price'] < 200]['price'].plot.hist()

A histogram looks, trivially, like a bar plot. And it basically is! In fact, a histogram is special kind of bar plot that splits your data into even intervals and displays how many rows are in each interval with bars. The only analytical difference is that instead of each bar representing a single value, it represents a range of values.

But they don't deal very well with skewed data:

In [None]:
reviews['price'].plot.hist()

In [None]:
reviews[reviews['price'] > 1500]

Histograms work best for interval variables without **skew**. They also work really well for ordinal categorical variables like points:

In [None]:
reviews['points'].plot.hist()

In [None]:
pd.set_option('max_columns', None)
pokemon = pd.read_csv("../input/pokemon/pokemon.csv")
pokemon.head(3)

The frequency of Pokemon by type:

In [None]:
pokemon['type1'].value_counts().plot.bar()

The frequency of Pokemon by HP stat total:

In [None]:
pokemon['hp'].value_counts().sort_index().plot.line()

The frequency of Pokemon by weight:

In [None]:
pokemon['weight_kg'].plot.hist()