# Introduction

In the last tutorial, we learned how to select relevant data out of a DataFrame or Series. Plucking the right data out of our data representation is critical to getting work done, as we demonstrated in the exercises.

However, the data does not always come out of memory in the format we want it in right out of the bat. Sometimes we have to do some more work ourselves to reformat it for the task at hand.  This tutorial will cover different operations we can apply to our data to get the input "just right". 

We'll use the Wine Magazine data for demonstration.

In [1]:
import pandas as pd
import numpy as np
reviews = pd.read_csv("data\winemag-data-130k-v2.csv", index_col=0)

In [None]:
reviews

# Summary functions

Pandas provides many simple "summary functions" (not an official name) which restructure the data in some useful way. For example, consider the `describe()` method:

In [None]:
reviews.describe()

In [None]:
reviews.points.describe()

This method generates a high-level summary of the attributes of the given column. It is type-aware, meaning that its output changes based on the data type of the input. The output above only makes sense for numerical data; for string data here's what we get:

In [None]:
reviews.taster_name.describe()

To get the Summary Statistics for all object columns (non-numeric) we can use the `include` parameter of the `describe()` method as follows:

In [None]:
reviews.describe(include=['O'])

If you want to get some particular simple summary statistic about a column in a DataFrame or a Series, there is usually a helpful pandas function that makes it happen. 

For example, to see the mean of the points allotted (e.g. how well an averagely rated wine does), we can use the `mean()` function:

In [None]:
# Try it yourself
# Done by students
reviews.points.mean()

To see a list of unique values we can use the `unique()` function:

In [None]:
# Try it yourself
# Done by students
reviews.taster_name.unique()

To count the unique values we can use the `nunique()` function:

In [None]:
reviews.taster_name.nunique()

To see a list of unique values _and_ how often they occur in the dataset, we can use the `value_counts()` method:

In [None]:
# Try it yourself
# Done by students
reviews.taster_name.value_counts()

Pandas provides many common mapping operations as built-ins. For example, here's a faster way of remeaning our points column:

In [None]:
reviews.country + " - " + reviews.region_1

These operators are faster than `map()` or `apply()` because they use speed ups built into pandas. All of the standard Python operators (>, <, ==, and so on) work in this manner.

However, they are not as flexible as `map()` or `apply()`, which can do more advanced things, like applying conditional logic, which cannot be done with addition and subtraction alone.

# Your turn