In [None]:
# Data Cleaning in Pandas - Recap

## Introduction

In this section you saw how to wrangle and clean data in Pandas! This will be a baseline skill that you will use consistently in your work whether it's doing sanity checks, cleaning messy data or transforming raw datasets into useful aggregates and views. Having an understanding of the format of your data is essential to critically thinking about how you can manipulate and shape it into new and interesting forms.


## Lambda functions

We started out by introducing lambda functions. These are quick throw away functions that you can write on the fly. They're very useful for transforming a column feature. For example, you might want to extract the day from a date.

In [None]:
import pandas as pd
dates = pd.Series(['12-01-2017', '12-02-2017', '12-03-2017', '12-04-2017'])
dates.map(lambda x: x.split('-')[1])

## Combining DataFrames

You can combine dataframes by merging them (joining data by a common field) or concatenating them (appending data at the beginning or end).

In [None]:
df1 = pd.DataFrame(dates)
df2 = pd.DataFrame(['12-05-2017', '12-06-2017', '12-07-2017'])
pd.concat([df1, df2])

## Grouping and aggregating

In [None]:
df = pd.read_csv('titanic.csv')
df.head()

In [None]:
grouped = df.groupby(['Pclass', 'Sex'])['Age'].mean().reset_index()
grouped.head()

## Pivot tables

In [None]:
pivoted = grouped.pivot(index='Pclass', columns = 'Sex', values='Age')
pivoted

## Graphing

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
pivoted.plot(kind='barh')

## Missing data

In [None]:
print('Top 5 Values before:\n', df['Cabin'].value_counts(normalize=True).reset_index()[:5])
# Not a useful means of imputing in most cases, but a simple example to recap
df.Cabin = df['Cabin'].fillna(value='?')
print('Top 5 Values after:\n', df.Cabin.value_counts(normalize=True).reset_index()[:5])

## Summary

In this lesson, you started practicing essential ETL skills that you will use throughout your data work to transform and wrangle data into useful forms.