# Pythonic pandas
### Using a tutorial about [fast flexible pandas](https://realpython.com/fast-flexible-pandas/).

The [pandas Python package](https://pandas.pydata.org/pandas-docs/stable/) is an effective way to examine and manipulate data.  
The source code is [available on Github](https://github.com/pandas-dev/pandas), and be sure to check out pandas' library of [extension modules](https://github.com/pandas-dev/pandas/tree/master/pandas/_libs).  
Be careful when writing code for pandas, because [Pythonic code](https://stackoverflow.com/questions/25011078/what-does-pythonic-mean) may not necessarily be a good idea.  
Like [NumPy](http://www.numpy.org/), pandas is designed for vectorized operations that replace explicit loops with array expressions.  
This tutorial will attempt to demonstrate Pythonic pandas that will make the best use of the language and the library.

## Our Task

The goal of this example will be to apply time-of-use energy tariffs to find the total cost of energy consumption for one year.  
Make sure that you are up to speed with basic [data selection and indexing](https://pandas.pydata.org/pandas-docs/stable/indexing.html).  
Our problem is that at different hours of the day, the price for electricity varies, so the task is to multiply the electricity consumed for each hour by the correct price for the hour in which it was consumed.  
Let’s read our data from a [CSV file](https://raw.githubusercontent.com/realpython/materials/master/pandas-fast-flexible-intuitive/tutorial/demand_profile.csv) that has two columns: one for date plus time and one for electrical energy consumed in kilowatt hours (kWh):

In [1]:
import pandas as pd

pd.__version__

'0.23.3'

In [2]:
nrg = pd.read_csv('energy_consumption.csv'); nrg.describe(include='all')

Unnamed: 0,date_time,energy_kwh
count,8760,8760.0
unique,8760,
top,24/8/13 9:00,
freq,1,
mean,,0.6536
std,,0.453193
min,,0.0
25%,,0.285
50%,,0.609
75%,,0.941


The rows contains the electricity used in each hour for a one year period.  
Each row indicates the usage for the hour starting at the specified time, so `1/1/13 0:00` indicates the usage for the first hour of January 1st.

### [Working with datetime data](https://realpython.com/fast-flexible-pandas/#saving-time-with-datetime-data)

Let's take a closer look at our data:

In [3]:
nrg.head()

Unnamed: 0,date_time,energy_kwh
0,1/1/13 0:00,0.586
1,1/1/13 1:00,0.58
2,1/1/13 2:00,0.572
3,1/1/13 3:00,0.596
4,1/1/13 4:00,0.592


Both pandas and Numpy use the concept of `dtypes` as data types, and if no arguments are specified, `date_time` will take on an `object` dtype.

In [4]:
nrg.dtypes

date_time      object
energy_kwh    float64
dtype: object

In [5]:
# https://docs.python.org/3/library/functions.html#type
# https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iat.html
type(nrg.iat[0,0])

str

This will be an issue with any column that can't neatly fit into a single data type.  
Working with dates as strings is also an inefficient use of memory and programmer time (not to mention patience).  
This exercise will work with time series data, and the `date_time` column will be formatted as an array of `datetime` objects called a [pandas.Timestamp](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Timestamp.html).

In [6]:
nrg['date_time'] = pd.to_datetime(nrg['date_time'])
# https://stackoverflow.com/questions/29206612/difference-between-data-type-datetime64ns-and-m8ns
nrg['date_time'].dtype

dtype('<M8[ns]')

If you're curious about alternatives to the code above, check out [pandas.PeriodIndex](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.PeriodIndex.html), which can store ordinal values indicating regular time periods.  
We now have a [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) called `nrg` that contains the data from our `.csv` file.  
Notice how the time is displayed differently in the `date_time` column.

In [7]:
nrg.head()

Unnamed: 0,date_time,energy_kwh
0,2013-01-01 00:00:00,0.586
1,2013-01-01 01:00:00,0.58
2,2013-01-01 02:00:00,0.572
3,2013-01-01 03:00:00,0.596
4,2013-01-01 04:00:00,0.592


### Time for a timing decorator

The code above is pretty straightforward, but how fast does it run?  
Let's find out by using a [timing decorator](https://github.com/realpython/materials/blob/master/pandas-fast-flexible-intuitive/tutorial/timer.py) called `@timeit` (an homage to [Python's timeit](https://docs.python.org/3/library/timeit.html)).  
This decorator behaves like `timeit.repeat()`, but it also allows you to return the result of the function itself as well as get the average runtime from multiple trials.  
When you create a function and place the `@timeit` decorator above it, the function will be timed every time it is called.  
Keep in mind that the decorator runs an outer and an inner loop.

In [8]:
import timer