In [2]:
from myenv.models.candlestick import Candlestick
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


candlestick = Candlestick()

bitcoin_prices = candlestick.to_df()

bitcoin_prices.describe()


Unnamed: 0,open,high,close,low
count,1422.0,1422.0,1422.0,1422.0
mean,13425.309381,13856.75206,13445.480345,12923.115084
std,13468.444033,13935.359769,13476.179163,12902.342382
min,3188.01,3276.5,3189.02,2817.0
25%,6532.1575,6666.7675,6542.35,6430.0
50%,8775.64,8994.475,8777.845,8521.64
75%,11459.8,11791.25,11463.725,11116.225
max,63575.0,64854.0,63575.0,62020.0


In [8]:
# In Python, we can access the property of an object by accessing it as an attribute

bitcoin_prices['high']

0        4485.39
1        4371.52
2        4184.69
3        4211.08
4        4119.62
          ...   
1417    35967.90
1418    35293.80
1419    35118.90
1420    35059.10
1421    33929.60
Name: high, Length: 1422, dtype: float64

These are the two ways of selecting a specific Series out of a DataFrame. Neither of them is more or less syntactically valid than the other, but the indexing operator [] does have the advantage that it can handle column names with reserved characters in them

Doesn't a pandas Series look kind of like a fancy dictionary? It pretty much is, so it's no surprise that, to drill down to a single specific value, we need only use the indexing operator [] once more:

In [12]:
bitcoin_prices['high'][1:2]

1    4371.52
Name: high, dtype: float64

## Indexing in pandas

The indexing operator and attribute selection are nice because they work just like they do in the rest of the Python ecosystem. As a novice, this makes them easy to pick up and use. However, pandas has its own accessor operators, **loc** and **iloc**. For more advanced operations, these are the ones you're supposed to be using.

# Index-based selection

Pandas indexing works in one of two paradigms. The first is index-based selection: selecting data based on its numerical position in the data. iloc follows this paradigm.

To select the first row of data in a DataFrame, we may use the following:

iloc[:3, 0] -> 1:3 số hàng, 0 cột

In [16]:
bitcoin_prices.iloc[:3, 0]


0   2017-08-17
1   2017-08-18
2   2017-08-19
Name: date, dtype: datetime64[ns]

In [17]:
# lay index 1, 2, 5
bitcoin_prices.iloc[[1,2,5], 0]


1   2017-08-18
2   2017-08-19
5   2017-08-22
Name: date, dtype: datetime64[ns]

In [18]:
# lay 5 hang cuoi cung
bitcoin_prices.iloc[-5:]


Unnamed: 0,date,open,high,close,low
1417,2021-07-04,34669.1,35967.9,35286.5,34357.1
1418,2021-07-05,35288.1,35293.8,33690.1,33125.6
1419,2021-07-06,33690.1,35118.9,34220.0,33532.0
1420,2021-07-07,34220.0,35059.1,33862.1,33777.8
1421,2021-07-08,33862.1,33929.6,32834.2,32077.0


# Label-based selection

The second paradigm for attribute selection is the one followed by the loc operator: label-based selection. In this paradigm, it's the data index value, not its position, which matters.

For example, to get the first entry in reviews, we would now do the following:

In [20]:
bitcoin_prices.loc[1, 'open']


4285.08

In [21]:
bitcoin_prices.loc[:, ['open', 'high']]


Unnamed: 0,open,high
0,4261.48,4485.39
1,4285.08,4371.52
2,4108.37,4184.69
3,4120.98,4211.08
4,4069.13,4119.62
...,...,...
1417,34669.10,35967.90
1418,35288.10,35293.80
1419,33690.10,35118.90
1420,34220.00,35059.10


# Conditional selection

So far we've been indexing various strides of data, using structural properties of the DataFrame itself. To do interesting things with the data, however, we often need to ask questions based on conditions.

For example, suppose that we're interested specifically in better-than-average wines produced in Italy.

We can start by checking if each wine is Italian or not:

In [22]:
bitcoin_prices['high'] == 4485.39


0        True
1       False
2       False
3       False
4       False
        ...  
1417    False
1418    False
1419    False
1420    False
1421    False
Name: high, Length: 1422, dtype: bool

This operation produced a Series of True/False booleans based on the country of each record. This result can then be used inside of loc to select the relevant data: