# V deo
#### Koristimo `yfinance` biblioteku za preuzimanje podataka o kretanju vrednosti bitkoina u 2024.godini. Nakon preuzimanja, podatke čuvamo u odg. csv fajl.

In [11]:
import yfinance as yf
from pandas.core.resample import asfreq

data = yf.download("BTC-USD", start="2024-01-01", end="2024-12-31")

data.to_csv("../data/btc24.csv")


[*********************100%***********************]  1 of 1 completed


#### Prilikom čitanja csv fajla moguće je simultano izvršiti odr. manipulacije nad datasetom:
#### `parse_dates`=["Date"]: Ova opcija govori pandas-u da kolonu pod nazivom "Date" treba konvertovati u tip podatka datetime
#### `index_col`="Date": za indekse redova postavljamo datume

In [1]:
import pandas as pd
df = pd.read_csv("../data/btc24.csv",parse_dates=["Date"], index_col="Date")
df.head(10)

Unnamed: 0_level_0,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2024-01-01,44167.332031,44175.4375,42214.976562,42280.234375,18426978443
2024-01-02,44957.96875,45899.707031,44176.949219,44187.140625,39335274536
2024-01-03,42848.175781,45503.242188,40813.535156,44961.601562,46342323118
2024-01-04,44179.921875,44770.023438,42675.175781,42855.816406,30448091210
2024-01-05,44162.691406,44353.285156,42784.71875,44192.980469,32336029347
2024-01-06,43989.195312,44227.632812,43475.15625,44178.953125,16092503468
2024-01-07,43943.097656,44495.570312,43662.230469,43998.464844,19330573863
2024-01-08,46970.503906,47218.0,43244.082031,43948.707031,42746192015
2024-01-09,46139.730469,47893.699219,45244.714844,46987.640625,39821290992
2024-01-10,46627.777344,47647.222656,44483.152344,46121.539062,50114613298


#### Ono što je novina jeste to da smo za indekse stavili datume. Dobijena struktura je tipa `DatetimeIndex` što se vidi iz priloženog:

In [2]:
df.index

DatetimeIndex(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04',
               '2024-01-05', '2024-01-06', '2024-01-07', '2024-01-08',
               '2024-01-09', '2024-01-10',
               ...
               '2024-12-21', '2024-12-22', '2024-12-23', '2024-12-24',
               '2024-12-25', '2024-12-26', '2024-12-27', '2024-12-28',
               '2024-12-29', '2024-12-30'],
              dtype='datetime64[ns]', name='Date', length=365, freq=None)

#### Najveća prednost ovakvog pristupa je olakšano pristupanje podacima iz dataseta. Pristupamo najpre vrednostima kolona za indeks 12.maj i rezultat kastujemo u rečnik. Zatim pristupamo podacima za april, nakon čega prikazujemo i rad sa opsezima. 

In [3]:
dict(df.loc['2024-05-12'])

{'Close': np.float64(61448.39453125),
 'High': np.float64(61818.15625),
 'Low': np.float64(60632.6015625),
 'Open': np.float64(60793.50390625),
 'Volume': np.float64(13800459405.0)}

In [4]:
df.loc['2024-04']

Unnamed: 0_level_0,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2024-04-01,69702.148438,71342.09375,68110.695312,71333.484375,34873527352
2024-04-02,65446.972656,69708.382812,64586.59375,69705.023438,50705240709
2024-04-03,65980.8125,66914.320312,64559.898438,65446.671875,34488018367
2024-04-04,68508.84375,69291.257812,65113.796875,65975.695312,34439527442
2024-04-05,67837.640625,68725.757812,66011.476562,68515.757812,33748230056
2024-04-06,68896.109375,69629.601562,67491.71875,67840.570312,19967785809
2024-04-07,69362.554688,70284.429688,68851.632812,68897.109375,21204930369
2024-04-08,71631.359375,72715.359375,69064.242188,69362.554688,37261432669
2024-04-09,69139.015625,71742.507812,68212.921875,71632.5,36426900409
2024-04-10,70587.882812,71093.429688,67503.5625,69140.242188,38318601774


In [5]:
df.loc['2024-03-29':'2024-04-05']

Unnamed: 0_level_0,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2024-03-29,69892.828125,70913.09375,69076.65625,70744.796875,25230851763
2024-03-30,69645.304688,70355.492188,69601.0625,69893.445312,17130241883
2024-03-31,71333.648438,71377.78125,69624.867188,69647.78125,20050941373
2024-04-01,69702.148438,71342.09375,68110.695312,71333.484375,34873527352
2024-04-02,65446.972656,69708.382812,64586.59375,69705.023438,50705240709
2024-04-03,65980.8125,66914.320312,64559.898438,65446.671875,34488018367
2024-04-04,68508.84375,69291.257812,65113.796875,65975.695312,34439527442
2024-04-05,67837.640625,68725.757812,66011.476562,68515.757812,33748230056


#### `Resample` metoda koristi se za grupisanje podataka po vremenskim intervalima. Parametar 'ME' označava mesečno grupisanje. Podaci će biti grupisani za svaki kraj meseca (npr. 31. januar, 28/29. februar).

In [6]:
df['Close'].resample('ME').mean()

Date
2024-01-31    42919.612399
2024-02-29    49875.174300
2024-03-31    67702.439264
2024-04-30    65882.380599
2024-05-31    65266.317288
2024-06-30    65899.465755
2024-07-31    62804.542087
2024-08-31    59921.197581
2024-09-30    60358.515885
2024-10-31    65577.264491
2024-11-30    86570.707812
2024-12-31    98393.073438
Freq: ME, Name: Close, dtype: float64

#### Može se desiti da dataset nema kolonu za datume prilikom njegovog importovanja. Moguće je izgenerisati odg. listu datuma i dodeliti je datafrejmu.

In [7]:
bt = pd.read_csv('../data/btc24_bez_datuma.csv')
bt.sample(5)

Unnamed: 0,Close,High,Low,Open,Volume
8,46139.730469,47893.699219,45244.714844,46987.640625,39821290992
308,67811.507812,69433.179688,66803.648438,68742.132812,41184819348
113,66407.273438,67199.242188,65864.867188,66839.890625,24310975583
347,101459.257812,101888.804688,99233.28125,100046.648438,56894751583
172,64096.199219,65007.546875,63378.894531,64837.988281,26188171739


#### Pošto razmatramo kretanje vrednosti bitkoina u 2024.godini, napravićemo `DatetimeIndex` sa svim datumima ove godine. U te svrhe koristimo metodu date_range, koja prima početni i krajnji dan (ne mora nužno biti dan), kao i parametar `freq` koji određuje koji će dani biti uključeni (D - svi dani, B-business days tj. radni dani)

In [13]:
dts = pd.date_range(start='2024-01-01',end='2024-12-30',freq='D')
dts

DatetimeIndex(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04',
               '2024-01-05', '2024-01-06', '2024-01-07', '2024-01-08',
               '2024-01-09', '2024-01-10',
               ...
               '2024-12-21', '2024-12-22', '2024-12-23', '2024-12-24',
               '2024-12-25', '2024-12-26', '2024-12-27', '2024-12-28',
               '2024-12-29', '2024-12-30'],
              dtype='datetime64[ns]', length=365, freq='D')

#### Izgenerisanu listu indeksa dodeljujemo datafrejmu. Kako bi se promene nad datafrejmom sačuvale parametar `inplace` postavljamo na True.

In [16]:
bt.set_index(dts, inplace=True)
bt.sample(5)

Unnamed: 0,Close,High,Low,Open,Volume
2024-10-29,72720.492188,73577.210938,69729.914062,69910.046875,58541874402
2024-06-20,64828.65625,66438.960938,64547.847656,64960.296875,25641109124
2024-10-20,69001.703125,69359.007812,68105.71875,68364.179688,18975847518
2024-06-02,67751.601562,68409.164062,67315.523438,67710.273438,17110588415
2024-10-23,66432.195312,67402.742188,65188.035156,67362.375,32263980353


#### Metoda `asfreq` se koristi za promenu frekvencije vremenskih serija (time series). Na primer, moguće je promeniti frekvenciju sa dnevne (D) na mesečnu (ME), ili sa minutne (T) na satnu (h). Kao metod koristićemo padding - nalepljivanje vrednosti iz prethodnog reda.

In [24]:
bt = bt.loc['2024-09']
bt

Unnamed: 0,Close,High,Low,Open,Volume
2024-09-01,57325.488281,59062.070312,57217.824219,58969.800781,24592449997
2024-09-02,59112.480469,59403.070312,57136.027344,57326.96875,27036454524
2024-09-03,57431.023438,59815.058594,57425.167969,59106.191406,26666961053
2024-09-04,57971.539062,58511.570312,55673.164062,57430.347656,35627680312
2024-09-05,56160.488281,58300.582031,55712.453125,57971.703125,31030280656
2024-09-06,53948.753906,56976.109375,52598.699219,56160.191406,49361693566
2024-09-07,54139.6875,54838.144531,53740.070312,53949.085938,19061486526
2024-09-08,54841.566406,55300.859375,53653.757812,54147.933594,18268287531
2024-09-09,57019.535156,58041.125,54598.433594,54851.886719,34618096173
2024-09-10,57648.710938,58029.976562,56419.414062,57020.097656,28857630507


In [23]:
bt.asfreq(freq='W', method='pad')

Unnamed: 0,Close,High,Low,Open,Volume
2024-09-01,57325.488281,59062.070312,57217.824219,58969.800781,24592449997
2024-09-08,54841.566406,55300.859375,53653.757812,54147.933594,18268287531
2024-09-15,59182.835938,60381.917969,58696.308594,60000.726562,18120960867
2024-09-22,63648.710938,63993.421875,62440.726562,63396.804688,20183348802
2024-09-29,65635.304688,66069.34375,65450.015625,65888.898438,14788214575


In [27]:
bt.asfreq(freq='h', method='pad').iloc[:100]

Unnamed: 0,Close,High,Low,Open,Volume
2024-09-01 00:00:00,57325.488281,59062.070312,57217.824219,58969.800781,24592449997
2024-09-01 01:00:00,57325.488281,59062.070312,57217.824219,58969.800781,24592449997
2024-09-01 02:00:00,57325.488281,59062.070312,57217.824219,58969.800781,24592449997
2024-09-01 03:00:00,57325.488281,59062.070312,57217.824219,58969.800781,24592449997
2024-09-01 04:00:00,57325.488281,59062.070312,57217.824219,58969.800781,24592449997
...,...,...,...,...,...
2024-09-04 23:00:00,57971.539062,58511.570312,55673.164062,57430.347656,35627680312
2024-09-05 00:00:00,56160.488281,58300.582031,55712.453125,57971.703125,31030280656
2024-09-05 01:00:00,56160.488281,58300.582031,55712.453125,57971.703125,31030280656
2024-09-05 02:00:00,56160.488281,58300.582031,55712.453125,57971.703125,31030280656


#### Kao što smo već nagovestili, funkcija date_range ne mora nužno generisati listu datuma (dana), već indeksi mogu biti i sati, minuti itd. Takođe, ne moramo navesti krajnji vremenski interval već možemo koristiti periods kojim ćemo navesti željeni broj indeksa.

In [29]:
pd.date_range('2023-10-20', periods=72, freq='h')

DatetimeIndex(['2023-10-20 00:00:00', '2023-10-20 01:00:00',
               '2023-10-20 02:00:00', '2023-10-20 03:00:00',
               '2023-10-20 04:00:00', '2023-10-20 05:00:00',
               '2023-10-20 06:00:00', '2023-10-20 07:00:00',
               '2023-10-20 08:00:00', '2023-10-20 09:00:00',
               '2023-10-20 10:00:00', '2023-10-20 11:00:00',
               '2023-10-20 12:00:00', '2023-10-20 13:00:00',
               '2023-10-20 14:00:00', '2023-10-20 15:00:00',
               '2023-10-20 16:00:00', '2023-10-20 17:00:00',
               '2023-10-20 18:00:00', '2023-10-20 19:00:00',
               '2023-10-20 20:00:00', '2023-10-20 21:00:00',
               '2023-10-20 22:00:00', '2023-10-20 23:00:00',
               '2023-10-21 00:00:00', '2023-10-21 01:00:00',
               '2023-10-21 02:00:00', '2023-10-21 03:00:00',
               '2023-10-21 04:00:00', '2023-10-21 05:00:00',
               '2023-10-21 06:00:00', '2023-10-21 07:00:00',
               '2023-10-