## Loading Data & Visualizations

Goals:
- Get data
- Inspect data
- The .pipe method

#### Loading Data

In [1]:
!pip install matplotlib




[notice] A new release of pip is available: 24.0 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
!pip install yfinance




[notice] A new release of pip is available: 24.0 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
from matplotlib import dates
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import yfinance as yf

In [4]:
raw = yf.download('SPY AAPL', start='2010-01-01', end='2019-12-31')

[*********************100%***********************]  2 of 2 completed


In [5]:
raw

Price,Close,Close,High,High,Low,Low,Open,Open,Volume,Volume
Ticker,AAPL,SPY,AAPL,SPY,AAPL,SPY,AAPL,SPY,AAPL,SPY
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
2010-01-04,6.440330,86.026466,6.455076,86.072009,6.391278,84.644942,6.422876,85.297751,493729600,118944600
2010-01-05,6.451464,86.254196,6.487877,86.292152,6.417458,85.662115,6.458084,85.973340,601904800,111579900
2010-01-06,6.348847,86.314926,6.477046,86.527467,6.342226,86.102385,6.451466,86.170699,552160000,116074400
2010-01-07,6.337110,86.679276,6.379843,86.785546,6.291067,85.912604,6.372319,86.155509,477131200,131091100
2010-01-08,6.379242,86.967720,6.379845,87.005676,6.291370,86.276961,6.328685,86.451546,447610800,126402800
...,...,...,...,...,...,...,...,...,...,...
2019-12-23,68.757668,297.810944,68.818194,298.209600,67.878827,297.662600,67.917565,298.153975,98572000,52990000
2019-12-24,68.823029,297.820251,68.973140,298.089097,68.496193,297.514284,68.924716,298.042752,48478800,20270000
2019-12-26,70.188507,299.405609,70.205456,299.414889,68.927145,298.200359,68.956196,298.209611,93121200,30911200
2019-12-27,70.161850,299.331329,71.171429,300.202828,69.755116,298.793610,70.481430,300.147203,146266000,42528800


In [6]:
raw.columns

MultiIndex([( 'Close', 'AAPL'),
            ( 'Close',  'SPY'),
            (  'High', 'AAPL'),
            (  'High',  'SPY'),
            (   'Low', 'AAPL'),
            (   'Low',  'SPY'),
            (  'Open', 'AAPL'),
            (  'Open',  'SPY'),
            ('Volume', 'AAPL'),
            ('Volume',  'SPY')],
           names=['Price', 'Ticker'])

In [7]:
# Going to use the .pipe method a lot
raw.pipe?

[1;31mSignature:[0m
[0mraw[0m[1;33m.[0m[0mpipe[0m[1;33m([0m[1;33m
[0m    [0mfunc[0m[1;33m:[0m [1;34m'Callable[..., T] | tuple[Callable[..., T], str]'[0m[1;33m,[0m[1;33m
[0m    [1;33m*[0m[0margs[0m[1;33m,[0m[1;33m
[0m    [1;33m**[0m[0mkwargs[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m [1;33m->[0m [1;34m'T'[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Apply chainable functions that expect Series or DataFrames.

Parameters
----------
func : function
    Function to apply to the Series/DataFrame.
    ``args``, and ``kwargs`` are passed into ``func``.
    Alternatively a ``(callable, data_keyword)`` tuple where
    ``data_keyword`` is a string indicating the keyword of
    ``callable`` that expects the Series/DataFrame.
*args : iterable, optional
    Positional arguments passed into ``func``.
**kwargs : mapping, optional
    A dictionary of keyword arguments passed into ``func``.

Returns
-------
the return type of ``func``.

See Also
--------
DataFram

In [8]:
def fix_cols(df):
    columns = df.columns
    outer = [col[0] for col in columns]
    df.columns = outer
    return df

(raw
 .iloc[:,::2]
 .pipe(fix_cols)
)

Unnamed: 0_level_0,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2010-01-04,6.440330,6.455076,6.391278,6.422876,493729600
2010-01-05,6.451464,6.487877,6.417458,6.458084,601904800
2010-01-06,6.348847,6.477046,6.342226,6.451466,552160000
2010-01-07,6.337110,6.379843,6.291067,6.372319,477131200
2010-01-08,6.379242,6.379845,6.291370,6.328685,447610800
...,...,...,...,...,...
2019-12-23,68.757668,68.818194,67.878827,67.917565,98572000
2019-12-24,68.823029,68.973140,68.496193,68.924716,48478800
2019-12-26,70.188507,70.205456,68.927145,68.956196,93121200
2019-12-27,70.161850,71.171429,69.755116,70.481430,146266000


**Understanding the fix_cols function**
- This function extracts the first element from each column name and sets it as the new column name.
- It assumes that the column names are tuples or lists (multi-indexed columns).

**Applying .iloc[:, ::2]**
- iloc[:, ::2]: This selects every second column (i.e., 0, 2, 4, ...).
- .pipe(fix_cols): Passes the result of .iloc into the fix_cols function.

**Overall Explanation**
- raw.iloc[:, ::2]: Selects every second column of raw.
- pipe(fix_cols): Passes the subsetted DataFrame to fix_cols, renaming columns by extracting the first element from each column name.

In [9]:
import yfinance as yf

def fix_cols(df):
    columns = df.columns
    outer = [col[0] for col in columns]
    df.columns = outer
    return df

def tweak_data():
    raw = yf.download('SPY AAPL', start='2010-01-01', end='2019-12-31')

    return (raw
     .iloc[:,::2]
     .pipe(fix_cols)
    )

tweak_data()

[*********************100%***********************]  2 of 2 completed


Unnamed: 0_level_0,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2010-01-04,6.440330,6.455076,6.391278,6.422876,493729600
2010-01-05,6.451464,6.487877,6.417458,6.458084,601904800
2010-01-06,6.348847,6.477046,6.342226,6.451466,552160000
2010-01-07,6.337110,6.379843,6.291067,6.372319,477131200
2010-01-08,6.379242,6.379845,6.291370,6.328685,447610800
...,...,...,...,...,...
2019-12-23,68.757668,68.818194,67.878827,67.917565,98572000
2019-12-24,68.823029,68.973140,68.496193,68.924716,48478800
2019-12-26,70.188507,70.205456,68.927145,68.956196,93121200
2019-12-27,70.161850,71.171429,69.755116,70.481430,146266000
