# A. Using Pandas and Numpy

## A.1. Portfolio and Risk

- Pandas Datareader - a Python package that allows us to create a pandas DataFrame object by using various data sources from the internet. It is popularly used for working with realtime stock price datasets. [Tutorial](https://www.youtube.com/watch?v=sgndYho8RyI)
- Pandas
- Numpy
- Portfolio and Return

### A.1.1. Get Stock Price with Pandas Datareader

In [5]:
# !pip install pandas_datareader
import pandas as pd
import pandas_datareader as pdr
import datetime as dt
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%config IPCompleter.greedy=True
%config IPCompleter.use_jedi=False

- Get the Apple Inc. stock price via Yahoo API from the defined start date. The stock ticker we use is "AAPL".
- __Note__: A ticker symbol or stock symbol (i.e. "AAPL") is an abbreviation used to uniquely identify publicly traded shares of a particular stock on a particular stock market. In short, ticker symbols are arrangements of symbols or characters representing specific assets or securities listed on a stock exchange or traded publicly.

In [7]:
start = dt.datetime(2020, 1,1)
data = pdr.get_data_yahoo("AAPL", start) 

- Inspect the first 5 rows of our data

In [14]:
data.head()

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-12-31,73.419998,72.379997,72.482498,73.412498,100805600.0,72.245941
2020-01-02,75.150002,73.797501,74.059998,75.087502,135480400.0,73.894325
2020-01-03,75.144997,74.125,74.287498,74.357498,146322800.0,73.175934
2020-01-06,74.989998,73.1875,73.447502,74.949997,118387200.0,73.759003
2020-01-07,75.224998,74.370003,74.959999,74.597504,108872000.0,73.412117


- Inspect the data type of our data frame index

In [12]:
data.index

DatetimeIndex(['2019-12-31', '2020-01-02', '2020-01-03', '2020-01-06',
               '2020-01-07', '2020-01-08', '2020-01-09', '2020-01-10',
               '2020-01-13', '2020-01-14',
               ...
               '2022-03-22', '2022-03-23', '2022-03-24', '2022-03-25',
               '2022-03-28', '2022-03-29', '2022-03-30', '2022-03-31',
               '2022-04-01', '2022-04-01'],
              dtype='datetime64[ns]', name='Date', length=570, freq=None)

- Inspect the data type of our data frame columns / fields

In [15]:
data.dtypes

High         float64
Low          float64
Open         float64
Close        float64
Volume       float64
Adj Close    float64
dtype: object

Notice that all the fields have the float64 datatype and so are numerical variables. In addition, to make it more clearer, of course our data is a dataframe.

In [18]:
type(data)

pandas.core.frame.DataFrame

### A.1.2. NumPy - Warming Up

- If we already have Pandas Dataframe, why even bother to use NumPy? 
    - Although Pandas Dataframe gives us most of the functionality we want when working with financial analysis, but there are some where it's limited. 
    - Luckily, Pandas Dataframe is built upon NumPy arrays, so it's not a problem. We can use NumPy to do more advanced financial analysis, that's why we love it!

a. Convert the dataframe to a numpy array

In [21]:
arr = data.to_numpy()
arr

array([[7.34199982e+01, 7.23799973e+01, 7.24824982e+01, 7.34124985e+01,
        1.00805600e+08, 7.22459412e+01],
       [7.51500015e+01, 7.37975006e+01, 7.40599976e+01, 7.50875015e+01,
        1.35480400e+08, 7.38943253e+01],
       [7.51449966e+01, 7.41250000e+01, 7.42874985e+01, 7.43574982e+01,
        1.46322800e+08, 7.31759338e+01],
       ...,
       [1.78029999e+02, 1.74399994e+02, 1.77839996e+02, 1.74610001e+02,
        1.03049300e+08, 1.74610001e+02],
       [1.74880005e+02, 1.71940002e+02, 1.74029999e+02, 1.74309998e+02,
        7.86998000e+07, 1.74309998e+02],
       [1.74880005e+02, 1.71940002e+02, 1.74029999e+02, 1.74309998e+02,
        7.87513280e+07, 1.74309998e+02]])

In [22]:
arr.shape

(570, 6)

In [23]:
len(arr)

570

In [36]:
arr[0]

array([7.34199982e+01, 7.23799973e+01, 7.24824982e+01, 7.34124985e+01,
       1.00805600e+08, 7.22459412e+01])

In [51]:
data.iloc[0:1]

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-12-31,73.419998,72.379997,72.482498,73.412498,100805600.0,72.245941


In [56]:
data.loc[[dt.datetime(2019,12,31)]]

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-12-31,73.419998,72.379997,72.482498,73.412498,100805600.0,72.245941


In [57]:
arr.dtype

dtype('float64')

Remember, in the dataframe, you could have a data type per column. However, in NumPy, you could only have one data type.