N-dimensional arrays in NumPy are only compatible with values of the same data type.

Arrays can organize values into matrices and vectors.

One-dimensional arrays are called vectors
Multi-dimensional arrays are called matrices

In [1]:
import numpy as np


In [2]:
a = np.array([[0,1,2,3],[4,5,6,7]])
print(a)
print(a.shape)
print(a[0,2])

[[0 1 2 3]
 [4 5 6 7]]
(2, 4)
2


In [3]:
a[1,3] = 8
a[1,3]

8

Generating random numbers is useful to run simulations of data entries.
Firstly, one must import the random module

    >>> import random

    • random.random() - generates a float value in range [0,1)

    • random.randit(min, max) - generates an integer value in the provided interval [min, max].

To combine NumPy and random module, one must concatenate the numpy and random operators one after the other.

i.e.

    Create a matrix of n rows and n columns using randomized integer values

    >>> import numpy as np
    >>> import random

    >>> np.random.randit(1, 9, (n,n))

In [4]:
import random

rand_float = random.random()

print(rand_float)

rand_int = random.randint(29, 38)

print(rand_int)

rand_matrix = np.random.randint(1,6, (4,6))

print(rand_matrix)

0.41985558437120696
29
[[3 3 2 3 2 4]
 [3 1 5 1 5 1]
 [5 5 2 1 2 3]
 [5 4 1 3 1 5]]


In [5]:
import pandas as pd

ser = pd.Series(np.random.random(5), name = "Column: 01")
print(ser)

0    0.589499
1    0.833780
2    0.200109
3    0.376938
4    0.612500
Name: Column: 01, dtype: float64


In [6]:
from pandas_datareader import data as wb

Apple = wb.DataReader('AAPL', data_source = 'yahoo', start = '1995-1-1')

TSLA = wb.DataReader('TSLA', data_source='yahoo', start='2012-1-1')

print(TSLA.head(20)['Adj Close'])


Date
2012-01-03    5.616
2012-01-04    5.542
2012-01-05    5.424
2012-01-06    5.382
2012-01-09    5.450
2012-01-10    5.524
2012-01-11    5.646
2012-01-12    5.650
2012-01-13    4.558
2012-01-17    5.320
2012-01-18    5.362
2012-01-19    5.352
2012-01-20    5.320
2012-01-23    5.354
2012-01-24    5.484
2012-01-25    5.594
2012-01-26    5.788
2012-01-27    5.866
2012-01-30    5.914
2012-01-31    5.814
Name: Adj Close, dtype: float64


In [7]:
Apple.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6919 entries, 1995-01-03 to 2022-06-24
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   High       6919 non-null   float64
 1   Low        6919 non-null   float64
 2   Open       6919 non-null   float64
 3   Close      6919 non-null   float64
 4   Volume     6919 non-null   float64
 5   Adj Close  6919 non-null   float64
dtypes: float64(6)
memory usage: 378.4 KB


In [8]:
# To see first n rows of data use obj.head(n)
# If no n argument is given, the function automatically presents the first 5 rows

Apple.head(30)

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1995-01-03,0.347098,0.33817,0.347098,0.342634,103868800.0,0.289652
1995-01-04,0.353795,0.344866,0.344866,0.351563,158681600.0,0.2972
1995-01-05,0.351563,0.345982,0.350446,0.347098,73640000.0,0.293425
1995-01-06,0.385045,0.367188,0.371652,0.375,1076622000.0,0.317013
1995-01-09,0.373884,0.366071,0.371652,0.367885,274086400.0,0.310998
1995-01-10,0.392857,0.368304,0.368304,0.390067,614790400.0,0.32975
1995-01-11,0.429129,0.381138,0.390625,0.417411,873824000.0,0.352866
1995-01-12,0.414063,0.399554,0.41183,0.405134,551779200.0,0.342487
1995-01-13,0.41183,0.396205,0.41183,0.40067,351377600.0,0.338713
1995-01-16,0.404018,0.395089,0.40067,0.397321,188977600.0,0.335882


In [9]:
# To see last n rows of obj, use obj.tail(n)
# If no n argument is given, the function automatically presents the last 5 rows


Apple.tail(30)

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2022-05-12,146.199997,138.800003,142.770004,142.559998,182602000.0,142.559998
2022-05-13,148.100006,143.110001,144.589996,147.110001,113990900.0,147.110001
2022-05-16,147.520004,144.179993,145.550003,145.539993,86643800.0,145.539993
2022-05-17,149.770004,146.679993,148.860001,149.240005,78336300.0,149.240005
2022-05-18,147.360001,139.899994,146.850006,140.820007,109742900.0,140.820007
2022-05-19,141.660004,136.600006,139.880005,137.350006,136095600.0,137.350006
2022-05-20,140.699997,132.610001,139.089996,137.589996,137426100.0,137.589996
2022-05-23,143.259995,137.649994,137.789993,143.110001,117726300.0,143.110001
2022-05-24,141.970001,137.330002,140.809998,140.360001,104132700.0,140.360001
2022-05-25,141.789993,138.339996,138.429993,140.520004,92482700.0,140.520004


To represent data in pandas, one can use a dataframe with the method:
    
    • DataFrame() - Creates an object of type dataframe with columns of data.

    >>> new_data = np.DataFrame()

Note: a DataFrame essentially works as a dictionary in the sense that every column is a key with its values. To create key-value pairs in the data frame, one can use the same methods for regular dictionaries:

    • dataframe[key] = data - Adds a key-value pair to the DataFrame
    • One can gather specific data by using key values with DataReader. 
    I.E. wb.DataReader()[Key]


    >>> new_data['TSLA'] = wb.DataReader('TSLA', data_source='yahoo', start='2012-1-1')['Adj Close']

In [10]:
tickers = ['PG', 'MSFT', 'T', 'F', 'GE']
new_data = pd.DataFrame()

for t in tickers:
    new_data[t] = wb.DataReader(t, data_source='yahoo', start='1995-1-1')['Adj Close']

In [11]:
new_data.tail(20)

Unnamed: 0_level_0,PG,MSFT,T,F,GE
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2022-05-26,146.479996,265.899994,21.32,13.12,77.010002
2022-05-27,148.720001,273.23999,21.290001,13.63,78.760002
2022-05-31,147.880005,271.869995,21.290001,13.68,78.290001
2022-06-01,145.639999,272.420013,21.219999,13.55,77.519997
2022-06-02,147.210007,274.579987,21.190001,13.89,78.0
2022-06-03,145.889999,270.019989,20.9,13.5,76.970001
2022-06-06,145.320007,268.75,20.940001,13.46,77.0
2022-06-07,146.940002,272.5,21.139999,13.74,78.0
2022-06-08,145.110001,270.410004,21.049999,13.53,77.160004
2022-06-09,142.490005,264.790009,20.879999,13.28,74.779999


Using Quandl to gather data:

>>> import quandl 

    • quandl.get() - fetches information from the argument specified in the get method

i.e.
>>> quandl.get('FRED/GDP')

    The get method gathers data from the Federal Reserve Economic Data with regards to GDP

In [12]:
import quandl
data_01 = quandl.get('FRED/GDP')
data_01

Unnamed: 0_level_0,Value
Date,Unnamed: 1_level_1
1947-01-01,243.164
1947-04-01,245.968
1947-07-01,249.585
1947-10-01,259.745
1948-01-01,265.742
...,...
2020-10-01,21477.597
2021-01-01,22038.226
2021-04-01,22740.959
2021-07-01,23202.344


One can convert data to a comma-separated-value file by applying the following method to the variable specified in code.

    • to_csv(path) - Using the to_csv method converts a data setr to a CSV file. One must also specify in the argument the destination (path) and name of the file.

>>>data_01.to_csv(C:\Users\Python for Finance\---NAME OF YOUR FILE---)

To read the csv file, use the following method:

    • pd.read_csv(path) - This allows panda to read the csv file in python.

Alternatively, one could also convert the data to an excel file by replacing the 'csv' part of the method with 'excel'
    
    • to_excel(path)
    • pd.read_excel(path)

In [13]:
data_01.to_csv(r'C:\Users\infin\OneDrive\Documentos\Python for Finance\my_data01.csv')
data_02 = pd.read_csv(r'C:\Users\infin\OneDrive\Documentos\Python for Finance\my_data01.csv')

In [14]:
print(data_02.head(5))
print(data_02.tail(5))

         Date    Value
0  1947-01-01  243.164
1  1947-04-01  245.968
2  1947-07-01  249.585
3  1947-10-01  259.745
4  1948-01-01  265.742
           Date      Value
295  2020-10-01  21477.597
296  2021-01-01  22038.226
297  2021-04-01  22740.959
298  2021-07-01  23202.344
299  2021-10-01  23992.355


In [15]:
data_02.to_excel(r'C:\Users\infin\OneDrive\Documentos\Python for Finance\my_data01.xlsx', index=False)
data_03 = pd.read_excel(r'C:\Users\infin\OneDrive\Documentos\Python for Finance\my_data01.xlsx')

In [16]:
data_03.head(5)

Unnamed: 0,Date,Value
0,1947-01-01,243.164
1,1947-04-01,245.968
2,1947-07-01,249.585
3,1947-10-01,259.745
4,1948-01-01,265.742


In [17]:
data_03.tail(5)

Unnamed: 0,Date,Value
295,2020-10-01,21477.597
296,2021-01-01,22038.226
297,2021-04-01,22740.959
298,2021-07-01,23202.344
299,2021-10-01,23992.355


Python will, by default, assign python indexing (default integer index) to the data we provide. That is to say, data is not sorted by any other metric, but rather from 0 onwards. 

To change this, one can include another argument in the read method:

    • pd.read_excel/csv(path, index_col="") - To change the indexing, one must specify, within quotation marks, the key column from which the indeces will be extracted.

>>> pd.read_excel(r'C:\Users\infin\OneDrive\Documentos\Python for Finance\my_data01.xlsx', index_col='Date')

Another method for changing the index:

    • data_03.set_index('index') - after a data type variable is already defined, one can set an index that stems from the data set itself. This is done by passing the desired index in the argument within quotation marks/apostrophes.

    However, there is one caveat to this method. It will only display the data associated with the index

In [18]:
new_data03 = pd.read_excel(r'C:\Users\infin\OneDrive\Documentos\Python for Finance\my_data01.xlsx', index_col='Date')
new_data03

Unnamed: 0_level_0,Value
Date,Unnamed: 1_level_1
1947-01-01,243.164
1947-04-01,245.968
1947-07-01,249.585
1947-10-01,259.745
1948-01-01,265.742
...,...
2020-10-01,21477.597
2021-01-01,22038.226
2021-04-01,22740.959
2021-07-01,23202.344


In [19]:
data_02.set_index('Date')

Unnamed: 0_level_0,Value
Date,Unnamed: 1_level_1
1947-01-01,243.164
1947-04-01,245.968
1947-07-01,249.585
1947-10-01,259.745
1948-01-01,265.742
...,...
2020-10-01,21477.597
2021-01-01,22038.226
2021-04-01,22740.959
2021-07-01,23202.344
