# Using Stock Market Data in Python

Abstract The present chapter aims to demonstrate the different access to data concerning securities and the packages that are useful to analyze the data. 

The book emphasizes Yahoo Finance API, but it explains various API’s that are accessible for analyzing the data. The packages are explained and applied to the data.

## Keyword API · Packages · Installation · Data

There are different sources of data that are viable in finance. Some of them can be accessed without creating and CVS (comma-separated-values) file or an XLS file (Microsoft Excel). The sources that can be accessed online through Python and without using one of the after mentioned files are the API (application programming interface). The API is useful because since they can have a specific protocol, routines and data structures, the information accessed can be retrieved every time we are using Python.

## API sources

**(1)Google Finance:** Google developed an API for retrieving information from the financial markets. The API has been discontinued and therefore it is mentioned here as a resource that can be uploaded in the future.

**(2) Yahoo Finance:** One of the most important API that the book will be working with, given that it is free and that the information is accurate since it is retrieved directly from the markets. Since Yahoo started its financial platform in 1997 it has grown and its one of the most consulted platforms for financial decisions that is accessible by price and by easiness of language.

**(3) Quandl:** The company's first movement in financial data was when they launched, in 2013, a million free data set with its universal API. This created the possibility for analyzing data from other sources. In late 2018 they were acquired by Nasdaq (National Association of Securities Dealer), making them one of the most interesting datasets in the markets.

Although there are other interesting API in the market, in the book we will be using the two specific API, Yahoo Finance and Quandl. The reason for using these API is because they are gratuitous and that they are accurate. There are other databases such as Bloomberg that have an API but access to the data could be expensive.

For the use of CVS and XLS databases, the main databases that will be used throughout the book are Yahoo Finance for statistical and portfolio
analysis and World Bank and International Monetary Fund for macroeconomic investment strategies.



## Most important libraries for using data in Python in the present book

The first API that will be used is the Yahoo Finance API. For this, it is important to know certain packages that will be important in order to handle the different data that will be accessed.

**(a) NumPy:** The numerical package for Python is one of the most important packages for handling data. By using NumPy the data can be approached for linear algebra, multi-dimensional containers, to create arrays and many other uses. The capability of NumPy will be demonstrated throughout the book.

**(b) Pandas2:** The name of the library is an acronym for Python Data Analysis Library which is one of the most powerful libraries when it comes to analyzing data. Pandas can be used to create a DataFrame, slicing, replacing, creating time series just to name a few. Pandas will be used throughout the book.

**(c) Matplotlib:** Throughout the book Matplotlib will be used for plotting 2D graphs. Matplotlib is an excellent library for plotting because of its quality, the variety of graphs that can be elaborated and the easiness of its use.

**(d) f.fn( )**: One of the most interesting libraries for quantitative finance is the f.fn( ) library which helps access data and plot easily. In this book, the library will be used to access portfolio and measure performance. It complements with Matplotlib when creating graphs.

**(e) Ta-lib:** Excellent libraries for developing technical analysis and backtesting. Will be used with portfolios and stocks. The installation is tricky, therefore a suggestion is appropriate.


## Using Python with yahoo fInance API

The first step for working with Yahoo API is to establish which libraries are going to be used. As it was mentioned earlier, NumPy, Pandas and Matplotlib will be using for accessing the data. The following is the command on Jupyter Notebooks:

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline


The ***%matplotlib inline*** command is important because it creates a better graph when elaborating a Jupyter Notebook and also the quality is better. It helps the plots to work correctly and it is mentioned as necessary for the use of Matplotlib.

The second step is to import a module called datetime and another module called pandas_datareader. The datetime module is useful for manipulating dates. This is important given that the information regarding Yahoo API must be determined on a given date. Also, it is useful when comparing datasets between different stock quotes.

The pandas_datareader allow us to access data from the web. In this case the pandas_datareader will allow the usage of the API and of the information.

The script in Jupyter Notebooks is as follows:



In [2]:
import pandas_datareader as pdr
import datetime
import pandas_datareader.data as web
import yfinance as yf



Once the Jupyter Notebook is set, the Yahoo API can be accessed. The most important aspect is to set a start date and an end date for the values.
It is also important to add the stock quote that will be consulted, mainly its ticker. A stock ticker symbol is a one to four letter code representing the name of the company. For this example, the company Tesla will be used for analysis. The ticker for Tesla is TSLA. The information is going to be accessed from January 1, 2015 to January 1, 2019. The code is as follows:

In [3]:
start = datetime.datetime(2015,1,1)
end = datetime.datetime(2019,1,1)


tesla_data = yf.download('TSLA', start, end)

print(tesla_data.tail())

print(tesla_data.head())


[*********************100%%**********************]  1 of 1 completed

                 Open       High        Low      Close  Adj Close     Volume
Date                                                                        
2018-12-24  20.900000  20.966667  19.680000  19.692667  19.692667   83398500
2018-12-26  20.000000  21.798000  19.606001  21.739332  21.739332  122446500
2018-12-27  21.322666  21.478001  20.100000  21.075333  21.075333  128626500
2018-12-28  21.540001  22.416000  21.227333  22.257999  22.257999  149085000
2018-12-31  22.519333  22.614000  21.684000  22.186666  22.186666   94534500
                 Open       High        Low      Close  Adj Close    Volume
Date                                                                       
2015-01-02  14.858000  14.883333  14.217333  14.620667  14.620667  71466000
2015-01-05  14.303333  14.433333  13.810667  14.006000  14.006000  80527500
2015-01-06  14.004000  14.280000  13.614000  14.085333  14.085333  93928500
2015-01-07  14.223333  14.318667  13.985333  14.063333  14.063333  44526000
2015-




In the example above two variables were created to define the information that is going to be accessed. The start date is set to January 1,2015 and the end date is set to January 1, 2019. Both variables are then used in the main variable where the Yahoo API is going to be accessed.

The variable Tesla will use the combination of the module web with DataReader to access the information from the stock company TSLA from the API Yahoo in the dates from January 1, 2015 to January 1,2019.

Now the information is in the current Jupyter Notebook. It is important to recall that every time that we shut down the Jupyter Notebook the data must be uploaded again using the same procedure or by running the whole script. To visualize the information the following command can be executed:

Tesla.tail( ) or Tesla.head( )

The.tail command will show the last five dates of the data. 

The.head command will show the first five dates in the beginning of the dates that were established. 

From the Yahoo API the Opening price, the Highest Price of the day the Lowest Price of the day the Adjusted Close and the Volume are displayed.

## Using Python with quandl API

Quandl is an excellent source of information because it offers databases for both free and premium. The first step for using Quandl is creating a user at the signup page at https://www.quandl.com/. This is an important aspect because Quandl offers an API Key that is necessary for accessing different data.

Once the user is created in Quandl, in the account settings it will appear Your API Key with a lookalike to the following XXXXXX The key is personal and it should not be given to other users for access.

With the key, the script in Jupyter is quite easy. The first step will be to import Quandl, the second step will be to import the API Key and then create a search based on the Petroleum Prices reported by the Organization of the Petroleum Exporting Countries (OPEE) from the following Quandl address: https://www.quandl.com/data/OPEC/ORB-OPEC-Crude-Oil-Price. 

The script is as follows:

In [5]:
import quandl
import matplotlib as plt
%matplotlib inline

# Definir la serie y la clave API
series = "NSE/OIL"
api_key = 'sdrzzviQNV8eb4RuLmyb'

# Configurar la clave API de Quandl
quandl.ApiConfig.api_key = api_key

# Obtener los datos para la serie deseada
oil_data = quandl.get(series)

print(oil_data.tail())


# Crea un gráfico solo del precio de cierre
oil_data['Close'].plot()




QuandlError: (Status 403) Something went wrong. Please try again. If you continue to have problems, please contact us at connect@quandl.com.

## Using f.fn( ) for retreiving information

One of the most important libraries for retrieving information that we will be using in the present book is f.ff( ). Certain processes throughout the book are simplified by the use of this package, the most important difficulty is interpreting the results. Therefore, this book will center on developing a traditional approach and then applying certain packages for the retrieval of information.

The first step is to install f.fn( ) in the computer.

In [6]:
pip install ffn

Note: you may need to restart the kernel to use updated packages.


After the installation it is important to import certain packages that f.fn ( ) uses. The recommended packages are as follows:

In [7]:
import ffn
import datetime
import pandas_datareader as pdr
import matplotlib.pyplot as plt
%matplotlib inline


Now that it has been installed, the process for retrieving information for Yahoo Finance is as follows:

In [8]:
stocks = ffn.get('aapl, spy, amzn', start='2019-1-1', end = '2021-1-1')
stocks.tail()

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


Unnamed: 0_level_0,aapl,spy,amzn
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-12-24,129.047516,348.497711,158.634506
2020-12-28,133.66301,351.491577,164.197998
2020-12-29,131.883316,350.821045,166.100006
2020-12-30,130.758743,351.321564,164.292496
2020-12-31,129.751602,353.106628,162.846497


An important aspect is that f.fn( ) uses adjusted close as a default. If the interest is to access different prices the process is as follow:

In [9]:
stocks = ffn.get('aapl:Close, spy:Close, amzn:Close', start='2019-1-1', end = '2021-1-1')
stocks.tail()

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


Unnamed: 0_level_0,aaplclose,spyclose,amznclose
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-12-24,131.970001,369.0,158.634506
2020-12-28,136.690002,372.170013,164.197998
2020-12-29,134.869995,371.459991,166.100006
2020-12-30,133.720001,371.98999,164.292496
2020-12-31,132.690002,373.880005,162.846497
