This is a script made to utilize an API called IEX Cloud that contains financial data (specifically regarding stocks) in order to scrape this data and export it as a new dataframe.

In [None]:
!pip install iexfinance
!pip install praw
!pip install datetime
!pip install pandas



The only potentially unfamiliar module on here is iexfinance, which is made specifically for this API.

In [None]:
import iexfinance
import praw
import datetime
import pandas

In [None]:
import os
import requests

base_url = 'https://cloud.iexapis.com/v1' #Use this for "real purposes"
sandbox_url = 'https://sandbox.iexapis.com/stable' #Testing purposes


In [None]:
os.environ['IEX_API_VERSION'] = 'iexcloud-sandbox'
os.environ['IEX_TOKEN'] = 'SandboxorSecretKey'

The variables used for testing as keys are from the API's "sandbox" which is made for that purpose and has no limits; for real data, I would use a secret key. The sandbox is the only thing I used, since I am not paying for the service and am limited on how many times I can make a request. Notably, because this was based on messages in a day, not in a second (like in reddit), I did not add any sleep for the computer, but I tried to make the code as efficient as possible. However, this still would not work without payment outside of sandbox; I would instead edit the code to look at a single stock, or choose to pay for the service.

In [None]:
from iexfinance.stocks import Stock

batch = Stock(["NDAQ", "SPGI", "DOW"])
batch.get_balance_sheet()
batch.get_price()
batch.get_quote()
batch.get_income_statement()
batch.get_cash_flow()

{'DOW':             capitalExpenditures  cashChange  ...     subkey        updated
 2020-10-21           -993261774   863918080  ...  qryelruta  1645035880586
 
 [1 rows x 26 columns],
 'NDAQ':             capitalExpenditures  cashChange  ...     subkey        updated
 2020-10-28           -134302659  -160257370  ...  yltrreauq  1609345656693
 
 [1 rows x 26 columns],
 'SPGI':             capitalExpenditures  cashChange  ...     subkey        updated
 2020-10-23            -43436778   482383211  ...  aluyrqetr  1621840512252
 
 [1 rows x 26 columns]}

I chose to look at the core stocks - the NASDAQ, S&P 500, and Dow. I mainly wanted major predictors for stock rises and falls.

In [None]:
with open('batchfile.csv','w') as batchfile:

  batch_data2 = str(batch.get_balance_sheet()) + ',"'
  batch_data2 += str(batch.get_price()) + '",'
  batch_data2 += str(batch.get_quote()) + ','
  batch_data2 += str(batch.get_income_statement()) + ','
  batch_data2 += str(batch.get_cash_flow()) + '\n'

  print(batch_data2)
  batchfile.write(batch_data2)

{'NDAQ':             accountsPayable capitalSurplus  ...     subkey        updated
2020-10-28        150961839           None  ...  rlurqetya  1672686911527

[1 rows x 37 columns], 'SPGI':             accountsPayable capitalSurplus  ...     subkey        updated
2020-10-17        186402697           None  ...  terarqluy  1654704574862

[1 rows x 37 columns], 'DOW':             accountsPayable capitalSurplus  ...     subkey        updated
2020-10-20                0           None  ...  auletrqry  1644804222703

[1 rows x 37 columns]},"         NDAQ     SPGI    DOW
price  130.71  326.175  56.14",     symbol        companyName  ...  lastTradeTime isUSMarketOpen
NDAQ   NDAQ  Nasdaq Inc - 144A  ...  1685987625514           True
SPGI   SPGI     S&P Global Inc  ...  1614429913209           True
DOW     DOW            Dow Inc  ...  1664358764266           True

[3 rows x 55 columns],{'NDAQ':             costOfRevenue currency       ebit  ...   key     subkey        updated
2020-11-02      700

This outputs the data found in the variables, including opening, closing, and changes.

In [None]:
from iexfinance.stocks import get_historical_data
from datetime import datetime
from datetime import date

start = datetime(2010, 1, 1)
end = date.today()

history1 = get_historical_data("NDAQ", start, end)
history2 = get_historical_data("SPGI", start, end)
history3 = get_historical_data("DOW", start, end)

history1
history2
history3

Unnamed: 0,close,high,low,open,symbol,volume,id,key,subkey,updated,changeOverTime,marketChangeOverTime,uOpen,uClose,uHigh,uLow,uVolume,fOpen,fClose,fHigh,fLow,fVolume,label,change,changePercent
2019-03-20,51.8,54.3,51.7,53.14,DOW,2428611,IORALSITPRSHI_CCE,WOD,0,1684068259913,0,0,53.18,52,54.2,50.8,2364047,53.87,50.5,54.9,51.6,2400396,"Mar 20, 19",0,0
2019-03-21,51.09,52,48.5,52.23,DOW,1805234,COIAE_RSTLICHSPIR,WDO,0,1637518625125,-0.0166663,-0.0169348,52.22,49.05,51,48.5,1792355,51.24,50.66,50,49.3,1786730,"Mar 21, 19",-0.853501,-0.0165
2019-03-22,50,52.3,49.88,50.3,DOW,846318,RRISCCLEOH_PISIAT,WDO,0,1614263507569,-0.0243145,-0.024931,49.8,48.7,50.76,50.49,847538,50.8,49.9,50.51,48.4,870783,"Mar 22, 19",-0.388752,-0.0081
2019-03-25,49.79,50.5,50,49.7,DOW,459487,IERTCSH_OCASLIPIR,OWD,0,1675801729436,-0.013416,-0.0133241,50.7,51.5,51.5,50,444675,50.7,49.47,50.8,49,449868,"Mar 25, 19",0.567311,0.0114
2019-03-26,49.21,50.06,48.49,51,DOW,512734,IC_ITRLIORSHESACP,WOD,0,1660470595537,-0.0198433,-0.0199851,50,50.94,49.8,48.56,511396,51,49.39,50.86,49.9,514414,"Mar 26, 19",-0.309966,-0.0061
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-12-18,57.57,57.36,54.49,57.45,DOW,11663145,ALPTCESIORC_HSRII,WDO,0,1626196632919,0.123201,0.121575,56.91,57.14,56.31,56.51,11558806,56.01,56.73,58.14,54.55,11628727,"Dec 18, 20",1.91031,0.034
2020-12-21,57.01,57.5,53.93,55.47,DOW,4612865,IR_EIISHPRCACSTLO,DWO,0,1621902785128,0.109735,0.107669,54.87,57.27,56.5,55.66,4636300,56.32,55.93,56.3,56.39,4552633,"Dec 21, 20",-0.597766,-0.0107
2020-12-22,54.69,55.89,56.23,55.47,DOW,2300781,CITREHRIOPCASLS_I,ODW,0,1628013213589,0.101835,0.0997908,57.27,55.18,55.87,54.63,2317151,57.56,57.36,57.01,55.1,2287794,"Dec 22, 20",-0.417093,-0.0076
2020-12-23,56.47,57.42,55.09,57.47,DOW,2342739,ACERICHPTIR_SSOIL,DWO,0,1644817833964,0.101527,0.100603,56,57.29,56.8,57.37,2348041,57,56.04,56.32,56.91,2332575,"Dec 23, 20",0.083349,0.0016


This data basically follows the same method, I'm just trying to get historical data about the stocks. I use datetime to specify between what dates I would like to gather this data.

In [None]:
with open('histfile.csv','w') as histfile:

  hist_data2 = str(get_historical_data("NDAQ", start, end)) + ',"'
  hist_data2 += str(get_historical_data("SPGI", start, end)) + '",'
  hist_data2 += str(get_historical_data("DOW", start, end)) + '\n'

  print(hist_data2)
  histfile.write(hist_data2)

             close     high      low  ...       label     change changePercent
2010-01-04   20.89       21     20.4  ...   Jan 4, 10   0.439792         0.022
2010-01-05      21    21.06     20.8  ...   Jan 5, 10  0.0608417         0.003
2010-01-06   20.75    20.99    20.38  ...   Jan 6, 10 -0.0104495       -0.0005
2010-01-07   20.41     20.4    20.77  ...   Jan 7, 10  -0.276392       -0.0139
2010-01-08   20.84     21.3     20.6  ...   Jan 8, 10   0.220112        0.0108
...            ...      ...      ...  ...         ...        ...           ...
2020-12-18   129.8   133.97   133.51  ...  Dec 18, 20   0.102925        0.0008
2020-12-21  131.87  133.822  125.044  ...  Dec 21, 20  -0.550994       -0.0043
2020-12-22   130.9   134.98   132.57  ...  Dec 22, 20    1.86393        0.0148
2020-12-23  131.22    132.4   132.79  ...  Dec 23, 20   -2.00129        -0.015
2020-12-24  134.39   133.52   130.34  ...  Dec 24, 20   0.670101        0.0054

[2772 rows x 25 columns],"             close    hig

In [None]:
from datetime import date
from iexfinance.stocks import get_historical_intraday

date = date.today()

current1 = get_historical_intraday("NDAQ", output_format='pandas')
current2 = get_historical_intraday("SPGI", output_format='pandas')
current3 = get_historical_intraday("DOW", output_format='pandas')

current1
current2
current3

Unnamed: 0,date,label,high,low,open,close,average,volume,notional,numberOfTrades
2020-12-28 09:30:00,2020-12-28,09:30 AM,,,,,,0,0.00,0
2020-12-28 09:31:00,2020-12-28,09:31 AM,,,,,,0,0.00,0
2020-12-28 09:32:00,2020-12-28,09:32 AM,55.480,56.360,56.810,57.470,55.730,186,10465.11,2
2020-12-28 09:33:00,2020-12-28,09:33 AM,56.998,56.860,57.232,56.115,55.600,2286,123898.00,19
2020-12-28 09:34:00,2020-12-28,09:34 AM,56.730,56.820,55.750,58.119,56.968,1202,69395.10,14
...,...,...,...,...,...,...,...,...,...,...
2020-12-28 15:06:00,2020-12-28,3:06 PM,56.128,55.893,56.388,55.571,57.104,207,11010.00,2
2020-12-28 15:07:00,2020-12-28,3:07 PM,55.651,55.482,57.114,56.702,55.252,578,32197.71,7
2020-12-28 15:08:00,2020-12-28,3:08 PM,55.111,55.570,56.460,57.159,55.576,860,47890.16,11
2020-12-28 15:09:00,2020-12-28,3:09 PM,56.205,56.000,56.367,54.580,56.795,405,22539.97,9


This data just tracks stocks throughout the day; I can specify a specific period for information to be gathered using datetime in a more complex manner, but in this case I'm only interested in what the stocks did very recently. Notably it looks at stocks since the last day the market was open.

In [None]:
with open('newfile.csv','w') as newfile:

  new_data2 = str(get_historical_intraday("NDAQ", output_format='pandas')) + ',"'
  new_data2 += str(get_historical_intraday("SPGI", output_format='pandas')) + '",'
  new_data2 += str(get_historical_intraday("DOW", output_format='pandas')) + '\n'

  print(new_data2)
  newfile.write(new_data2)

                           date     label  ...   notional  numberOfTrades
2020-12-28 09:30:00  2020-12-28  09:30 AM  ...   5750.490               2
2020-12-28 09:31:00  2020-12-28  09:31 AM  ...   7578.090               1
2020-12-28 09:32:00  2020-12-28  09:32 AM  ...   2684.000               1
2020-12-28 09:33:00  2020-12-28  09:33 AM  ...      0.000               0
2020-12-28 09:34:00  2020-12-28  09:34 AM  ...  16651.000               3
...                         ...       ...  ...        ...             ...
2020-12-28 15:06:00  2020-12-28   3:06 PM  ...   1037.260               3
2020-12-28 15:07:00  2020-12-28   3:07 PM  ...  98750.700              10
2020-12-28 15:08:00  2020-12-28   3:08 PM  ...      0.000               0
2020-12-28 15:09:00  2020-12-28   3:09 PM  ...  14162.562               3
2020-12-28 15:10:00  2020-12-28   3:10 PM  ...   1838.896               3

[341 rows x 10 columns],"                           date     label  ...    notional  numberOfTrades
2020-12-28 

Note that this is made for use with sandbox and was not tested with the real version of IEXCloud as it costs money.