## Working with Tables I

## Pandas library

SEC file:

In [1]:
import requests

In [3]:
s = requests.get('https://www.sec.gov/files/company_tickers.json').text 

s[:200]
  # Show first 200 characters

'{"0":{"cik_str":320193,"ticker":"AAPL","title":"Apple Inc."},"1":{"cik_str":789019,"ticker":"MSFT","title":"MICROSOFT CORP"},"2":{"cik_str":1018724,"ticker":"AMZN","title":"AMAZON COM INC"},"3":{"cik_'

In [13]:
d = requests.get('https://www.sec.gov/files/company_tickers.json').json()

d['0']
  # Show entry '0':

{'cik_str': 320193, 'ticker': 'AAPL', 'title': 'Apple Inc.'}

In [14]:
d['0']['ticker']

'AAPL'

We can also read the SEC file with pandas.    
Import the library:

In [18]:
import pandas as pd

Use pandas to read the file:

In [19]:
pd.read_json('https://www.sec.gov/files/company_tickers.json')

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,11218,11219,11220,11221,11222,11223,11224,11225,11226,11227
cik_str,320193,789019,1018724,1652044,1318605,1326801,1293451,1577552,1046179,1067983,...,1348952,1354591,1355096,1353616,1357671,1326160,1326205,1333986,1335105,1338065
ticker,AAPL,MSFT,AMZN,GOOG,TSLA,FB,TCEHY,BABA,TSM,BRK-A,...,ELU,GEUR,QRTEP,FXS,CRTDW,DUKB,IGCIW,EQH-PA,LIXTW,DCP-PB
title,Apple Inc.,MICROSOFT CORP,AMAZON COM INC,Alphabet Inc.,"Tesla, Inc.",Facebook Inc,Tencent Holdings Ltd,Alibaba Group Holding Ltd,TAIWAN SEMICONDUCTOR MANUFACTURING CO LTD,BERKSHIRE HATHAWAY INC,...,"ENTERGY LOUISIANA, LLC","PetLife Pharmaceuticals, Inc.","Qurate Retail, Inc.",Invesco CurrencyShares Swedish Krona Trust,"Creatd, Inc.",Duke Energy CORP,"India Globalization Capital, Inc.","Equitable Holdings, Inc.","LIXTE BIOTECHNOLOGY HOLDINGS, INC.","DCP Midstream, LP"


In [21]:
symbols = pd.read_json('https://www.sec.gov/files/company_tickers.json').transpose()
symbols

Unnamed: 0,cik_str,ticker,title
0,320193,AAPL,Apple Inc.
1,789019,MSFT,MICROSOFT CORP
2,1018724,AMZN,AMAZON COM INC
3,1652044,GOOG,Alphabet Inc.
4,1318605,TSLA,"Tesla, Inc."
...,...,...,...
11223,1326160,DUKB,Duke Energy CORP
11224,1326205,IGCIW,"India Globalization Capital, Inc."
11225,1333986,EQH-PA,"Equitable Holdings, Inc."
11226,1335105,LIXTW,"LIXTE BIOTECHNOLOGY HOLDINGS, INC."


Find all rows where ticker equals BAC:

In [22]:
symbols.ticker

0          AAPL
1          MSFT
2          AMZN
3          GOOG
4          TSLA
          ...  
11223      DUKB
11224     IGCIW
11225    EQH-PA
11226     LIXTW
11227    DCP-PB
Name: ticker, Length: 11228, dtype: object

In [23]:
symbols.ticker == 'BAC'

0        False
1        False
2        False
3        False
4        False
         ...  
11223    False
11224    False
11225    False
11226    False
11227    False
Name: ticker, Length: 11228, dtype: bool

In [1]:
symbols[symbols.ticker == 'BAC']

NameError: name 'symbols' is not defined

All other securities issued by this firm:

In [31]:
symbols[symbols.title == 'BANK OF AMERICA CORP /DE/']
symbols[symbols.cik_str == 70858]

Unnamed: 0,cik_str,ticker,title
22,70858,BAC,BANK OF AMERICA CORP /DE/
8370,70858,BML-PL,BANK OF AMERICA CORP /DE/
8371,70858,BAC-PE,BANK OF AMERICA CORP /DE/
8372,70858,BML-PG,BANK OF AMERICA CORP /DE/
8373,70858,BML-PH,BANK OF AMERICA CORP /DE/
8374,70858,BML-PJ,BANK OF AMERICA CORP /DE/
8375,70858,BAC-PL,BANK OF AMERICA CORP /DE/
8376,70858,BAC-PB,BANK OF AMERICA CORP /DE/
8378,70858,BAC-PK,BANK OF AMERICA CORP /DE/
8379,70858,BAC-PC,BANK OF AMERICA CORP /DE/


In [34]:
type(symbols)

pandas.core.frame.DataFrame

## Creating a table from scratch

In [33]:
pd.DataFrame()

In [35]:
pd.DataFrame(index=['a','b','c'])

a
b
c


In [36]:
pd.DataFrame(index=['a','b','c'], columns=['A','B','C'])

Unnamed: 0,A,B,C
a,,,
b,,,
c,,,


In [42]:
t = pd.DataFrame()
t['A'] = [1,2,3]
t['B'] = [4,5,6]
t['C'] = [7,8,9]
t

Unnamed: 0,A,B,C
0,1,4,7
1,2,5,8
2,3,6,9


In [44]:
t.index = ['a','b','c']
t

Unnamed: 0,A,B,C
a,1,4,7
b,2,5,8
c,3,6,9


## CSV files

In [46]:
t.to_csv('test.csv')             # Save the table

In [47]:
pd.read_csv('test.csv')              # Read the saved file

Unnamed: 0.1,Unnamed: 0,A,B,C
0,a,1,4,7
1,b,2,5,8
2,c,3,6,9


In [49]:
pd.read_csv('test.csv', index_col=0)  

Unnamed: 0,A,B,C
a,1,4,7
b,2,5,8
c,3,6,9


## Accessing parts of tables

Read a file from URL:

In [52]:
data = pd.read_csv('http://www.janschneider.website/data/AAPL.csv',index_col='name')
data

Unnamed: 0_level_0,value,segment,startdate,enddate,instant,filedate
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
CashAndCashEquivalentsAtCarryingValue,9.352000e+09,,,,2007-09-29,2009-07-22
Assets,3.957200e+10,,,,2008-09-27,2009-07-22
AvailableForSaleSecuritiesCurrent,1.023600e+10,,,,2008-09-27,2009-07-22
CommitmentsAndContingencies,0.000000e+00,,,,2008-09-27,2009-07-22
CommonStockSharesAuthorized,1.800000e+09,,,,2008-09-27,2009-07-22
...,...,...,...,...,...,...
EarningsPerShareDiluted,6.100000e-01,,2018-12-30,2019-03-30,,2020-10-30
RevenueFromContractWithCustomerExcludingAssessedTax,8.431000e+10,,2018-09-30,2018-12-29,,2020-10-30
GrossProfit,3.203100e+10,,2018-09-30,2018-12-29,,2020-10-30
NetIncomeLoss,1.996500e+10,,2018-09-30,2018-12-29,,2020-10-30


Save the file:

In [54]:
data.to_csv('data/apple.csv')

In [57]:
pd.read_csv('data/apple.csv', index_col = 0)

Unnamed: 0_level_0,value,segment,startdate,enddate,instant,filedate
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
CashAndCashEquivalentsAtCarryingValue,9.352000e+09,,,,2007-09-29,2009-07-22
Assets,3.957200e+10,,,,2008-09-27,2009-07-22
AvailableForSaleSecuritiesCurrent,1.023600e+10,,,,2008-09-27,2009-07-22
CommitmentsAndContingencies,0.000000e+00,,,,2008-09-27,2009-07-22
CommonStockSharesAuthorized,1.800000e+09,,,,2008-09-27,2009-07-22
...,...,...,...,...,...,...
EarningsPerShareDiluted,6.100000e-01,,2018-12-30,2019-03-30,,2020-10-30
RevenueFromContractWithCustomerExcludingAssessedTax,8.431000e+10,,2018-09-30,2018-12-29,,2020-10-30
GrossProfit,3.203100e+10,,2018-09-30,2018-12-29,,2020-10-30
NetIncomeLoss,1.996500e+10,,2018-09-30,2018-12-29,,2020-10-30


Select rows and columns like this:

> table.loc[ rows, columns  ]

> table.iloc[ row numbers, column numbers ]

In [59]:
data.loc['Assets', 'value']

name
Assets    3.957200e+10
Assets    4.814000e+10
Assets    3.957200e+10
Assets    5.385100e+10
Assets    5.392600e+10
              ...     
Assets    3.204000e+11
Assets    3.173440e+11
Assets    3.385160e+11
Assets    3.238880e+11
Assets    3.385160e+11
Name: value, Length: 137, dtype: float64

We can use:
- single value
- list (use [brackets])
- slice (use colon, [no brackets])

In [63]:
data.loc[['NetIncomeLoss','GrossProfit'], 'startdate':'instant']

Unnamed: 0_level_0,startdate,enddate,instant
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
NetIncomeLoss,2008-03-30,2008-06-28,
NetIncomeLoss,2009-03-29,2009-06-27,
NetIncomeLoss,2007-09-30,2008-06-28,
NetIncomeLoss,2008-09-28,2009-06-27,
NetIncomeLoss,2006-10-01,2007-09-29,
...,...,...,...
GrossProfit,2019-09-29,2019-12-28,
GrossProfit,2019-06-30,2019-09-28,
GrossProfit,2019-03-31,2019-06-29,
GrossProfit,2018-12-30,2019-03-30,


In [68]:
data.iloc[-3:]

Unnamed: 0_level_0,value,segment,startdate,enddate,instant,filedate
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
GrossProfit,32031000000.0,,2018-09-30,2018-12-29,,2020-10-30
NetIncomeLoss,19965000000.0,,2018-09-30,2018-12-29,,2020-10-30
EarningsPerShareBasic,1.05,,2018-09-30,2018-12-29,,2020-10-30


In [69]:
data.iloc[-3: , [0,2]]

Unnamed: 0_level_0,value,startdate
name,Unnamed: 1_level_1,Unnamed: 2_level_1
GrossProfit,32031000000.0,2018-09-30
NetIncomeLoss,19965000000.0,2018-09-30
EarningsPerShareBasic,1.05,2018-09-30


In [72]:
data.value
data['value'] #same thing as above

name
CashAndCashEquivalentsAtCarryingValue                  9.352000e+09
Assets                                                 3.957200e+10
AvailableForSaleSecuritiesCurrent                      1.023600e+10
CommitmentsAndContingencies                            0.000000e+00
CommonStockSharesAuthorized                            1.800000e+09
                                                           ...     
EarningsPerShareDiluted                                6.100000e-01
RevenueFromContractWithCustomerExcludingAssessedTax    8.431000e+10
GrossProfit                                            3.203100e+10
NetIncomeLoss                                          1.996500e+10
EarningsPerShareBasic                                  1.050000e+00
Name: value, Length: 32319, dtype: float64

In [73]:
data[ ['value', 'startdate']]

Unnamed: 0_level_0,value,startdate
name,Unnamed: 1_level_1,Unnamed: 2_level_1
CashAndCashEquivalentsAtCarryingValue,9.352000e+09,
Assets,3.957200e+10,
AvailableForSaleSecuritiesCurrent,1.023600e+10,
CommitmentsAndContingencies,0.000000e+00,
CommonStockSharesAuthorized,1.800000e+09,
...,...,...
EarningsPerShareDiluted,6.100000e-01,2018-12-30
RevenueFromContractWithCustomerExcludingAssessedTax,8.431000e+10,2018-09-30
GrossProfit,3.203100e+10,2018-09-30
NetIncomeLoss,1.996500e+10,2018-09-30


In [80]:
data['value'] #notice the output is different between this and the table below (this one asks for single value) --> this one is a Series

name
CashAndCashEquivalentsAtCarryingValue                  9.352000e+09
Assets                                                 3.957200e+10
AvailableForSaleSecuritiesCurrent                      1.023600e+10
CommitmentsAndContingencies                            0.000000e+00
CommonStockSharesAuthorized                            1.800000e+09
                                                           ...     
EarningsPerShareDiluted                                6.100000e-01
RevenueFromContractWithCustomerExcludingAssessedTax    8.431000e+10
GrossProfit                                            3.203100e+10
NetIncomeLoss                                          1.996500e+10
EarningsPerShareBasic                                  1.050000e+00
Name: value, Length: 32319, dtype: float64

In [76]:
type(data['value'])

pandas.core.series.Series

In [81]:
data[ ['value']] #notice the output is different between this and the table above (this one asks for list)  --> this one is a DataFrame

Unnamed: 0_level_0,value
name,Unnamed: 1_level_1
CashAndCashEquivalentsAtCarryingValue,9.352000e+09
Assets,3.957200e+10
AvailableForSaleSecuritiesCurrent,1.023600e+10
CommitmentsAndContingencies,0.000000e+00
CommonStockSharesAuthorized,1.800000e+09
...,...
EarningsPerShareDiluted,6.100000e-01
RevenueFromContractWithCustomerExcludingAssessedTax,8.431000e+10
GrossProfit,3.203100e+10
NetIncomeLoss,1.996500e+10


In [79]:
type(data[ ['value']])

pandas.core.frame.DataFrame

In [92]:
data.iloc[2:6]
data[2:6]      #same thing

data[2:6][['value','filedate']]
data[['value','filedate']][2:6]  #same thing

Unnamed: 0_level_0,value,filedate
name,Unnamed: 1_level_1,Unnamed: 2_level_1
AvailableForSaleSecuritiesCurrent,10236000000.0,2009-07-22
CommitmentsAndContingencies,0.0,2009-07-22
CommonStockSharesAuthorized,1800000000.0,2009-07-22
CommonStockSharesIssued,888326000.0,2009-07-22


In [94]:
data[0]  #pandas is looking for columns here, but 0 doesn't exist

KeyError: 0

In [95]:
data[:4]

Unnamed: 0_level_0,value,segment,startdate,enddate,instant,filedate
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
CashAndCashEquivalentsAtCarryingValue,9352000000.0,,,,2007-09-29,2009-07-22
Assets,39572000000.0,,,,2008-09-27,2009-07-22
AvailableForSaleSecuritiesCurrent,10236000000.0,,,,2008-09-27,2009-07-22
CommitmentsAndContingencies,0.0,,,,2008-09-27,2009-07-22


In [96]:
data.iloc[0] #but THIS will give you row 0

value         9.352e+09
segment             NaN
startdate           NaN
enddate             NaN
instant      2007-09-29
filedate     2009-07-22
Name: CashAndCashEquivalentsAtCarryingValue, dtype: object

In [106]:
data.loc[data.value > 10**11]
data [ data.value>10**11] #same thing

Unnamed: 0_level_0,value,segment,startdate,enddate,instant,filedate
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Assets,1.067580e+11,,,,2011-06-25,2011-07-20
Assets,1.163710e+11,,,,2011-09-24,2011-10-26
Assets,1.039220e+11,us-gaap:UnallocatedAmountToSegmentMember,,,2011-09-24,2011-10-26
PaymentsToAcquireAvailableForSaleSecurities,1.023170e+11,,2010-09-26,2011-09-24,,2011-10-26
SalesRevenueNet,1.082490e+11,,2010-09-26,2011-09-24,,2011-10-26
...,...,...,...,...,...,...
RevenueFromContractWithCustomerExcludingAssessedTax,1.091970e+11,country:US,2019-09-29,2020-09-26,,2020-10-30
RevenueFromContractWithCustomerExcludingAssessedTax,1.022660e+11,country:US,2018-09-30,2019-09-28,,2020-10-30
RevenueFromContractWithCustomerExcludingAssessedTax,1.250100e+11,aapl:OtherCountriesMember,2019-09-29,2020-09-26,,2020-10-30
RevenueFromContractWithCustomerExcludingAssessedTax,1.142300e+11,aapl:OtherCountriesMember,2018-09-30,2019-09-28,,2020-10-30


In [105]:
data.loc[data.value > 10**11, 'value']
data [ data.value>10**11].value #same thing

name
Assets                                                 1.067580e+11
Assets                                                 1.163710e+11
Assets                                                 1.039220e+11
PaymentsToAcquireAvailableForSaleSecurities            1.023170e+11
SalesRevenueNet                                        1.082490e+11
                                                           ...     
RevenueFromContractWithCustomerExcludingAssessedTax    1.091970e+11
RevenueFromContractWithCustomerExcludingAssessedTax    1.022660e+11
RevenueFromContractWithCustomerExcludingAssessedTax    1.250100e+11
RevenueFromContractWithCustomerExcludingAssessedTax    1.142300e+11
RevenueFromContractWithCustomerExcludingAssessedTax    1.155920e+11
Name: value, Length: 1128, dtype: float64