Import stock listing info from the NASDAQ

In this exercise, you will read the file nasdaq-listings.csv with data on companies listed on the NASDAQ and then diagnose issues with the imported data. 
You will fix these issues in the next exercise.

In [1]:
import pandas as pd

In [3]:
# Import the data
nasdaq = pd.read_csv('nasdaq-listings.csv')
nasdaq

Unnamed: 0,Stock Symbol,Company Name,Last Sale,Market Capitalization,IPO Year,Sector,Industry,Last Update
0,AAPL,Apple Inc.,141.05,7.400000e+11,1980,Technology,Computer Manufacturing,4/26/17
1,GOOGL,Alphabet Inc.,840.18,5.810000e+11,NAN,Technology,"Computer Software: Programming, Data Processing",4/24/17
2,GOOG,Alphabet Inc.,823.56,5.690000e+11,2004,Technology,"Computer Software: Programming, Data Processing",4/23/17
3,MSFT,Microsoft Corporation,64.95,5.020000e+11,1986,Technology,Computer Software: Prepackaged Software,4/26/17
4,AMZN,"Amazon.com, Inc.",884.67,4.220000e+11,1997,Consumer Services,Catalog/Specialty Distribution,4/24/17
...,...,...,...,...,...,...,...,...
1110,IFV,First Trust Dorsey Wright International Focus ...,18.78,5.380470e+08,NAN,NAN,NAN,4/23/17
1111,QCRH,"QCR Holdings, Inc.",40.85,5.374372e+08,NAN,Finance,Major Banks,4/25/17
1112,AUPH,Aurinia Pharmaceuticals Inc,6.83,5.364592e+08,NAN,Health Care,Major Pharmaceuticals,4/26/17
1113,ESND,Essendant Inc.,14.30,5.358635e+08,NAN,Consumer Services,Paper,4/23/17


In [4]:
# Inspect nasdaq
nasdaq.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1115 entries, 0 to 1114
Data columns (total 8 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Stock Symbol           1115 non-null   object 
 1   Company Name           1115 non-null   object 
 2   Last Sale              1115 non-null   float64
 3   Market Capitalization  1115 non-null   float64
 4   IPO Year               1115 non-null   object 
 5   Sector                 1115 non-null   object 
 6   Industry               1115 non-null   object 
 7   Last Update            1115 non-null   object 
dtypes: float64(2), object(6)
memory usage: 69.8+ KB


When inspecting the output of nasdaq.head(10), did you observe that some of the fields contained the string 'NAN'? However, the output of nasdaq.info() didn't report any missing values. To ensure missing values are imported correctly, you need to explicitly set the na_values argument to the required string.

The 'Last Update' column shows date information upon inspection, but is imported as something other than dtype datetime64. To ensure dates are imported correctly, make use of the parse_dates argument.

In [5]:
# Read data using .read_csv() with adequate parsing arguments

# Import the data
nasdaq = pd.read_csv('nasdaq-listings.csv', na_values='NAN', parse_dates=['Last Update'])
nasdaq

  nasdaq = pd.read_csv('nasdaq-listings.csv', na_values='NAN', parse_dates=['Last Update'])


Unnamed: 0,Stock Symbol,Company Name,Last Sale,Market Capitalization,IPO Year,Sector,Industry,Last Update
0,AAPL,Apple Inc.,141.05,7.400000e+11,1980.0,Technology,Computer Manufacturing,2017-04-26
1,GOOGL,Alphabet Inc.,840.18,5.810000e+11,,Technology,"Computer Software: Programming, Data Processing",2017-04-24
2,GOOG,Alphabet Inc.,823.56,5.690000e+11,2004.0,Technology,"Computer Software: Programming, Data Processing",2017-04-23
3,MSFT,Microsoft Corporation,64.95,5.020000e+11,1986.0,Technology,Computer Software: Prepackaged Software,2017-04-26
4,AMZN,"Amazon.com, Inc.",884.67,4.220000e+11,1997.0,Consumer Services,Catalog/Specialty Distribution,2017-04-24
...,...,...,...,...,...,...,...,...
1110,IFV,First Trust Dorsey Wright International Focus ...,18.78,5.380470e+08,,,,2017-04-23
1111,QCRH,"QCR Holdings, Inc.",40.85,5.374372e+08,,Finance,Major Banks,2017-04-25
1112,AUPH,Aurinia Pharmaceuticals Inc,6.83,5.364592e+08,,Health Care,Major Pharmaceuticals,2017-04-26
1113,ESND,Essendant Inc.,14.30,5.358635e+08,,Consumer Services,Paper,2017-04-23


In [6]:
# Inspect the data
nasdaq.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1115 entries, 0 to 1114
Data columns (total 8 columns):
 #   Column                 Non-Null Count  Dtype         
---  ------                 --------------  -----         
 0   Stock Symbol           1115 non-null   object        
 1   Company Name           1115 non-null   object        
 2   Last Sale              1115 non-null   float64       
 3   Market Capitalization  1115 non-null   float64       
 4   IPO Year               593 non-null    float64       
 5   Sector                 1036 non-null   object        
 6   Industry               1036 non-null   object        
 7   Last Update            1115 non-null   datetime64[ns]
dtypes: datetime64[ns](1), float64(3), object(4)
memory usage: 69.8+ KB


In [9]:
# Load listing info from a single sheet
# Import the data
nyse = pd.read_excel('listings.xlsx', sheet_name='nyse', na_values='n/a')
nyse

  warn(msg)


Unnamed: 0,Stock Symbol,Company Name,Last Sale,Market Capitalization,IPO Year,Sector,Industry
0,DDD,3D Systems Corporation,14.48,1.647165e+09,,Technology,Computer Software: Prepackaged Software
1,MMM,3M Company,188.65,1.127366e+11,,Health Care,Medical/Dental Instruments
2,WBAI,500.com Limited,13.96,5.793129e+08,2013.0,Consumer Services,Services-Misc. Amusement & Recreation
3,WUBA,58.com Inc.,36.11,5.225238e+09,2013.0,Technology,"Computer Software: Programming, Data Processing"
4,AHC,A.H. Belo Corporation,6.20,1.347351e+08,,Consumer Services,Newspapers/Magazines
...,...,...,...,...,...,...,...
3142,ZB^H,Zions Bancorporation,25.30,0.000000e+00,,,
3143,ZBK,Zions Bancorporation,28.86,0.000000e+00,,Finance,Major Banks
3144,ZOES,"Zoe&#39;s Kitchen, Inc.",17.07,3.325561e+08,2014.0,Consumer Services,Restaurants
3145,ZTS,Zoetis Inc.,53.10,2.610544e+10,2013.0,Health Care,Major Pharmaceuticals


In [10]:
# Inspect the data
nyse.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3147 entries, 0 to 3146
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Stock Symbol           3147 non-null   object 
 1   Company Name           3147 non-null   object 
 2   Last Sale              3079 non-null   float64
 3   Market Capitalization  3147 non-null   float64
 4   IPO Year               1361 non-null   float64
 5   Sector                 2177 non-null   object 
 6   Industry               2177 non-null   object 
dtypes: float64(3), object(4)
memory usage: 172.2+ KB


Load listing data from two sheets

The import process is just as intuitive when using the sheet_names attribute of a pd.ExcelFile() object.

Passing in a list as the sheet_name argument of pd.read_excel(), whether you assign the list to a variable holding the sheet_names attribute of a pd.ExcelFile() object or type the list out yourself, constructs a dictionary. In this dictionary, the keys are the names of the sheets, and the values are the DataFrames containing the data from the corresponding sheet. You can extract values from a dictionary by providing a particular key in brackets.

In this exercise, you will retrieve the list of stock exchanges from listings.xlsx and then use this list to read the data for all three exchanges into a dictionary. 

In [12]:
# Create pd.ExcelFile() object
xls = pd.ExcelFile('listings.xlsx')

# Extract sheet names and store in exchanges
exchanges = xls.sheet_names
exchanges

['amex', 'nasdaq', 'nyse']

In [14]:
# Create listings dictionary with all sheet data
listings = pd.read_excel(xls, sheet_name=exchanges, na_values='n/a')
listings.keys()


  warn(msg)
  warn(msg)
  warn(msg)


dict_keys(['amex', 'nasdaq', 'nyse'])

In [15]:
listings['nasdaq']

Unnamed: 0,Stock Symbol,Company Name,Last Sale,Market Capitalization,IPO Year,Sector,Industry
0,AAPL,Apple Inc.,141.05,7.400245e+11,1980.0,Technology,Computer Manufacturing
1,GOOGL,Alphabet Inc.,840.18,5.809175e+11,,Technology,"Computer Software: Programming, Data Processing"
2,GOOG,Alphabet Inc.,823.56,5.694261e+11,2004.0,Technology,"Computer Software: Programming, Data Processing"
3,MSFT,Microsoft Corporation,64.95,5.019031e+11,1986.0,Technology,Computer Software: Prepackaged Software
4,AMZN,"Amazon.com, Inc.",884.67,4.221385e+11,1997.0,Consumer Services,Catalog/Specialty Distribution
...,...,...,...,...,...,...,...
3162,WSFSL,WSFS Financial Corporation,25.70,0.000000e+00,,Finance,Major Banks
3163,XGTIW,"XG Technology, Inc",6.56,0.000000e+00,2013.0,Consumer Durables,Telecommunications Equipment
3164,ZNWAA,Zion Oil & Gas Inc,,0.000000e+00,,Energy,Oil & Gas Production
3165,ZIONW,Zions Bancorporation,9.87,0.000000e+00,,Finance,Major Banks


In [16]:
# Inspect NASDAQ listings
listings['nasdaq'].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3167 entries, 0 to 3166
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Stock Symbol           3167 non-null   object 
 1   Company Name           3167 non-null   object 
 2   Last Sale              3165 non-null   float64
 3   Market Capitalization  3167 non-null   float64
 4   IPO Year               1386 non-null   float64
 5   Sector                 2767 non-null   object 
 6   Industry               2767 non-null   object 
dtypes: float64(3), object(4)
memory usage: 173.3+ KB


In [17]:
# Load all listing data and iterate over key-value dictionary pairs

# Import the NYSE and NASDAQ listings
nyse = pd.read_excel('listings.xlsx', sheet_name='nyse', na_values='n/a')
nasdaq = pd.read_excel('listings.xlsx', sheet_name='nasdaq', na_values='n/a')

# Inspect nyse and nasdaq
nyse.info()
nasdaq.info()

  warn(msg)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3147 entries, 0 to 3146
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Stock Symbol           3147 non-null   object 
 1   Company Name           3147 non-null   object 
 2   Last Sale              3079 non-null   float64
 3   Market Capitalization  3147 non-null   float64
 4   IPO Year               1361 non-null   float64
 5   Sector                 2177 non-null   object 
 6   Industry               2177 non-null   object 
dtypes: float64(3), object(4)
memory usage: 172.2+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3167 entries, 0 to 3166
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Stock Symbol           3167 non-null   object 
 1   Company Name           3167 non-null   object 
 2   Last Sale              3165 non-null   float64
 3   M

  warn(msg)


In [18]:
# Add Exchange reference columns
nyse['Exchange'] = 'NYSE'
nasdaq['Exchange'] = 'NASDAQ'

# Concatenate DataFrames  
combined_listings = pd.concat([nyse, nasdaq]) 
combined_listings

Unnamed: 0,Stock Symbol,Company Name,Last Sale,Market Capitalization,IPO Year,Sector,Industry,Exchange
0,DDD,3D Systems Corporation,14.48,1.647165e+09,,Technology,Computer Software: Prepackaged Software,NYSE
1,MMM,3M Company,188.65,1.127366e+11,,Health Care,Medical/Dental Instruments,NYSE
2,WBAI,500.com Limited,13.96,5.793129e+08,2013.0,Consumer Services,Services-Misc. Amusement & Recreation,NYSE
3,WUBA,58.com Inc.,36.11,5.225238e+09,2013.0,Technology,"Computer Software: Programming, Data Processing",NYSE
4,AHC,A.H. Belo Corporation,6.20,1.347351e+08,,Consumer Services,Newspapers/Magazines,NYSE
...,...,...,...,...,...,...,...,...
3162,WSFSL,WSFS Financial Corporation,25.70,0.000000e+00,,Finance,Major Banks,NASDAQ
3163,XGTIW,"XG Technology, Inc",6.56,0.000000e+00,2013.0,Consumer Durables,Telecommunications Equipment,NASDAQ
3164,ZNWAA,Zion Oil & Gas Inc,,0.000000e+00,,Energy,Oil & Gas Production,NASDAQ
3165,ZIONW,Zions Bancorporation,9.87,0.000000e+00,,Finance,Major Banks,NASDAQ


In [25]:
# Create an empty list: listings
listings = []

# Import the data
for exchange in exchanges:
    listing = pd.read_excel(xls, sheet_name=exchange, na_values='n/a')
    listing['Exchange'] = exchange
    listings.append(listing)

print(len(listings))
print(type(listings))
listings[0]

  warn(msg)
  warn(msg)


3
<class 'list'>


  warn(msg)


Unnamed: 0,Stock Symbol,Company Name,Last Sale,Market Capitalization,IPO Year,Sector,Industry,Exchange
0,XXII,"22nd Century Group, Inc",1.3300,1.206285e+08,,Consumer Non-Durables,Farming/Seeds/Milling,amex
1,FAX,Aberdeen Asia-Pacific Income Fund Inc,5.0000,1.266333e+09,1986.0,,,amex
2,IAF,Aberdeen Australia Equity Fund Inc,6.1500,1.398653e+08,,,,amex
3,CH,"Aberdeen Chile Fund, Inc.",7.2201,6.756346e+07,,,,amex
4,ABE,Aberdeen Emerging Markets Smaller Company Oppo...,13.3600,1.288430e+08,,,,amex
...,...,...,...,...,...,...,...,...
355,WYY,WidePoint Corporation,0.4350,3.602423e+07,,Technology,EDP Services,amex
356,WTT,"Wireless Telecom Group, Inc.",1.5200,3.380309e+07,,Capital Goods,Electrical Products,amex
357,XTNT,"Xtant Medical Holdings, Inc.",0.5300,9.589080e+06,,Health Care,Biotechnology: Biological Products (No Diagnos...,amex
358,YUMA,"Yuma Energy, Inc.",2.6150,3.190562e+07,,Energy,Oil & Gas Production,amex


In [23]:
# Concatenate the listings: listing_data
listing_data = pd.concat(listings)

listing_data

Unnamed: 0,Stock Symbol,Company Name,Last Sale,Market Capitalization,IPO Year,Sector,Industry,Exchange
0,XXII,"22nd Century Group, Inc",1.3300,1.206285e+08,,Consumer Non-Durables,Farming/Seeds/Milling,amex
1,FAX,Aberdeen Asia-Pacific Income Fund Inc,5.0000,1.266333e+09,1986.0,,,amex
2,IAF,Aberdeen Australia Equity Fund Inc,6.1500,1.398653e+08,,,,amex
3,CH,"Aberdeen Chile Fund, Inc.",7.2201,6.756346e+07,,,,amex
4,ABE,Aberdeen Emerging Markets Smaller Company Oppo...,13.3600,1.288430e+08,,,,amex
...,...,...,...,...,...,...,...,...
3142,ZB^H,Zions Bancorporation,25.3000,0.000000e+00,,,,nyse
3143,ZBK,Zions Bancorporation,28.8600,0.000000e+00,,Finance,Major Banks,nyse
3144,ZOES,"Zoe&#39;s Kitchen, Inc.",17.0700,3.325561e+08,2014.0,Consumer Services,Restaurants,nyse
3145,ZTS,Zoetis Inc.,53.1000,2.610544e+10,2013.0,Health Care,Major Pharmaceuticals,nyse


In [24]:
# Inspect the results
listing_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 6674 entries, 0 to 3146
Data columns (total 8 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Stock Symbol           6674 non-null   object 
 1   Company Name           6674 non-null   object 
 2   Last Sale              6590 non-null   float64
 3   Market Capitalization  6674 non-null   float64
 4   IPO Year               2852 non-null   float64
 5   Sector                 5182 non-null   object 
 6   Industry               5182 non-null   object 
 7   Exchange               6674 non-null   object 
dtypes: float64(3), object(5)
memory usage: 469.3+ KB
