Section 1: Pandas Basics

In [53]:
# Import pandas library and dataset
import pandas as pd

#from the listing csv import the relevant data
listings = pd.read_csv("listings.csv")

Check the data Listings 

In [54]:
listings.head()

Unnamed: 0,Symbol,Exchange,Name,Last_Price,Market_Cap,ADR TSO,IPO_Year,Sector,Industry
0,A,NYSE,"Agilent Technologies, Inc.",81.68,25934696179.92,,1999.0,Capital Goods,Biotechnology: Laboratory Analytical Instruments
1,AA,NYSE,Alcoa Corporation,29.15,5407809589.6,,2016.0,Basic Industries,Aluminum
2,AABA,NASDAQ,Altaba Inc.,75.39,42781131315.3,,,Technology,EDP Services
3,AAC,NYSE,"AAC Holdings, Inc.",2.16,53141086.8,,2014.0,Health Care,Medical Specialities
4,AAL,NASDAQ,"American Airlines Group, Inc.",34.02,15276869742.96,,,Transportation,Air Freight/Delivery Services


Details on the columns
1. Symbol: A Stock Ticker Symbol is an abbreviation used to uniquely identify publicly traded shares of a particular stock on a particular stock market
2. Exchange: Marketplace in which shares are traded
3. Name: Legal Company Name
4. Last_Price: Last Trading Price (as of April 2019)
5. Market_Cap: Dollar Value of Outstanding Shares (as of April 2019). Computed as Shares times current Price.
6. ADR_TSO: Additional Information on foreign stocks trading in the US
7. IPO_Year: Year of Initial Public Offering
8. Sector: Sector of main business activity
9. Industry: Industry of main business activity

Inspect the first 10 rows

In [55]:
listings.head(10)

Unnamed: 0,Symbol,Exchange,Name,Last_Price,Market_Cap,ADR TSO,IPO_Year,Sector,Industry
0,A,NYSE,"Agilent Technologies, Inc.",81.68,25934696179.92,,1999.0,Capital Goods,Biotechnology: Laboratory Analytical Instruments
1,AA,NYSE,Alcoa Corporation,29.15,5407809589.6,,2016.0,Basic Industries,Aluminum
2,AABA,NASDAQ,Altaba Inc.,75.39,42781131315.3,,,Technology,EDP Services
3,AAC,NYSE,"AAC Holdings, Inc.",2.16,53141086.8,,2014.0,Health Care,Medical Specialities
4,AAL,NASDAQ,"American Airlines Group, Inc.",34.02,15276869742.96,,,Transportation,Air Freight/Delivery Services
5,AAMC,AMEX,Altisource Asset Management Corp,29.9,47381573.2,,,Finance,Real Estate
6,AAME,NASDAQ,Atlantic American Corporation,2.48,49983983.36,,,Finance,Life Insurance
7,AAN,NYSE,"Aaron&#39;s, Inc.",53.54,3628837653.64,,,Technology,Diversified Commercial Services
8,AAOI,NASDAQ,"Applied Optoelectronics, Inc.",12.3,244556139.9,,2013.0,Technology,Semiconductors
9,AAON,NASDAQ,"AAON, Inc.",44.88,2336955141.84,,,Capital Goods,Industrial Machinery/Components


Inspect the last 5 data rows

In [56]:
listings.tail()

Unnamed: 0,Symbol,Exchange,Name,Last_Price,Market_Cap,ADR TSO,IPO_Year,Sector,Industry
6847,ZUMZ,NASDAQ,Zumiez Inc.,26.72,681919062.56,,2005.0,Consumer Services,Clothing/Shoe/Accessory Stores
6848,ZUO,NYSE,"Zuora, Inc.",19.79,2147266454.0,,2018.0,Technology,Computer Software: Prepackaged Software
6849,ZYME,NYSE,Zymeworks Inc.,15.74,504078206.26,,2017.0,Health Care,Major Pharmaceuticals
6850,ZYNE,NASDAQ,"Zynerba Pharmaceuticals, Inc.",7.85,165399468.6,,2015.0,Health Care,Major Pharmaceuticals
6851,ZYXI,NASDAQ,"Zynex, Inc.",5.02,161834880.48,,,Health Care,Biotechnology: Electromedical & Electrotherape...


In the dataset there are 6852 data points (0-6852)

Get critical information on the data

In [57]:
listings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6852 entries, 0 to 6851
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Symbol      6852 non-null   object 
 1   Exchange    6852 non-null   object 
 2   Name        6852 non-null   object 
 3   Last_Price  6745 non-null   float64
 4   Market_Cap  5954 non-null   float64
 5   ADR TSO     140 non-null    float64
 6   IPO_Year    3105 non-null   float64
 7   Sector      5309 non-null   object 
 8   Industry    5309 non-null   object 
dtypes: float64(4), object(5)
memory usage: 481.9+ KB


As it is clearly depicted we have NAN and Missing Values
1. Last_Price -> There are missing 6852 - 6745 = 107
2. Market_Cap -> There are missing 6852 - 5954 = 898
3. ADR TSO -> There are many more missing than usual -> 6852 - 140 = 6712
4. IPO_Year -> There are missing 6852 - 3105 = 3747
5. Sector -> There are missing 6852 - 5309 = 1543
6. Industry -> There are missing 6852 - 5309 = 1543

In [58]:
listings.describe()

Unnamed: 0,Last_Price,Market_Cap,ADR TSO,IPO_Year
count,6745.0,5954.0,140.0,3105.0
mean,37.37,7174440219.09,58692415.09,2009.03
std,99.25,35254552925.59,162990861.9,9.34
min,0.0,0.0,24000.0,1972.0
25%,8.31,120376889.75,4271224.5,2003.0
50%,20.57,558517593.27,14069636.5,2013.0
75%,37.37,2888746939.46,39626126.5,2017.0
max,4299.99,945979473600.0,1485426467.0,2019.0


Interpretation Of Statistics:

Last Price:
1. 6745 entries of stocks in my dataset
2. The average stock price is 37.37
3. Prices vary very wide since the Standard Deviation is 99.25
4. Min and Max prices indicate that there are very expensice stocks of 4299 and some penny stocks for 0.0003.

Quartiles Analysis:
1. 25% of the stocks prices are below 8.31
2. 50% of the stocks are below 20.57
3. 75% of the stocks are below 37.37
4. Only 25% of stocks have high prices in our entries

Market_Cap = Market Capitilization : value of outstanding shares
1. 5954 entries in data with some missing data
2. The average market cap is 7.17 billion, it is currenntly influenced by the large companies in stocks
3. Very Large Variance 35.25 billion reflects that we have companies of different sizes
4. Smallest Market Cap is 0 so that company is no longer trading
5. Highest Market Cap: 945.98 billion

Quartiles Analysis:
1. 25% of the companies have market Cap below 120m
3. 75% of companies are below 2.89 billion 

Diversity: The dataset includes companies of various sizes, ages, and stock price ranges, making it representative of a broad market spectrum.

IPO Trends: A high proportion of companies went public after 2000, reflecting a growing market.



In [59]:
listings.Name

0          Agilent Technologies, Inc.
1                   Alcoa Corporation
2                         Altaba Inc.
3                  AAC Holdings, Inc.
4       American Airlines Group, Inc.
                    ...              
6847                      Zumiez Inc.
6848                      Zuora, Inc.
6849                   Zymeworks Inc.
6850    Zynerba Pharmaceuticals, Inc.
6851                      Zynex, Inc.
Name: Name, Length: 6852, dtype: object

Show the name of the companies for each stock

Find out the IPO Year of the MSFT and DIS

In [60]:
listings = pd.read_csv("listings.csv", index_col = "Symbol")

In [61]:
listings.loc[["MSFT", "DIS"], "IPO_Year":]

Unnamed: 0_level_0,IPO_Year,Sector,Industry
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
MSFT,1986.0,Technology,Computer Software: Prepackaged Software
DIS,,Consumer Services,Television Services


The company Microsoft had started the IPO in 1986 while the Disney the data is NAN

# Analyzing Columns & Pandas Series

In [62]:
price = listings.Last_Price.copy() # last Price = Last Traiding Price

In [63]:
price.head(10)

Symbol
A      81.68
AA     29.15
AABA   75.39
AAC     2.16
AAL    34.02
AAMC   29.90
AAME    2.48
AAN    53.54
AAOI   12.30
AAON   44.88
Name: Last_Price, dtype: float64

In [64]:
price.describe()

count   6745.00
mean      37.37
std       99.25
min        0.00
25%        8.31
50%       20.57
75%       37.37
max     4299.99
Name: Last_Price, dtype: float64

Interpretation
1. We have 6745 prices for the stocks
2. The mean price is 37.37
3. The most expensive stock is 4299
4. The Standard Deviation is 99 which mean the price between the stocks varies a lot.

In [65]:
price.value_counts()

Last_Price
25.00     9
9.73      9
1.61      8
2.50      8
10.26     7
         ..
0.67      1
103.36    1
25.89     1
7.77      1
101.33    1
Name: count, Length: 4470, dtype: int64

The most frequent prices is between 25 and 9.73 for traiding a stock

In [66]:
price.value_counts(normalize = True)

Last_Price
25.00    0.00
9.73     0.00
1.61     0.00
2.50     0.00
10.26    0.00
         ... 
0.67     0.00
103.36   0.00
25.89    0.00
7.77     0.00
101.33   0.00
Name: proportion, Length: 4470, dtype: float64

Interpretation:
1. Interpretation: For every 1,000 stocks, approximately 1.33 stocks have a price of $25.00.


In [67]:
price.sort_values()

Symbol
CYTXZ      0.00
JASNW      0.00
VEACW      0.01
WHLRW      0.01
SNOAW      0.01
           ... 
TRNE.U      NaN
UUUU.WS     NaN
VST.WS.A    NaN
WSO.B       NaN
ZNWAA       NaN
Name: Last_Price, Length: 6852, dtype: float64

The lowest value is 0.003

In [68]:
price.sort_values(ascending = False, inplace = True)
price

Symbol
SEB        4299.99
NVR        2910.42
AMZN       1847.33
BKNG       1806.00
BAC^L      1316.70
             ...  
TRNE.U         NaN
UUUU.WS        NaN
VST.WS.A       NaN
WSO.B          NaN
ZNWAA          NaN
Name: Last_Price, Length: 6852, dtype: float64

The highest price currenlty is 4299$ for the SEB Stock Seaboard Corp

In [69]:
# Sort the series price by the index
price.sort_index(inplace = True)

In [70]:
price

Symbol
A      81.68
AA     29.15
AABA   75.39
AAC     2.16
AAL    34.02
        ... 
ZUMZ   26.72
ZUO    19.79
ZYME   15.74
ZYNE    7.85
ZYXI    5.02
Name: Last_Price, Length: 6852, dtype: float64

# Pandas Index Operations

In [71]:
listings.index

Index(['A', 'AA', 'AABA', 'AAC', 'AAL', 'AAMC', 'AAME', 'AAN', 'AAOI', 'AAON',
       ...
       'ZSAN', 'ZTEST', 'ZTO', 'ZTR', 'ZTS', 'ZUMZ', 'ZUO', 'ZYME', 'ZYNE',
       'ZYXI'],
      dtype='object', name='Symbol', length=6852)

In [72]:
listings.index.is_unique

False

Some index entries appear more than once, violating the uniqueness property of a typical DataFrame index

In [73]:
listings.sort_values(by = "Market_Cap", ascending = False, inplace= True)
listings.head(3)

Unnamed: 0_level_0,Exchange,Name,Last_Price,Market_Cap,ADR TSO,IPO_Year,Sector,Industry
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
AAPL,NASDAQ,Apple Inc.,200.62,945979473600.0,,1980.0,Technology,Computer Manufacturing
MSFT,NASDAQ,Microsoft Corporation,120.19,922123334074.74,,1986.0,Technology,Computer Software: Prepackaged Software
AMZN,NASDAQ,"Amazon.com, Inc.",1847.33,907413834783.7,,1997.0,Consumer Services,Catalog/Specialty Distribution


Sorting the values by Market Cap from High to low. The highest Capitalization in the data is from the Apple Company and Second is Microsoft

In [74]:
listings.sort_values(["IPO_Year", "Market_Cap"], ascending = [True, False], inplace = True)
listings.head(3)

Unnamed: 0_level_0,Exchange,Name,Last_Price,Market_Cap,ADR TSO,IPO_Year,Sector,Industry
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
AMAT,NASDAQ,"Applied Materials, Inc.",42.17,40035886321.53,,1972.0,Technology,Semiconductors
COKE,NASDAQ,"Coca-Cola Consolidated, Inc.",294.7,2104584430.9,,1972.0,Consumer Non-Durables,Beverages (Production/Distribution)
WDFC,NASDAQ,WD-40 Company,163.6,2257237298.4,,1973.0,Basic Industries,Major Chemicals


Sort the values by IPO_Year(ascending) and Market Price (Descending) . The oldest company in the IPO Year is the AMAT Applied Materials, it is traded on NASDAQ in the Technology Sector

In [75]:
listings.nunique()

Exchange         3
Name          5766
Last_Price    4470
Market_Cap    5835
ADR TSO        140
IPO_Year        42
Sector          12
Industry       135
dtype: int64

It is clearly show that the data is from 3 Stock Exchange Marketplaces

In [76]:
listings['Exchange'].unique()

array(['NASDAQ', 'NYSE', 'AMEX'], dtype=object)

The martketplaces in the data are NASDAQ, NYSE, AMEX

In [77]:
listings.nlargest(n = 5, columns = "Market_Cap")

Unnamed: 0_level_0,Exchange,Name,Last_Price,Market_Cap,ADR TSO,IPO_Year,Sector,Industry
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
AAPL,NASDAQ,Apple Inc.,200.62,945979473600.0,,1980.0,Technology,Computer Manufacturing
MSFT,NASDAQ,Microsoft Corporation,120.19,922123334074.74,,1986.0,Technology,Computer Software: Prepackaged Software
AMZN,NASDAQ,"Amazon.com, Inc.",1847.33,907413834783.7,,1997.0,Consumer Services,Catalog/Specialty Distribution
GOOGL,NASDAQ,Alphabet Inc.,1206.45,838707627454.2,,,Technology,"Computer Software: Programming, Data Processing"
GOOG,NASDAQ,Alphabet Inc.,1202.16,835725277815.36,,2004.0,Technology,"Computer Software: Programming, Data Processing"


The 3 companies that have the largest Market Cap are AAPL, Microsoft, Amazon and GOOGLE

In [78]:
listings.nsmallest(n = 3, columns = "Last_Price")

Unnamed: 0_level_0,Exchange,Name,Last_Price,Market_Cap,ADR TSO,IPO_Year,Sector,Industry
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
CYTXZ,NASDAQ,Cytori Therapeutics Inc.,0.0,,,,Health Care,Medical/Dental Instruments
JASNW,NASDAQ,"Jason Industries, Inc.",0.0,,,2013.0,Consumer Durables,Miscellaneous manufacturing industries
VEACW,NASDAQ,Vantage Energy Acquisition Corp.,0.01,,,2017.0,Finance,Business Services


The company with the lowest price in Stocks is Cytori Theraputics, Jason Industries and Vantage Energy Acquisition

### Create a mast to check the stocks that are traided on NASDAQ

In [79]:
nas = listings.Exchange == "NASDAQ"
nyse = listings.Exchange == "NYSE"
amex = listings.Exchange == "AMEX"
'NASDAQ', 'NYSE', 'AMEX'

('NASDAQ', 'NYSE', 'AMEX')

In [80]:
# Check all the stocks that have less than 5 bucks
bucks = listings.Last_Price < 5

In [81]:
bucks

Symbol
AMAT     False
COKE     False
WDFC     False
AAPL     False
KLAC     False
         ...  
ZB^A     False
ZB^G     False
ZB^H     False
ZIONW    False
ZTEST    False
Name: Last_Price, Length: 6852, dtype: bool

Create a dataframe only for the companies that are traded on NASDAQ Marketplace

In [82]:
nasdaq = listings.loc[nas].copy()

In [83]:
nasdaq.head(3)

Unnamed: 0_level_0,Exchange,Name,Last_Price,Market_Cap,ADR TSO,IPO_Year,Sector,Industry
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
AMAT,NASDAQ,"Applied Materials, Inc.",42.17,40035886321.53,,1972.0,Technology,Semiconductors
COKE,NASDAQ,"Coca-Cola Consolidated, Inc.",294.7,2104584430.9,,1972.0,Consumer Non-Durables,Beverages (Production/Distribution)
WDFC,NASDAQ,WD-40 Company,163.6,2257237298.4,,1973.0,Basic Industries,Major Chemicals


Create a separate dataframe for stocks traded in each marketplace

In [84]:
NYSE = listings.loc[nyse].copy()

In [85]:
AMEX = listings.loc[amex].copy()

Create a dataframe for stocks that are traided less than 5 bucks

In [86]:
cheap_stocks = listings.loc[bucks].copy()

In [87]:
cheap_stocks.head(3)

Unnamed: 0_level_0,Exchange,Name,Last_Price,Market_Cap,ADR TSO,IPO_Year,Sector,Industry
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
LYTS,NASDAQ,LSI Industries Inc.,2.99,77408035.25,,1985.0,Consumer Durables,Building Products
FAX,AMEX,Aberdeen Asia-Pacific Income Fund Inc,4.14,1042993611.36,,1986.0,,
CYTR,NASDAQ,CytRx Corporation,0.66,22200750.66,,1986.0,Health Care,Biotechnology: Biological Products (No Diagnos...


In [88]:
cheap_stocks.sort_values(by = 'Last_Price', ascending = True, inplace = True  )

In [89]:
cheap_stocks.head(3)

Unnamed: 0_level_0,Exchange,Name,Last_Price,Market_Cap,ADR TSO,IPO_Year,Sector,Industry
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
CYTXZ,NASDAQ,Cytori Therapeutics Inc.,0.0,,,,Health Care,Medical/Dental Instruments
JASNW,NASDAQ,"Jason Industries, Inc.",0.0,,,2013.0,Consumer Durables,Miscellaneous manufacturing industries
VEACW,NASDAQ,Vantage Energy Acquisition Corp.,0.01,,,2017.0,Finance,Business Services


Find the cheap stocks per Marketplace

In [90]:
nasdaq = listings[listings['Exchange'] == 'NASDAQ']  # Filter NASDAQ stocks
nyse = listings[listings['Exchange'] == 'NYSE']  # Filter NYSE stocks
amex = listings[listings['Exchange'] == 'AMEX']  # Filter AMEX stocks
bucks = nasdaq['Last_Price'] < 5    

In [91]:
nasdaq_cheap_stocks = nasdaq[bucks]

In [92]:
nyse.head(3)

Unnamed: 0_level_0,Exchange,Name,Last_Price,Market_Cap,ADR TSO,IPO_Year,Sector,Industry
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
ORCL,NYSE,Oracle Corporation,53.97,184450786380.0,,1986.0,Technology,Computer Software: Prepackaged Software
MKL,NYSE,Markel Corporation,979.66,13583293513.24,,1986.0,Finance,Property-Casualty Insurers
BPL,NYSE,Buckeye Partners L.P.,33.61,5169119152.99,,1986.0,Energy,Natural Gas Distribution


In [93]:
nasdaq_cheap_stocks.info()

<class 'pandas.core.frame.DataFrame'>
Index: 852 entries, LYTS to WHLRW
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Exchange    852 non-null    object 
 1   Name        852 non-null    object 
 2   Last_Price  852 non-null    float64
 3   Market_Cap  712 non-null    float64
 4   ADR TSO     49 non-null     float64
 5   IPO_Year    425 non-null    float64
 6   Sector      843 non-null    object 
 7   Industry    843 non-null    object 
dtypes: float64(4), object(4)
memory usage: 59.9+ KB


There are 852 stocks that are being traded on Nasdaq with a price less than 5$ 

NA Values and Summary Statistics

In [94]:
pd.options.display.float_format = '{:.2f}'.format

Check how many values we have missing per column

In [95]:
listings.isna().sum(axis = 0)

Exchange         0
Name             0
Last_Price     107
Market_Cap     898
ADR TSO       6712
IPO_Year      3747
Sector        1543
Industry      1543
dtype: int64

Fill all NA values in the Last Price with 0

In [98]:
listings["Last_Price"] = listings["Last_Price"].fillna(0)

In [99]:
listings.head(3)

Unnamed: 0_level_0,Exchange,Name,Last_Price,Market_Cap,ADR TSO,IPO_Year,Sector,Industry
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
AMAT,NASDAQ,"Applied Materials, Inc.",42.17,40035886321.53,,1972.0,Technology,Semiconductors
COKE,NASDAQ,"Coca-Cola Consolidated, Inc.",294.7,2104584430.9,,1972.0,Consumer Non-Durables,Beverages (Production/Distribution)
WDFC,NASDAQ,WD-40 Company,163.6,2257237298.4,,1973.0,Basic Industries,Major Chemicals


For the IPO Year save all the existing rows in a dataframe

In [100]:
exist_year = listings.loc[listings.IPO_Year.notna()]

In [102]:
exist_year.min

<bound method DataFrame.min of        Exchange                                        Name  Last_Price  \
Symbol                                                                    
AMAT     NASDAQ                     Applied Materials, Inc.       42.17   
COKE     NASDAQ                Coca-Cola Consolidated, Inc.      294.70   
WDFC     NASDAQ                               WD-40 Company      163.60   
AAPL     NASDAQ                                  Apple Inc.      200.62   
KLAC     NASDAQ                      KLA-Tencor Corporation      123.34   
...         ...                                         ...         ...   
RMG        NYSE                       RMG Acquisition Corp.        9.77   
SAMAW    NASDAQ  Schultze Special Purpose Acquisition Corp.        0.36   
TEAF       NYSE  Tortoise Essential Assets Income Term Fund       20.08   
THCBU    NASDAQ                       Tuscan Holdings Corp.       10.45   
THCBW    NASDAQ                       Tuscan Holdings Corp.        0.

IPO_Year information is not available for stocks with an IPO prior to 1970 
#### (Although this is not the right way to handle missing data)

In [103]:
listings["IPO_Year"] = listings["IPO_Year"].fillna(1969)

Convert the listings integer

In [105]:
listings.IPO_Year = listings.IPO_Year.astype("int")

Create the correlation Matrix

In [107]:
listings.corr(numeric_only = True)

Unnamed: 0,Last_Price,Market_Cap,ADR TSO,IPO_Year
Last_Price,1.0,0.31,0.07,-0.06
Market_Cap,0.31,1.0,0.47,-0.07
ADR TSO,0.07,0.47,1.0,-0.16
IPO_Year,-0.06,-0.07,-0.16,1.0


Intepretation - Business Insights
1. Last price is does not have any high correlation with the columns of this dataset thus it is not influenced that much by any of the columns
2. Market Cap has a moderate correlation with, companies with larger market caps might issue more ADR shares

#### Create Column Share
It is the total amount of shares outstanding

Formula = Divide Market Cap with Last Price

In [108]:
listings["Shares"] = listings.Market_Cap.div(listings.Last_Price)

In [110]:
listings.head(3)

Unnamed: 0_level_0,Exchange,Name,Last_Price,Market_Cap,ADR TSO,IPO_Year,Sector,Industry,Shares
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
AMAT,NASDAQ,"Applied Materials, Inc.",42.17,40035886321.53,,1972,Technology,Semiconductors,949392609.0
COKE,NASDAQ,"Coca-Cola Consolidated, Inc.",294.7,2104584430.9,,1972,Consumer Non-Durables,Beverages (Production/Distribution),7141447.0
WDFC,NASDAQ,WD-40 Company,163.6,2257237298.4,,1973,Basic Industries,Major Chemicals,13797294.0


Calculate the mean, max, min price for each exchange marketplace

In [112]:
listings.groupby("Exchange").Last_Price.mean()

Exchange
AMEX     24.24
NASDAQ   32.27
NYSE     43.04
Name: Last_Price, dtype: float64

In [113]:
listings.groupby("Exchange").Last_Price.max()

Exchange
AMEX     4299.99
NASDAQ   1847.33
NYSE     2910.42
Name: Last_Price, dtype: float64

In [114]:
listings.groupby("Exchange").Last_Price.min()

Exchange
AMEX     0.00
NASDAQ   0.00
NYSE     0.00
Name: Last_Price, dtype: float64