# Analyzing Stock Prices

In this project, we'll look at more than 10 years of stock market data from Yahoo Finance for a few hundred different stocks that are traded on the NASDAQ stock exchange.

## Introduction to the Data

Some of the data we'll be working with can be found in this repository under the `prices` folder. The `download_data.py` script can be used to download all of the stock price data. Each file is named for each specific stock symbol, and the data within contains the following:

* date – when the data is from
* close – closing price on the date
* open – opening price on the date
* high – highest price reached on the date
* low – lowest price reached on the date
* volume – number of shares traded on the date

We'll import the data and take a look at what the first few rows of the files look like.

In [1]:
import pandas as pd
import os

stock_prices = {}

for file_name in os.listdir('prices'):
    name = file_name.split('.')[0] # Removing the file extension from the name
    stock_prices[name] = pd.read_csv(os.path.join('prices', file_name))
    
stock_prices['fizz'].head()

Unnamed: 0,date,close,open,high,low,volume
0,2007-01-03,13.689997,14.100001,14.100001,13.400004,264400
1,2007-01-04,13.460005,13.689997,13.689997,13.400004,122700
2,2007-01-05,13.170001,13.389996,13.940004,13.130005,86500
3,2007-01-08,12.720001,12.999997,13.089996,12.699997,109500
4,2007-01-09,12.759997,12.699997,12.900001,12.549997,48600


## Computing Aggregates

Now that we've taken a look at the data we'll be working with, we'll start off by computing some aggregate data like average closing price for each stock and minimum and maximum closing price of all stocks in our dataset.

In [2]:
average_closing = {}

for stock in stock_prices:
    average_closing[stock] = stock_prices[stock]['close'].mean()

In [3]:
for stock in stock_prices:
    print(stock, average_closing[stock])

csco 23.628822402702724
bios 5.790710422779932
csbk 11.70866774131272
airt 12.430108102316591
arii 31.491413133590704
aeis 20.003212338996093
feim 8.712000000000025
bldp 2.3273861003861005
chco 39.566401530115876
amsc 13.049243415057907
asys 8.914054046332067
afam 29.43431277065635
adtn 23.847494206177636
avhi 22.406231679150594
drrx 2.3527799227799227
astc 1.4152123552123521
amkr 6.955822393436287
chy 12.45603860888028
cbmx 3.7302140003861
bobe 37.68301545907332
eqix 165.3847721150579
amswa 8.076181467181465
atrs 2.0350231660231692
bpop 17.295227834362944
alco 34.68778766100379
cdns 14.246918911196932
aegn 19.899729709266428
ardm 1.9280694980694946
abcb 17.990475994208477
cort 3.299548262548255
cash 32.26195366332041
byfc 3.4977644787644735
bnso 1.717254826254819
cvgw 29.564660229343623
brcd 7.668254826254833
chmg 26.42131274517373
arql 3.874424710424698
ancx 12.260374515443988
ccrn 10.0547413146718
daio 3.6515559845559764
esgr 114.2688533061775
dgii 10.49529343552125
emci 25.21568724

In [4]:
average_sorted = [(average_closing[stock], stock) for stock in stock_prices]
average_sorted.sort()

print('10 Minimum Average Closing Prices')
for i in range(10):
    print(average_sorted[i])    

10 Minimum Average Closing Prices
(0.8122763011583004, 'blfs')
(0.824100993822394, 'apdn')
(0.901011583011584, 'bmra')
(0.9969415324324327, 'bcli')
(1.1615408884169918, 'cyrx')
(1.204571143629345, 'clrb')
(1.2069536679536692, 'cpst')
(1.228244384585441, 'csbr')
(1.329351351351346, 'egt')
(1.398042471042472, 'aemd')


In [5]:
print('10 Maximum Average Closing Prices')
for i in range(10):
    print(average_sorted[-i-1])

10 Maximum Average Closing Prices
(275.1340775710431, 'amzn')
(257.17654040231656, 'aapl')
(230.2946601100388, 'cme')
(228.3897761598455, 'atri')
(200.25248278146725, 'fcnca')
(193.5319112447879, 'bidu')
(165.3847721150579, 'eqix')
(164.53822006139012, 'biib')
(114.2688533061775, 'esgr')
(113.28309655096503, 'bbh')


## Most Traded Stock Each Day

Next, we'll create a dictionary where the keys will be the trading day, and the value will be a tuple containing the trade volume and stock symbol of the most traded stock on that day. This way we can simply look up a day and find out what stock was popular on that day.

In [6]:
trades_each_day = {}

for stock in stock_prices:
    for index, row in stock_prices[stock].iterrows():
        day = row['date']
        volume = row['volume']
        pair = (volume, stock)
        if day not in trades_each_day:
            trades_each_day[day] = []
        trades_each_day[day].append(pair)
        
most_traded = {}

for day in trades_each_day:
    trades_each_day[day].sort()
    most_traded[day] = trades_each_day[day][-1]

In [7]:
print(most_traded['2007-05-25'])
print(most_traded['2007-06-01'])
print(most_traded['2009-06-08'])

(158239900, 'aapl')
(221315500, 'aapl')
(232913100, 'aapl')


## Searching for High Volume Trading Days

Now, we'll search for all transactions on the days with a higher trade volume than normal.

In [8]:
high_volume_days = []

for day in trades_each_day:
    volume_day = sum([volume for volume, _ in trades_each_day[day]])
    high_volume_days.append((volume_day, day))
    
high_volume_days.sort()
high_volume_days[-10:]

[(1533363200, '2008-01-24'),
 (1536176400, '2008-01-16'),
 (1553880500, '2007-11-08'),
 (1555072400, '2008-09-29'),
 (1559032100, '2008-02-07'),
 (1578877700, '2008-01-22'),
 (1599183500, '2008-10-08'),
 (1611272800, '2007-07-26'),
 (1770266900, '2008-10-10'),
 (1964583900, '2008-01-23')]

## Profitable Stocks

Next, let's find the stocks that would have been most profitable to buy and hodl.

In [9]:
profitable_stocks = []

for stock in stock_prices:
    prices = stock_prices[stock]
    initial = prices.loc[0, 'close']
    final = prices.loc[prices.shape[0] - 1, 'close']
    percentage = 100 * (final - initial) / initial
    profitable_stocks.append((percentage, stock))
    
profitable_stocks.sort()
profitable_stocks[-10:]

[(1330.0000666666667, 'achc'),
 (1339.2137535980346, 'bcli'),
 (1525.162516251625, 'cui'),
 (1549.6700659868027, 'apdn'),
 (1707.355447278503, 'anip'),
 (2230.7234281466817, 'amzn'),
 (2437.4365640858978, 'blfs'),
 (3898.6004898285596, 'arcw'),
 (4005.0000000000005, 'adxs'),
 (7483.8389225948395, 'admp')]

## Conclusion & Next Steps

In this project we performed a basic analysis on large amounts of past stock price data. This kind of analysis could be a good start for a deeper analysis where we could more recent data and find undervalued stocks that have the potential to be good investments.

Some next steps we could take if we would like to continue our analysis could be to:

* Look at what stocks would have been most profitable to short at the start of a period.
* List the stocks with the highest after-hours trading and with the largest differences in closing price vs. the next trading day.
* Test some common technical indicators that help forecast the market.
* Find time periods of steady increases in prices and steady declines in prices.
* Find the optimal day to buy each stock if held long-term.

The idea for this project comes from the [DATAQUEST](https://app.dataquest.io/) **Algorithms and Data Structures** course.