# Analyzing Stock Prices From 2007-2017
## Introduction
In this project we will be looking at stock market data that was downloaded from [Yahoo Finance](https://finance.yahoo.com/) using the [yahoo_finance](https://pypi.org/project/yahoo-finance/) Python package. This data includes daily stock prices from `2007-1-1` to `2017-04-17` for stocks traded on the [NASDAQ](https://www.nasdaq.com/) stock exchange.

The goal of this project is to implement data structures to analyze stock prices more efficiently. To do this, we will mainly be working with dictionaries.

## Importing the Data

In [1]:
import pandas as pd
import os

stock_prices = {}

for fn in os.listdir("prices"):
    stock = fn.split(".")[0]  # Get stock name from file
    stock_prices[stock] = pd.read_csv(os.path.join("prices", fn))

To check if we loaded in the data correctly, let's display the first few rows for the `aapl` stock.

In [2]:
stock_prices["aapl"].head()

Unnamed: 0,date,close,open,high,low,volume
0,2007-01-03,83.800002,86.289999,86.579999,81.899999,309579900
1,2007-01-04,85.659998,84.050001,85.949998,83.820003,211815100
2,2007-01-05,85.049997,85.77,86.199997,84.400002,208685400
3,2007-01-08,85.47,85.959998,86.529998,85.280003,199276700
4,2007-01-09,92.570003,86.450003,92.979999,85.15,837324600


## Computing Average Closing Prices


In [3]:
avg_closing_prices = {}

for stock in stock_prices:
    avg_closing_prices[stock] = stock_prices[stock]["close"].mean()

In [4]:
# Displaying stocks in alphabetical order by converting to list of tuples
avg_closing_prices_alpha = sorted(
    zip(avg_closing_prices.keys(), avg_closing_prices.values()))

for stock_sym in avg_closing_prices_alpha:
    print(stock_sym, sep="\n")

('aal', 22.07495366679537)
('aame', 2.779679536679537)
('aaon', 23.617386061776063)
('aapl', 257.1765404023166)
('aaww', 44.33160229034749)
('aaxn', 11.863907341698841)
('abax', 34.57868337992278)
('abcb', 17.990475994208495)
('abco', 47.64705796756756)
('abeo', 2.5932200772200775)
('abio', 2.2518008)
('abmd', 33.22242086100386)
('abtl', 6.233108108494208)
('acad', 13.823586874903475)
('acet', 12.655212363320464)
('acfc', 5.596733538610039)
('acgl', 63.32590737683399)
('achc', 24.04779533822394)
('achn', 5.941177606949807)
('aciw', 28.27269496023166)
('acls', 3.343806946718147)
('acnb', 17.3435289003861)
('acor', 27.47286873938224)
('acta', 11.320559837065638)
('actg', 15.997490346718147)
('acxm', 18.263061783783783)
('adbe', 51.199436284169884)
('adi', 42.24018144826255)
('admp', 1.7122164397683397)
('adp', 61.03234735559846)
('adra', 27.35145173976834)
('adrd', 22.51748262046332)
('adre', 39.14505407104247)
('adru', 22.37166796177606)
('adsk', 42.24759463281853)
('adtn', 23.847494206

## Minimum and Maximum Average Closing Prices

Now we would like to know which stocks have the highest and lowest average closing prices. There are multiple ways we can do this, but we can start first by converting the original dictionary into a list of tuples like we did before. This time however, we will sort the stocks by average closing prices, and not the stock names.

## The long way
If we wanted to compare minimum and maximum average closing prices among all stocks, we can sort the average closing prices in increasing or decreasing order after we create our new list of tuples.

In [5]:
# Sorting average closing prices by increasing order
avg_closing_prices_inc = sorted(
    zip(avg_closing_prices.values(), avg_closing_prices.keys()))

for stock_sym in avg_closing_prices_inc:
    print(stock_sym, sep="\n")

(0.8122763011583011, 'blfs')
(0.8241009938223938, 'apdn')
(0.901011583011583, 'bmra')
(0.9969415324324326, 'bcli')
(1.1615408884169884, 'cyrx')
(1.2045711436293434, 'clrb')
(1.206953667953668, 'cpst')
(1.2282443845854416, 'csbr')
(1.3293513513513513, 'egt')
(1.3980424710424713, 'aemd')
(1.4005010393822395, 'dfbg')
(1.405298283011583, 'alqa')
(1.4116189448441248, 'cpah')
(1.4152123552123552, 'astc')
(1.4581224154440156, 'chci')
(1.494366311969112, 'ctic')
(1.5323436293436294, 'eltk')
(1.5382316602316604, 'dzsi')
(1.5475988922779922, 'cool')
(1.5946138996138997, 'cgnt')
(1.6028996138996137, 'creg')
(1.617906349034749, 'casi')
(1.7122164397683397, 'admp')
(1.7172548262548262, 'bnso')
(1.7391445949806952, 'aezs')
(1.822119691119691, 'dynt')
(1.8256061776061776, 'apps')
(1.863166023166023, 'dysl')
(1.8681738996139, 'apri')
(1.8903166015444013, 'crds')
(1.8903745173745172, 'dlhc')
(1.907691699604743, 'cur')
(1.928069498069498, 'ardm')
(1.9615839865149598, 'cpsh')
(1.9762007722007724, 'cprx')

In [6]:
# Sorting average closing prices by decreasing order
avg_closing_prices_dec = sorted(
    zip(avg_closing_prices.values(), avg_closing_prices.keys()), reverse=True)

for stock_sym in avg_closing_prices_dec:
    print(stock_sym, sep="\n")

(275.13407757104255, 'amzn')
(257.1765404023166, 'aapl')
(230.2946601100386, 'cme')
(228.38977615984555, 'atri')
(200.2524827814672, 'fcnca')
(193.53191124478766, 'bidu')
(165.3847721150579, 'eqix')
(164.53822006138998, 'biib')
(114.26885330617759, 'esgr')
(113.28309655096525, 'bbh')
(110.25166789845558, 'djco')
(104.54806553783784, 'dhil')
(103.10355984362937, 'csgp')
(97.93825093397682, 'anat')
(97.1099267011583, 'alxn')
(96.17006946409266, 'cost')
(95.49895756602318, 'cacc')
(92.2331003965251, 'amgn')
(89.39383399150579, 'bwld')
(86.29457917374518, 'ffiv')
(85.09483015984556, 'celg')
(83.70168345444017, 'algt')
(80.56527417181468, 'coke')
(77.7559074069498, 'cswc')
(76.63736287992279, 'cbrl')
(72.21778764864865, 'chdn')
(67.52742853513513, 'fisv')
(67.4280848891892, 'esrx')
(65.04237453166023, 'cern')
(64.74335521467181, 'alog')
(63.32590737683399, 'acgl')
(62.32520078146718, 'anss')
(61.98583785675676, 'chrw')
(61.03234735559846, 'adp')
(59.040315206563704, 'asml')
(58.576274124710

## The short way
If we don't want to look at all of the stocks at once, we can simply obtain the values we want by indexing and slicing the list of tuples that we just created. In this case, we will only use the `avg_closing_prices_inc` list, which is sorted by increasing order of average closing prices.

In [7]:
# Display top 3 minimum average closing prices
for price in avg_closing_prices_inc[:3]:
    print(price)

(0.8122763011583011, 'blfs')
(0.8241009938223938, 'apdn')
(0.901011583011583, 'bmra')


In [8]:
# Display top 3 maximum average closing prices
for price in reversed(avg_closing_prices_inc[-3:]):
    print(price)

(275.13407757104255, 'amzn')
(257.1765404023166, 'aapl')
(230.2946601100386, 'cme')


Now we can see that `amzn`, `aapl`, and `cme` had the three highest average closing prices, while `blfs`, `apdn`, and `bmra` had the three lowest average closing prices during the same time period.

## Grouping Trades by Day
Now we want to organize all of the trades by date. We will do this by creating a dictionary where the keys are the dates and the values are the trades for all stock symbols that were made on that date.

In [9]:
trades_by_date = {}

for stock in stock_prices:
    for index, row in stock_prices[stock].iterrows():
        day = row["date"]
        volume = row["volume"]
        pair = (volume, stock)
        if day not in trades_by_date:
            trades_by_date[day] = []
        trades_by_date[day].append(pair)

## Finding The Most Traded Stock Each Day

In [10]:
most_traded_by_day = {}

for day in trades_by_date:
    trades_by_date[day].sort()
    # Calculates most traded stock of given day
    most_traded_by_day[day] = trades_by_date[day][-1]

## Verifying the results

In [11]:
print(most_traded_by_day['2007-01-03'])
print(most_traded_by_day['2007-01-04'])
print(most_traded_by_day['2007-01-05'])
print(most_traded_by_day['2007-01-08'])

(309579900, 'aapl')
(211815100, 'aapl')
(208685400, 'aapl')
(199276700, 'aapl')


Here we can see that `aapl` was the most traded stock symbol for these four days.

## Searching For High Volume Days

In [12]:
daily_volumes = []

for day in trades_by_date:
    day_volume = sum([volume for volume, _ in trades_by_date[day]])
    daily_volumes.append((day_volume, day))

daily_volumes.sort(reverse=True)

daily_volumes[:10]

[(1964583900, '2008-01-23'),
 (1770266900, '2008-10-10'),
 (1611272800, '2007-07-26'),
 (1599183500, '2008-10-08'),
 (1578877700, '2008-01-22'),
 (1559032100, '2008-02-07'),
 (1555072400, '2008-09-29'),
 (1553880500, '2007-11-08'),
 (1536176400, '2008-01-16'),
 (1533363200, '2008-01-24')]

Here we have the ten highest trade volume days overall from `2007-1-1` to `2017-04-17`. The highest volume day during that time span was `2008-01-23`, where the total trade volume was `1964583900`. Eight of the ten highest trade volume days came in the year `2008`.

## Finding Profitable Stocks
Now we want to find the top ten most profitable stocks from `2007-1-1` to `2017-04-17`, to see which stocks would have given us the best return on investment.

In [13]:
percentages = []

for stock_sym in stock_prices:
    prices = stock_prices[stock_sym]
    initial = prices.loc[0, "close"]
    final = prices.loc[prices.shape[0] - 1, "close"]
    percentage = 100 * (final - initial) / initial
    percentages.append((percentage, stock_sym))

percentages.sort()

percentages[-10:]

[(1330.0000666666667, 'achc'),
 (1339.2137535980346, 'bcli'),
 (1525.162516251625, 'cui'),
 (1549.6700659868027, 'apdn'),
 (1707.3554472785036, 'anip'),
 (2230.7234281466817, 'amzn'),
 (2437.4365640858978, 'blfs'),
 (3898.6004898285596, 'arcw'),
 (4005.0000000000005, 'adxs'),
 (7483.8389225948395, 'admp')]

The most profitable stock that we could have bought in `2007` was `admp`, which grew at around a rate of `7483` percent.