## Stock Price Data

In this project we'll work with stock market data that was downloaded from Yahoo Finance using the <a href="https://pypi.org/project/yahoo-finance/">yahoo_finance</a> Python package. 

This data consists of the daily stock prices from 2007-1-1 to 2017-04-17 for several hundred stock symbols traded on the NASDAQ stock exchange, stored in the prices folder. The download_data.py script in the same folder as the Jupyter notebook was used to download all of the stock price data. Each file in the prices folder is named for a specific stock symbol, and contains the:


`date` -- date that the data is from.

`close` -- the closing price on that day, which is the price when the trading day ends.

`open` -- the opening price on that day, which is the price when the trading day starts.

`high` -- the highest price the stock reached during trading.

`low` -- the lowest price the stock reached during trading.

`volume` -- the number of shares that were traded during the day.


The prices are sorted in ascending order by day. Stock trading doesn't happen on certain days, like weekends and holidays, so there are gaps between days -- we only have data for days on which trading happening.

To read in and store all of the data, we will be using a dictionary where the values are the stock symbols (name of the file without the .csv extension) and the value associated with each key is a DataFrame storing the data from the CSV file.

For example, the aapl.csv data can be stored in an entry with key "aapl" and value is the DataFrame obtained by reading the CSV file, like so: pd.read_csv("prices/aapl.csv").



In [1]:
import pandas as pd
import os

stock_prices={}

for fn in os.listdir("prices"):
    name = fn.split(".")[0]
    stock_prices[name] = pd.read_csv(os.path.join("prices", fn))

We chose a dictionary where the keys are the stock symbols and the values are DataFrames with the from the corresponding CSV file.

In [2]:
stock_prices["aapl"].head()

Unnamed: 0,date,close,open,high,low,volume
0,2007-01-03,83.800002,86.289999,86.579999,81.899999,309579900
1,2007-01-04,85.659998,84.050001,85.949998,83.820003,211815100
2,2007-01-05,85.049997,85.77,86.199997,84.400002,208685400
3,2007-01-08,85.47,85.959998,86.529998,85.280003,199276700
4,2007-01-09,92.570003,86.450003,92.979999,85.15,837324600


### Computing Aggregates

#### Computing Average Closing Prices

In [3]:
avg_closing_price = {}

for stock in stock_prices:
    avg_closing_price[stock] = stock_prices[stock]["close"].mean()
    


In [4]:
# dict(list(stock_prices.items())[0:5])

### Displaying the average closing prices

In [5]:
# # correct way
# for stock in stock_prices:
#     print(stock, avg_closing_price[stock])

# shorted data
for stock in list(stock_prices)[:5]:
    print(stock, avg_closing_price[stock])


dgica 14.986583006177607
bdge 24.12035132432432
cvco 53.36543631042471
blkb 33.75537838185328
bbox 25.997579137451737


### Minimum and maximum closing prices

In [6]:
pairs = [(avg_closing_price[stock], stock) for stock in avg_closing_price]

pairs.sort()

print("Two minimum average closing prices:")
print(pairs[0], pairs[1])
print()
print("Two maximum average closing prices:")
print(pairs[-1], pairs[-2])

Two minimum average closing prices:
(0.8122763011583011, 'blfs') (0.8241009938223938, 'apdn')

Two maximum average closing prices:
(275.13407757104255, 'amzn') (257.1765404023166, 'aapl')


amzn and aapl have the highest average closing prices, while blfs, and apdn have the lowest average closing prices.



### Organizing the trades per day

We are going to calculate a dictionary where the keys are the days and the values are list of pairs (volume, stock_symbol) of all trades that occurred on that day.


In [8]:
trades_by_day = {}

for stock_sym in stock_prices:
    for index, row in stock_prices[stock_sym].iterrows():
        day = row["date"]
        volume = row["volume"]
        pair = (volume, stock_sym)
        if day not in trades_by_day:
            trades_by_day[day] = []
        trades_by_day[day].append(pair)


### Finding The Most Traded Stock Each Day

Calculate a dictionary there the keys are the days and the value of each day is a pair (volume, stock_symbol) with the most traded stock symbol on that day.


In [9]:
most_traded_by_day = {}

for day in trades_by_day:
    trades_by_day[day].sort()
    most_traded_by_day[day] = trades_by_day[day][-1]

### Verify a few of the results

In [13]:
print(most_traded_by_day['2007-01-03'])
print(most_traded_by_day['2007-01-04'])
print(most_traded_by_day['2007-01-05'])
print(most_traded_by_day['2007-01-08'])

(309579900, 'aapl')
(211815100, 'aapl')
(208685400, 'aapl')
(199276700, 'aapl')


### Searching For High Volume Days

In [23]:
total_volume_by_day=[]

for day in trades_by_day:
    day_volume = sum([volume for volume, _ in trades_by_day[day]])
    total_volume_by_day.append((day_volume, day))
    
total_volume_by_day.sort()

total_volume_by_day[-10:]

[(1533363200, '2008-01-24'),
 (1536176400, '2008-01-16'),
 (1553880500, '2007-11-08'),
 (1555072400, '2008-09-29'),
 (1559032100, '2008-02-07'),
 (1578877700, '2008-01-22'),
 (1599183500, '2008-10-08'),
 (1611272800, '2007-07-26'),
 (1770266900, '2008-10-10'),
 (1964583900, '2008-01-23')]

### Finding Profitable Stocks

In [33]:
most_profitable=[]

for stock_sym in stock_prices:
    prices = stock_prices[stock_sym]
    initial = prices.loc[0, "close"]
    final = prices.loc[prices.shape[0] - 1, "close"]
    percentage_diff = 100 * ((final - initial) / initial)
    most_profitable.append((percentage_diff, stock_sym))

most_profitable.sort()
most_profitable[-10:]

[(1330.0000666666667, 'achc'),
 (1339.2137535980346, 'bcli'),
 (1525.1625162516252, 'cui'),
 (1549.6700659868025, 'apdn'),
 (1707.3554472785033, 'anip'),
 (2230.7234281466817, 'amzn'),
 (2437.4365640858978, 'blfs'),
 (3898.60048982856, 'arcw'),
 (4005.0000000000005, 'adxs'),
 (7483.8389225948395, 'admp')]

The most profitable stock to buy in 2007 would have been `ADMP`, which appreciated from around 7 cents to its current price of 4.43.
