In this guided project, we'll work with stock market data that was downloaded from Yahoo Finance using the yahoo_finance Python package. This data consists of the daily stock prices from 2007-1-1 to 2017-04-17 for several hundred stock symbols traded on the NASDAQ stock exchange, stored in the prices folder. 

In [102]:
import pandas as pd
import os
import numpy as np

In [None]:
df = {}
# Read all CSV files from a folder and combine the data
for fn in os.listdir("prices"):
    name = fn.split('.')[0]
    df[name] = pd.read_csv(os.path.join("prices", fn))

In [36]:
df['aapl']

Unnamed: 0,date,close,open,high,low,volume
0,2007-01-03,83.800002,86.289999,86.579999,81.899999,309579900
1,2007-01-04,85.659998,84.050001,85.949998,83.820003,211815100
2,2007-01-05,85.049997,85.770000,86.199997,84.400002,208685400
3,2007-01-08,85.470000,85.959998,86.529998,85.280003,199276700
4,2007-01-09,92.570003,86.450003,92.979999,85.150000,837324600
...,...,...,...,...,...,...
2585,2017-04-10,143.169998,143.600006,143.880005,142.899994,18473000
2586,2017-04-11,141.630005,142.940002,143.350006,140.059998,30275300
2587,2017-04-12,141.800003,141.600006,142.149994,141.009995,20238900
2588,2017-04-13,141.050003,141.910004,142.380005,141.050003,17652900


## Average price

In [37]:
# Average closing price
df_mean_close = {}
for name in df:
    df_mean_close[name] = df[name]['close'].mean()
    
df_mean_close['aapl']

257.1765404023166

In [38]:
# Minimum average closing price
stock = 'aapl'
for name in df:
    if df_mean_close[name] < df_mean_close[stock]:
        stock = name
        
print('{} has the minimum average colsing price: {}'.format(
    stock, df_mean_close[stock]))

blfs has the minimum average colsing price: 0.8122763011583011


In [39]:
# Minimum average closing price
stock = 'aapl'
for name in df:
    if df_mean_close[name] > df_mean_close[stock]:
        stock = name
        
print('{} has the maximum average colsing price: {}'.format(
    stock, df_mean_close[stock]))

amzn has the maximum average colsing price: 275.13407757104255


## Trade volume

We'll calculate a dictionary where the keys are the dates and the values are a list of all trades from all stock symbols that occurred on that day. More precisely, for each day, we'll want a list of pairs (volume, stock_symbol) of all trades that occurred on that day.

In [48]:
daily_trans = {}
for name in df:
    for index, row in df[name].iterrows():
        date = row['date']
        volume = row['volume']
        if date not in daily_trans:
            daily_trans[date] = []
        daily_trans[date].append([volume, name])

In [66]:
most_traded_stock = {}
for date in daily_trans:
    # sort stock by volume, 1st column
    daily_trans[date].sort()
    most_traded_stock[date] = daily_trans[date][-1]

In [67]:
print(most_traded_stock['2007-01-03'])
print(most_traded_stock['2007-01-04'])
print(most_traded_stock['2007-01-05'])
print(most_traded_stock['2007-01-08'])

[309579900, 'aapl']
[211815100, 'aapl']
[208685400, 'aapl']
[199276700, 'aapl']


In [76]:
# daily total traded volume
total_traded_daily = []
for date in daily_trans:
    total_volume = sum([volume for volume, _ in daily_trans[date]])
    total_traded_daily.append([total_volume, date])

In [77]:
# top 10 most trade volume dates
total_traded_daily.sort()
total_traded_daily[-10:]

[[1533363200, '2008-01-24'],
 [1536176400, '2008-01-16'],
 [1553880500, '2007-11-08'],
 [1555072400, '2008-09-29'],
 [1559032100, '2008-02-07'],
 [1578877700, '2008-01-22'],
 [1599183500, '2008-10-08'],
 [1611272800, '2007-07-26'],
 [1770266900, '2008-10-10'],
 [1964583900, '2008-01-23']]

## Price difference

In this part, we will investigate which stock is the most and least profitable.

In [98]:
profit_list = []
for stock in df:
    start_price = df[stock]['close'].iloc[0]
    end_price = df[stock]['close'].iloc[-1]
    profit_perc = (end_price-start_price)/start_price*100
    profit_list.append([profit_perc, stock, start_price, end_price])

In [99]:
profit_list.sort()
profit_list[-10:]

[[1330.0000666666667, 'achc', 3.0, 42.900002],
 [1339.2137535980346, 'bcli', 0.280014, 4.03],
 [1525.1625162516252, 'cui', 0.279972, 4.55],
 [1549.6700659868025, 'apdn', 0.10002, 1.65],
 [1707.3554472785033, 'anip', 2.800224, 50.610001000000004],
 [2230.7234281466817, 'amzn', 38.700001, 901.98999],
 [2437.4365640858978, 'blfs', 0.080002, 2.03],
 [3898.60048982856, 'arcw', 0.100035, 4.0],
 [4005.0000000000005, 'adxs', 0.2, 8.21],
 [7483.8389225948395, 'admp', 0.059996, 4.55]]

The top 10 most profitable have more than 1000% grown. admp has the highest profitable among all, with a increse of 7483.8% from 0.059996.

In [101]:
# top 10 worst trade volume dates
profit_list.sort()
profit_list[:10]

[[-98.33424353725407, 'bont', 36.619999, 0.61],
 [-98.25072886297376, 'dcth', 3.43, 0.06],
 [-97.52144899904671, 'cmls', 10.49, 0.26],
 [-96.17224880382774, 'falc', 8.36, 0.32],
 [-95.62602538950107, 'cetv', 73.160004, 3.2],
 [-93.2156371644067, 'atlc', 39.650002, 2.69],
 [-93.17775158217626, 'bbry', 128.549995, 8.77],
 [-93.04932704257072, 'evep', 22.299999, 1.55],
 [-91.1604938271605, 'clmt', 40.5, 3.58],
 [-91.0659114315139, 'dest', 38.84, 3.47]]

On the other hand, bont shrunk most during 2017 for 98.3% from 36.62. It would have been best to short at the start of the period.

## After hours trade

We will find out which stock has the biggest changes between the closing price and the next day open.

In [125]:
after_hours_trade = {}

for stock in df:
    # add after hours trade column
    df[stock]['next_open'] = np.roll(df[stock]['open'],-1)
    df[stock]['abs_after_hours_trade'] = abs(df[stock]['next_open'] - df[stock]['close'])
    
    # remove last row for missing after hours trade
    for idx, row in df[stock].iloc[:-1].iterrows():
        date = row['date']
        value = row['abs_after_hours_trade']
        if date not in after_hours_trade:
            after_hours_trade[date] = []
        after_hours_trade[date].append([value, stock])

In [129]:
max_after_hours_trade_daily = []

for date in after_hours_trade:
    after_hours_trade[date].sort()
    value = after_hours_trade[date][-1][0]
    stock = after_hours_trade[date][-1][1]
    max_after_hours_trade_daily.append([value, stock])

In [131]:
max_after_hours_trade_daily[:5]

[[2.99999200000002, 'cme'],
 [2.0699960000000033, 'bbry'],
 [2.169999999999999, 'dvax'],
 [2.75, 'celg'],
 [3.75, 'fcnca']]