## Introduction: Analyzing Stock Prices Project

In this project, stock market data that was downloaded from Yahoo Finance using the yahoo_finance Python package. This data consists of the daily stock prices from 2007-1-1 to 2017-04-17 for several hundred stock symbols traded on the NASDAQ stock exchange, stored in the prices folder (in this instance). The download_data.py script in the same folder as the Jupyter notebook was used to download all of the stock price data. Each file in the prices folder is named for a specific stock symbol, and contains the following columns:
- date -- date that the data is from.
- close -- the closing price on that day, which is the price when the trading day ends.
- open -- the opening price on that day, which is the price when the trading day starts.
- high -- the highest price the stock reached during trading.
- low -- the lowest price the stock reached during trading.
- volume -- the number of shares that were traded during the day.date -- date that the data is from.
- close -- the closing price on that day, which is the price when the trading day ends.
- open -- the opening price on that day, which is the price when the trading day starts.
- high -- the highest price the stock reached during trading.
- low -- the lowest price the stock reached during trading.
- volume -- the number of shares that were traded during the day.

In [1]:
## Importing required libraries and storing each txt file in a seperate key within a dictionary
import os
import pandas as pd

files = os.listdir("prices") # create iterable of all files in prices folder

stock_price_data = {}

for file in files:
    # Get the name of the file without extension
    name = file.split(".")[0]
    stock_price_data[name] = pd.read_csv(os.path.join("prices", file))    

# obtain oversight of all files stored in dictionary
lst_files = stock_price_data.keys()
lst_files

dict_keys(['aal', 'aame', 'aaon', 'aapl', 'aaww', 'aaxn', 'abax', 'abcb', 'abco', 'abeo', 'abio', 'abmd', 'abtl', 'acad', 'acet', 'acfc', 'acgl', 'achc', 'achn', 'aciw', 'acls', 'acnb', 'acor', 'acta', 'actg', 'acxm', 'adbe', 'adi', 'admp', 'adp', 'adra', 'adrd', 'adre', 'adru', 'adsk', 'adtn', 'adxs', 'aegn', 'aehr', 'aeis', 'aemd', 'aey', 'aezs', 'afam', 'afsi', 'agen', 'agii', 'agys', 'ahgp', 'ahpi', 'aimc', 'ainv', 'aiq', 'airm', 'airt', 'akam', 'akrx', 'alco', 'algn', 'algt', 'alks', 'allt', 'alny', 'alog', 'alot', 'alqa', 'alsk', 'alxn', 'amag', 'amat', 'amd', 'amed', 'amgn', 'amkr', 'amnb', 'amot', 'amrb', 'amri', 'amrn', 'amsc', 'amsf', 'amswa', 'amtd', 'amwd', 'amzn', 'anat', 'ancx', 'ande', 'ango', 'anik', 'anip', 'anss', 'aobc', 'apdn', 'aplp', 'apog', 'apps', 'apri', 'apwc', 'aray', 'arcb', 'arcc', 'arci', 'arcw', 'ardm', 'arii', 'aris', 'arkr', 'arlp', 'arlz', 'arna', 'arow', 'arql', 'arrs', 'arry', 'artna', 'artw', 'artx', 'arwr', 'asfi', 'asml', 'asna', 'asrv', 'asrvp', 

Displaying data stored for the aapl stock symbol (to get a feel for the data):

In [2]:
stock_price_data["aapl"].head()

Unnamed: 0,date,close,open,high,low,volume
0,2007-01-03,83.800002,86.289999,86.579999,81.899999,309579900
1,2007-01-04,85.659998,84.050001,85.949998,83.820003,211815100
2,2007-01-05,85.049997,85.77,86.199997,84.400002,208685400
3,2007-01-08,85.47,85.959998,86.529998,85.280003,199276700
4,2007-01-09,92.570003,86.450003,92.979999,85.15,837324600


### Computing some aggregates
#### 1. Displaying average closing prices for each file

In [3]:
average_closing_prcs = {}

for stock_sym in stock_price_data:
    closing_average = stock_price_data[stock_sym]["close"].mean()
    average_closing_prcs[stock_sym] = closing_average 

average_closing_prcs

{'aal': 22.074953666795338,
 'aame': 2.7796795366795344,
 'aaon': 23.617386061776063,
 'aapl': 257.17654040231656,
 'aaww': 44.331602290347405,
 'aaxn': 11.863907341698845,
 'abax': 34.57868337992275,
 'abcb': 17.990475994208477,
 'abco': 47.647057967567655,
 'abeo': 2.5932200772200797,
 'abio': 2.2518008000000007,
 'abmd': 33.222420861003854,
 'abtl': 6.233108108494209,
 'acad': 13.82358687490347,
 'acet': 12.655212363320476,
 'acfc': 5.596733538610015,
 'acgl': 63.32590737683382,
 'achc': 24.047795338223956,
 'achn': 5.941177606949804,
 'aciw': 28.27269496023163,
 'acls': 3.343806946718146,
 'acnb': 17.343528900386115,
 'acor': 27.47286873938217,
 'acta': 11.32055983706564,
 'actg': 15.997490346718152,
 'acxm': 18.26306178378379,
 'adbe': 51.19943628416986,
 'adi': 42.24018144826256,
 'admp': 1.7122164397683428,
 'adp': 61.03234735559846,
 'adra': 27.3514517397683,
 'adrd': 22.51748262046331,
 'adre': 39.14505407104248,
 'adru': 22.37166796177606,
 'adsk': 42.247594632818625,
 'adtn'

#### 2. Calculating maximun and minimum average closing prices

In [4]:
pairs = [(average_closing_prcs[stock_sym], stock_sym) for stock_sym in average_closing_prcs]
pairs.sort()

print("five minimum average closing prices:")
for index, pair in enumerate(pairs[0:5]):
    print(index + 1, pair)
print("\n")

print("Five maximum average closing prices:")
pairing = []
for pair in pairs[-5:]:
    pairing.append(pair)
pairing_upd = reversed(sorted(pairing))

for index, pair in enumerate(pairing_upd):
    print(index + 1, pair)

five minimum average closing prices:
1 (0.8122763011583004, 'blfs')
2 (0.824100993822394, 'apdn')
3 (0.901011583011584, 'bmra')
4 (0.9969415324324327, 'bcli')
5 (1.1615408884169918, 'cyrx')


Five maximum average closing prices:
1 (275.1340775710431, 'amzn')
2 (257.17654040231656, 'aapl')
3 (230.2946601100388, 'cme')
4 (228.3897761598455, 'atri')
5 (200.25248278146725, 'fcnca')


It appears the amzn and aapl have the highest average closing prices, while blfs, and apdn have the lowest average closing prices.

Organizing the trades per day
Calculating a dictionary where the keys are the days and the values are list of pairs (volume, stock_symbol) of all trades that occurred on that day.

In [5]:
trades_by_day = {}

for stock_sym in stock_price_data:
    for index, row in stock_price_data[stock_sym].iterrows():
        day = row["date"]
        volume = row["volume"]
        pair = (volume, stock_sym)
        if day not in trades_by_day:
            trades_by_day[day] = []
        trades_by_day[day].append(pair)

### 3. Finding The Most Traded Stock Each Day
Calculating a dictionary where the keys are the days and the value of each day is a pair (volume, stock_symbol) with the most traded stock symbol on that day.

In [6]:
most_traded_by_day = {}

for day in trades_by_day:
    trades_by_day[day].sort() #sorting list values (dictionary values)
    most_traded_by_day[day] = trades_by_day[day][-1]

Verifying a few of the results

In [7]:
print(f"{most_traded_by_day['2007-01-03'][0]:,}", most_traded_by_day['2007-01-03'][1])
print(f"{most_traded_by_day['2008-01-03'][0]:,}", most_traded_by_day['2008-01-03'][1])
print(f"{most_traded_by_day['2009-02-03'][0]:,}", most_traded_by_day['2009-02-03'][1])
print(f"{most_traded_by_day['2010-02-03'][0]:,}", most_traded_by_day['2010-02-03'][1])

309,579,900 aapl
210,516,600 aapl
149,827,300 aapl
153,832,000 aapl


### 4. Searching For High Volume Days (top 10 per day)

In [8]:
all_daily_volumes = []

for day in trades_by_day:
    day_volume = sum([volume for volume, _ in trades_by_day[day]])
    all_daily_volumes.append((day_volume, day))

all_daily_volumes.sort()
for amount, day in reversed(all_daily_volumes[-10:]):
    print("volume amount:", f"{amount:,}", " ", "day:", day)    

volume amount: 1,964,583,900   day: 2008-01-23
volume amount: 1,770,266,900   day: 2008-10-10
volume amount: 1,611,272,800   day: 2007-07-26
volume amount: 1,599,183,500   day: 2008-10-08
volume amount: 1,578,877,700   day: 2008-01-22
volume amount: 1,559,032,100   day: 2008-02-07
volume amount: 1,555,072,400   day: 2008-09-29
volume amount: 1,553,880,500   day: 2007-11-08
volume amount: 1,536,176,400   day: 2008-01-16
volume amount: 1,533,363,200   day: 2008-01-24


### 5. Finding Profitable Stocks

In [9]:
percentages = []

# Calculating growth percentage from start to end in each different stock file to find highest growth)
for stock_sym in stock_price_data:
    prices = stock_price_data[stock_sym]
    initial = prices.loc[0, "close"]
    final = prices.loc[prices.shape[0] - 1, "close"]
    percentage = 100 * (final - initial) / initial
    percentages.append((percentage, stock_sym))

percentages.sort(reverse = True)

percentages[:10]

[(7483.8389225948395, 'admp'),
 (4005.0000000000005, 'adxs'),
 (3898.6004898285596, 'arcw'),
 (2437.4365640858978, 'blfs'),
 (2230.7234281466817, 'amzn'),
 (1707.355447278503, 'anip'),
 (1549.6700659868027, 'apdn'),
 (1525.162516251625, 'cui'),
 (1339.2137535980346, 'bcli'),
 (1330.0000666666667, 'achc')]

Based on the above, the most profitable stock to buy in 2007 would have been ADMP, which appreciated from around 7 cents to its current price of 4.43.