# <center> Analyzing Stock Prices

This is project will process with stock market data download from [Yahoo Finance](https://finance.yahoo.com/?guccounter=1&guce_referrer=aHR0cHM6Ly9hcHAuZGF0YXF1ZXN0LmlvLw&guce_referrer_sig=AQAAAKPs15pddqDC0ArpmhoVIgtjznOl8Yg8LMAQPNy4zGRr_lJvpk1A_6yIgVm7V_9AOR5R2I8S3cS7dveRAQMr1Ps-Ntf1TXHjjm5x0pA_cpeGIuM0dx9qDN54RpHQvVjy_3V8QdnIGg2lGRyu2J2EL8sJ7mm3giAwCccQOBvOP1cF) using the [yahoo_finance](https://pypi.org/project/yahoo-finance/) Python package. This data consists of the daily stock prices from 2007-1-1 to 2017-04-17 for several hundred symbols traded on the NASDAQ stock exchange, stored in the prices folder. Each file in the prices folder has a specific stock symbol for its name, and each contains the following information:

- date : the data's date
- close : the date's closing price
- open : the date's opening price
- high : the date's highest stock price during trading 
- low : the date's lowest stock price during trading 
- volumne : the date's number of shares traded 

## Read all files in folder 

In [1]:
import pandas as pd 
import os 

stock_prices = {} 
for fn in os.listdir("prices") : 
    df = pd.read_csv(os.path.join("prices", fn))
    stock = fn.split('.')[0]
    stock_prices[stock] = df

In [3]:
# Display the data corresponding to the aapl symbol
stock_prices['aapl'].head()

Unnamed: 0,date,close,open,high,low,volume
0,2007-01-03,83.800002,86.289999,86.579999,81.899999,309579900
1,2007-01-04,85.659998,84.050001,85.949998,83.820003,211815100
2,2007-01-05,85.049997,85.77,86.199997,84.400002,208685400
3,2007-01-08,85.47,85.959998,86.529998,85.280003,199276700
4,2007-01-09,92.570003,86.450003,92.979999,85.15,837324600


## Calculate aggregates 

After reading all file in folder and storing them into dictionary, we can use it to compute aggregate. 

- The average closing price of each stock 
- The minimum average closing price over all stocks 
- The maximum average closing price over all stocks 

In [4]:
# Calculate the stock symbol with the closing average price

average_closing_prices = {}
for stock in stock_prices : 
    average_closing_prices[stock] = stock_prices[stock]['close'].mean()
    print(f"Stock name : {stock}, Price : {average_closing_prices[stock]}")

Stock name : dgica, Price : 14.986583006177607
Stock name : bdge, Price : 24.12035132432432
Stock name : cvco, Price : 53.36543631042471
Stock name : blkb, Price : 33.75537838185328
Stock name : bbox, Price : 25.997579137451737
Stock name : ffbc, Price : 16.002316604633204
Stock name : fbiz, Price : 22.958876448262547
Stock name : ffic, Price : 16.59364864787645
Stock name : bdsi, Price : 4.8207065644787646
Stock name : amgn, Price : 92.2331003965251
Stock name : expe, Price : 53.78315830308881
Stock name : expd, Price : 42.86821235366795
Stock name : cur, Price : 1.907691699604743
Stock name : clct, Price : 14.4366796011583
Stock name : alny, Price : 39.171486488030894
Stock name : evol, Price : 5.701853281853282
Stock name : ahgp, Price : 38.20530885868726
Stock name : dfbg, Price : 1.4005010393822395
Stock name : afsi, Price : 26.69982658918919
Stock name : chy, Price : 12.45603860888031
Stock name : bmrn, Price : 50.52171040733592
Stock name : agys, Price : 10.303613901544402
Stock

Stock name : core, Price : 44.28576448223938
Stock name : exel, Price : 6.616277998455599
Stock name : allt, Price : 9.180019302702703
Stock name : algn, Price : 36.751629348648656
Stock name : exls, Price : 26.46239382084942
Stock name : dorm, Price : 34.767818543243244
Stock name : chfc, Price : 27.265100385714288
Stock name : audc, Price : 4.375227799227799
Stock name : dcth, Price : 2.8911660231660226
Stock name : elgx, Price : 8.976440163706563
Stock name : ainv, Price : 9.949749044015444
Stock name : bkmu, Price : 7.306324324324323
Stock name : artna, Price : 20.97944401119691
Stock name : asfi, Price : 11.159220083783783
Stock name : ceco, Price : 13.657787633204634
Stock name : ctws, Price : 30.461830105405404
Stock name : cvlt, Price : 39.000996127413124
Stock name : baby, Price : 21.25315058880309
Stock name : atrc, Price : 11.84171041969112
Stock name : cetv, Price : 24.057965252509653
Stock name : banf, Price : 49.64349804169885
Stock name : esgr, Price : 114.26885330617759

In [5]:
# Calculate the stock symbol with the minimum average closing price

stock, price = min(average_closing_prices.items(), key = lambda x : x[1])
print(f"The minimum average closing price is {round(price,3)}$ of {stock}")

The minimum average closing price is 0.812$ of blfs


In [6]:
# Calculate the stock symbol with the maximum average closing price

stock, price = max(average_closing_prices.items(), key = lambda x : x[1])
print(f"The maximum average closing price is {round(price,3)}$ of {stock}")

The maximum average closing price is 275.134$ of amzn


## Organize the trades by date

Now we will calculate a dictionary where the keys are the days and the values are lists of pairs (volumne, stock_symbol) of all trades that occured on that day. 

In [20]:
import math 
from multiprocessing import Pool
import functools 

def make_chunks(data, num_chunks) : 
    chunk_size = math.ceil(len(data) / num_chunks)
    return [data[i:i+chunk_size] for i in range(0, len(data), chunk_size)]

def map_reduce(data, num_processes, mapper, reducer) : 
    chunks = make_chunks(data, num_processes)
    with Pool(num_processes) as pool : 
        chunks_results = pool.map(mapper, chunks)
    return functools.reduce(reducer, chunks_results) 

def stock_mapper(stock_chunks) : 
    result = {}
    for stock in stock_chunks : 
        for index, row in stock_prices[stock].iterrows() : 
            day = row['date']
            volume = row['volume']
            pair = (volume, stock)
            if day not in result : 
                result[day] = []
            result[day].append(pair)
    return result 

def stock_reducer(dict1, dict2) : 
    for key, value in dict2.items() : 
        if key not in dict1 : 
            dict1[key] = dict2[key]
        else : 
            dict1[key] += dict2[key]
    return dict1

In [21]:
stock_names = list(stock_prices.keys())
num_processes = os.cpu_count() 

trades_by_day = map_reduce(stock_names, num_processes, stock_mapper, stock_reducer)

## Find the most-traded stock each day

Finding the most-traded stock can help to find trends in the broader market and see which companies are "hot" at which times.

In [24]:
most_traded_by_day = {} 

for day in trades_by_day : 
    most_traded_by_day[day] = sorted(trades_by_day[day], reverse = True)[0]

In [26]:
print(most_traded_by_day['2007-01-03'])
print(most_traded_by_day['2007-01-04'])
print(most_traded_by_day['2007-01-05'])
print(most_traded_by_day['2007-01-08'])

(309579900, 'aapl')
(211815100, 'aapl')
(208685400, 'aapl')
(199276700, 'aapl')


## Searching for High Volume Days

In order to search for all transactions on days with unusually high volumne, below task is needed.

- Compute total volumne of trading for each day
- Sort and find the 10 highest volume days overall

In [30]:
total_volume_by_day = []

for day in trades_by_day : 
    total_volume = sum([volumne for volumne, _ in trades_by_day[day]])
    total_volume_by_day.append((total_volume, day))

total_volume_by_day = sorted(total_volume_by_day, reverse = True)
total_volume_by_day[:10]

[(1964583900, '2008-01-23'),
 (1770266900, '2008-10-10'),
 (1611272800, '2007-07-26'),
 (1599183500, '2008-10-08'),
 (1578877700, '2008-01-22'),
 (1559032100, '2008-02-07'),
 (1555072400, '2008-09-29'),
 (1553880500, '2007-11-08'),
 (1536176400, '2008-01-16'),
 (1533363200, '2008-01-24')]

## Finding Profitable Stocks 

Our goal is to find which stocks would have been the most profitable to buy. 

- Subtracting the initial close price (first row) from the final close price (last row), then computing a percentage relative to the initial price. This tells us how much our initial investment woulde have grown or reduced.
- Sorting all of the percentages.
- Finding the 10 stocks that grew the most in the time period.

In [34]:
profitable_stocks = [] 

for stock in stock_prices : 
    initial = stock_prices[stock].loc[0, 'close']
    final = stock_prices[stock].loc[len(stock_prices)-1, 'close']
    percentage = 100 * (final - initial) / initial 
    profitable_stocks.append((percentage, stock))
    
profitable_stocks = sorted(profitable_stocks, reverse = True)
profitable_stocks[:10]

[(2748.9998500524816, 'arcw'),
 (1523.75, 'bvsn'),
 (847.6195238095239, 'cbmx'),
 (458.8235294117647, 'axgn'),
 (387.8048780487805, 'bstc'),
 (322.64470471442286, 'cldx'),
 (174.2621015348288, 'calm'),
 (144.73684210526318, 'fhco'),
 (125.64102564102566, 'bwen'),
 (95.36163147163397, 'cpla')]