# Analyzing Stock Prices
---

In this guided project, we'll work with stock market data that was downloaded from [Yahoo Finance](https://finance.yahoo.com/) using the [yahoo_finance](https://pypi.python.org/pypi/yahoo-finance) Python package. This data consists of the daily stock prices from `2007-1-1` to `2017-04-17` for several hundred stock symbols traded on the [NASDAQ](http://www.nasdaq.com/) stock exchange, stored in the `prices` folder. The `download_data.py` script in the same folder as the Jupyter notebook was used to download all of the stock price data. Each file in the prices folder is named for a specific stock symbol, and contains the:

- `date` -- date that the data is from.
- `close` -- the closing price on that day, which is the price when the trading day ends.
- `open` -- the opening price on that day, which is the price when the trading day starts.
- `high` -- the highest price the stock reached during trading.
- `low` -- the lowest price the stock reached during trading.
- `volume` -- the number of shares that were traded during the day.

Stock trading doesn't happen on certain days, like weekends and holidays, so there are gaps between days -- we only have data for days on which trading happening.

### 1. Stock Price Data
We will begin by reading all the `CSV` files from the `prices` folder and storing them in an empty dictionary.

In [1]:
# Importing the libraries
import pandas as pd
import os

# Creating an empty dictionary
stock_prices = {}

# Looping through every entry in the folder
for fn in os.listdir("prices"):
    name = fn.split(".")[0]
    stock_prices[name] = pd.read_csv(os.path.join("prices", fn))

To test whether we have successfully imported the data in, we will display the data for Apple (which is denoted by `AAPL`).

In [2]:
# Displaying Apple's stock
stock_prices['aapl'].head(10)

Unnamed: 0,date,close,open,high,low,volume
0,2007-01-03,83.800002,86.289999,86.579999,81.899999,309579900
1,2007-01-04,85.659998,84.050001,85.949998,83.820003,211815100
2,2007-01-05,85.049997,85.77,86.199997,84.400002,208685400
3,2007-01-08,85.47,85.959998,86.529998,85.280003,199276700
4,2007-01-09,92.570003,86.450003,92.979999,85.15,837324600
5,2007-01-10,96.999997,94.749999,97.800002,93.450003,738220000
6,2007-01-11,95.800003,95.94,96.779999,95.1,360063200
7,2007-01-12,94.620003,94.590002,95.059999,93.229998,328172600
8,2007-01-16,97.099999,95.68,97.250003,95.450002,311019100
9,2007-01-17,94.949997,97.560003,97.599998,94.820001,411565000


### 2. Minimum and Maximum Average Closing Prices
We will try and calculate the minimum and the maximum closing prices for every index we have. To do this, we can use the `DataFrame.mean()` method from the `pandas` library.

In [3]:
# Creating an empty dictionary to store the closing prices
avg_closing_prices = {}

# Iterating through every symbols
for stock_sym in stock_prices:
    avg_closing_prices[stock_sym] = stock_prices[stock_sym]["close"].mean()

We have successfully stored the average closing price to a dictionary. The problem is that we cannot actualy sort them right away since they are ordered based on the keys which are strings. We can actually flip the order and store them in a list, sort them, and pull out the maximum and minimum that way.

In [4]:
# Flipping the order
pairs = [(avg_closing_prices[stock_sym], stock_sym) for stock_sym in stock_prices]

# Sorting the list
pairs.sort(reverse = True)

# Getting the 3 highest average closing price
print("3 maximum average closing prices:")
print(pairs[0])
print(pairs[1])
print(pairs[2],'\n')

# Getting the 3 lowest average closing price
print("3 minimum average closing prices:")
print(pairs[-1])
print(pairs[-2])
print(pairs[-3])

3 maximum average closing prices:
(275.13407757104255, 'amzn')
(257.1765404023166, 'aapl')
(230.2946601100386, 'cme') 

3 minimum average closing prices:
(0.8122763011583011, 'blfs')
(0.8241009938223938, 'apdn')
(0.901011583011583, 'bmra')


We can see that the three stocks with the highest average closing price in that period are Amazon (`AMZN`), Apple (`AAPL`), and CME Group (`CME`). Contrary to that, the three lowest are Biolife Solutions (`BLFS`), Applied DNA Sciences (`APDN`), and Biomerica (`BMRA`).

### 3. Grouping Trades per Day
We will now group trades per day, more precisely, for each day, we'll want a list of pairs `(volume, stock_symbol)` of all trades that occurred on that day.

In [5]:
# Creating an empty dictionary
trades_by_day = {}

# Looping through every symbols and rows
for stock_sym in stock_prices:
    for index, row in stock_prices[stock_sym].iterrows():
        day = row["date"]
        volume = row["volume"]
        pair = (volume, stock_sym)
        if day not in trades_by_day:
            trades_by_day[day] = []
        trades_by_day[day].append(pair)

### 4. Finding The Most Traded Stock Each Day
We will try and find what stock is the most traded at that particular day. As mentioned before we have stored trades for every symbol for every day in `(volume, stock_symbol)` format, thus we can actually sort it and pull the first entry.

In [6]:
# Creating an empty dictionary
most_traded_by_day = {}

# Looping through the day by day trades
for day in trades_by_day:
    trades_by_day[day].sort(reverse = True)
    most_traded_by_day[day] = trades_by_day[day][0]

To see whether it works or not, we will pull a couple of entries out of the dictionary.

In [7]:
# Checking if the result is in the correct format
print(most_traded_by_day['2013-07-19'])
print(most_traded_by_day['2015-12-22'])
print(most_traded_by_day['2016-07-19'])

(151516000, 'amd')
(32789400, 'aapl')
(23779900, 'aapl')


### 5. Searching For High Volume Days
We will now find which days yield the most volume of trades. We will be limiting our findings to the 10 days with most trades by volume.

In [8]:
# Creating an empty list
daily_volumes = []

# Looping through the trades day by day
for day in trades_by_day:
    day_volume = sum([volume for volume, _ in trades_by_day[day]])
    daily_volumes.append((day_volume, day))

# Sorting the list
daily_volumes.sort(reverse = True)

# Slicing the 10 highest by volume
daily_volumes[:10]

[(1964583900, '2008-01-23'),
 (1770266900, '2008-10-10'),
 (1611272800, '2007-07-26'),
 (1599183500, '2008-10-08'),
 (1578877700, '2008-01-22'),
 (1559032100, '2008-02-07'),
 (1555072400, '2008-09-29'),
 (1553880500, '2007-11-08'),
 (1536176400, '2008-01-16'),
 (1533363200, '2008-01-24')]

### 6. Finding Profitable Stocks
We will now try and find the most profitable stocks. To do this we will:
1. Subtract the initial close price (first row) from the final close price (last row), then computing a percentage relative to the initial price. This will tell us how much our initial investment would have grown or shrunk.
2. Sort all of the percentages.
3. Find the ten stocks that grew the most in the time period.

In [9]:
# Creating an empty list
percentages = []

# Looping through every symbol
for sym in stock_prices:
    data = stock_prices[sym]
    initial = data.loc[0, 'close']
    final = data.loc[len(data) - 1, 'close']
    per = (final-initial)*100/initial
    percentages.append((per, sym))
    
# Sorting the list
percentages.sort(reverse = True)

# Getting the top 10
percentages[:10]

[(7483.8389225948395, 'admp'),
 (4005.0000000000005, 'adxs'),
 (3898.6004898285596, 'arcw'),
 (2437.4365640858978, 'blfs'),
 (2230.7234281466817, 'amzn'),
 (1707.3554472785036, 'anip'),
 (1549.6700659868027, 'apdn'),
 (1525.162516251625, 'cui'),
 (1339.2137535980346, 'bcli'),
 (1330.0000666666667, 'achc')]

From the result above, we can deduce that `ADMP` or Adamis Pharmaceuticals Corporation has the highest growth over that period of time.