# Analysing Stock Prices

In this project, our objective is to conduct a comprehensive analysis of stock trades spanning from 2007 to 2017, occurring 17 years ago. 

**Our analysis will include:**

- Finding Profitable Stocks
- Minimum and Maximum Average Closing Prices
- Grouping trades per day to assess daily trading activity.
- Identifying the most traded stock each day to understand market dynamics.
- Searching for high volume trading days to identify periods of significant market activity.
- Evaluating the profitability of stocks to inform investment decisions.
- Identifying optimal candidates for short-selling at the start of the period and many more.

We'll analyze data from [Yahoo Finance](https://finance.yahoo.com/) using the [yahoo_finance](https://pypi.python.org/pypi/yahoo-finance) Python package. This data consists of the daily stock prices from 2007-1-1 to 2017-04-17 for several hundred stock symbols traded on the [NASDAQ](http://www.nasdaq.com/) stock exchange.

#### To get you started, in the prices folder there are various `.csv` file containing information about stock for different companies. Each `.csv` file contains the following information:
- **date:** the data's date
- **close:** the date's closing price
- **open:** the date's opening price
- **high:** the date's highest stock price during trading
- **low:** the date's lowest stock price during trading
- **volume:** the date's number of shares traded

To facilitate our analysis, we'll utilize a dictionary data structure. Pandas will be used to read in data from CSV files, with the dictionary key serving as a reference to information about any stock we want.

#### Summary of Results:
After analyzing the data, we were able to come up with analysis that showed us  what stock to buy, when we would have bought it, market trends and lot more. 

## Stock Price Data 

This Python code snippet below helps organize and manage data from multiple CSV files in the "prices" folder. 

Here's what it does:
- It gathers stock data from various files in the "prices" folder.
- Each file contains data for a different company's stock.
- The code combines all this data into a single, easy-to-access structure.
- This structured approach makes it simpler to analyze and make informed decisions based on the data.

In [1]:
import pandas as pd
import os

# Initialize an empty dictionary to store DataFrames
dataframes_dict = {}

# Loop through each file in the "prices" folder
for fn in os.listdir("prices"):
    # Read the CSV file into a DataFrame
    df = pd.read_csv(os.path.join("prices", fn))
    # Extract the filename (without the .csv extension) to use as the dictionary key
    key = os.path.splitext(fn)[0]
    # Store the DataFrame in the dictionary with the filename (without the .csv extension) as the key
    dataframes_dict[key] = df

# Now, dataframes_dict contains all the DataFrames read from the CSV files,
# with filenames (without the .csv extension) as keys

# Displaying data for the "aapl" symbol
if "aapl" in dataframes_dict:
    print("Data for symbol 'aapl'")
    print(dataframes_dict["aapl"])
else:
    print("Symbol 'aapl' not found in the data")


Data for symbol 'aapl'
            date       close        open        high         low     volume
0     2007-01-03   83.800002   86.289999   86.579999   81.899999  309579900
1     2007-01-04   85.659998   84.050001   85.949998   83.820003  211815100
2     2007-01-05   85.049997   85.770000   86.199997   84.400002  208685400
3     2007-01-08   85.470000   85.959998   86.529998   85.280003  199276700
4     2007-01-09   92.570003   86.450003   92.979999   85.150000  837324600
...          ...         ...         ...         ...         ...        ...
2585  2017-04-10  143.169998  143.600006  143.880005  142.899994   18473000
2586  2017-04-11  141.630005  142.940002  143.350006  140.059998   30275300
2587  2017-04-12  141.800003  141.600006  142.149994  141.009995   20238900
2588  2017-04-13  141.050003  141.910004  142.380005  141.050003   17652900
2589  2017-04-17  141.830002  141.479996  141.880005  140.869995   16424000

[2590 rows x 6 columns]


## Finding Minimum and Maximum Average Closing Prices

Now we can compute aggregrates, finding the average closing price of each stock, minimum average closing price over all stocks and maximum average closing price over all stocks

In [2]:
# Initialize dictionaries to store average closing prices and track minimum and maximum values
average_closing_prices = {}
min_average_price = float('inf')  # Initialize with positive infinity
max_average_price = float('-inf')  # Initialize with negative infinity
min_symbol = None
max_symbol = None

# Iterate through each key-value pair in dataframes_dict
for symbol, df in dataframes_dict.items():
    # Calculate the average closing price for the current stock symbol
    average_closing_price = df['close'].mean()
    # Store the average closing price in the average_closing_prices dictionary
    average_closing_prices[symbol] = average_closing_price
    
    # Update minimum and maximum average closing prices and corresponding symbols
    if average_closing_price < min_average_price:
        min_average_price = average_closing_price
        min_symbol = symbol
    if average_closing_price > max_average_price:
        max_average_price = average_closing_price
        max_symbol = symbol

# Display the stock symbols with the minimum and maximum average closing price
# Display the stock symbols with the minimum and maximum average closing price
print("Stock Symbol with Minimum Average Closing Price:", min_symbol)
print("Minimum Average Closing Price:", min_average_price)
print("Stock Symbol with Maximum Average Closing Price:", max_symbol)
print("Maximum Average Closing Price:", max_average_price)
print("\n")        
# Display the dictionary of average closing prices
print("Average Closing Prices for all Stocks:")
print(average_closing_prices)


Stock Symbol with Minimum Average Closing Price: blfs
Minimum Average Closing Price: 0.8122763011583011
Stock Symbol with Maximum Average Closing Price: amzn
Maximum Average Closing Price: 275.13407757104255


Average Closing Prices for all Stocks:
{'dgica': 14.986583006177607, 'bdge': 24.12035132432432, 'cvco': 53.36543631042471, 'blkb': 33.75537838185328, 'bbox': 25.997579137451737, 'ffbc': 16.002316604633204, 'fbiz': 22.958876448262547, 'ffic': 16.59364864787645, 'bdsi': 4.8207065644787646, 'amgn': 92.2331003965251, 'expe': 53.78315830308881, 'expd': 42.86821235366795, 'cur': 1.907691699604743, 'clct': 14.4366796011583, 'alny': 39.171486488030894, 'evol': 5.701853281853282, 'ahgp': 38.20530885868726, 'dfbg': 1.4005010393822395, 'afsi': 26.69982658918919, 'chy': 12.45603860888031, 'bmrn': 50.52171040733592, 'agys': 10.303613901544402, 'adrd': 22.51748262046332, 'drrx': 2.352779922779923, 'crus': 20.50549421969112, 'brew': 8.094903474517373, 'fbms': 14.224830092664092, 'emcf': 22.2199

## Finding the Most Traded Stock Each Day

Finding the most-traded stock can help us find trends in the broader market and see which companies are "hot" at which times.

In [3]:
# Initialize an empty dictionary to store the most-traded stock symbol and volume for each day
most_traded_stocks = {}

# Iterate through each key-value pair in dataframes_dict
for symbol, df in dataframes_dict.items():
    # Group the trades by date and sum the volume for each date
    grouped_trades = df.groupby('date')['volume'].sum()
    # Find the stock symbol with the highest trade volume for each date
    for date, volume in grouped_trades.items():
        if date not in most_traded_stocks or volume > most_traded_stocks[date][0]:
            most_traded_stocks[date] = (volume, symbol)

# Display the dictionary of the most-traded stock symbol and volume for each day
print("Most-Traded Stocks:")
for date, (volume, symbol) in most_traded_stocks.items():
    print(f"Date: {date}, Most-Traded Stock Symbol: {symbol}, Trade Volume: {volume}")


Most-Traded Stocks:
Date: 2007-01-03, Most-Traded Stock Symbol: aapl, Trade Volume: 309579900
Date: 2007-01-04, Most-Traded Stock Symbol: aapl, Trade Volume: 211815100
Date: 2007-01-05, Most-Traded Stock Symbol: aapl, Trade Volume: 208685400
Date: 2007-01-08, Most-Traded Stock Symbol: aapl, Trade Volume: 199276700
Date: 2007-01-09, Most-Traded Stock Symbol: aapl, Trade Volume: 837324600
Date: 2007-01-10, Most-Traded Stock Symbol: aapl, Trade Volume: 738220000
Date: 2007-01-11, Most-Traded Stock Symbol: aapl, Trade Volume: 360063200
Date: 2007-01-12, Most-Traded Stock Symbol: aapl, Trade Volume: 328172600
Date: 2007-01-16, Most-Traded Stock Symbol: aapl, Trade Volume: 311019100
Date: 2007-01-17, Most-Traded Stock Symbol: aapl, Trade Volume: 411565000
Date: 2007-01-18, Most-Traded Stock Symbol: aapl, Trade Volume: 591151400
Date: 2007-01-19, Most-Traded Stock Symbol: aapl, Trade Volume: 341118400
Date: 2007-01-22, Most-Traded Stock Symbol: aapl, Trade Volume: 363506500
Date: 2007-01-23, 

Apple was the most traded stock each day, more strongly from 2007 to 2016.

## Searching for High Volume Days

Let's find which 10 days had the most trade volume.

In [4]:
# Initialize a dictionary to store the total volume for each day
total_volume_per_day = {}

# Iterate through each DataFrame in dataframes_dict
for symbol, df in dataframes_dict.items():
    # Group the trades by date and sum the volume for each day
    grouped_trades = df.groupby('date')['volume'].sum()
    # Update the total volume for each day in total_volume_per_day
    for date, volume in grouped_trades.items():
        if date in total_volume_per_day:
            total_volume_per_day[date] += volume
        else:
            total_volume_per_day[date] = volume

# Sort the total volume per day dictionary by volume in descending order
sorted_total_volume_per_day = sorted(total_volume_per_day.items(), key=lambda x: x[1], reverse=True)

# Print the 10 highest volume days overall
print("Top 10 Highest Volume Days:")
print("Date\t\t\tTotal Volume")
for date, volume in sorted_total_volume_per_day[:10]:
    print(f"{date}\t{volume}")


Top 10 Highest Volume Days:
Date			Total Volume
2008-01-23	1964583900
2008-10-10	1770266900
2007-07-26	1611272800
2008-10-08	1599183500
2008-01-22	1578877700
2008-02-07	1559032100
2008-09-29	1555072400
2007-11-08	1553880500
2008-01-16	1536176400
2008-01-24	1533363200


## Finding Profitable Stocks
Let's see which stocks would have been the most profitable to buy 17 years ago

In [5]:
# Initialize an empty dictionary to store the percentage growth of each stock symbol
percentage_growth = {}

# Iterate through each stock symbol
for symbol, df in dataframes_dict.items():
    # Get the initial and final close prices
    initial_close = df['close'].iloc[0]
    final_close = df['close'].iloc[-1]
    
    # Calculate the percentage growth relative to the initial price
    growth_percentage = ((final_close - initial_close) / initial_close) * 100
    
    # Store the percentage growth in the dictionary
    percentage_growth[symbol] = growth_percentage

# Sort the percentage growths in descending order
sorted_growth = sorted(percentage_growth.items(), key=lambda x: x[1], reverse=True)

# Get the top 10 stocks with the highest growth
top_10_growth = sorted_growth[:10]

# Print the top 10 most profitable stocks to buy
print("Top 10 Most Profitable Stocks to Buy:")
for rank, (symbol, growth_percentage) in enumerate(top_10_growth, start=1):
    print(f"{rank}. Stock Symbol: {symbol}, Percentage Growth: {growth_percentage:.2f}%")


Top 10 Most Profitable Stocks to Buy:
1. Stock Symbol: admp, Percentage Growth: 7483.84%
2. Stock Symbol: adxs, Percentage Growth: 4005.00%
3. Stock Symbol: arcw, Percentage Growth: 3898.60%
4. Stock Symbol: blfs, Percentage Growth: 2437.44%
5. Stock Symbol: amzn, Percentage Growth: 2230.72%
6. Stock Symbol: anip, Percentage Growth: 1707.36%
7. Stock Symbol: apdn, Percentage Growth: 1549.67%
8. Stock Symbol: cui, Percentage Growth: 1525.16%
9. Stock Symbol: bcli, Percentage Growth: 1339.21%
10. Stock Symbol: achc, Percentage Growth: 1330.00%


# We've done some basic analysis of the data, but there's still quite a bit more we can find out:

### Which stocks would have been the best to short-sell at the start of the period?
To identify the stocks that would have been the best to short-sell at the start of the period, you would typically look for stocks that experienced the most significant decrease in price from the beginning to the end of the period.

In [6]:
# Initialize an empty dictionary to store the percentage decrease in price of each stock symbol
percentage_decrease = {}

# Iterate through each stock symbol
for symbol, df in dataframes_dict.items():
    # Get the initial and final close prices
    initial_close = df['close'].iloc[0]
    final_close = df['close'].iloc[-1]
    
    # Calculate the percentage decrease in price from the beginning to the end of the period
    decrease_percentage = ((initial_close - final_close) / initial_close) * 100
    
    # Store the percentage decrease in price in the dictionary
    percentage_decrease[symbol] = decrease_percentage

# Sort the percentage decrease in price in descending order
sorted_decrease = sorted(percentage_decrease.items(), key=lambda x: x[1], reverse=True)

# Get the top stocks with the highest percentage decrease in price
top_decrease_stocks = sorted_decrease[:10]

# Print the top stocks that would have been the best to short-sell at the start of the period
print("Top Stocks to Short-Sell at the Start of the Period:")
for rank, (symbol, decrease_percentage) in enumerate(top_decrease_stocks, start=1):
    print(f"{rank}. Stock Symbol: {symbol}, Percentage Decrease: {decrease_percentage:.2f}%")


Top Stocks to Short-Sell at the Start of the Period:
1. Stock Symbol: bont, Percentage Decrease: 98.33%
2. Stock Symbol: dcth, Percentage Decrease: 98.25%
3. Stock Symbol: cmls, Percentage Decrease: 97.52%
4. Stock Symbol: falc, Percentage Decrease: 96.17%
5. Stock Symbol: cetv, Percentage Decrease: 95.63%
6. Stock Symbol: atlc, Percentage Decrease: 93.22%
7. Stock Symbol: bbry, Percentage Decrease: 93.18%
8. Stock Symbol: evep, Percentage Decrease: 93.05%
9. Stock Symbol: clmt, Percentage Decrease: 91.16%
10. Stock Symbol: dest, Percentage Decrease: 91.07%


### Which stocks have the most after-hours trading, and show the biggest changes between the closing price and the next day open?

In [7]:
# Initialize an empty dictionary to store the percentage decrease in price of each stock symbol
percentage_decrease = {}

# Iterate through each stock symbol
for symbol, df in dataframes_dict.items():
    # Get the initial and final close prices
    initial_close = df['close'].iloc[0]
    final_close = df['close'].iloc[-1]
    
    # Calculate the percentage decrease in price from the beginning to the end of the period
    decrease_percentage = ((initial_close - final_close) / initial_close) * 100
    
    # Store the percentage decrease in price in the dictionary
    percentage_decrease[symbol] = decrease_percentage

# Sort the percentage decrease in price in descending order
sorted_decrease = sorted(percentage_decrease.items(), key=lambda x: x[1], reverse=True)

# Get the top stocks with the highest percentage decrease in price
top_decrease_stocks = sorted_decrease[:10]

# Print the top stocks that would have been the best to short-sell at the start of the period
print("Top Stocks to Short-Sell at the Start of the Period:")
for rank, (symbol, decrease_percentage) in enumerate(top_decrease_stocks, start=1):
    print(f"{rank}. Stock Symbol: {symbol}, Percentage Decrease: {decrease_percentage:.2f}%")


Top Stocks to Short-Sell at the Start of the Period:
1. Stock Symbol: bont, Percentage Decrease: 98.33%
2. Stock Symbol: dcth, Percentage Decrease: 98.25%
3. Stock Symbol: cmls, Percentage Decrease: 97.52%
4. Stock Symbol: falc, Percentage Decrease: 96.17%
5. Stock Symbol: cetv, Percentage Decrease: 95.63%
6. Stock Symbol: atlc, Percentage Decrease: 93.22%
7. Stock Symbol: bbry, Percentage Decrease: 93.18%
8. Stock Symbol: evep, Percentage Decrease: 93.05%
9. Stock Symbol: clmt, Percentage Decrease: 91.16%
10. Stock Symbol: dest, Percentage Decrease: 91.07%


## Which time periods have resulted in steady increases in prices, and which periods have resulted in steady declines?


In [8]:
# Convert the date column to datetime dtype
df['date'] = pd.to_datetime(df['date'])

# Group the data by month
df['month'] = df['date'].dt.month
monthly_data = df.groupby('month')

# Initialize variables to store trends
increasing_months = []
decreasing_months = []

# Iterate through each month
for month, month_df in monthly_data:
    # Calculate the price change from the first to the last day of the month
    first_price = month_df.iloc[0]['close']
    last_price = month_df.iloc[-1]['close']
    price_change = last_price - first_price

    # Determine the price trend for the month
    if price_change > 0:
        increasing_months.append(month)
    elif price_change < 0:
        decreasing_months.append(month)

# Print the months with increases and decreases in prices
print("Months with Increases in Prices:", increasing_months)
print("Months with Decreases in Prices:", decreasing_months)


Months with Increases in Prices: [5, 6, 7, 8, 9, 10, 11, 12]
Months with Decreases in Prices: [1, 2, 3, 4]


## Based on price, what was the optimal day to buy each stock if we wanted to hold them until now?

we can analyze historical price data for each stock and find the day with the lowest closing price

In [9]:
# Initialize variables to store optimal buy dates
optimal_buy_dates = {}

# Iterate through each stock symbol
for symbol, df in dataframes_dict.items():
    # Find the day with the lowest closing price
    optimal_buy_date = df.loc[df['close'].idxmin()]['date']
    
    # Store the optimal buy date for the stock symbol
    optimal_buy_dates[symbol] = optimal_buy_date

# Print the optimal buy dates for each stock
for symbol, buy_date in optimal_buy_dates.items():
    print(f"Optimal buy date for {symbol}: {buy_date}")


Optimal buy date for dgica: 2010-08-24
Optimal buy date for bdge: 2009-03-12
Optimal buy date for cvco: 2009-03-20
Optimal buy date for blkb: 2009-03-09
Optimal buy date for bbox: 2016-01-20
Optimal buy date for ffbc: 2009-03-09
Optimal buy date for fbiz: 2009-08-25
Optimal buy date for ffic: 2009-03-09
Optimal buy date for bdsi: 2011-12-28
Optimal buy date for amgn: 2008-03-19
Optimal buy date for expe: 2008-11-20
Optimal buy date for expd: 2009-03-09
Optimal buy date for cur: 2016-08-12
Optimal buy date for clct: 2008-12-30
Optimal buy date for alny: 2011-10-03
Optimal buy date for evol: 2008-12-09
Optimal buy date for ahgp: 2008-11-21
Optimal buy date for dfbg: 2015-03-18
Optimal buy date for afsi: 2008-11-20
Optimal buy date for chy: 2008-11-20
Optimal buy date for bmrn: 2009-03-09
Optimal buy date for agys: 2008-11-20
Optimal buy date for adrd: 2009-03-09
Optimal buy date for drrx: 2012-04-13
Optimal buy date for crus: 2009-01-20
Optimal buy date for brew: 2009-03-11
Optimal buy d