# Version 1: Stock Pricing Data

Currently there are 11 Stock Market Sectors:
- Energy: Companies that do business in the oil and natural gas industry, including exploration and production.
- Materials: Companies that provide various goods for use in manufacturing and other applications, makers of chemicals, construction materials, containers, etc.
- Industrials: Companies over a wide range of businesss such as Transportation (airlines, rr, etc.), Aerospace, Defense, Construction, Engineering.
- Utilities: Companies that provide electrical power, natural gas transmission and distribution, renewable energy.
- Healthcare: Made up of two primary components: Companies that develop pharma and treatments, Companies that create and provide healthcare equipment and services.
- Financials: Companies that handle money: Banks, Insurance Companies, Brokerage Houses, etc.
- Consumer Discretionary: Companies that sell higher priced items like automobiles, luxury goods, leisure products
- Consumer Staples: Food, beverage, and tobacco companies
- Information Technology: Computer Hardware, Software, Cybersecurity, etc.
- Communication Services: Telecommmunication Services, Media, and Entertainment
- Real Estate: Made up of two primary components: Companies responsible for developing new real estate projects, REITS (real estate investment trusts).

Version 1 of this project will include only Stock information, but as such will include all 11 sectors.

Using the initial version of this project, we will use the nasdaq and nyse csvs to pull all the various individual ticker values to pull the required information

In [1]:
import os
import pandas as pd
import pandas_datareader.data as web
import yfinance as yf

import datetime
import get_methods

In [2]:
# due to file size, set low_memory to False to alter load order
nasdaq = pd.read_csv("./data/nasdaq_csv.csv",index_col=0, low_memory=False)
nyse = pd.read_csv("./data/nyse_csv.csv",index_col=0, low_memory=False)

In [3]:
nasdaq_tickers = nasdaq['ticker'].unique().tolist()

In [4]:
nyse_tickers = nyse['ticker'].unique().tolist()

In [5]:
len(nyse_tickers), len(nasdaq_tickers)

(1145, 1564)

We have ~1k NYSE stock tickers and ~1.5K Nasdaq stock tickers to work with, let's jump to pandas-datareaders and start crafting a function that will pull the necessary ticker information and create a dataframe for us to host on AWS RDS

---
## Extract Stock Pricing Information by Ticker using pandas-datareader and yahoo! finance

To begin let's try 10 years worth of stock pricing data 2013 to 2023:

In [7]:
start = datetime.datetime(2013,7,19)
end = datetime.datetime(2023,7,19)

Let's test 1 stock ticker to understand the response and how we can:
- make a re-usable function to capture ticker prices
- create a postgresql database to store this information

In [4]:
yfin = yf.pdr_override()

In [12]:
test = web.get_data_yahoo('GOOG', start='2013-07-19', end='2023-07-19')

[*********************100%***********************]  1 of 1 completed


In [13]:
test.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2013-07-19,22.082479,22.489454,21.808506,22.331297,22.331297,295475379
2013-07-22,22.465794,22.731546,22.341259,22.68248,22.68248,116563276
2013-07-23,22.68248,22.739765,22.40527,22.510626,22.510626,82134711
2013-07-24,22.596802,22.672518,22.433414,22.488209,22.488209,83451629
2013-07-25,22.263302,22.337523,22.069279,22.109629,22.109629,120493954


Considering the ticker is not present in the table, and that we will be pulling considerable amounts of data for ~2500 tickers, it is probably best to store this information within a PostgreSQL DB and host on AWS to save local memory and performance.

To help performance but also be able to keep records of multiple stocks and their asscoiated time series data we will set up the database with two initial Tables:
- Ticker Table
    - Columns: 'ticker_id' (Primary key), 'ticker_symbol'
    - This table will store information about each stock ticker
    - Index on 'ticker_symbol' to speed up searches on specific stocks
- Price Table
    - Columns: 'price_id' (Primary key), 'ticker_id' (Foreign key referencing the Ticker table), 'date', 'open', 'high', 'low', 'close', 'adj_close', 'volume'
    - This table will store the stock price data for each ticker
    - Potentially create an index on both the 'ticker_id' and 'date' column for efficient time-based queries 
    
Other considerations:
- For 'price_id' and 'ticker_id' we will need to ensure uniqueness to prevent collisions between records, to start we will use the SERIAL datatype for the different ids which should be managed by PostgreSQL

### Next Steps:

1. Create our PostgreSQL Database
2. Create the Ticker Table
3. Load Ticker Information
4. Create the Price Table
5. Load Price Data
6. Create an Amazon RDS Instance
7. Connect to the Amazon RDS Instance
8. Import the schema
9. Load the Data
10. Verify the data