# Enriching Stock Market Data to Make Predictions Via OpenAI's API

## Introduction
In this project we'll use OpenAI's ChatGPT and data analytics to explore the NASDAQ 100 stock market. By the end, we should be able to identify investment opportunities and sector trends through stock price analysis and AI recommendations.

## Data Files Overview
Our analysis is grounded in two primary data sources:

- NASDAQ 100 List (```nasdaq100.csv```): This file contains a comprehensive list of companies within the NASDAQ 100 index, detailing essential information such as company names and symbols. 

- NASDAQ 100 Price Change (```nasdaq100_price_change.csv```): Complementing our company list, this file tracks the year-to-date (YTD) price changes for each company. This dynamic data allows us to assess performance trends and identify standout performers within the index.


## Setup and Initialization
In this section, we begin our project by setting up our environment. This involves two crucial steps:1. **Importing Necessary Libraries**: We import Python libraries essential for data manipulation, analysis, and interaction with the OpenAI API. This step is foundational, as these libraries provide the tools needed to handle datasets and communicate with external APIs.

2. **API Key Initialization**: We initialize the OpenAI API key, which is required for authenticating our requests to the OpenAI API. This API key allows us to use OpenAI's capabilities for generating stock recommendations based on our analysis.
ysis.

By completing these steps, we ensure that our environment is prepared with the necessary tools and permissions to proceed with the data analysis and integration with OpenAI's

- *Note*: See README.md for help setting up environmental variables to use your own OpenAI API key. services.

In [1]:
import os
import pandas as pd
import openai

api_key = os.getenv('stock_market_project')
# print(api_key)


In [2]:
# Initialize your API key
openai.api_key = api_key

## Data Loading and Merging

In this section, we focus on loading and preparing our datasets for analysis:

1. **Loading Datasets**: We start by reading in two crucial datasets: nasdaq100.csv, which contains information about companies listed in the NASDAQ 100 index, and nasdaq100_price_change.csv, which details the year-to-date (YTD) price changes of these companies.

2. **Merging Data**: To enrich our analysis, we merge these datasets on the symbol column, ensuring that we have both the company information and their respective YTD price changes in a single dataset. This step is vital for a holistic analysis as it combines static company information with dynamic financial performance indicators.

3. **Data Preview**: Finally, we preview the combined dataset to ensure the merge was successful and to get an initial sense of the data structure and contents.

In [3]:
# Read in the two datasets
nasdaq100 = pd.read_csv("nasdaq100.csv")
price_change = pd.read_csv("nasdaq100_price_change.csv")

# Add symbol into nasdaq100
nasdaq100 = nasdaq100.merge(price_change[["symbol", "ytd"]], on="symbol", how="inner")

# Preview the combined dataset
nasdaq100.head()

Unnamed: 0,symbol,name,headQuarter,dateFirstAdded,cik,founded,ytd
0,AAPL,Apple Inc.,"Cupertino, CA",,320193,1976-04-01,42.99992
1,ABNB,Airbnb,"San Francisco, CA",,1559720,2008-08-01,68.66902
2,ADBE,Adobe Inc.,"San Jose, CA",,796343,1982-12-01,57.22723
3,ADI,Analog Devices,"Wilmington, MA",,6281,1965-01-01,17.02062
4,ADP,ADP,"Roseland, NJ",,8670,1949-01-01,5.53732


## Sector Classification via AI
Adds 'Sector' to companies, leveraging AI for classification:

1. **Check and Add 'Sector'**: Ensures a 'Sector' column exists, adding if absent.
   
2. **AI-Powered Classification**: For each company, an AI model classifies it into a specific sector based on a dynamic prompt, updating the dataset with these insights. This process systematically enriches the dataset with sector information, facilitating detailed sector-based analysis.

In [4]:
# Add a new column for Sector if it doesn't exist
if "Sector" not in nasdaq100.columns:
    nasdaq100["Sector"] = pd.NA
    
# Loop through the NASDAQ companies
for company in nasdaq100["symbol"]:
    # Dynamically create a prompt for each company
    dynamic_prompt = f'''Classify company {company} into one of the following sectors. Answer only with the sector name: Technology, Consumer Cyclical, Industrials, Utilities, Healthcare, Communication, Energy, Consumer Defensive, Real Estate, Financial.'''

    # Create a response from ChatGPT
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": dynamic_prompt}],
        temperature=0.0,
    )
    # Extract the sector from the response
    sector = response['choices'][0]['message']['content'].strip()
    
    # Add the sector to the corresponding company in the DataFrame
    nasdaq100.loc[nasdaq100["symbol"] == company, "Sector"] = sector


In [5]:
# Example of how to prepare company_data
company_data_list = nasdaq100.apply(lambda x: f"{x['symbol']} in the {x['Sector']} sector has a YTD performance of {x['ytd']}", axis=1).tolist()
company_data = " ".join(company_data_list)


## Generating Stock Recommendations with AI
We craft a prompt to guide AI in offering stock investment insights:

1. **Prompt Creation**: We formulate a prompt with company_data for the OpenAI API, requesting the top sectors and companies based on YTD performance.

2. **Model Interaction**: The prompt is sent to OpenAI's model, fetching tailored stock recommendations.

This process exemplifies leveraging AI for strategic investment advice, directly linking data analysis to actionable insights.

In [6]:
# Create a prompt for stock recommendations including the dynamically created company_data
recommendation_prompt = f'''Based on the following Nasdaq-100 company data, recommend the three best sectors and three or more companies per sector for stock performance year to date (YTD):
{company_data}
'''

# Get the model response
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": recommendation_prompt}],
    temperature=0.0,
)

# Extract and print the stock recommendations
stock_recommendations = response['choices'][0]['message']['content']
print(stock_recommendations)


Based on the YTD performance data provided, the three best sectors for stock performance are:

1. Technology sector:
   - NVDA (YTD performance: 217.26511)
   - AMD (YTD performance: 82.45861)
   - MRVL (YTD performance: 76.57683)
   - ABNB (YTD performance: 68.66902)
   - LRCX (YTD performance: 70.25344)

2. Healthcare sector:
   - ALGN (YTD performance: 70.17618)
   - MRVL (YTD performance: 76.57683)
   - SGEN (YTD performance: 50.4662)
   - IDXX (YTD performance: 25.20381)
   - ISRG (YTD performance: 16.41757)

3. Consumer Cyclical sector:
   - TSLA (YTD performance: 132.6087)
   - MELI (YTD performance: 64.48294)
   - BKNG (YTD performance: 59.58046)
   - MAR (YTD performance: 38.54484)
   - LULU (YTD performance: 18.2819)

Please note that these recommendations are based solely on the YTD performance data provided and do not take into account other factors such as financial health, market conditions, or future prospects of the companies. It is important to conduct thorough researc

## Conclusion
The analysis indicates that the Technology, Healthcare, and Consumer Cyclical sectors have outperformed others based on year-to-date (YTD) performance. Within these sectors, NVDA leads Technology with significant growth, while TSLA stands out in Consumer Cyclical, and the Healthcare sector shows strong performance from ALGN and MRVL. These results highlight sectoral strengths and potential investment opportunities, emphasizing the importance of comprehensive research beyond YTD performance for informed decision-making