## Enriching stock market data using Open AI API 

<p align="center">
    <img src="images/nasdaq100.png" width="450">
</p>

The Nasdaq-100 is a stock market index made up of 101 equity securities issued by 100 of the largest non-financial companies listed on the Nasdaq stock exchange. It helps investors compare stock prices with previous prices to determine market performance.

In this project you are provided with two CSV files containing Nasdaq-100 stock information:
- _**nasdaq100_CA.csv**_: contains information about companies in the index such as symbol, name, etc. For this analysis, only companies headquartered in California have been selected.
- _**nasdaq100_price_change.csv**_: contains price changes per stock across periods including (but not limited to) one day, five days, one month, six months, one year, etc.

As an AI developer, you will leverage the OpenAI API to classify companies into sectors and produce a summary of sector and company performance for this year, for the companies in the index that are headquartered in California.

# CSV with Nasdaq-100 stock data

In this project, you have available two CSV files `nasdaq100_CA.csv` and `nasdaq100_price_change.csv`.

## nasdaq100_CA.csv

```py
symbol,name,headQuarter,dateFirstAdded,cik,founded
AAPL,Apple Inc.,"Cupertino, CA",,0000320193,1976-04-01
ABNB,Airbnb,"San Francisco, CA",,0001559720,2008-08-01
ADBE,Adobe Inc.,"San Jose, CA",,0000796343,1982-12-01
...
```

## nasdaq100_price_change.csv

```py
symbol,1D,5D,1M,3M,6M,ytd,1Y,3Y,5Y,10Y,max
AAPL,-1.7254,-8.30086,-6.20411,3.042,15.64824,42.99992,8.47941,60.96299,245.42031,976.99441,139245.53954
ABNB,2.1617,-2.21919,9.88336,19.43286,19.64241,68.66902,23.64013,-1.04347,-1.04347,-1.04347,-1.04347
ADBE,0.5409,-1.77817,9.16191,52.0465,38.01522,57.22723,21.96206,17.83037,109.05718,1024.69214,251030.66399
ADI,0.9291,-4.03352,2.58486,3.65887,5.01602,17.02062,8.09735,63.42847,92.81874,286.77518,26012.63736
...
```

In [1]:
# --- Imports and OpenAI client setup ---
import os
import pandas as pd
from openai import OpenAI

# Instantiate an API client
client = OpenAI()

#Read the CSV files into DataFrames
nasdaq100_ca = pd.read_csv("nasdaq100_CA.csv")
nasdaq100_price_change = pd.read_csv("nasdaq100_price_change.csv")

# Add the YTD performance to nasdaq100_ca
# Keep only the columns we need from the price change file
price_ytd = nasdaq100_price_change[["symbol", "ytd"]]

# Merge to bring in the ytd column based on the stock symbol
nasdaq100_ca = nasdaq100_ca.merge(price_ytd, on="symbol", how="left")

# Check
#nasdaq100_ca.head()

# Save this to nasdaq_ytd.csv as the project text mentions 
nasdaq100_ca.to_csv("nasdaq_ytd.csv", index=False)


ModuleNotFoundError: No module named 'pandas'

In [None]:
import pandas as pd

# Read the file that already has the ytd column
# (created in Cell 1 and saved as nasdaq_ytd.csv
nasdaq100_ca = pd.read_csv("nasdaq_ytd.csv")

# Keep a separate alias 'nasdaq' if the project text refers to it
nasdaq = nasdaq100_ca.copy()

# Print columns to debug missing headquarters column
print("Columns in nasdaq_ytd.csv:", list(nasdaq.columns))

# Try to detect the headquarters column, being more flexible
possible_hq_cols = [col for col in nasdaq.columns if "headquarter" in col.lower()]
if possible_hq_cols:
    hq_col = possible_hq_cols[0]
else:
    raise ValueError("Could not find a headquarters column in nasdaq_ytd.csv. Columns found: " + str(list(nasdaq.columns)))

# ---- Sector list ----
sectors = [
    "Technology",
    "Consumer Cyclical",
    "Industrials",
    "Utilities",
    "Healthcare",
    "Communication",
    "Energy",
    "Consumer Defensive",
    "Real Estate",
    "Financial",
]

def classify_sector(company_name: str, headquarters: str) -> str:
    """
    Use OpenAI to classify a company into one of the allowed sectors.
    Returns ONLY the sector name as a string.
    """
    prompt = f"""
You are a financial analyst. Classify the company described below
into exactly ONE of the following sectors:

{", ".join(sectors)}

Return only the sector name, nothing else.

Company name: {company_name}
Headquarters: {headquarters}
"""
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}],
        temperature=0
    )
    sector = response.choices[0].message.content.strip()
    return sector

# Apply the classifier to every row in the DataFram
nasdaq["sector"] = nasdaq.apply(
    lambda row: classify_sector(row["name"], row[hq_col]),
    axis=1
)

# Copy the sector column back into nasdaq100_ca so the grader finds it there
nasdaq100_ca["sector"] = nasdaq["sector"]

#check how many companies per sector
# nasdaq["sector"].value_counts()

Columns in nasdaq_ytd.csv: ['symbol', 'name', 'headQuarter', 'dateFirstAdded', 'cik', 'founded', 'ytd']


In [None]:
import pandas as pd

# Make sure nasdaq100_ca is in memory
try:
    nasdaq100_ca
except NameError:
    nasdaq100_ca = pd.read_csv("nasdaq_ytd.csv")

# If sector somehow isn't there (if Cell 2 wasn't run), you could re-run Cell 2

# Build the performance DataFrame from nasdaq100_ca
perf_df = nasdaq100_ca[["symbol", "name", "sector", "ytd"]].copy()

# Turn the performance data into text for the model
perf_text = perf_df.to_csv(index=False)

recommendation_prompt = f"""
You are an expert equity analyst.

Below is data for Nasdaq-100 companies headquartered in California.
Columns: symbol, company name, sector, and year-to-date percentage return (ytd).

Using this data:
1. Give a brief overall summary of how these stocks have performed year-to-date.
2. Identify the best performing sectors.
3. For each of the top sectors, recommend a few (2–3) attractive companies
   based on their ytd performance.

Be concise but specific. Mention sectors and stock symbols in your answer.

Data:
{perf_text}
"""

rec_response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": recommendation_prompt}],
    temperature=0.2,
)

stock_recommendations = rec_response.choices[0].message.content.strip()

# print(stock_recommendations)
