## Enriching stock market data using Open AI API 

<p align="center">
    <img src="images/nasdaq100.png" width="450">
</p>

The Nasdaq-100 is a stock market index made up of 101 equity securities issued by 100 of the largest non-financial companies listed on the Nasdaq stock exchange. It helps investors compare stock prices with previous prices to determine market performance.

In this project you are provided with two CSV files containing Nasdaq-100 stock information:
- _**nasdaq100.csv**_: contains information about companies in the index such as symbol, name, etc.
- _n**asdaq100_price_change.csv**_: contains price changes per stock across periods including (but not limited to) one day, five days, one month, six months, one year, etc.

As an AI developer, you will leverage the OpenAI API to classify companies into sectors and produce a summary of sector and company performance for this year.

# CSV with Nasdaq-100 stock data

In this project, you have available two CSV files `nasdaq100.csv` and `nasdaq100_price_change.csv`.

## nasdaq100.csv

```py
symbol,name,headQuarter,dateFirstAdded,cik,founded
AAPL,Apple Inc.,"Cupertino, CA",,0000320193,1976-04-01
ABNB,Airbnb,"San Francisco, CA",,0001559720,2008-08-01
ADBE,Adobe Inc.,"San Jose, CA",,0000796343,1982-12-01
ADI,Analog Devices,"Wilmington, MA",,0000006281,1965-01-01
...
```

## nasdaq100_price_change.csv

```py
symbol,1D,5D,1M,3M,6M,ytd,1Y,3Y,5Y,10Y,max
AAPL,-1.7254,-8.30086,-6.20411,3.042,15.64824,42.99992,8.47941,60.96299,245.42031,976.99441,139245.53954
ABNB,2.1617,-2.21919,9.88336,19.43286,19.64241,68.66902,23.64013,-1.04347,-1.04347,-1.04347,-1.04347
ADBE,0.5409,-1.77817,9.16191,52.0465,38.01522,57.22723,21.96206,17.83037,109.05718,1024.69214,251030.66399
ADI,0.9291,-4.03352,2.58486,3.65887,5.01602,17.02062,8.09735,63.42847,92.81874,286.77518,26012.63736
...
```

In [29]:
import os
import pandas as pd
from openai import OpenAI

# Define the model to use
model = "gemini-1.5-flash"

# Define the client
client = OpenAI(api_key=os.environ["GEMINI_API_KEY"],  base_url="https://generativelanguage.googleapis.com/v1beta/openai/")



In [30]:
# Read in the two datasets
nasdaq100 = pd.read_csv("nasdaq100.csv")
price_change = pd.read_csv("nasdaq100_price_change.csv")

# Add symbol into nasdaq100
nasdaq100 = nasdaq100.merge(price_change[["symbol", "ytd"]], on="symbol", how="inner")

# Preview the combined dataset
nasdaq100.head()

Unnamed: 0,symbol,name,headQuarter,dateFirstAdded,cik,founded,ytd
0,AAPL,Apple Inc.,"Cupertino, CA",,320193,1976-04-01,42.99992
1,ABNB,Airbnb,"San Francisco, CA",,1559720,2008-08-01,68.66902
2,ADBE,Adobe Inc.,"San Jose, CA",,796343,1982-12-01,57.22723
3,ADI,Analog Devices,"Wilmington, MA",,6281,1965-01-01,17.02062
4,ADP,ADP,"Roseland, NJ",,8670,1949-01-01,5.53732


In [31]:
import time

# Define a function to process a batch of companies
def process_batch(companies):
    try:
        # Create a prompt for the batch of companies
        prompt = f'''Classify the following companies into one of the following sectors. Answer only with the sector name for each company, separated by commas: Technology, Consumer Cyclical, Industrials, Utilities, Healthcare, Communication, Energy, Consumer Defensive, Real Estate, Financial.
        Companies: {", ".join(companies)}
        '''
        
        # Make the API request
        response = client.chat.completions.create(
            model=model,
            messages=[{ "role": "user", "content": prompt}],
            temperature=0.0,
        )
        
        # Extract the sectors from the response
        sectors = response.choices[0].message.content.split(", ")
        
        # Add the sectors to the DataFrame
        for company, sector in zip(companies, sectors):
            nasdaq100.loc[nasdaq100["symbol"] == company, "Sector"] = sector
        
    except Exception as e:
        print(f"Error processing batch: {e}")

# Split the companies into batches (e.g., 10 companies per batch)
batch_size = 10
for i in range(0, len(nasdaq100["symbol"]), batch_size):
    batch = nasdaq100["symbol"][i:i + batch_size]
    process_batch(batch)
    
    # Add a delay between batches (e.g., 10 seconds)
    time.sleep(5)  # Adjust the delay based on the API's rate limits

In [32]:
# Count the number of sectors
nasdaq100["Sector"].value_counts()

Technology                36
Healthcare                13
Consumer Cyclical         12
Consumer Defensive        10
Communication              7
Technology\n               5
Industrials                5
Utilities                  3
Consumer Cyclical\n        2
Financial                  2
Energy\n                   1
Industrials\n              1
Consumer Discretionary     1
Financial\n                1
Healthcare\n               1
Energy                     1
Name: Sector, dtype: int64

In [33]:
# Prompt to get stock recommendations
prompt = f'''Provide summary information about Nasdaq-100 stock performance year to date (YTD), recommending the three best sectors and three or more companies per sector.
            Company data: {nasdaq100} 
'''

# Get the model response
response = client.chat.completions.create(
        model=model,
        messages=[{ "role": "user", "content": prompt}],
        temperature=0.0,
    )


In [34]:
# Store the output as a variable and print the recommendations
stock_recommendations = response.choices[0].message.content
print(stock_recommendations)

The provided data shows a mixed performance for Nasdaq-100 stocks year-to-date (YTD).  While a complete analysis requires more data (e.g., standard deviation, correlation), based solely on the YTD percentage change, we can identify some strong performing sectors and companies.  Note that this is a simplified analysis and doesn't account for risk or other important investment factors.

**Top 3 Performing Sectors (based on available data):**

It's difficult to definitively rank sectors without more comprehensive data and statistical analysis. However, based on the provided sample,  Consumer Cyclical and Technology show strong YTD performance, with some companies in Communication Services also showing positive growth.  Utilities show negative performance in the sample.

**Top Companies (by sector, based on YTD performance within the sample):**

This selection is based solely on the provided YTD percentage change and is not a recommendation.  Thorough research is crucial before making any 