# Predicting AAPL price momentum

1. Importing data from an API

In this section, we import data using an API key and store it in a pandas datframe. In this case, we are using an API key from Polygon.io, a stock data website.

In [5]:
import pandas as pd
import requests
url = f"https://api.polygon.io/v2/aggs/ticker/AAPL/range/1/minute/2025-01-09/2025-02-10?apiKey=vFDjkUVRfPnedLrbRjm75BZ9CJHz3dfv"
response = requests.get(url)
data = response.json()


in the code below, we convert the raw data obtained from polygon's json response and convert it into a dataframe 'df'.

In [7]:
if 'results' in data :
  df = pd.DataFrame(data['results'])
else :
  raise ValueError("No data obtained")
df['t'] = pd.to_datetime(df['t'], unit='ms')
df = df.rename(columns={'t': 'timestamp', 'o': 'open', 'h': 'high', 'l': 'low', 'c': 'close', 'v': 'volume'})
df = df[['timestamp', 'open', 'high', 'low', 'close', 'volume']]

df

Unnamed: 0,timestamp,open,high,low,close,volume
0,2025-01-10 09:00:00,243.4400,244.00,241.7000,242.2700,4980.0
1,2025-01-10 09:01:00,242.1500,242.15,241.8800,241.8800,1030.0
2,2025-01-10 09:02:00,242.0600,242.12,242.0600,242.1200,1191.0
3,2025-01-10 09:04:00,242.1700,242.17,242.1300,242.1300,644.0
4,2025-01-10 09:07:00,242.2800,242.28,242.2800,242.2800,133.0
...,...,...,...,...,...,...
4995,2025-01-21 18:50:00,220.8200,220.91,220.7800,220.9050,78447.0
4996,2025-01-21 18:51:00,220.8931,220.91,220.8000,220.8300,63273.0
4997,2025-01-21 18:52:00,220.8300,220.84,220.7600,220.7816,58101.0
4998,2025-01-21 18:53:00,220.7820,220.79,220.6805,220.7000,75219.0


2. Adding 'Mean' and 'Target' columns

In this section, we add a mean and target column to our dataframe.

The mean column is the calculated mean of the open, high, low and close columns at one point in time.

The target column defines whether the mean price has increased or not. It defines '1' as having increased and '0' as not having increased.

In [14]:
df['mean'] = df[['open', 'high', 'low', 'close']].mean(axis=1)
df['target'] = (df['mean'].diff() > 0).astype(int)
df




Unnamed: 0,timestamp,open,high,low,close,volume,mean,target
0,2025-01-10 09:00:00,243.4400,244.00,241.7000,242.2700,4980.0,242.852500,0
1,2025-01-10 09:01:00,242.1500,242.15,241.8800,241.8800,1030.0,242.015000,0
2,2025-01-10 09:02:00,242.0600,242.12,242.0600,242.1200,1191.0,242.090000,1
3,2025-01-10 09:04:00,242.1700,242.17,242.1300,242.1300,644.0,242.150000,1
4,2025-01-10 09:07:00,242.2800,242.28,242.2800,242.2800,133.0,242.280000,1
...,...,...,...,...,...,...,...,...
4995,2025-01-21 18:50:00,220.8200,220.91,220.7800,220.9050,78447.0,220.853750,1
4996,2025-01-21 18:51:00,220.8931,220.91,220.8000,220.8300,63273.0,220.858275,1
4997,2025-01-21 18:52:00,220.8300,220.84,220.7600,220.7816,58101.0,220.802900,0
4998,2025-01-21 18:53:00,220.7820,220.79,220.6805,220.7000,75219.0,220.738125,0


3. Training the model

In this section, we build and train the model based on the final dataframe.

this model is based off the random forest classifier model, which is trained by data from our datframe

In [21]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score

features = ['open', 'low', 'high', 'close', 'volume', 'mean']
X = df[features]
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle=False, test_size=0.2)
model = RandomForestClassifier(n_estimators=100, min_samples_split=42, random_state=42)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))



0.534
              precision    recall  f1-score   support

           0       0.54      0.77      0.63       522
           1       0.52      0.28      0.36       478

    accuracy                           0.53      1000
   macro avg       0.53      0.52      0.50      1000
weighted avg       0.53      0.53      0.50      1000



As we can see, the accuracy of the perdiction of momentum is 50%, which means that for large datasets, these predicitons are not very accurate and therefore the model needs refining.