<a href="https://colab.research.google.com/github/ZachPino67/RESUMEREAL/blob/main/Untitled0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Implement a Python script that takes historical stock price data, performs feature engineering to calculate technical indicators, trains a machine learning model to predict stock movements, and generates buy/sell/hold recommendations.

## Data collection

### Subtask:
Fetch historical stock price data.


**Reasoning**:
Install the `yfinance` library to fetch stock data.



In [1]:
%pip install yfinance



**Reasoning**:
Fetch historical stock data for AAPL using yfinance and store it in a pandas DataFrame.



In [2]:
import yfinance as yf
import pandas as pd

ticker_symbol = "AAPL"
start_date = '2020-01-01'
end_date = '2023-12-31'

stock_data = yf.download(ticker_symbol, start=start_date, end=end_date)
display(stock_data.head())

  stock_data = yf.download(ticker_symbol, start=start_date, end=end_date)
[*********************100%***********************]  1 of 1 completed


Price,Close,High,Low,Open,Volume
Ticker,AAPL,AAPL,AAPL,AAPL,AAPL
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2020-01-02,72.538506,72.598884,71.292296,71.545882,135480400
2020-01-03,71.833282,72.594048,71.608677,71.765659,146322800
2020-01-06,72.40567,72.444313,70.703005,70.954181,118387200
2020-01-07,72.065147,72.671341,71.845369,72.415337,108872000
2020-01-08,73.224403,73.526295,71.768079,71.768079,132079200


## Feature engineering

### Subtask:
Calculate technical indicators from the price data.


**Reasoning**:
Calculate the technical indicators as instructed: SMA, RSI, Bollinger Bands, and MACD, and then drop rows with NaN values.



In [3]:
# Calculate 14-day SMA
stock_data['SMA_14'] = stock_data['Close'].rolling(window=14).mean()

# Calculate 14-day RSI
delta = stock_data['Close'].diff()
gain = delta.where(delta > 0, 0)
loss = -delta.where(delta < 0, 0)

avg_gain = gain.ewm(com=13, adjust=False).mean()
avg_loss = loss.ewm(com=13, adjust=False).mean()

rs = avg_gain / avg_loss
stock_data['RSI_14'] = 100 - (100 / (1 + rs))

# Calculate 20-day Bollinger Bands
stock_data['20_day_SMA'] = stock_data['Close'].rolling(window=20).mean()
stock_data['20_day_std'] = stock_data['Close'].rolling(window=20).std()
stock_data['Upper_Band'] = stock_data['20_day_SMA'] + (stock_data['20_day_std'] * 2)
stock_data['Lower_Band'] = stock_data['20_day_SMA'] - (stock_data['20_day_std'] * 2)

# Calculate MACD
exp1 = stock_data['Close'].ewm(span=12, adjust=False).mean()
exp2 = stock_data['Close'].ewm(span=26, adjust=False).mean()
stock_data['MACD'] = exp1 - exp2
stock_data['Signal_Line'] = stock_data['MACD'].ewm(span=9, adjust=False).mean()

# Drop rows with NaN values
stock_data.dropna(inplace=True)

display(stock_data.head())

Price,Close,High,Low,Open,Volume,SMA_14,RSI_14,20_day_SMA,20_day_std,Upper_Band,Lower_Band,MACD,Signal_Line
Ticker,AAPL,AAPL,AAPL,AAPL,AAPL,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2
2020-01-30,78.218895,78.272028,76.982348,77.414659,126743200,76.455167,66.35012,75.360955,2.020404,79.401763,71.320146,1.197079,1.007314
2020-01-31,74.750755,77.93148,74.456109,77.508832,199588400,76.441021,48.053065,75.471567,1.915588,79.302743,71.640392,0.968898,0.999631
2020-02-03,74.545486,75.711992,72.99014,73.492485,173788400,76.297839,47.223014,75.607177,1.731637,79.070451,72.143904,0.762708,0.952247
2020-02-04,77.006516,77.197313,75.745816,76.151556,136616400,76.404278,56.84732,75.83722,1.583182,79.003584,72.670855,0.788792,0.919556
2020-02-05,77.634445,78.433854,77.030662,78.134372,118826800,76.578685,58.906422,76.115685,1.358666,78.833017,73.398352,0.850331,0.905711


## Model training

### Subtask:
Train a machine learning model to predict stock movements based on the features.


**Reasoning**:
Define the target variable and features, split the data, choose and train a Linear Regression model.



In [4]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Define the target variable
y = stock_data['Close'].shift(-1)

# Define the feature set
X = stock_data.drop(columns=['Close', 'Open', 'High', 'Low', 'Volume', 'Dividends', 'Stock Splits'])

# Drop the last row of X to match the length of y after shifting
X = X[:-1]
y = y[:-1]

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Choose and train a Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)

KeyError: 'Dividends'

**Reasoning**:
The previous code failed because the columns 'Dividends' and 'Stock Splits' do not exist in the DataFrame. I will correct the code to drop only the 'Close', 'Open', 'High', 'Low', and 'Volume' columns.



In [5]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Define the target variable
y = stock_data['Close'].shift(-1)

# Define the feature set
X = stock_data.drop(columns=['Close', 'Open', 'High', 'Low', 'Volume'])

# Drop the last row of X to match the length of y after shifting
X = X[:-1]
y = y[:-1]

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Choose and train a Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)

## Prediction and recommendation

### Subtask:
Use the trained model to make predictions and generate buy/sell/hold recommendations.


**Reasoning**:
Make predictions on the test set, compare them with actual values, and generate buy/sell/hold recommendations based on a threshold.



In [6]:
import numpy as np

# Make predictions on the test set
y_pred = model.predict(X_test)

# Compare predicted values with actual values
comparison_df = pd.DataFrame({'Actual': y_test.values.flatten(), 'Predicted': y_pred.flatten()}, index=X_test.index)

# Generate recommendations based on a threshold
# Define a threshold (e.g., 0.5% change)
threshold = 0.005

comparison_df['Recommendation'] = 'Hold'
comparison_df.loc[comparison_df['Predicted'] > comparison_df['Actual'] * (1 + threshold), 'Recommendation'] = 'Buy'
comparison_df.loc[comparison_df['Predicted'] < comparison_df['Actual'] * (1 - threshold), 'Recommendation'] = 'Sell'

display(comparison_df.head())

Unnamed: 0_level_0,Actual,Predicted,Recommendation
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2022-07-07,144.554199,142.173754,Sell
2021-11-11,147.050659,145.126365,Sell
2022-12-22,130.026199,130.878328,Buy
2021-10-21,145.563675,147.141738,Buy
2021-03-04,118.487961,117.925235,Hold


## Backtesting (optional but recommended)

### Subtask:
Evaluate the performance of the model and recommendations on historical data.


**Reasoning**:
Calculate the accuracy and other performance metrics based on the comparison dataframe and display the results.



In [7]:
# Calculate accuracy of recommendations
correct_predictions = 0
for index, row in comparison_df.iterrows():
    actual = row['Actual']
    predicted = row['Predicted']
    recommendation = row['Recommendation']

    if recommendation == 'Buy' and predicted > actual * (1 + threshold):
        correct_predictions += 1
    elif recommendation == 'Sell' and predicted < actual * (1 - threshold):
        correct_predictions += 1
    elif recommendation == 'Hold' and (actual * (1 - threshold) <= predicted <= actual * (1 + threshold)):
        correct_predictions += 1

accuracy = correct_predictions / len(comparison_df)

# Calculate profitable trades and average profit/loss
comparison_df['Price_Change'] = comparison_df['Actual'].diff().shift(-1)
profitable_trades = comparison_df[comparison_df['Recommendation'] == 'Buy']['Price_Change'] > 0
profitable_trades_count = profitable_trades.sum()
total_trades = len(comparison_df[comparison_df['Recommendation'] != 'Hold'])

# Calculate average profit/loss for Buy and Sell recommendations separately
buy_trades = comparison_df[comparison_df['Recommendation'] == 'Buy']
sell_trades = comparison_df[comparison_df['Recommendation'] == 'Sell']

average_profit_buy = buy_trades[buy_trades['Price_Change'] > 0]['Price_Change'].mean()
average_loss_buy = buy_trades[buy_trades['Price_Change'] <= 0]['Price_Change'].mean()

average_profit_sell = sell_trades[sell_trades['Price_Change'] < 0]['Price_Change'].mean()
average_loss_sell = sell_trades[sell_trades['Price_Change'] >= 0]['Price_Change'].mean()


print(f"Accuracy of recommendations: {accuracy:.4f}")
print(f"Percentage of profitable Buy trades: {profitable_trades_count / len(buy_trades):.4f}")
print(f"Average profit per Buy trade: {average_profit_buy:.4f}")
print(f"Average loss per Buy trade: {average_loss_buy:.4f}")
print(f"Average profit per Sell trade: {average_profit_sell:.4f}")
print(f"Average loss per Sell trade: {average_loss_sell:.4f}")

Accuracy of recommendations: 1.0000
Percentage of profitable Buy trades: 0.5846
Average profit per Buy trade: 40.3726
Average loss per Buy trade: -31.2975
Average profit per Sell trade: -36.5384
Average loss per Sell trade: 28.0828


## Summary:

### Data Analysis Key Findings

*   Historical stock price data for Apple (AAPL) from January 1, 2020, to December 31, 2023, was successfully downloaded.
*   Several technical indicators, including 14-day Simple Moving Average (SMA), 14-day Relative Strength Index (RSI), 20-day Bollinger Bands, and MACD, were calculated and added as features to the dataset.
*   A Linear Regression model was trained to predict the next day's closing price using the engineered technical indicators as features.
*   Buy, Sell, and Hold recommendations were generated based on the model's predictions compared to the actual closing price, using a 0.5% threshold to define significant price changes.
*   Backtesting results on the test set showed a recommendation accuracy of 1.0000, meaning the recommendations correctly reflected whether the predicted price was above, below, or within the defined threshold range of the actual price.
*   For "Buy" recommendations, 58.46% were profitable. The average profit per profitable "Buy" trade was \$40.37, while the average loss per losing "Buy" trade was -\$31.30.
*   For "Sell" recommendations, the average "profit" (price decrease) per trade was -\$36.54 (meaning the price increased on average), and the average "loss" (price increase) per trade was \$28.08 (meaning the price decreased on average).

### Insights or Next Steps

*   The high recommendation accuracy of 1.0000 indicates that the recommendations perfectly align with the model's prediction relative to the actual price change threshold. However, this does not necessarily imply the model's predictions are accurate in magnitude or that the recommendations are profitable in a real trading scenario.
*   Further evaluation should include more robust backtesting metrics like Sharpe Ratio, maximum drawdown, and cumulative returns to better assess the trading strategy's performance and risk profile. Consider using a dedicated backtesting library.
