<a href="https://colab.research.google.com/github/Bborub/bk-bridge-pedestrian/blob/main/MoreStox25Aug23.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

From: https://www.analyticsvidhya.com/blog/2023/02/anomaly-detection-on-google-stock-data-2014-2022/?utm_source=related_WP&utm_medium=https://www.analyticsvidhya.com/blog/2021/07/stock-prices-analysis-with-python/

In [None]:
# Finding data points that have a 0.0% change from previous months value

data[data['Change %']==0.0]

In [None]:
data['Month Starting'] = pd.to_datetime(data['Month Starting'], errors='coerce').dt.date

In [None]:
#Replacing the missing values after cross verifying
data['Month Starting'][31] = pd.to_datetime('2020-05-01')
data['Month Starting'][43] = pd.to_datetime('2019-05-01')
data['Month Starting'][55] = pd.to_datetime('2018-05-01')

In [None]:
plt.figure(figsize=(25,5))
plt.plot(data['Month Starting'],data['Open'], label='Open')
plt.plot(data['Month Starting'],data['Close'], label='Close')
plt.xlabel('Year')
plt.ylabel('Close Price')
plt.legend()
plt.title('Change in the stock price of Google over the years')

# The stock price has increased since 2017, with a peak enhancement occurring in 2022.

In [None]:
# Calculating the daily returns
data['Returns'] = data['Close'].pct_change()

# Calculating the rolling average of the returns
data['Rolling Average'] = data['Returns'].rolling(window=30).mean()

plt.figure(figsize=(10,5))

''' Creating a line plot using the 'Month Starting' column as the x-axis
and the 'Rolling Average' column as the y-axis'''

sns.lineplot(x='Month Starting', y='Rolling Average', data=data)


In [None]:
corr = data.corr()
plt.figure(figsize=(10,10))
sns.heatmap(corr, annot=True, cmap='coolwarm')

Scaling the returns using StandardScaler

To ensure that the data is normalized to have zero mean and unit variance, we use the StandardScaler from the Scikit-learn library. We first import the StandardScaler class and then create an instance of the class. We then fit the scaler to the Returns column of our dataset using the fit_transform method. This scales our data to have zero mean and unit variance, which is necessary for some machine learning algorithms to function properly.

In [None]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
data['Returns'] = scaler.fit_transform(data['Returns'].values.reshape(-1,1))
data.head()

In [None]:
# Handling Unexpected Missing Values

data['Returns'] = data['Returns'].fillna(data['Returns'].mean())
data['Rolling Average'] = data['Rolling Average'].fillna(data['Rolling Average'].mean())

Model Development
Now that the data has been preprocessed and analyzed, we are ready to develop a model for anomaly detection. We will use the Scikit-learn library in Python to construct and train a model to detect anomalous data points within the dataset.

We will use the Isolation Forest algorithm to detect anomalies. Isolation Forest is an unsupervised machine learning algorithm that isolates anomalies by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature. This process is repeated until the anomaly is isolated.

We will use the Scikit-learn library to construct and train our Isolation Forest model. The following code snippet shows how to construct and train the model.

In [None]:
from sklearn.ensemble import IsolationForest
model = IsolationForest(contamination=0.05)
model.fit(data[['Returns']])

# Predicting anomalies
data['Anomaly'] = model.predict(data[['Returns']])
data['Anomaly'] = data['Anomaly'].map({1: 0, -1: 1})

# Ploting the results
plt.figure(figsize=(13,5))
plt.plot(data.index, data['Returns'], label='Returns')
plt.scatter(data[data['Anomaly'] == 1].index, data[data['Anomaly'] == 1]['Returns'], color='red')
plt.legend(['Returns', 'Anomaly'])
plt.show()

Conclusion
This project-based blog explored anomaly detection in Google stock data from 2014-2022. We used the Scikit-learn library in Python to construct and train an Isolation Forest model to detect anomalous data points within the dataset.

Our model was able to uncover hidden patterns and outliers in the data, and we were able to draw meaningful conclusions about the stock market. We found that the stock price has increased since 2017 and that the rolling mean decreased in 2019. We also found that the Open price correlates more with the Close price than any other feature.

Overall, this project was a great success and has opened up new possibilities for stock market analysis and anomaly detection.
