## 1. Sign Up for the Newsletter
Before diving into the tutorial, I invite you to sign up for the Infotra.io newsletter to stay updated on the latest features, insights, and developments in data-driven trading. You'll receive exclusive content, new releases, and best practices for improving your trading strategies with Infotra.io. Your feedback is invaluable to us, and staying connected through the newsletter helps us continue enhancing the tool for traders like you.
For signing up, please write an email to newsletter@infotra.io

## 2. Extracting Data from Infotra.io
To begin, we'll use Infotra.io to extract historical data that will be used for our machine learning model. Here’s how to do it:

Navigate to Infotra.io and choose a specific candlestick pattern (e.g., Doji).
Configure the risk management parameters (such as profit-to-risk ratio, entry/exit points) and run the analysis.
Once the opportunities have been identified, export the data as a CSV file by clicking on the export button.
This CSV file will contain both successful and failed opportunities, including all relevant market data needed for building the machine learning model.

In [8]:
import pandas as pd
import json
import datetime
import os

## 3. Loading and Cleaning the Data
Now that we have the exported data, let’s load the CSV file into our notebook and begin the data cleaning process. This step is crucial for ensuring the data is ready for feature extraction and model training. Here’s what we’ll do:

In [9]:
raw_data = pd.read_csv("data/opportunities.csv")

## Select the Required Columns:
We’ll first select only the columns relevant to our analysis. This might include data such as the open, high, low, close prices, as well as columns that show whether the opportunity was successful, and any other features we need for building our model.

In [10]:
required_columns = ["Opportunity_id",
                    "date",
                    "open",
                    "high",
                    "low",
                    "close",
                    "is_last_candle_of_found_pattern",
                    "is_buy_opportunity",
                    "is_sell_opportunity",
                    "succeeded"]
candle_patterns = raw_data[required_columns]

## Split the Buy and Sell Opportunities:
Since buy and sell opportunities can behave differently, we’ll split the dataset into two: one for buy opportunities and another for sell opportunities. This will allow us to create separate models for both, or compare how the features influence success in each case.

The data in the CSV file is organized in a specific order:

* First, you have rows for successful buy opportunities.
* Then, rows for failed buy opportunities.
* After that, rows for successful sell opportunities.
* Finally, rows for failed sell opportunities.

To split the data into buy and sell opportunities, the approach involves finding the last row of the failed buy opportunities. This row marks the transition point between the buy-related data and the sell-related data. Once this index is identified, the data can be split into two parts:

* Buy opportunities: All rows from the beginning of the dataset up to the identified row.
* Sell opportunities: All rows from the identified row onward.

In [11]:
for i in range(1, candle_patterns.shape[0]):
    if candle_patterns.loc[i, "succeeded"] == 1 and candle_patterns.loc[i-1, "succeeded"] == 0:
        print(i)
        break

240


In [12]:
buy_pattern = candle_patterns.loc[0:239,:]
buy_pattern.tail()

Unnamed: 0,Opportunity_id,date,open,high,low,close,is_last_candle_of_found_pattern,is_buy_opportunity,is_sell_opportunity,succeeded
235,Opportunity-44,2024-08-25,1.11879,1.12011,1.11529,1.11879,1,0,0,0
236,Opportunity-45,2024-09-08,1.10872,1.10914,1.1035,1.10872,1,0,0,0
237,Opportunity-46,2024-09-15,1.10884,1.11354,1.10871,1.10884,1,0,0,0
238,Opportunity-47,2024-09-20,1.11614,1.11818,1.1136,1.11601,1,0,0,0
239,Opportunity-48,2024-09-22,1.11608,1.11682,1.10849,1.11608,1,0,0,0


The next step is to filter the data to focus only on the specific candlestick patterns identified as the "pattern candle". This is done by selecting rows where the column is_last_candle_of_found_pattern equals 1. These rows represent the final candle of the detected pattern, which is the key point for analyzing trading opportunities.

By filtering this way, you ensure that the dataset only includes the relevant candles for your analysis or modeling.

In [13]:
pattern_buy_opportunity = buy_pattern[buy_pattern["is_last_candle_of_found_pattern"] == 1]

The final step is to convert the date column from a string format to a proper datetime format. This is done using a function that transforms each string value into a UTC datetime object. By doing this, the date values become easier to work with, especially for sorting, filtering, or performing time-based analysis in your trading model

In [14]:
def str_to_utc_date(string_value):
    datetime_from_string = datetime.datetime.strptime(string_value, "%Y-%m-%d")
    return datetime_from_string.replace(tzinfo=datetime.timezone.utc).date()

In [15]:
pattern_buy_opportunity["date"] = pattern_buy_opportunity["date"].apply(lambda string_value: str_to_utc_date(string_value))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pattern_buy_opportunity["date"] = pattern_buy_opportunity["date"].apply(lambda string_value: str_to_utc_date(string_value))


In [16]:
pattern_buy_opportunity.head()

Unnamed: 0,Opportunity_id,date,open,high,low,close,is_last_candle_of_found_pattern,is_buy_opportunity,is_sell_opportunity,succeeded
0,Opportunity-1,2004-11-02,1.274,1.2755,1.2661,1.2739,1,0,0,1
12,Opportunity-2,2004-11-08,1.2913,1.2984,1.2903,1.2914,1,0,0,1
23,Opportunity-3,2004-11-09,1.2896,1.294,1.2879,1.2897,1,0,0,1
34,Opportunity-4,2005-03-01,1.3183,1.3243,1.3164,1.3183,1,0,0,1
38,Opportunity-5,2005-10-04,1.1916,1.195,1.1897,1.1916,1,1,0,1
