### 1. collecting and cleaning data
***
this part of code first imports the `find_ticker_by_name` function from the `ticker` module, which can search among Tehran Stock Exchange and return the best match as a `Ticker` object. It then retrieves the historical stock data for the company "خودرو" using the `find_ticker_by_name` function and stores it in the `data` variable. 
It then creates a new column called `tomorrow` in the data DataFrame, which contains the closing price of the stock from the next day. The `dropna` method is then called to remove any rows with missing data.

In [9]:
from ticker import find_ticker_by_name

t = find_ticker_by_name("خودرو")
data = t.history
data["tomorrow"] = data["close"].shift(1)
data = data.dropna()
data

Unnamed: 0_level_0,low,high,yesterday,first,close,last,number,volume,value,tomorrow
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2023-11-01,2277,2419,2396,2350,2319,2370,5778,221306710,513108229893,2427.0
2023-10-31,2346,2451,2425,2432,2396,2353,4868,166745090,399532545537,2319.0
2023-10-30,2326,2442,2326,2350,2425,2430,5661,258164944,626136627443,2396.0
2023-10-29,2250,2376,2326,2250,2326,2348,6759,393804301,915867140096,2425.0
2023-10-28,2324,2370,2446,2366,2326,2324,3143,123630719,287582171746,2326.0
...,...,...,...,...,...,...,...,...,...,...
2001-03-31,2800,2848,2849,2840,2838,2838,161,177362,500068371,2868.0
2001-03-28,2796,2849,2795,2800,2849,2849,104,80676,227566666,2838.0
2001-03-27,2795,2800,2798,2795,2795,2795,120,166600,466405131,2849.0
2001-03-26,2797,2810,2800,2801,2798,2798,98,96613,270595190,2795.0


### 2. choose features and target parameters
***
Then we select some features that we belive have **most effect on prediction** and store them into `X` variable. Then store **target** values(tomorrow) into `y` variable. After all we show a **summary** of `X` DataFrame by calling `describe` function.

In [6]:
features = ["low", "high", "first", "last", "close"]
X = data[features]
y = data.tomorrow
X.describe()

Unnamed: 0,low,high,first,last,close
count,5194.0,5194.0,5194.0,5194.0,5194.0
mean,2759.392183,2850.51752,2805.654794,2997.449365,3000.972276
std,1810.070042,1888.130278,1853.648313,1701.984515,1706.153392
min,0.0,0.0,0.0,583.0,587.0
25%,1849.25,1894.0,1873.5,1981.0,1997.0
50%,2590.0,2667.5,2620.5,2681.0,2686.0
75%,3203.0,3310.0,3250.0,3314.0,3324.5
max,12481.0,13450.0,13037.0,12750.0,12835.0


### 3. spliting
***
Then we split `X` and `y` into **train** and **test** parts with a ratio of 75 to 25 using `train_test_split` function.

In [15]:
from sklearn.model_selection import train_test_split
train_X, test_X, train_y, test_y = train_test_split(X, y, random_state=0)

print("number of vectors in train_X:", len(train_X))
print("number of values in train_y:", len(train_y))
print("number of vectors in test_X:", len(test_X))
print("number of values in test_y:", len(test_y))

number af vectors in train_X: 3895
number af values in train_y: 3895
number af vectors in test_X: 1299
number af values in test_y: 1299


### 4. trainig model and predict
The **random forest regression** model is a type of ensemble learning method that combines multiple decision trees to make predictions. It is commonly used for regression tasks, where the goal is to predict a continuous numerical value. 
***
implementating a **random forest regression** model using the `RandomForestRegresso`r class from the `sklearn.ensemble` module in Python. The `random_state` parameter is set to 0 to ensure **reproducibility** of the results. The `fit()` method is used to **train** the model on the training data, represented by `train_X` and `train_y`. The `predict()` method is then used to generate predictions on the **test** data, represented by `test_X`. The output of the `predict()` method is stored in the `rf_predict` variable.

In [21]:
from sklearn.ensemble import RandomForestRegressor
rf_model = RandomForestRegressor(random_state=0)
rf_model.fit(train_X, train_y)
rf_predict = rf_model.predict(test_X)
rf_predict

array([2954.44, 2871.11, 3076.49, ..., 3010.79, 3187.5 , 3265.91])

### 5. accuracy
The **mean absolute percentage error (MAPE)** is a measure of the **accuracy** of a forecasting method in statistics. It is calculated as the average absolute percent error for each time period minus actual values divided by actual values, expressed as a percentage . The formula for MAPE is:
> $ {MAPE} = \frac{100}{n} \sum_{t=1}^{n} \left| \frac{A_t - F_t}{A_t} \right|$ 

where n is the number of fitted points, $A~t~$ is the actual value, and $F~t~$ is the forecast value . The MAPE is a commonly used loss function in regression analysis and model evaluation . It is often used as a quality function for regression models because of its very intuitive interpretation in terms of relative error . The MAPE is also used in forecasting problems to measure the accuracy of a forecast system

In [23]:
from sklearn.metrics import mean_absolute_percentage_error
mape = mean_absolute_percentage_error(test_y, rf_predict)
mape

0.031847742174759316