# Creating the services and testing

Now that we have defined the pipeline we want to build, the components it consits of and the inputs and outputs of each of the components, we can start Implementing the components. As mentioned in the chapter 1.5 general instrucitons for creating a component, the first step is to define the services. When we talk about the services, we mean the functions that will be available on the servers. We define the services separatly from the servers to ensure proper testing of the core functionality before moving forward.   

## Data component
Let's start by looking at the data collection and cleaning component, which in the future will be referred to as the data component. The purpose of the component is to clean raw data from a csv file. As mentioned previously the csv file will be received from the web application. When defining the service, we will not bother ourselves with how the necessary variables will be received to the server, but rather we want to focus on the core functionality. The function available on the server shuld simply, one way or another, receive a csv file, clean and structure the data and return the result. We have the base for the code that the service should consist of, as this is present in the stock_price_prediction notebook. 

```python
def clean_data(csv_file):
    data = pd.read_csv(csv_file)
    data['Date'] = pd.to_datetime(data['Date'])
    data['Previous_Close'] = data['Close'].shift(1)
    data = data.dropna()
    
    x = data[['Previous_Close']]
    y = data['Close']
    dates = data['Date']
    
    x_train, x_test, y_train, y_test, dates_train, dates_test = train_test_split(
        x, y, dates, test_size=0.2, shuffle=False
    )
    
    # Flatten the lists
    x_train_flat = [item for sublist in x_train.values for item in sublist]
    x_test_flat = [item for sublist in x_test.values for item in sublist]
    y_train_flat = y_train.tolist()
    y_test_flat = y_test.tolist()
    dates_train_str = dates_train.dt.strftime('%Y-%m-%d').tolist()
    dates_test_str = dates_test.dt.strftime('%Y-%m-%d').tolist()
    
    return x_train_flat, x_test_flat, y_train_flat, y_test_flat, dates_train_str, dates_test_str
```
Here we have defined that the function should take a csv file containing the input data. Next the function changes the format of the Date column, assumes the column close is the target variable and therefor creates a feature called Previous_Close which is the close value of the previous day. Next the function drops the first row of data as it will contain NaN values. The function then saves the previous close column values to a variable called x, as these are the values used to make predictions. It saves the Close column values to a variable called y as these are the values that will be predicted, and are thought to depend on x. Next the function splits the datasets into training and testing sets using the function train_test_split(), and lastly flattens the lists before returning them.

## Testing the service

Now we would recommend you to test this function by writing a simple test function that calls the previously defined "clean_data" function. We have written a very simple test function in the file data_test.py. The function defines the csv file used as input to the clean_data function, calls the function and prints the returned values. The returned values are also saved to a json file. This is done so that the values can be used when testing the next component, as the training component will require the cleaned data. Notice also that the csv file used for testing is a sorter version of the csv file used in the stock price prediction pipeline. This is done to simplify the testing process for each of the components, as there is less data to deal with. Below you can see how the test_data_service function has been defined: 

def test_data_service():
    csv_file = 'MSFT.US.test.csv'
    returned_data = clean_data(csv_file)
    print(returned_data)

    #save the data to a json file
    with open('cleaned_data.json', 'w') as f:
        #add data with variable names to json file:
        json.dump({
            'x_train': list(returned_data[0]),
            'x_test': list(returned_data[1]),
            'y_train': list(returned_data[2]),
            'y_test': list(returned_data[3]),
            'dates_train': list(returned_data[4]),
            'dates_test': list(returned_data[5])
        }, f)
    return



In the file, we then ensure to call the function, and then we can simply run pyhton data_test.py to see the output of the clean_data function. If the service you have defined does not seem to work properly, you can try to use the built in debugger in vs code to fins what's causing the error. 


## Training component. 

Now let's repeat the same steps for the training component. 

So first let's define the service. We know it should receive the cleaned data and train a linear regression model before returning the trained model. We have to take into consideration that the x_train list was flattened in the data component (This was done to simplify future steps once we implement grpc), but we need it to be 2D, so a list containing lists. To achieve this we need to include the following line:

```python
x_train = np.array(x_train).reshape(-1, 1)
```
Besides this addition, we can use the same code as in the stock_price_prediction notebook. We also need to take into consideration that the models needs to somehow be sent to the next component. To do this we convert the model into binary. Here is the train_model function:

```python
import numpy as np
from sklearn.linear_model import LinearRegression
import pickle

def train_model(x_train, y_train):
    model = LinearRegression()
    x_train = np.array(x_train).reshape(-1, 1)
    model.fit(x_train, y_train)
    model_binary = pickle.dumps(model)

    return model_binary

```
## Testing the service

Now we need to test the function. We can use the json file from when we tested the data component and extract the cleaned data. Again, we will write a test function that will call the previously defined function. This time we will again save the result so that it can be used when testing the text component. Since the model is turned into binary, we will save it as such. Below you can see an implementation of a function that accomplishes these things:

```python
def test_train_model():
    # Read the JSON file
    with open('cleaned_data.json', 'r') as f:
        data = json.load(f)

    # Extract x_train and y_train values
    x_train = data['x_train']
    y_train = data['y_train']

    # Call the train_model function
    model_binary = train_model(x_train, y_train)

    with open('model.pkl', 'wb') as f:
        f.write(model_binary)
    

    # Print the result
    print("Model trained and serialized successfully.")
    print(f"Serialized model size: {len(model_binary)} bytes")
```
Make sure to move the json file created when cleaning the data to the directory for the training component and run the file containing the testing function to test your training service. 


## Model Testing component

For the service, we know that it should take all the data sets, x_train, y_train, dates_train, t_test, y_test and dates_test as well as the trained model. From the stock price prediction notebook we have an outline for how to create the service. We will make some prediction, calcuate the RMSE value and plot the results. The difference now however, is that we will need to return the plot in a format that in the future can be used to send the plot from one component to another. We have decided to do this by encoding the plot in binary. We will also save the plot so that it can easily be viewed. Below you can see an implementation of the test_model function.

```python 
import numpy as np
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime
import io

def test_model(model, x_test, y_test, dates_test):
    x_test = np.array(x_test).reshape(-1, 1)
    y_pred = model.predict(x_test)
    print(f"x_test in testing service: {x_test}")
    print(f"y_pred in testing service: {y_pred}")
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    print(f"RMSE: {rmse}")

    # Create a BytesIO object to save the plot in-memory
    plot_stream = io.BytesIO()

    # Plot the results
    plt.figure(figsize=(14, 7))
    plt.plot(dates_test, y_test, label='Actual')
    plt.plot(dates_test, y_pred, label='Predicted')

    # Format the date on the x-axis
    plt.gca().xaxis.set_major_locator(mdates.DayLocator(interval=120))  # Set major ticks every 120 days
    plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))

    plt.gcf().autofmt_xdate()  # Rotate date labels vertically

    plt.xlabel('Date')
    plt.ylabel('Close Price')
    plt.title('MSFT Stock Price Prediction')
    plt.legend()
    plt.grid(True)

    # Save the plot to the BytesIO object
    plt.savefig(plot_stream, format='png')
    plt.close()

    # Get the binary data from the BytesIO object
    plot_stream.seek(0)
    plot_binary = plot_stream.read()

    return rmse, plot_binary

```
In this code we again first make sure the x_test is a 2D list and then make some predictions using the model's predict function and entering the x_test values. With the predictions and the actual y_test values we calculate an RMSE value which describes how correct the predictions were. Finally we create a plot to visualize how close the predicted values were to the real ones and save it before returning both the RMSE value and the plot binary.

## Testing the service

Again we want to test the service we have written and will therefore write a test function. This time, in addition to using the data saved in the json file, the test function extracts the model saved as binary during the training component to use it when calling the test_model function. Below you can see how this has been implemented.

```python
import pickle
import json
import numpy as np
from datetime import datetime
from test_service import test_model

def test_test_model():
    # Load the model from the pickle file
    with open('model.pkl', 'rb') as file:
        model = pickle.load(file)

    # Load the test data from the JSON file
    with open('cleaned_data.json', 'r') as file:
        test_data = json.load(file)

    # Extract and convert the test data
    x_test = np.array(test_data['x_test'])
    y_test = np.array(test_data['y_test'])
    dates_test = [datetime.strptime(date, '%Y-%m-%d') for date in test_data['dates_test']]

    # Call the test_model function
    test_model(model, x_test, y_test, dates_test)

    # Check the output (this can be more sophisticated with assertions)
    print("Test completed successfully.")

# Run the test function
if __name__ == "__main__":
    test_test_model()
```

Remember to move the json file and the pickle file to the folder containing the test file before running the file. 

## Conclusion

Now we have successfully created the three services necessary for our three components and tested them. We can now safely move on to the next step as we can confidently say that the functions work as  intended. 


