# Configuring the components to communicate using gRPC

Now that we have a bit more knowledge about gRPC and protobuffers, we can configure the services we have defined previously to utilize grpc for communication. The steps necessary to take before you can use grpc for communication are the following:

1) Define the service
2) write the protofile
3) generate gRPC code
4) Implement client and server using the generated gRPC code

We have already defined the services in the previous chapter, so we can move onto writing the protofile. Let's copy over the service files from the previous chapter! You can find the service code in the sub folders inside the example folder in the folder named "4. communication using grpc and protobuffers" in the github repository. We will explain the process of creating the grpc server and client only for the data component. The process is however quite similar for all the different components, once you understand the necessary steps and their purposes. 

## Creating the protofile

To define the protofile we need to consider what data will be passed between the server and the client. We want the data server to receive a csv file, and this can be done in a few different ways, either as a string or as binary data. For this example we have chosen binary data, so we know that we need one message type for this. The function returns six different lists, the training and testing lists for the variables x, y and dates, so we also need to define a message type that can hold the lists. Here is an example of how to define these messages:

```proto
syntax = "proto3";

package data;

// Request message containing CSV data as bytes.
message DataRequest {
    bytes csv_content = 1;
}


// Response message containing cleaned data.
message DataResponse {
    repeated double x_train = 1;
    repeated double x_test = 2;
    repeated double y_train = 3;
    repeated double y_test = 4;
    repeated string dates_train = 5;
    repeated string dates_test = 6;
}
```
Notice that the keyword "repeated" has been used when the datatype is a list.

Once we have defined the messages, we can use them to define the service provided by our grpc server. When defining the service we need to specify which rpc methods will be available on the server. In our case we only have one function, the CleanData function. Therefore we define our data service with one rpc function, CleanData, which should take a DataRequest message defined above as input, and return a DataResponse message. Below is an example of how to define the service. 

```proto
// The DataService definition.
service DataService {
    // Sends CSV content and receives cleaned data.
    rpc CleanData (DataRequest) returns (DataResponse);
}
```
Now that we have written our protofile, we can move on to generating the grpc code. 

## Generating the grpc code

Now we have all the necessary components for the protofile. Next we need to generate the grpc code from the protofile. Make sure you have the grpc tools installed:

```bash
pip install grpcio-tools
```
Generating the grpc code can be done in the folder containing the protofile with the following command:

```bash
python -m grpc_tools.protoc -I./ --python_out=. --grpc_python_out=. data.proto
```
This should generate two files with code that you can use for defining the server and the client. The files should have the suffixes _pb2.py and _pb2_grpc.py. The first file contains message classes for the messages you defined in the protofile. There are two messageclasses, as we have two messages, and those can be accessed with data_pb2.DataRequest and data_pb2.DataResponse. The second file contains the necessary code for creating the client and the server for the microservice, which is the next step. 

## Creating the server
Let's first create the server for the microservice. For this you are going to want to utilize some of the code from the generated grpc files. In the file ending with _pb2_grpc.py you should find class definitions for both a servicer and a service, which you are going to need when creating the server. 

The Servicer is an abstract class that you implement to handle the server-side logic of your gRPC service. This is where you define how each RPC method should behave by writing the actual business logic for the methods described in your .proto file. You need to create a subclass of the generated DataServiceServicer and implement the RPC methods defined in the .proto file. The Service class is used internally by gRPC to provide methods to interact with the RPCs. It's auto-generated by protoc and typically includes static methods for the client to call RPCs and functions to add handlers to the server. The Service is not typically modified directly.

To implement the server you should follow these steps:
1) Implement the Servicer: Create a subclass of the generated DataServiceServicer and implement the methods.
2) Start the Server: Use the add_DataServiceServicer_to_server function to attach your servicer to the server and start it.

Since we already wrote the function clean_data in the service file, all you have to do is import it and possibly add some error handling to implement the RPC function CleanData. You will also need to extract the variables from the grpc request, so in this case the csv file. Since the file is sent as binary we also need to decode it. Additionally, you will need to make sure that the return type of the RPC method follows the definition in the protofile. We defined the DataResponse as six lists, x_train, x_test, y_train, y_test, dates_train and dates_test, so you need to make sure that the returned DataResponse follows this definiton. Here is an example of how to create the subclass for the servicer:

```python
# data_service_server.py
# data_service_server.py

from concurrent import futures
import grpc
import data_pb2_grpc
import data_pb2
import pandas as pd
from data_service import clean_data
import os
import io
import logging

class DataServiceServicer(data_pb2_grpc.DataServiceServicer):
    def CleanData(self, request, context):
        try:
            # Read CSV content from the request
            csv_content = request.csv_content
            # Decode the csv file 
            csv_file = io.BytesIO(csv_content)
            
            logging.info("Received CSV data, cleaning...")

            # Clean data using the function we defined previously
            x_train, x_test, y_train, y_test, dates_train, dates_test = clean_data(csv_file)
            logging.info("Data cleaned successfully")

            print(x_train, x_test, y_train, y_test, dates_train, dates_test)

            # Ensure the returned DataResponse follows the definition in the protofile
            return data_pb2.DataResponse(
                x_train=x_train,
                x_test=x_test,
                y_train=y_train,
                y_test=y_test,
                dates_train=dates_train,
                dates_test=dates_test
            )

        except Exception as e:
            logging.exception("Error cleaning data")
            context.set_code(grpc.StatusCode.INTERNAL)
            context.set_details(f"Internal error: {str(e)}")
            return data_pb2.DataResponse()

```

Here you can see that the code does essentially the same things as the code for the data app, at the route /get_clean_data. We first read the csv content and save it to a csv file. The file can then be passed to the clean_data function that we wrote earlier. Only difference here is the way we receive and return the data. For the grpc server, we make sure to save the values in the correct return format, data_pb2.DataResponse. 

Now we need to complete the second step for creating the server, start the server. For this we can use the generated function add_DataServiceServer. In order to do this we need to first create a grpc server, that can be done using the following line:

```python
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
```
futures.ThreadPoolExecutor(max_workers=10) specifies the thread pool executor for handling concurrent RPCs. ThreadPoolExecutor manages a pool of threads for executing tasks asynchronously. Here, it allows the server to handle up to 10 concurrent requests (RPC calls) in parallel.

Now that we have a server, we can add the servicer to it using the following:

```python
data_pb2_grpc.add_DataServiceServicer_to_server(DataServiceServicer(), server)
```

Lastly we need to define a port, start the server and make sure it stays running. This can be done like this:

```python
server.add_insecure_port('[::]:8080')
server.start()
server.wait_for_termination()
``` 
Now you have all the necessary components for your server. You can take a look at the data_service_server file to see the final code for the server. This file can be found in the directory "4. communication using grpc and protobuffers\example\data\data_service_server.py" in the github repository.

## Creating the client

Lastly we need to create a client in order to interact with our server. Creating a gRPC client involves setting up a communication channel, creating a stub to interact with the server, making a request, and handling the response. You are going to want to use the stub that was generated from the protofile. The stub acts as an intermediary between your application code and the remote gRPC service. The stub provides methods that correspond to the RPCs defined in the .proto file, allowing your application to invoke these methods as if they were local functions, even though they are executed on a remote server. 

We have two options when creating the client. We can either create the client as a command line client or a web application client. We have decided to define the client for the data component as a command line client, because once we have defined the grpc servers for all the components, we will need a combined client for all three. This client will be created as a web application client. Therefor the data client will only be used for testing the data server and does not necessarily need a web interface. 

To create the command line client, first we are going to create a channel, a communication path, to the server. You need to specify the server's address (in this case, localhost:8080) in order to create the channel. This can be done like so:

```python
with grpc.insecure_channel('localhost:50051') as channel:
```
The with statement ensures the channel is properly closed when the operation is done.

Next we will define the stub using the channel:

```python
stub = data_pb2_grpc.DataServiceStub(channel)
```

Now let's read and prepare the data for the request. Reading the file as bytes prepares the data to be sent in the DataRequest as defined in the protofile. This is how the reading can be done:

```python
csv_file_path = './MSFT.US.csv'

with open(csv_file_path, 'rb') as f:
    csv_content = f.read()
```

Next we need to construct a request message using the data read in the previous step. The request message format is defined in the .proto file. 

```python
request = data_pb2.DataRequest(csv_content=csv_content)
```
Now we can invoke the remote method on the stub. This sends the request to the server and waits for a response.

```python
response = stub.CleanData(request)
```
Now we can check the server’s response and process it as needed. This could involve printing the response data or using it in further computations.

```python
if response.x_train and response.x_test and response.y_train and response.y_test and response.dates_train and response.dates_test:
    print("x_trian:", response.x_train)
    print("x_test:", response.x_test)
    print("y_train:", response.y_train)
    print("y_test:", response.y_test)
    print("Dates Train:", response.dates_train)
    print("Dates Test:", response.dates_test)
else:
    print("No data returned or some fields are empty.")
```

These are all the necessary parts of the client. You can see the entire client file (data_client.py) in the same folder as the data server. 


## Testing
Now that you have all the necessary files, you can test the data server and client. You can run the server with the command python3 data_service_server.py and the same goes for the client. You should see the cleaned data being printed on the client side. If you want to ensure that the components you have created are robust and have fool proof error handling, you can develop unit tests on the client side. 

Now you could continue by creating the other services in the same way:

1) define the service
2) write protofile
3) generate grpc code
4) create server and client

To test the services individually, write individual client files. If you want to test them together, you can create a common client file which uses the stubs of all the different services to communicate. We have already implemented the grpc code for the training and testing, and included all the clients into one file called client.py. This client is a command line client, which can be used for quick testing of the connection of the different component. However, in order to better interact with the components, it would be advantageous to have a web application from which you could access the different grpc server. Next, we will cover how to create such a client. 

## Creating a common web application client

When creating the web application client, you can reuse quite alot of the code we wrote for the CLI client, as the webapplication will go through the same processes, like creating the channels and the stubs and making grpc requests to the servers. The main difference is that you will need to implement some sort of logic for how to make the requests, as well as a way to visualize the results. 

For our web application, we have used python and Flask to simplify the necessary tools you need to learn, but you can ofcourse use any language and package you feel the most comfortable with. The benefit of using grpc, is that the client and the servers can all be written in different programming languages, but the communication will still work the same thanks to grpc and protobuffers. 

Let's start by looking at how to write the app.py file containing the logic of the app. This will be quite similar to the way we wrote the apps in the last chapter for the different components. We need to define different routes for the different functions we want to call. We have created three ruotes in addition to the main route. Each route corresponds to one step of the pipeline, so for example we have one route called "/clean_data" where we oc course make the grpc request to the data server. 

For future reasons, we have decided that the user shall upload the csv file they want to use for the training. This way the user can test the pipeline on different csv files. Since we added this functionality, we also need to add some error handling in the case of incorrect files being uploaded. We need to check that the file uloaded is a csv file and that it contains the necessary columns for the data cleaning service. 

To validate the uploaded file we have created the following helper functions: 

```python
# Function to check if a file is a CSV
def is_csv_file(filename):
    return filename.lower().endswith('.csv')

# Function to validate the CSV format
def validate_csv(file):
    try:
        data = pd.read_csv(file)
        required_columns = ['Date', 'Close']
        missing_columns = [col for col in required_columns if col not in data.columns]
        
        if missing_columns:
            return f"Missing columns: {', '.join(missing_columns)}"
        
        # Check if there is enough data
        if data.shape[0] < 2:
            return "Not enough data for training. The CSV file must contain at least 2 rows."

        return None
    except pd.errors.EmptyDataError:
        return "The CSV file is empty."
    except Exception as e:
        return f"An error occurred during validation: {str(e)}"

```

Next we have defined the clean_data route and the corresponding function. The function first uses the helper functions to validate the uploaded file. Once the file has been validated, it uses makes a grpc request to the data server using the stub and lastly returns the values as json. Below you can see the code for the clean_data route: 

```python

@app.route('/clean_data', methods=['POST'])
def clean_data():
    if 'csv_file' not in request.files:
        return jsonify({'error': 'No file part in the request'}), 400

    file = request.files['csv_file']
    
    if file.filename == '':
        return jsonify({'error': 'No selected file'}), 400

    if not is_csv_file(file.filename):
        return jsonify({'error': 'Invalid file format. Only CSV files are allowed.'}), 400

    # Validate the CSV file format
    validation_error = validate_csv(file)
    if validation_error:
        return jsonify({'error': validation_error}), 400

    try:
        # Read file content for gRPC request
        file.seek(0)  # Reset file pointer to start
        csv_content = file.read()
        
        # Use gRPC to clean the data
        with grpc.insecure_channel('localhost:8080') as channel:
            stub = data_pb2_grpc.DataServiceStub(channel)
            request_data = data_pb2.DataRequest(csv_content=csv_content)
            response = stub.CleanData(request_data)
            
            # Prepare cleaned data for response
            cleaned_data = {
                'x_train': list(response.x_train),
                'x_test': list(response.x_test),
                'y_train': list(response.y_train),
                'y_test': list(response.y_test),
                'dates_train': list(response.dates_train),
                'dates_test': list(response.dates_test)
            }
            
            return jsonify(cleaned_data)
    except grpc.RpcError as e:
        return jsonify({'error': f'RPC failed: {e.code()} - {e.details()}'}), 500
    except Exception as e:
        return jsonify({'error': f'An error occurred: {str(e)}'}), 500

```
Once the csv file has been validated you can see that the process is quite similar to that of the CLI client. This also holds true for the testing and the training. We define routes for the testing and training separatly, and then we simply retreive the necessary values from the request, and use the stub to call the relevant function. If you take a look at the app.py file in the "4. communication using grpc and protobuffers\example\client" directory, you can see how the entire app has been implemented. We have also created an index.html file which specifies how the ui should look. This file can be found in the templates folder. If you run the app you can test out the UI and the pipeline. You should see the plot of the testing as output.