## Data processing steps
### 1. Remove duplicate data
For example, if bid and ask are the same between adjacent time points, it means that no transaction has occurred, then remove the same line
### 2. Generate new features based on price and share in 'bid' and 'ask'   
- time: The time of the observation, and the interval sets to 10s.
- bid_weighted_average: The weighted average bid price.  
- bid_high: The highest price a buyer is willing to pay for the stock.  
- bid_volumn: The total number of shares that buyers are willing to buy at the corresponding bid price.  
- ask_weighted_average: The weighted average ask price.  
- ask_low: The lowest price a seller is willing to accept for the stock.  
- ask_volumn: The total number of shares that sellers are willing to sell at the corresponding ask price.   
- mid_price: The average price between the highest bid price and lowest ask price.   
- spread: The difference between the highest bid price and lowest ask price.   
- bid_ask_ratio: The ratio of bid volume to ask volume.   
- Trend: Trend: The direction of the price trend. A value of 1 indicates the price is rising, while a value of 2 indicates the price is falling. A value of 0 indicates the price is stable.   
- rising: A binary variable indicating whether the price trend is rising (1) or not (0).   
- falling: A binary variable indicating whether the price trend is falling (1) or not (0).   
- stable: A binary variable indicating whether the price trend is stable (1) or not (0).   

In [3]:
import csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os

In [7]:
import os
import csv
import numpy as np

# Set the directory paths for input and output files
input_dir = 'LOBS/'
output_dir = 'CSV/'

# Loop over each input file in the input directory
for filename in os.listdir(input_dir):
    if not filename.endswith('.txt'):
        continue

    # Open the input and output files
    with open(os.path.join(input_dir, filename), 'r') as infile, open(os.path.join(output_dir, filename[:-4] + '.csv'), 'w', newline='') as outfile:
        # Create a CSV writer object
        writer = csv.writer(outfile)

        # Write the header row
        writer.writerow(['time', 'bid_weighted_average', 'bid_high','bid_volumn',\
                         'ask_weighted_average','ask_low','ask_volumn',\
                         'mid_price','spread','bid_ask_ratio'])

        # Loop over each line in the input file
        for line in infile:
            # Split the line into its components
            components = eval(line.replace("Exch0, ", ""))
            
            # Read the next line from the input file and evaluate it
            next_line = infile.readline()
            if not next_line:
                continue
            else:
                next_components = eval(next_line.replace("Exch0, ", ""))
            
            # Check if bid and ask are same with next components
            if components[1] == next_components[1]:
                continue
                
            # Check the number of components in bid and ask, initially set to 2
            elif len(components[1][0][1]) < 2 or len(components[1][1][1]) < 2:
                continue
            
            else:
                time = components[0] 
                bid_weighted_average = round(sum(item[0]*item[1] for item in components[1][0][1])/sum(item[1] for item in components[1][0][1]), 2)
                bid_high = max(item[0] for item in components[1][0][1])
                bid_volumn = sum(item[1] for item in components[1][0][1])
                ask_weighted_average = round(sum(item[0]*item[1] for item in components[1][1][1])/sum(item[1] for item in components[1][1][1]), 2)
                ask_low = min(item[0] for item in components[1][1][1])
                ask_volumn = sum(item[1] for item in components[1][1][1])
                mid_price = round(np.average([max(item[0] for item in components[1][0][1]),min(item[0] for item in components[1][1][1])]), 2)
                spread = ask_low - bid_high
                bid_ask_ratio = round(bid_volumn/ask_volumn,2)
                # Write the data to the output file
                writer.writerow([time, bid_weighted_average, bid_high, bid_volumn,\
                                 ask_weighted_average,ask_low, ask_volumn,\
                                 mid_price, spread, bid_ask_ratio])
