## Parsing and creating CSV files

This jupyter notebook parses the data files collected for the **anechoic chamber** located in `ESP32/DATA/` folder. The files are labelled as follows:

If the filename is `channelx-ystraight.txt`, then:

* x denotes the channel number 
* y denotes the distance between ESP32 and access point



In [1]:
import pandas as pd
import numpy as np
import os

Change this according to the where the data files are located.

In [2]:
path_to_files = "../ESP32/DATA/"

In [3]:
filelist = []

for r,d,f in os.walk(path_to_files):
    filelist = f

Collecting file names in a list.

In [4]:
for f in filelist:
    print(f)

channel11-250straight.txt
channel11-50straight.txt
channel11-185straight.txt
channel6-185straight.txt
channel1-185straight.txt
channel1-250straight.txt
channel6-50straight.txt
channel1-50straight.txt
channel6-250straight.txt


## Logic for a single file
Some trial: You can see the logic of parsing for a single file here
(Commented out)

In [5]:
#filename = "channel11-250straight.txt"


In [6]:
#int(filename[filename.find('-')+1:filename.find("straight")]) #Splicing to find distance

In [7]:
#int(filename[filename.find("l")+1:filename.find('-')]) #Splicing to find channel no.

In [8]:
#f = open(path_to_files+filename)

In [9]:
#string = f.read()

In [10]:
#string = string[string.find("<CSI>"):] #Remove information from the beginning

In [11]:
#string

In [12]:
#%pprint

Going through file

In [13]:
#all_lists = []
#done = False

#while done!=True:
    
#    index1 = string.find("</len>")+6
#    index2 = string.find("\n\n")
    
#    if(index1 == -1 or index2 == -1):
#        done = True
#       break

#    csi_list = string[index1:index2].replace('\n',' ').replace('- ','-').split()
#    csi_list = [int(i) for i in csi_list]
    #print(csi_list)
#    all_lists.append(csi_list)
#    string = string[index2+2:]

#all_lists.pop()
#len(all_lists)

## Looping for all files

The above logic was for a single file. The code below does exactly the same thing for all files in the folder and adds them to the dataframe created.

Firstly we need to create a dataframe with the columns as mentioned:
![title](img/schema.png)
    

In [14]:
col_list = []
for i in range(1,129):#Subcarrier number
    colname1 = "csi_"+str(i)+"real"
    colname2 = "csi_"+str(i)+"imag"
    col_list.append(colname1)
    col_list.append(colname2)
        
col_list.append('channel')
col_list.append('distance')

In [15]:
df = pd.DataFrame(columns=col_list)

Now, we do the following steps:

1. Iterate over each file (`filename`) in `filelist`
2. Extract the `distance` and `channel` number from the `filename` itself
3. Open the file and remove extra information from the beginning
4. Add frames to the `df` dataframe and keep on iterating

In [16]:
for filename in filelist:
    print("Processing"+filename +"...")
    distance = int(filename[filename.find('-')+1:filename.find("straight")]) #Splicing to find distance
    channel = int(filename[filename.find("l")+1:filename.find('-')]) #Splicing to find channel no.
    
    f = open(path_to_files+filename,'r')
    string = f.read()
    
    string = string[string.find("<CSI>"):] #Removing all unneccesary information in the beginning
    all_lists_in_file = []
    done = False

    while done!=True:
    
        index1 = string.find("</len>")+6 #Find start of CSI frame
        index2 = string.find("\n<CSI>") #Find end of CSI frame

        if(index1 == -1 or index2 == -1): #No more frames to process
            done = True
            break

        #print(string[index1:index2].replace('\n',' ').replace('- ','-').split())
        #Create list of subcarrier data for a single frame(row)
        csi_list = string[index1:index2].replace('\n',' ').replace('- ','-').split()
        csi_list = [int(i) for i in csi_list]
        
        #print(len(csi_list))
        
        ## NOTE: these if conditions have been added due to errors in data collection(masking). 
        
        if(len(csi_list) == 257):
            #print(csi_list)
            csi_list = csi_list[:-1]
        elif(len(csi_list) == 258):
            #print(csi_list)
            csi_list = csi_list[:-2]
        elif(len(csi_list) == 259):
            csi_list = csi_list[:-3]
        elif(len(csi_list) == 260):
            csi_list = csi_list[:-4]
        csi_list.append(channel)
        csi_list.append(distance)
        #print(len(csi_list))
        
        df.loc[len(df)] = csi_list

        #Append this list to the cumulative list collection for this file
        all_lists_in_file.append(csi_list)
        
        #Start search for next rows
        string = string[index2+2:]
        #print(string)
        

    #Remove an empty list from the end (implementation specific)
    #all_lists.pop()
    
    
    

Processingchannel11-250straight.txt...
Processingchannel11-50straight.txt...
Processingchannel11-185straight.txt...
Processingchannel6-185straight.txt...
Processingchannel1-185straight.txt...
Processingchannel1-250straight.txt...
Processingchannel6-50straight.txt...
Processingchannel1-50straight.txt...
Processingchannel6-250straight.txt...


Setting index name for clear meaning

In [17]:
df.index.names = ['Frame ID']

In [18]:
Now, have a look at the final data.

SyntaxError: invalid syntax (<ipython-input-18-d0a0eeed937f>, line 1)

In [22]:
df

Unnamed: 0_level_0,csi_1real,csi_1imag,csi_2real,csi_2imag,csi_3real,csi_3imag,csi_4real,csi_4imag,csi_5real,csi_5imag,...,csi_125real,csi_125imag,csi_126real,csi_126imag,csi_127real,csi_127imag,csi_128real,csi_128imag,channel,distance
Frame ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,124,-64,7,0,17,12,17,11,16,10,...,35,23,34,23,33,25,32,26,11,250
1,30,-32,1,0,-8,21,-7,21,-7,21,...,39,-25,38,-24,38,-22,38,-21,11,250
2,30,-32,1,0,21,11,21,10,22,9,...,39,30,39,29,39,29,39,29,11,250
3,116,64,7,0,-18,12,-18,12,-16,12,...,29,-33,30,-32,31,-30,30,-29,11,250
4,82,32,5,0,6,22,7,22,7,21,...,46,8,45,9,44,1,0,42,11,250
5,30,-32,1,0,22,-1,20,-2,20,-3,...,44,2,42,1,41,-1,40,-3,11,250
6,30,-32,1,0,23,7,22,7,22,7,...,19,45,18,45,16,44,14,43,11,250
7,124,-64,7,0,-1,-24,-1,-23,-1,-22,...,-47,18,-48,16,-49,13,-48,9,11,250
8,30,-32,1,0,-20,-8,-20,-8,-20,-7,...,-46,-11,-47,-9,-46,-8,-43,-7,11,250
9,116,64,7,0,21,9,21,8,20,7,...,22,44,21,44,19,44,18,43,11,250


Save this file as a CSV for future use.

In [23]:
df.to_csv("anechoic_chamber_data.csv")