**Table of contents**<a id='toc0_'></a>    
- 1. [Check data files for the ReArm project](#toc1_)    
  - 1.1. [Location of the data files](#toc1_1_)    
  - 1.2. [Structure of the name of the data files](#toc1_2_)    
  - 1.3. [Simple script to check the data files](#toc1_3_)    
  - 1.4. [Script using a class for the expected data by visit](#toc1_4_)    
- 2. [TODO: Make a full OOP version of the script](#toc2_)    

<!-- vscode-jupyter-toc-config
	numbering=true
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# 1. <a id='toc1_'></a>[Check data files for the ReArm project](#toc0_)

The goal is to check that all the expected data files for the ReArm project are present. 



## 1.1. <a id='toc1_1_'></a>[Location of the data files](#toc0_)

The data files are accessed via `dat/ReArm.lnk`, a symlink to the data.  
The symlink point to the directory where the data files are effectively stored.   
The symlink is ignored by Git (see `.gitignore` file).

The directory structure is the following: 

    dat
    └── ReArm.lnk 
        ├── ReArm_C1P02
        │   ├── Accelerometry
        │   ├── Armeo
        │   ├── Circle
        │   ├── Reaching
        │   └── Scan
        ├── ...
        ...



## 1.2. <a id='toc1_2_'></a>[Structure of the name of the data files](#toc0_)

The data files are named according to the following structure:

```
<project>_<participant>_<date>_<visit>_<record>_<record-specific-file>.<extension>
```

where:

- `<project>` is the name of the project (`ReArm`)
- `<participant>` is the participant ID (`C1P02` as C1 for the center and P02 for the participant)
- `<date>` is the date of the recording (`20190131` as YYYYMMDD)
- `<visit>` is the visit number (`1`,`2`, `3` ), 
- `<record>` is the name of the record ( `r` `c` `a` `ac`)
- `<record-specific-file>` is a string that depends on the type of record
- `<extension>` depends on the type of record (csv, {easy, oxy3, oxy4}, xdf, cwa)




The following table shows the different types of records and the corresponding `<record>`:

| `<record>` | directory | Record   |
| :-- | :-----------     | :------  |   
| `r` | `Reaching`       | Reaching task |
| `c` | `Circle`         | Circular Steering task  | 
| `a` | `Armeo`          | Armeo's Ladybug Game|
| `ac` | `Accelerometry` | Wrist accelerometer at home |


The following table shows the different types of records and the corresponding `<record-specific-file>`:

| `<record>` | `<record-specific-file>` | Content |
|--------|--------------------------|-------------|
| `r` | `_k.csv` | Kinect MoCap  |
| `r` | `_k_m.csv` | Kinect markers |
| `r` | `_l_m_mau_np.csv` | l markers for `mau` and `np` |
| `r` | `_l_m_mau_p.csv` | l markers for `mau` and `p` |
| `r` | `_l_m_sau_np.csv` | l markers for `sau` and `np` |
| `r` | `_l_m_sau_p.csv` | l markers for `sau` and `p` |
| `r` | `.easy` | Oxysoft csv export  |
| `r` | `.oxy4` | Oxysoft binary data |
| `r` | `.xdf` | XDF file (it contains all the previous information) |
| | | |
| `c` | `_k.csv` | Kinect MoCap  |
| `c` | `_k_m.csv` | Kinect markers |
| `c` | `_l_m_np.csv` | l markers for `np` |
| `c` | `_l_m_p.csv` | l markers for `p` |
| `c` | `_l_np.csv` | l mouse mocap for `np` |
| `c` | `_l_p.csv` | l mouse mocap for `p` |
| `c` | `.easy` | Oxysoft csv export  |
| `c` | `.oxy4` | Oxysoft binary data |
| `c` | `.xdf` | XDF file (it contains all the previous information) |
| | | |
| `a` | `.easy` | Oxysoft csv export  |
| `a` | `.oxy4` | Oxysoft binary data |
| `a` | `.xdf` | XDF file (it contains all the previous information) |
| | | |
| `ac` | `_p.cwa` | Accelerometer data for `p` |
| `ac` | `_np.cwa` | Accelerometer data for `np` |



## 1.3. <a id='toc1_3_'></a>[Simple script to check the data files](#toc0_)

The following script checks the data files and prints the missing files


In [None]:
# this is necessary for relative paths in the code
import os

if os.path.basename(os.getcwd()) == "notebooks":
    os.chdir("..")

In [None]:
# beginning of file names for each visit (search pattern for glob)
begOfFileName = "ReArm_C?P??_????????_?_"

# expected end of file names for each visit, by directory
reachingFiles = [
    "_k.csv",
    "_k_m.csv",
    "_l_m_mau_np.csv",
    "_l_m_mau_p.csv",
    "_l_m_sau_np.csv",
    "_l_m_sau_p.csv",   
    ".easy",
    ".oxy4",
    ".xdf",
]
circleFiles = [ 
    "_k.csv",
    "_k_m.csv",
    "_l_m_np.csv",
    "_l_m_p.csv",
    "_l_np.csv",
    "_l_p.csv",   
    ".easy",
    ".oxy4",
    ".xdf",
]
armeoFiles = [
    ".easy",
    ".oxy4",
]
accelerometerFiles = [
    "_p.cwa",
    "_np.cwa",
]
expectedVisits = {
    "1" : "Visit 1",
    "2" : "Visit 2",
    "3" : "Visit 3",
}

expectedRecords = {
    "r": "Reaching",
    "c": "Circle",
    "a": "Armeo",
    "ac": "Accelerometry",
}

expectedEndOfFileNames = {
    "r": reachingFiles,
    "c": circleFiles,
    "a": armeoFiles,
    "ac": accelerometerFiles,
}

In [None]:
import glob

def checkFilesByVisit( dataDirectory, expectedRecords, expectedEndOfFileNames):
    goodFileList = []
    badFileList = []
    for recordLetter, recordDirectory in expectedRecords.items():
        for endOfFileName in expectedEndOfFileNames[recordLetter]:
            fnamePattern = begOfFileName + recordLetter + endOfFileName
            fullFnamePattern = os.path.join(dataDirectory, recordDirectory, fnamePattern) 
            fullFnamePattern = os.path.normpath(fullFnamePattern)

            foundFiles = glob.glob(fullFnamePattern)
            if len(foundFiles) == 1:
                goodFileList.append(foundFiles[0])
            else: 
                # we have a problem
                message = ""
                if len(foundFiles) == 0:
                    message += ("    File not found") 
                if len(foundFiles) > 1:
                    message +=("    Multiple files found:")
                    for problem in foundFiles:
                        message +=("\n      " + problem)
                badFileList.append([fullFnamePattern, message] )
    
    return goodFileList, badFileList

In [None]:
# use the function to check the files
dataDirectory = "dat/ReArm.lnk/ReArm_C1P02"
goodFileList, badFileList = checkFilesByVisit(dataDirectory, expectedRecords, expectedEndOfFileNames)

# check that we found all the files
totalExpectedFiles = 0
for recordLetter, recordDirectory in expectedRecords.items():
    totalExpectedFiles += len(expectedEndOfFileNames[recordLetter])

print(f"Found {len(goodFileList)} files out of {totalExpectedFiles} expected.")
# for problem in goodFileList:
#     print("  " + problem)

print(f"Did not find {len(badFileList)} files out of {totalExpectedFiles} expected.")
if len(badFileList) > 0:
    for problem in badFileList:
        print(f"  {problem[0]}")
        print(f"{problem[1]}")

## 1.4. <a id='toc1_4_'></a>[Script using a class for the expected data by visit](#toc0_)

The following script checks the data files and prints the missing files, using a class for the expected data by visit. 

This improves the readability of the code and allows to add more checks in the future.



In [None]:
import glob

class ExpectedFilesInRearmVisit:
    """ 
    a class to hold the expected file names and sub-directories for a ReArm visit
    """
    def __init__(self):
        self.begOfFileName = "ReArm_C?P??_????????_?_"
        self.reachingFiles = [
            "_k.csv",
            "_k_m.csv",
            "_l_m_mau_np.csv",
            "_l_m_mau_p.csv",
            "_l_m_sau_np.csv",
            "_l_m_sau_p.csv",   
            ".easy",
            ".oxy4",
            ".xdf",
        ]
        self.circleFiles = [ 
            "_k.csv",
            "_k_m.csv",
            "_l_m_np.csv",
            "_l_m_p.csv",
            "_l_np.csv",
            "_l_p.csv",   
            ".easy",
            ".oxy4",
            ".xdf",
        ]
        self.armeoFiles = [
            ".easy",
            ".oxy4",
        ]
        self.accelerometerFiles = [
            "_p.cwa",
            "_np.cwa",
        ]
        self.visits = {
            "1" : "Visit 1",
            "2" : "Visit 2",
            "3" : "Visit 3",
        }
        self.records = {
            "r": "Reaching",
            "c": "Circle",
            "a": "Armeo",
            "ac": "Accelerometry",
        }
        self.endOfFileNames = {
            "r": self.reachingFiles,
            "c": self.circleFiles,
            "a": self.armeoFiles,
            "ac": self.accelerometerFiles,
        }
        self.__setTotalExpectedFiles()

    def __setTotalExpectedFiles(self):
        self.totalExpectedFiles = 0
        for recordLetter, recordDirectory in self.records.items():
            self.totalExpectedFiles += len(self.endOfFileNames[recordLetter])

def checkFilesByVisit(dataDirectory):
    expected = ExpectedFilesInRearmVisit()
    goodFileList = []
    badFileList = []
    for recordLetter, recordDirectory in expected.records.items():
        for endOfFileName in expected.endOfFileNames[recordLetter]:
            fnamePattern = expected.begOfFileName + recordLetter + endOfFileName
            fullFnamePattern = os.path.join(dataDirectory, recordDirectory, fnamePattern) 
            fullFnamePattern = os.path.normpath(fullFnamePattern)

            foundFiles = glob.glob(fullFnamePattern)
            if len(foundFiles) == 1:
                goodFileList.append(foundFiles[0])
            else: 
                # we have a problem
                message = ""
                if len(foundFiles) == 0:
                    message += ("    File not found") 
                if len(foundFiles) > 1:
                    message +=("    Multiple files found:")
                    for problem in foundFiles:
                        message +=("\n      " + problem)
                badFileList.append([fullFnamePattern, message] )
    
    return goodFileList, badFileList

def printProblems(fileList, dataDirectory):
    print(f"In {dataDirectory}:")
    if len(fileList) == 0:
        print(f"- Found {expected.totalExpectedFiles} expected files.")
    else:
        print(f"- Did not find {len(fileList)} files out of {expected.totalExpectedFiles} expected.")
        for problem in fileList:
            print(f"  {problem[0]}")
            print(f"{problem[1]}")
            
###################################################################################################
expected = ExpectedFilesInRearmVisit()
dataDirectory = "dat/ReArm.lnk/ReArm_C1P02"
goodFileList, badFileList = checkFilesByVisit(dataDirectory)
printProblems(badFileList, dataDirectory)


# 2. <a id='toc2_'></a>[TODO: Make a full OOP version of the script](#toc0_)

This would allow to check the data files in a more flexible way, with more checks.

But maybe it is not worth the effort, as the current version is already quite flexible and easy to read.