# Check data files for the ReArm project

The goal is to check that all the expected data files for the ReArm project are present. 



## Location of the data files

The data files are accessed via `dat/ReArm.lnk`, a symlink to the data.  
The symlink point to the directory where the data files are effectively stored.   
The symlink is ignored by Git (see `.gitignore` file).

The directory structure is the following: 

    dat
    └── ReArm.lnk 
        ├── ReArm_C1P02
        │   ├── Accelerometry
        │   ├── Armeo
        │   ├── Circle
        │   ├── Reaching
        │   └── Scan
        ├── ...
        ...



## Structure of the name of the data files

The data files are named according to the following structure:

```
<project>_<participant>_<date>_<visit>_<record>_<record-specific-file>.<extension>
```

where:

- `<project>` is the name of the project (`ReArm`)
- `<participant>` is the participant ID (`C1P02` as C1 for the center and P02 for the participant)
- `<date>` is the date of the recording (`20190131` as YYYYMMDD)
- `<visit>` is the visit number (`1`,`2`, `3` ), 
- `<record>` is the name of the record ( `r` `c` `a` `ac`)
- `<record-specific-file>` is a string that depends on the type of record
- `<extension>` depends on the type of record (csv, {easy, oxy3, oxy4}, xdf, cwa)




The following table shows the different types of records and the corresponding `<record>`:

| `<record>` | directory | Record   |
| :-- | :-----------     | :------  |   
| `r` | `Reaching`       | Reaching task |
| `c` | `Circle`         | Circular Steering task  | 
| `a` | `Armeo`          | Armeo's Ladybug Game|
| `ac` | `Accelerometry` | Wrist accelerometer at home |


The following table shows the different types of records and the corresponding `<record-specific-file>`:

| `<record>` | `<record-specific-file>` | Content |
|--------|--------------------------|-------------|
| `r` | `_k.csv` | Kinect MoCap  |
| `r` | `_k_m.csv` | Kinect markers |
| `r` | `_l_m_mau_np.csv` | l markers for `mau` and `np` |
| `r` | `_l_m_mau_p.csv` | l markers for `mau` and `p` |
| `r` | `_l_m_sau_np.csv` | l markers for `sau` and `np` |
| `r` | `_l_m_sau_p.csv` | l markers for `sau` and `p` |
| `r` | `.easy` | Oxysoft csv export  |
| `r` | `.oxy4` | Oxysoft binary data |
| `r` | `.xdf` | XDF file (it contains all the previous information) |
| | | |
| `c` | `_k.csv` | Kinect MoCap  |
| `c` | `_k_m.csv` | Kinect markers |
| `c` | `_l_m_np.csv` | l markers for `np` |
| `c` | `_l_m_p.csv` | l markers for `p` |
| `c` | `_l_np.csv` | l mouse mocap for `np` |
| `c` | `_l_p.csv` | l mouse mocap for `p` |
| `c` | `.easy` | Oxysoft csv export  |
| `c` | `.oxy4` | Oxysoft binary data |
| `c` | `.xdf` | XDF file (it contains all the previous information) |
| | | |
| `a` | `.easy` | Oxysoft csv export  |
| `a` | `.oxy4` | Oxysoft binary data |
| `a` | `.xdf` | XDF file (it contains all the previous information) |
| | | |
| `ac` | `_p.cwa` | Accelerometer data for `p` |
| `ac` | `_np.cwa` | Accelerometer data for `np` |

The following table shows how to interpret the items in a `<record-specific-file>`:

| `<record-specific-file>` | Content |
|--------| -------------|
| `mau` |  maximal arm use   |
| `sau` |  spontaneous arm use   |
| `np` |  non-paretic arm   |
| `p` |  paretic arm   |


# Script to check the data files

The script checks for the presence of the data files and prints out the missing files.  
This is a full OOP approach to the problem checking that all expected files are present in the directory.  

I use two classes :
- `ExpectedInRearmVisit`: stores the expected file names and sub-directories for a ReArm visit
- `CheckFilesInRearmVisit`: checks if the files are present in the directory


In [None]:
import os
import glob


class ExpectedInRearmVisit:
    """
    a class to hold the expected file names and sub-directories for a ReArm visit
    """

    def __init__(self):
        self.begOfFileName = "ReArm_C?P??_????????_?_"
        self.reachingFiles = [
            "_k.csv",
            "_k_m.csv",
            "_l_m_mau_np.csv",
            "_l_m_mau_p.csv",
            "_l_m_sau_np.csv",
            "_l_m_sau_p.csv",
            ".easy",
            ".oxy4",
            ".xdf",
        ]
        self.circleFiles = [
            "_k.csv",
            "_k_m.csv",
            "_l_m_np.csv",
            "_l_m_p.csv",
            "_l_np.csv",
            "_l_p.csv",
            ".easy",
            ".oxy4",
            ".xdf",
        ]
        self.armeoFiles = [
            ".easy",
            ".oxy4",
        ]
        self.accelerometerFiles = [
            "_p.cwa",
            "_np.cwa",
        ]
        self.visits = {
            "1": "Visit 1",
            "2": "Visit 2",
            "3": "Visit 3",
        }
        self.records = {
            "r": "Reaching",
            "c": "Circle",
            "a": "Armeo",
            "ac": "Accelerometry",
        }
        self.endOfFileNames = {
            "r": self.reachingFiles,
            "c": self.circleFiles,
            "a": self.armeoFiles,
            "ac": self.accelerometerFiles,
        }
        self.__setTotalExpectedFiles()

    def __setTotalExpectedFiles(self):
        self.totalExpectedFiles = 0
        for recordLetter, recordDirectory in self.records.items():
            self.totalExpectedFiles += len(self.endOfFileNames[recordLetter])


class CheckFilesInRearmVisit:
    """
    a class to check that the expected files are present in a ReArm visit
    """

    def __init__(self, dataDirectory):
        self.dataDirectory = self.__checkDataDirectory(dataDirectory)
        self.goodFileList = []
        self.badFileList = []
        self.expected = ExpectedInRearmVisit()

    def __checkDataDirectory(self, dataDirectory):
        # we expect a full path, but the user may have given a relative path
        if not os.path.isdir(dataDirectory):
            # most likely a relative path from the current directory
            dataDirectory = os.path.join(os.getcwd(), "..", dataDirectory)
            if not os.path.isdir(dataDirectory):
                raise NotADirectoryError(f"{dataDirectory} is not a directory")
            dataDirectory = os.path.normpath(dataDirectory)
        # ensure we have an absolute path
        dataDirectory = os.path.abspath(dataDirectory)
        return dataDirectory

    def checkFilesByVisit(self):
        dataDirectory = self.dataDirectory
        expected = self.expected
        goodFileList = self.goodFileList
        badFileList = self.badFileList
        for recordLetter, recordDirectory in expected.records.items():
            for endOfFileName in expected.endOfFileNames[recordLetter]:
                fnamePattern = expected.begOfFileName + recordLetter + endOfFileName
                fullFnamePattern = os.path.join(
                    dataDirectory, recordDirectory, fnamePattern
                )
                fullFnamePattern = os.path.normpath(fullFnamePattern)

                foundFiles = glob.glob(fullFnamePattern)
                if len(foundFiles) == 1:
                    goodFileList.append(foundFiles[0])
                else:
                    # we have a problem
                    message = ""
                    if len(foundFiles) == 0:
                        message += "    File not found"
                    if len(foundFiles) > 1:
                        message += "    Multiple files found:"
                        for problem in foundFiles:
                            message += "\n      " + problem
                    badFileList.append([fullFnamePattern, message])
        self.goodFileList = goodFileList
        self.badFileList = badFileList

    def printProblems(self):
        nExpectedFiles = self.expected.totalExpectedFiles
        nBadFiles = len(self.badFileList)
        print(f"In {self.dataDirectory}:")
        if len(self.badFileList) == 0:
            print(f"- Found {nExpectedFiles} expected files.")
        else:
            print(f"- Did not find {nBadFiles} files out of {nExpectedFiles} expected.")
            for problem in self.badFileList:
                print(f"  {problem[0]}")
                print(f"{problem[1]}")

    def printGoodFiles(self):
        print("Good files:")
        for goodFile in self.goodFileList:
            print(f"  {goodFile}")

    def saveGoodFiles(self, saveFile):
        saveFile = os.path.join(self.dataDirectory, saveFile)
        print(f"Saving good files to {saveFile}")
        with open(saveFile, "w") as f:
            for goodFile in self.goodFileList:
                f.write(f"{goodFile}\n")

In [None]:
dataDirectory = "dat/ReArm.lnk/ReArm_C1P02"
C1P02_V1 = CheckFilesInRearmVisit(dataDirectory)
C1P02_V1.checkFilesByVisit()
C1P02_V1.printProblems()
C1P02_V1.printGoodFiles()
C1P02_V1.saveGoodFiles("goodFiles.txt")