Skip to content

Commit

Permalink
Big cleanup after clang-tidy run and code analysis.
Browse files Browse the repository at this point in the history
There are some TODOs left which are marked with // TODO

Reenabled compilation flags for debug code.

Changed a lot of uint64 values to signed type. The signed type still
offers us the possibility to work with tremendously huge files and it
is easier in some operations to have a signed value like -1 for a file
size query (not existing, stream, whatever).

Removed unnecessary imports throughout the project.

Changed a lot of code so, that equal operations are correct. See to it:
- that compared types are both signed or unsigned
- compared types have the right sizes 32Bit < 64Bit and 64Bit<64Bit

Changed method parameter types to const ref where feasible.

Changed auto loop parameters to const ref where necessary.

Refactored Extractor::extract() and pulled out some methods to make it
more readable.

Renamed checkPremises throughout the whole project to fulfillsPremises.

Changed "old-style" casts to e.g. static_cast / reinterpret_cast over
all files.

Added more tests for the index entry storage strategies.
  • Loading branch information
heinold committed Aug 8, 2019
1 parent bcfcc57 commit 4f21d94
Show file tree
Hide file tree
Showing 84 changed files with 767 additions and 650 deletions.
11 changes: 11 additions & 0 deletions .idea/inspectionProfiles/Project_Default.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ cmake_minimum_required(VERSION 3.13)

set(CMAKE_CXX_STANDARD 17)
# -pedantic -ansi -Winit-self -Wmissing-declarations -Wextra -Winit-self -Wold-style-cast -Woverloaded-virtual -Wuninitialized
SET(CMAKE_CXX_FLAGS_DEBUG "-O0 -g -fPIC -Wreturn-type")
SET(CMAKE_C_FLAGS_DEBUG "-O0 -g -fPIC -Wreturn-type")
SET(CMAKE_C_FLAGS_DEBUG "-O0 -g -fPIC")
SET(CMAKE_CXX_FLAGS_DEBUG "-O0 -g -fPIC")

set(CMAKE_EXE_LINKER_FLAGS " -static-libstdc++ -static-libgcc")
#add_definitions(" -DCURL_STATICLIB")
Expand Down
6 changes: 3 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,23 +35,23 @@ The version 1 header is exactly 512 Byte wide and can be described like:

``` bash
| (IndexHeader) |
| (u_int32_t) | (u_int32_t) | (u_int32_t) | (u_int32_t) | (u_int64_t) | (u_int64_t) | (bool) | (u_char) | (u_int64_t)[59] |
| (u_int32_t) | (u_int32_t) | (u_int32_t) | (u_int32_t) | (int64_t) | (int64_t) | (bool) | (u_char) | (int64_t)[59] |
| indexWriterVersion | sizeOfIndexEntry | magicNumber | blockInterval | numberOfEntries | linesInIndexedFile | dictionariesAreCompressed | placeholder | reserved |
```

The version 1 index entry has an extracted width of 32800 Byte index entry can be described like:

```bash
| IndexEntry |
| (u_int64_t) | (u_int64_t) | (u_int64_t) | (u_int32_t) | (u_int32_t) | (u_char) | (u_int16_t) | (u_char)[32768] |
| (int64_t) | (int64_t) | (int64_t) | (u_int32_t) | (u_int32_t) | (u_char) | (u_int16_t) | (u_char)[32768] |
| blockID | blockOffsetInRawFile | startingLineInEntry | offsetOfFirstValidLine | bits | reserved | compressedDictionarySize | dictionary |
```

If compression is enabled, this looks a bit different (note the last field differs then and depends on compressedDictionarySize!), when it is stored in the FQI file:

```bash
| IndexEntry |
| (u_int64_t) | (u_int64_t) | (u_int64_t) | (u_int32_t) | (u_int32_t) | (u_char) | (u_int16_t) | (u_char)[compressedDictionarySize] |
| (int64_t) | (int64_t) | (int64_t) | (u_int32_t) | (u_int32_t) | (u_char) | (u_int16_t) | (u_char)[compressedDictionarySize] |
| blockID | blockOffsetInRawFile | startingLineInEntry | offsetOfFirstValidLine | bits | reserved | compressedDictionarySize | dictionary |
```

Expand Down
15 changes: 13 additions & 2 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,23 @@ include_directories(
${CMAKE_SOURCE_DIR}/src
)

# Glob together files in src and add compilation flags for debug. This is tod isable additional compiler checks for
# external headers and sources. We can't change them so why check them.
# We still will add all files manually later. Why? Because cmake won't necessarily find additional files and folders
# automatically!
#file(GLOB_RECURSE ALL_SOURCES ${PROJECT_SOURCE_DIR} *.cpp *.h)
#message(${ALL_SOURCES})
#set_source_files_properties(${ALL_SOURCES} PROPERTIES COMPILE_FLAGS "-std=c++11 -Wreturn-type -pedantic -ansi -Winit-self -Wmissing-declarations -Wextra -Winit-self -Wold-style-cast -Woverloaded-virtual -Wuninitialized")
#set_source_files_properties(zlib.h PROPERTIES COMPILE_FLAGS -Wnoold-style-cast++)

add_library(
fastqindexlib
common/CommonStructsAndConstants.cpp common/CommonStructsAndConstants.h
common/ErrorAccumulator.cpp common/ErrorAccumulator.h
common/ErrorMessages.cpp common/ErrorMessages.h
common/IndexHeader.cpp common/IndexHeader.h
common/ErrorMessages.cpp common/ErrorMessages.h
common/IndexEntry.cpp common/IndexEntry.h
common/IndexEntryV1.cpp common/IndexEntryV1.h
common/IndexEntryV1.h
common/IOHelper.cpp common/IOHelper.h
common/StringHelper.cpp common/StringHelper.h
process/base/ZLibBasedFASTQProcessorBaseClass.cpp process/base/ZLibBasedFASTQProcessorBaseClass.h
Expand Down Expand Up @@ -73,3 +82,5 @@ target_link_libraries(
fastqindex
fastqindexlib
)

set_target_properties(fastqindex PROPERTIES COMPILE_FLAGS "-Wreturn-type -pedantic -ansi -Winit-self -Wextra -Winit-self -Wold-style-cast -Woverloaded-virtual -Wuninitialized")
10 changes: 5 additions & 5 deletions src/common/CommonStructsAndConstants.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,15 @@ const u_char MAGIC_NUMBER_RAW[4] = {1, 2, 3, 4};
* Magic number to identify FastqIndEx index files.
* Basically (((uint) 4) << 24) + (((uint) 3) << 16) + (((uint) 2) << 8) + 1;
*/
const uint MAGIC_NUMBER = *((uint *) MAGIC_NUMBER_RAW);
const uint MAGIC_NUMBER = *(reinterpret_cast<uint *>(const_cast<u_char *>(MAGIC_NUMBER_RAW)));

const u_int64_t kB = 1024;
const int64_t kB = 1024;

const u_int64_t MB = kB * 1024;
const int64_t MB = kB * 1024;

const u_int64_t GB = MB * 1024;
const int64_t GB = MB * 1024;

const u_int64_t TB = GB * 1024;
const int64_t TB = GB * 1024;

const int DEFAULT_RECORD_SIZE = 4;

16 changes: 6 additions & 10 deletions src/common/CommonStructsAndConstants.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,22 +12,18 @@
#include <string>
#include <zconf.h>

using std::shared_ptr;

struct IndexEntryV1;

/**
* Used to identify a file as a file created by this binary.
*/
extern const uint MAGIC_NUMBER;

extern const u_int64_t kB;
extern const int64_t kB;

extern const u_int64_t MB;
extern const int64_t MB;

extern const u_int64_t GB;
extern const int64_t GB;

extern const u_int64_t TB;
extern const int64_t TB;

extern const int DEFAULT_RECORD_SIZE;

Expand All @@ -36,12 +32,12 @@ extern const int DEFAULT_RECORD_SIZE;
* I'd prefer to set the following value as constant ints, but C++ then refuses to initialize arrays with the nice {0}
* syntax. Though it works, when the constants are in a class.
*/
#define WINDOW_SIZE 32768
#define WINDOW_SIZE 32768U

/**
* Chunk size for raw data from compressed file
*/
#define CHUNK_SIZE 16384
#define CHUNK_SIZE 16384U

/**
* As we mix strings, stringstreams and cstrings, a buffer with this size is used to ensure, that the decompressed data
Expand Down
3 changes: 1 addition & 2 deletions src/common/ErrorAccumulator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
*/

#include "ErrorAccumulator.h"
#include <fstream>
#include <iostream>
#include <string>
#include <sstream>
Expand Down Expand Up @@ -46,7 +45,7 @@ void ErrorAccumulator::severe(const string &msg) {

vector<string> ErrorAccumulator::getErrorMessages() { return errorMessages; }

const void ErrorAccumulator::addErrorMessage(_cstr s0, _cstr s1, _cstr s2, _cstr s3, _cstr s4, _cstr s5) {
void ErrorAccumulator::addErrorMessage(_cstr s0, _cstr s1, _cstr s2, _cstr s3, _cstr s4, _cstr s5) {
ErrorAccumulator::debug(s0, s1, s2, s3, s4, s5);
errorMessages.emplace_back(join(s0, s1, s2, s3, s4, s5));
}
Expand Down
2 changes: 1 addition & 1 deletion src/common/ErrorAccumulator.h
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ class ErrorAccumulator {
* This would actually be a perfect example for a variadic function but handling variadic functions is tricky as
* the 'va_...()' macros don't know about the number of passed arguments.
*/
const void addErrorMessage(_cstr s0, _cstr s1 = "", _cstr s2 = "", _cstr s3 = "", _cstr s4 = "", _cstr s5 = "");
void addErrorMessage(_cstr s0, _cstr s1 = "", _cstr s2 = "", _cstr s3 = "", _cstr s4 = "", _cstr s5 = "");

static string join(_cstr s0, _cstr s1 = "", _cstr s2 = "", _cstr s3 = "", _cstr s4 = "", _cstr s5 = "");

Expand Down
1 change: 0 additions & 1 deletion src/common/ErrorMessages.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
* Distributed under the MIT License (license terms are at https://github.com/dkfz-odcf/FastqIndEx/blob/master/LICENSE.txt).
*/

#include "ErrorMessages.h"

const char *ERR_MESSAGE_FASTQ_INVALID = "The fastq file does not exist or cannot be read.";
const char *ERR_MESSAGE_INDEX_INVALID = "The index file already exists and is either not a file or cannot be written.";
8 changes: 4 additions & 4 deletions src/common/IOHelper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ path IOHelper::getUserHomeDirectory() {
// The getuid() function shall always be successful and no return value is reserved to indicate the error.
// Therefore, we have a valid uid and getpwuid should not fail as well. No further checks applied here.
struct passwd *pw = getpwuid(getuid());
if(pw == nullptr) {
if (pw == nullptr) {
auto s = stringstream();
s << "FastqIndEx could not retrieve the user directory of the current user.";
report(s, nullptr);
Expand Down Expand Up @@ -97,14 +97,14 @@ path IOHelper::fullPath(const path &file) {
return path(string(buf));
}

shared_ptr<map<string, string>> IOHelper::loadIniFile(path file, string section) {
shared_ptr<map<string, string>> IOHelper::loadIniFile(const path &file, const string& section) {
auto resultMap = make_shared<map<string, string>>();
CSimpleIniA configuration;
configuration.SetUnicode(true);
auto result = configuration.LoadFile(file.string().c_str());
configuration.LoadFile(file.string().c_str());
CSimpleIniA::TNamesDepend keys;
configuration.GetAllKeys(section.c_str(), keys);
for (auto key : keys) {
for (const auto& key : keys) {
auto val = configuration.GetValue(section.c_str(), key.pItem);
(*resultMap)[string(key.pItem)] = string(val);
}
Expand Down
2 changes: 1 addition & 1 deletion src/common/IOHelper.h
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ class IOHelper {
*/
static path fullPath(const path &file);

static shared_ptr<map<string, string>> loadIniFile(path file, string section);
static shared_ptr<map<string, string>> loadIniFile(const path &file, const string &section);
};

#endif //FASTQINDEX_IOHELPER_H
33 changes: 0 additions & 33 deletions src/common/IndexEntryV1.cpp

This file was deleted.

31 changes: 29 additions & 2 deletions src/common/IndexEntryV1.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
#include "CommonStructsAndConstants.h"
#include "IndexEntry.h"

using namespace std;

/**
* Represents an entry in a gz index file
* The order of the entries is set so, that we have a reduced padding!
Expand Down Expand Up @@ -67,6 +69,14 @@ struct IndexEntryV1 : public VirtualIndexEntry {
*/
Bytef dictionary[WINDOW_SIZE]{0};

static shared_ptr<IndexEntryV1> from(unsigned char bits,
u_int64_t blockID,
u_int32_t offsetOfFirstValidLine,
u_int64_t offsetInRawFile,
u_int64_t startingLineInEntry) {
return make_shared<IndexEntryV1>(bits, blockID, offsetOfFirstValidLine, offsetInRawFile, startingLineInEntry);
}

IndexEntryV1(unsigned char bits,
u_int64_t blockID,
u_int32_t offsetOfFirstValidLine,
Expand All @@ -80,13 +90,30 @@ struct IndexEntryV1 : public VirtualIndexEntry {

IndexEntryV1() = default;

bool operator==(const IndexEntryV1 &rhs) const;
bool operator==(const IndexEntryV1 &rhs) const {
return bits == rhs.bits &&
blockID == rhs.blockID &&
offsetToNextLineStart == rhs.offsetToNextLineStart &&
blockOffsetInRawFile == rhs.blockOffsetInRawFile &&
startingLineInEntry == rhs.startingLineInEntry;
}

bool operator!=(const IndexEntryV1 &rhs) const {
return !(rhs == *this);
}

shared_ptr<IndexEntry> toIndexEntry();
shared_ptr<IndexEntry> toIndexEntry() {
auto indexLine = std::make_shared<IndexEntry>(
this->blockID,
this->bits,
this->offsetToNextLineStart,
this->blockOffsetInRawFile,
this->startingLineInEntry
);
memcpy(indexLine->window, this->dictionary, sizeof(this->dictionary));
indexLine->compressedDictionarySize = this->compressedDictionarySize;
return indexLine;
}
};

#endif //FASTQINDEX_INDEXENTRYV1_H
6 changes: 3 additions & 3 deletions src/common/IndexHeader.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,13 +43,13 @@ struct IndexHeader {
* This value is not available when a header is created in write mode. It will be written at the end of indexing
* process. However, the indexer and the extractor will work without it.
*/
u_int64_t numberOfEntries{0};
int64_t numberOfEntries{0};

/**
* Stores the amount of lines in the indexed file. This value is also written after the indexing is done.
* However, the indexer and the extractor will work without it.
*/
u_int64_t linesInIndexedFile{0};
int64_t linesInIndexedFile{0};

/**
* Tell the IndexReader, if the entries are compressed. (1-byte padded to 8 bytes!) The value looks weird without
Expand All @@ -62,7 +62,7 @@ struct IndexHeader {
* Reserved space for information which might be added in
* the future.
*/
u_int64_t reserved[59]{0};
int64_t reserved[59]{0};

explicit IndexHeader(u_int32_t binaryVersion, u_int32_t sizeOfIndexEntry, u_int32_t blockInterval, bool dictionariesAreCompressed) {
this->indexWriterVersion = binaryVersion;
Expand Down
19 changes: 8 additions & 11 deletions src/process/base/ZLibBasedFASTQProcessorBaseClass.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,9 @@
*/

#include "process/io/PathSource.h"
#include "process/extract/Extractor.h"
#include "ZLibBasedFASTQProcessorBaseClass.h"
#include <cstdio>
#include <experimental/filesystem>
#include <iostream>
#include <unistd.h>
#include <zlib.h>

ZLibBasedFASTQProcessorBaseClass::ZLibBasedFASTQProcessorBaseClass(
Expand Down Expand Up @@ -54,16 +51,16 @@ bool ZLibBasedFASTQProcessorBaseClass::initializeZStream(int mode) {

bool ZLibBasedFASTQProcessorBaseClass::readCompressedDataFromSource() {
/* get some compressed data from input file */
int result = this->fastqFile->read(input, CHUNK_SIZE);
int64_t result = this->fastqFile->read(input, CHUNK_SIZE);

if (result == -1) {
this->addErrorMessage("There was an error while trying to read the FASTQ file '", fastqFile->toString(), "'.");
return false;
} else
zStream.avail_in = (uInt) result;
zStream.avail_in = static_cast<uInt>(result);

if (zStream.avail_in == 0) {
this->addErrorMessage("There was no data available in the compressed stream");
this->addErrorMessage("There was no data available in the compressed stream.");
return false;
}
zStream.next_in = input;
Expand Down Expand Up @@ -114,13 +111,13 @@ void ZLibBasedFASTQProcessorBaseClass::resetSlidingWindowIfNecessary() {
bool ZLibBasedFASTQProcessorBaseClass::decompressNextChunkOfData(bool checkForStreamEnd, int flushMode) {
// Inflate until out of input, output, or at end of block
// Update the total input and output counters
u_int64_t availableInBeforeInflate = zStream.avail_in;
u_int64_t availableOutBeforeInflate = zStream.avail_out;
int64_t availableInBeforeInflate = zStream.avail_in;
int64_t availableOutBeforeInflate = zStream.avail_out;
u_int32_t windowPositionBeforeInflate = WINDOW_SIZE - zStream.avail_out;

zlibResult = inflate(&zStream, flushMode);
u_int64_t readBytes = availableInBeforeInflate - zStream.avail_in;
u_int64_t writtenBytes = availableOutBeforeInflate - zStream.avail_out;
int64_t readBytes = availableInBeforeInflate - zStream.avail_in;
int64_t writtenBytes = availableOutBeforeInflate - zStream.avail_out;

totalBytesIn += readBytes;
totalBytesOut += writtenBytes;
Expand All @@ -129,7 +126,7 @@ bool ZLibBasedFASTQProcessorBaseClass::decompressNextChunkOfData(bool checkForSt
// as we work with a string append method, we need to copy the read data to a fresh buffer first.
Bytef cleansedWindowForCout[CLEAN_WINDOW_SIZE]{
0}; // +1 more Byte than WINDOW_SIZE for a definitely 0-terminated c-string!
std::memcpy(cleansedWindowForCout, window + windowPositionBeforeInflate, writtenBytes);
std::memcpy(cleansedWindowForCout, window + windowPositionBeforeInflate, writtenBytes);// TODO

if (zlibResult == Z_NEED_DICT) {
zlibResult = Z_DATA_ERROR;
Expand Down

0 comments on commit 4f21d94

Please sign in to comment.