Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor TsFile #736

Open
wants to merge 14 commits into
base: master
from
Open

Refactor TsFile #736

wants to merge 14 commits into from

Conversation

@HTHou
Copy link
Contributor

HTHou commented Jan 14, 2020

HTHou added 9 commits Jan 6, 2020
@LeiRui

This comment has been minimized.

Copy link
Contributor

LeiRui commented Jan 14, 2020

The travis tests fail

@HTHou

This comment has been minimized.

Copy link
Contributor Author

HTHou commented Jan 14, 2020

The travis tests fail

Yes, there are still lots of works to complete. So it's only a draft.

return new SnappyCompressor();
default:
throw new CompressionTypeNotSupportedException(name.toString());
case UNCOMPRESSED:

This comment has been minimized.

Copy link
@Ring-k

Ring-k Jan 14, 2020

Contributor

Please check the indention.

*/
TSDataType getDataType(String measurement) throws NoMeasurementException;
TSDataType getDataType(Path path) throws NoMeasurementException, IOException;

This comment has been minimized.

Copy link
@Ring-k

Ring-k Jan 14, 2020

Contributor

The comment should be updated too.

HTHou and others added 5 commits Jan 22, 2020
* add time and timestamp keyword.
…to new_tsFile_v2
@HTHou HTHou marked this pull request as ready for review Feb 10, 2020
@HTHou

This comment has been minimized.

Copy link
Contributor Author

HTHou commented Feb 11, 2020

The new structure of TsFileMetadata is shown below.
Screen Shot 2020-02-11 at 9 31 17 AM

More detailed doc to be added soon.

@liutaohua

This comment has been minimized.

Copy link
Contributor

liutaohua commented Feb 11, 2020

I think TsOffsetArray is a bad design, because it is hard to understanding it, and each DeviceIndexArray takes 8 byte to store the index.
If you put the contents of TsOffsetArray directly into the DeviceIndexArray, it would only take 4 byte to record the number, and it would be easier to understand

protected List<List<ChunkMetaData>> chunkGroupMetaDataList = new ArrayList<>();
protected List<String> deviceList = new ArrayList<>();
protected List<ChunkMetaData> chunkMetaDataList = new ArrayList<>();
private static Map<Path, List<ChunkMetaData>> timeseriesMetadataMap = new TreeMap<>();

This comment has been minimized.

Copy link
@fanhualta

fanhualta Feb 11, 2020

Contributor

Why add static? I don't think all TsFileIOWriter instances should share this variable, which will lead to serious bugs. Please remove it.

private ChunkMetaData currentChunkMetaData;
private long markedPosition;
private static Map<String, int[]> deviceOffsetsMap = new HashMap<>();

This comment has been minimized.

Copy link
@fanhualta

fanhualta Feb 11, 2020

Contributor

Please remove static. The reason is the same as above.

for (i = 0; i < tsMetadataList.size(); i++) {
tsOffsets[i] = out.getPosition();
int size = tsMetadataList.get(i).serializeTo(out.wrapAsStream());
tsOffsets[i+1] = tsOffsets[i] + size;
}
Comment on lines +301 to +305

This comment has been minimized.

Copy link
@fanhualta

fanhualta Feb 11, 2020

Contributor

Please remove useless assignment and addition operations.

if(i > 0){
      tsOffsets[0] = out.getPosition();
}
for (i = 0; i < tsMetadataList.size(); i++) {
      int size = tsMetadataList.get(i).serializeTo(out.wrapAsStream());
      tsOffsets[i+1] = out.getPosition();
 }
*
* @param chunkGroupMetaDataList all chunk group metadata in memory
* @return TsDeviceMetadataIndex in TsFileMetaData
* @return tsOffsets in TsFileMetaData

This comment has been minimized.

Copy link
@fanhualta

fanhualta Feb 11, 2020

Contributor

Please add the java doc.

if (!deviceOffsetsMap.containsKey(deviceId)) {
deviceOffsetsMap.put(deviceId, new int[2]);
deviceOffsetsMap.get(deviceId)[0] = i;
}
Comment on lines +282 to +285

This comment has been minimized.

Copy link
@fanhualta

fanhualta Feb 11, 2020

Contributor

Please use the following codes instead to reduce the operation and improve the efficiency.

deviceOffsetsMap.computeIfAbsent(deviceId, k -> new int[2])[0] = i;
List<ChunkMetaData> chunkMetaDataListOfOnePath = timeseriesMetadataMap.getOrDefault(path,
new ArrayList<ChunkMetaData>());
chunkMetaDataListOfOnePath.add(currentChunkMetaData);
timeseriesMetadataMap.put(path, chunkMetaDataListOfOnePath);
Comment on lines +209 to +212

This comment has been minimized.

Copy link
@fanhualta

fanhualta Feb 11, 2020

Contributor

Please use the following codes instead to reduce the operation and improve the efficiency.

timeseriesMetadataMap.computeIfAbsent(path, k -> new ArrayList<>()).add(currentChunkMetaData);
logger.debug("start chunk group:{}, file position {}", deviceId, out.getPosition());
currentChunkGroupMetaData = new ChunkGroupMetaData(deviceId, new ArrayList<>(),
out.getPosition());
chunkMetaDataList = new ArrayList<>();
}

/**
* end chunk and write some log.
* If there is no data in the chunk group, nothing will be flushed.
*/
public void endChunkGroup(long version) throws IOException {

This comment has been minimized.

Copy link
@fanhualta

fanhualta Feb 11, 2020

Contributor

If version is useless, please remove it.

@@ -31,7 +31,7 @@
import org.apache.iotdb.tsfile.utils.Binary;
import org.apache.iotdb.tsfile.write.record.RowBatch;
import org.apache.iotdb.tsfile.write.record.datapoint.DataPoint;
import org.apache.iotdb.tsfile.write.schema.MeasurementSchema;
import org.apache.iotdb.tsfile.write.schema.TimeseriesSchema;

This comment has been minimized.

Copy link
@fanhualta

fanhualta Feb 11, 2020

Contributor

Please remove useless import.

@@ -25,7 +25,7 @@
import org.apache.iotdb.tsfile.file.footer.ChunkGroupFooter;
import org.apache.iotdb.tsfile.write.record.RowBatch;
import org.apache.iotdb.tsfile.write.record.datapoint.DataPoint;
import org.apache.iotdb.tsfile.write.schema.MeasurementSchema;
import org.apache.iotdb.tsfile.write.schema.TimeseriesSchema;

This comment has been minimized.

Copy link
@fanhualta

fanhualta Feb 11, 2020

Contributor

Please remove useless import.

public TimeseriesMetaData(String measurementId, List<ChunkMetaData> chunkMetaDataList) {
this.measurementId = measurementId;
this.chunkMetaDataList = chunkMetaDataList;
this.numOfChunkMetaDatas = chunkMetaDataList.size();
}
Comment on lines +42 to +46

This comment has been minimized.

Copy link
@fanhualta

fanhualta Feb 11, 2020

Contributor

If the constructor is useless, please remove it.

Copy link
Member

qiaojialin left a comment

please format the code first

private int numOfChunkMetaDatas;

private String measurementId;
private List<ChunkMetaData> chunkMetaDataList = new ArrayList<>();

This comment has been minimized.

Copy link
@qiaojialin

qiaojialin Feb 12, 2020

Member

No need to store this in TimeseriesMetaData. I expect TimeseriesMetadata be used in this way:

List TsFileSequenceReader.getAllChunkMetadata(TimeseriesMetaDataIndex metadataIndex)


import org.apache.iotdb.tsfile.utils.ReadWriteIOUtils;

public class TimeseriesMetaData {

This comment has been minimized.

Copy link
@qiaojialin

qiaojialin Feb 12, 2020

Member
Suggested change
public class TimeseriesMetaData {
public class TimeseriesMetadataIndex {

Actually this is an index to ChunkMetadatas of one series. Similar to the TsDeviceMetadataIndex


private long startOffsetOfChunkMetaDataList;
private int chunkMetaDataListDataSize;
private int numOfChunkMetaDatas;

This comment has been minimized.

Copy link
@qiaojialin

qiaojialin Feb 12, 2020

Member

Do we need this field?

We could read ChunkMetadata from a ByteBuffer like this:

List metadatas = new ArrayList();
for (bytebuffer.remaining > 0) {
metadatas.add(bytebuffer.readChunkMetadata());
}

return 5;
default:
return -1;
case BOOLEAN:

This comment has been minimized.

Copy link
@qiaojialin

qiaojialin Feb 12, 2020

Member

Do not change this, our google-style contains the blanks.

return PLA;
default:
return UNCOMPRESSED;
case 0:

This comment has been minimized.

Copy link
@qiaojialin

qiaojialin Feb 12, 2020

Member

change back

return IRREGULAR_FREQ;
default:
return IRREGULAR_FREQ;
case 0:

This comment has been minimized.

Copy link
@qiaojialin

qiaojialin Feb 12, 2020

Member

change back

return new FloatStatistics();
default:
throw new UnknownColumnTypeException(type.toString());
case INT32:

This comment has been minimized.

Copy link
@qiaojialin

qiaojialin Feb 12, 2020

Member

change back

logger.error(
"Failed to get HDFSInput in Hadoop file system. Please check your dependency of Hadoop module.",
e);
logger.error("Failed to get HDFSInput in Hadoop file system. Please check your dependency of Hadoop module.", e);

This comment has been minimized.

Copy link
@qiaojialin

qiaojialin Feb 12, 2020

Member

I think you need to use idea to format the codes......... Maybe this line exceeds the max length.

@HTHou

This comment has been minimized.

Copy link
Contributor Author

HTHou commented Feb 12, 2020

The updated structure is here.

Screen Shot 2020-02-12 at 4 25 20 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants
You can’t perform that action at this time.