Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reading MAT files #805

Closed
StefanKarpinski opened this issue May 5, 2012 · 18 comments
Closed

reading MAT files #805

StefanKarpinski opened this issue May 5, 2012 · 18 comments
Labels
help wanted Indicates that a maintainer wants help on an issue or pull request

Comments

@StefanKarpinski
Copy link
Member

This ability would be tremendously useful for Matlab users who have data already in MAT form.

@pao
Copy link
Member

pao commented May 5, 2012

For v7.3 files, they are in an HDF5 format. We might also look into cribbing notes from SciPy.

@StefanKarpinski
Copy link
Member Author

I was not aware it was HDF5. That's nice, actually, although it is a massively over-engineered format.

@pao
Copy link
Member

pao commented May 5, 2012

Well, that only buys you compatibility with v7.3 files, which I'm still not sure they've made the default. It's much faster as you save larger files, though, and it's your only option for 2+ GB files.

I will also point out that the ability to read/save MATLAB files is important for dealing with customers who use MATLAB, or who might require it as an output format. We've dealt with both.

@ViralBShah
Copy link
Member

We can use libmatio that octave uses.

@ViralBShah
Copy link
Member

@pao
Copy link
Member

pao commented May 6, 2012

That looks like it can handle both file types as long as HDF5 support is available elsewhere and has a pretty straightforward C interface.

@ViralBShah
Copy link
Member

Right - we can quickly get matio support. The need had not come up before, but I always figured we could do it at a moment's notice if necessary. I had used it a few years back, and seems like development has continued.

-viral

On 06-May-2012, at 7:19 PM, pao wrote:

That looks like it can handle both file types as long as HDF5 support is available elsewhere and has a pretty straightforward C interface.


Reply to this email directly or view it on GitHub:
#805 (comment)

@JeffBezanson
Copy link
Member

@simonster
Copy link
Member

For older MAT files, https://github.com/simonster/MAT.jl

@JeffBezanson
Copy link
Member

Cool, thanks! Would be good to list it here: https://github.com/JuliaLang/METADATA.jl

@timholy
Copy link
Member

timholy commented Dec 2, 2012

Very nice, Simon!

@ViralBShah
Copy link
Member

This is awesome!

@staticfloat
Copy link
Member

Probably the module I'm going to use the most in the foreseeable future. Thank you very much, @simonster. :)

@timholy
Copy link
Member

timholy commented Dec 2, 2012

@simonster, do you have any interest in synchronizing with the interface for the HDF5 MatIO module? That way we could provide a single interface supporting multiple versions of *.mat files, and users wouldn't need to know which version a given file is saved in.

There are really two ways we could go about doing that: either you could extend your functions to support some of the operations in the HDF5 module (e.g., providing separate open, close, and read(mfile, "myvariable") functions), or I could provide a wrapper for the HDF5 functionality that mimics your interface. As you know well, the code in the HDF5 module checks to see what type of *.mat file it is; we could use that as the basis for an overall function that detects the type and then reads it either as a v7.3 file or an older file type. I guess I have a slight preference for asking you to move closer to the MatIO-style interface, for the particular reason that it will be more efficient if you just want to extract a single variable from the file (imagine you have a file with one huge variable and a few teeny variables, and you're only interested in the teeny ones). But I can go either way.

Out of curiosity, what does version == 0x0100 correspond to? Does this support -v4, -v6, and -v7? Also, are you intending to provide write support? Or is that not really necessary since it's in MatIO?

@simonster
Copy link
Member

@timholy, Yes, I think we should definitely standardize on one interface for this. I'm fine with adding a function to read a single variable (or array of variables) out of the file, although I think we do actually want a function to read the entire file at once, for convenience and because it's more efficient if you want all the variables.

We should also agree on conventions for how data gets returned from these modules and make sure it is consistent between HDF5 and level 5 files. In level 5 format, the number of dimensions of an array is flexible, but MATLAB always saves two, so scalars are 1x1 arrays, empty vectors are 0x0 arrays, column vectors are nx1 arrays, and row vectors are 1xn arrays. I load 1x1 arrays as scalars and empty vectors as empty in only one dimension, although I leave the latter two cases as-is. I'm not sure how these get handled in HDF5.

I check version == 0x0100 because it's what the MAT file format spec says to use. The manual calls it "level 5" format and says it is compatible with MATLAB v5+, and I can read both v6 and v7 format from current MATLAB with the check, so I assume nothing has changed. My code can't currently read v4 format, but it looks simpler than v5/v6/v7 (which are themselves much simpler than v7.3/HDF5) so it would probably not take much work to support it as well.

@nolta
Copy link
Member

nolta commented Dec 2, 2012

We should probably move matio.jl out of julia_hdf5 and into MAT.jl.

@timholy
Copy link
Member

timholy commented Dec 2, 2012

@simonster, sure, reading all the variables would be easy to add. And I think the dimensions may already work out as you say, but indeed we should check.

@nolta, I was beginning to wonder about the same thing; right now the ability to read & write .mat files, while a "side-effect" of HDF5 support, might be hard to find.

Of course, MAT.jl will require HDF5.jl (I will rename it as such shortly), and so no matter where the file lives users will get both.

@timholy
Copy link
Member

timholy commented Dec 4, 2012

julia> load("src/MAT.jl")

julia> load("matio.jl"); using MatIO

julia> fidh5 = matopen("/home/tim/src/julia-modules/julia_hdf5/matfile.mat", "r")
MatlabHDF5File(16777216,"/home/tim/src/julia-modules/julia_hdf5/matfile.mat",true,false)

julia> fid = matopen("test/array.mat", "r")
Matlabv5File(IOStream(<file test/array.mat>),false,nothing)

:-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Indicates that a maintainer wants help on an issue or pull request
Projects
None yet
Development

No branches or pull requests

8 participants