Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Downloading MNIST #143

Closed
keepforever opened this issue Dec 13, 2017 · 12 comments
Closed

Error Downloading MNIST #143

keepforever opened this issue Dec 13, 2017 · 12 comments

Comments

@keepforever
Copy link

keepforever commented Dec 13, 2017

After using this code from the book/github:

from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')
mnist

I get the following error

--------------------------------------------------------------
OSError                      Traceback (most recent call last)
<ipython-input-5-13fc10446cca> in <module>()
      1 
      2 from sklearn.datasets import fetch_mldata
----> 3 mnist = fetch_mldata('MNIST original')
      4 mnist

E:\brian\Anaconda3\lib\site-packages\sklearn\datasets\mldata.py in fetch_mldata(dataname, target_name, data_name, transpose_data, data_home)
    156     # load dataset matlab file
    157     with open(filename, 'rb') as matlab_file:
--> 158         matlab_dict = io.loadmat(matlab_file, struct_as_record=True)
    159 
    160     # -- extract data from matlab_dict

E:\brian\Anaconda3\lib\site-packages\scipy\io\matlab\mio.py in loadmat(file_name, mdict, appendmat, **kwargs)
    134     variable_names = kwargs.pop('variable_names', None)
    135     MR = mat_reader_factory(file_name, appendmat, **kwargs)
--> 136     matfile_dict = MR.get_variables(variable_names)
    137     if mdict is not None:
    138         mdict.update(matfile_dict)

E:\brian\Anaconda3\lib\site-packages\scipy\io\matlab\mio5.py in get_variables(self, variable_names)
    290                 continue
    291             try:
--> 292                 res = self.read_var_array(hdr, process)
    293             except MatReadError as err:
    294                 warnings.warn(

E:\brian\Anaconda3\lib\site-packages\scipy\io\matlab\mio5.py in read_var_array(self, header, process)
    250            `process`.
    251         '''
--> 252         return self._matrix_reader.array_from_header(header, process)
    253 
    254     def get_variables(self, variable_names=None):

scipy/io/matlab/mio5_utils.pyx in scipy.io.matlab.mio5_utils.VarReader5.array_from_header (scipy\io\matlab\mio5_utils.c:7087)()

scipy/io/matlab/mio5_utils.pyx in scipy.io.matlab.mio5_utils.VarReader5.array_from_header (scipy\io\matlab\mio5_utils.c:6206)()

scipy/io/matlab/mio5_utils.pyx in scipy.io.matlab.mio5_utils.VarReader5.read_real_complex (scipy\io\matlab\mio5_utils.c:7549)()

scipy/io/matlab/mio5_utils.pyx in scipy.io.matlab.mio5_utils.VarReader5.read_numeric (scipy\io\matlab\mio5_utils.c:4252)()

scipy/io/matlab/mio5_utils.pyx in scipy.io.matlab.mio5_utils.VarReader5.read_element (scipy\io\matlab\mio5_utils.c:3836)()

scipy/io/matlab/streams.pyx in scipy.io.matlab.streams.GenericStream.read_string (scipy\io\matlab\streams.c:2119)()

scipy/io/matlab/streams.pyx in scipy.io.matlab.streams.GenericStream.read_into (scipy\io\matlab\streams.c:1911)()

OSError: could not read bytes
@ageron
Copy link
Owner

ageron commented Dec 19, 2017

I'm not entirely sure, but I'm guessing this is due to the fact that Python 3.6 on MacOSX does not come with SSL certificates, you must install the certifi package manually, as explained in this StackOverflow Answer.

@ageron
Copy link
Owner

ageron commented Dec 19, 2017

Alternatively, this might be due to a corrupted download. Checkout this StackOverflow Answer.

@adamrmelnyk
Copy link

I ran into this as well. Data was corrupted. Just removed the mldata as suggested from the Stack Overflow post. Thanks!

@ageron
Copy link
Owner

ageron commented Jan 14, 2018

Thanks for your feedback @adamrmelnyk , good to know that the fix still works.

@Doc-Scott
Copy link

Doc-Scott commented Mar 9, 2018

This happened for me too, I think the data got corrupted the first time due to a bad internet connection. I removed it from the shell using:

rm ~/scikit_learn_data/mldata/mnist-original.mat

Which is described in StackOverflow post above, minus the rm, frustratingly it took me several different approaches to realise that. I'm using Linux but I think rm works in any pythonshell

@andsec
Copy link

andsec commented Apr 27, 2018

Hello @ageron ,

How do you re-download the corrupted mnist-original.mat file after you have deleted it?

@nil12285
Copy link

nil12285 commented Apr 27, 2018

copy https://github.com/amplab/datascience-sp14/raw/master/lab7/mldata/mnist-original.mat file into your scikit data home dir.

How to find where is your "scikit data home dir"?

from sklearn.datasets.base import get_data_home 
print (get_data_home())

@andsec
Copy link

andsec commented Apr 27, 2018

Thank you @nil12285! It worked perfectly.

@ttwstnow
Copy link

thank u @nil12285, it works for me

@rohith28
Copy link

rohith28 commented Jul 9, 2018

I used following code, It worked for me.
Getting data from a github repo


from six.moves import urllib
from sklearn.datasets import fetch_mldata

from scipy.io import loadmat
mnist_alternative_url = "https://github.com/amplab/datascience-sp14/raw/master/lab7/mldata/mnist-original.mat"
mnist_path = "./mnist-original.mat"
response = urllib.request.urlopen(mnist_alternative_url)
with open(mnist_path, "wb") as f:
    content = response.read()
    f.write(content)
mnist_raw = loadmat(mnist_path)
mnist = {
    "data": mnist_raw["data"].T,
    "target": mnist_raw["label"][0],
    "COL_NAMES": ["label", "data"],
    "DESCR": "mldata.org dataset: mnist-original",
}
print("Success!")

@ageron
Copy link
Owner

ageron commented Dec 20, 2018

The fetch_mldata() function is now deprecated since Scikit-Learn 0.20. You should use fetch_openml() instead. I added more details in #301, so I'll close this issue.

@ageron ageron closed this as completed Dec 20, 2018
@shobhit1893
Copy link

You this - this will work.
#301 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants