FileNotFoundError: in Chapter 2 loading the data #448

FerCalatayud · 2019-06-16T00:26:08Z

This is my exact code. This is the error I get:

FileNotFoundError: [Errno 2] File b'datasets/housing\housing.csv' does not exist: b'datasets/housing\housing.csv'

I've tried every PATH combination and nothing. I really don't know what is the problem.
The book is awesome, but I can't continue because of this small error.

ageron · 2019-06-16T10:39:50Z

Hi @FerCalatayud ,
Thanks for your question. I think you need to fix the definition of HOUSING_PATH. This is what is should be (see the notebook):

HOUSING_PATH = os.path.join("datasets", "housing")

You are probably using Windows, which separates file paths with \ (while Linux and MacOSX use /). To make the code portable, I used os.path.join("datasets", "housing"), which will return datasets\housing on Windows, and datasets/housing on Linux and MacOSX.

However, URLs use / no matter which system you are using. This is why the DOWNLOAD_ROOT and HOUSING_URL use /.

If it still fails after fixing this, make sure you are using Python 3, and that the file housing.csv is located in the datasets\housing directory in the same project as the Jupyter notebook.

Hope this helps.

FerCalatayud · 2019-06-16T17:41:30Z

Hi, thanks for the fast response Ageron.

It still doesn't work. You were correct, I'm using windows with python 3 on the jupyter notebook.

With this URL I'm downloading the data from your repository on GitHub, right? so the housing.csv is located at datasets\housing, and not at the same directory as my jupyter notebook.

Now that I think about it, I could load the housing.csv and housing.tgz from my computer, right?

FerCalatayud · 2019-06-16T20:39:17Z

I changed the URL to my path where I have the data set on my PC, and it worked.

But the question remains, why does the URL doesn't work? I've tried everything, and it gives me either the error that I posted before or an "ERROR 404".

I even print the path before accessing it with ".read_csv()", click it and directs me to the right web page, but the code outputs an "ERROR 404"

ageron · 2019-06-17T20:16:14Z

Hi @FerCalatayud ,
There are two steps:

fetch_housing_data() downloads the data from the URL on my github repo (the URL must contain only slashes /)
load_housing_data() loads the CSV file that was just downloaded (the path must contain either slashes or backslashes depending on the O.S.).

I believe there were two problems in your initial code:

your code defines the fetch_housing_data() function but it never calls it. You should call fetch_housing_data_() before you call load_housing_data().
your code used a mix of slashes and backslashes for the file path used by load_housing_data().

Hope this helps.

PythonUser1981 · 2020-01-22T00:52:11Z

Basically the correct code goes like this:

Make sure you indent the functions properly

import os
import tarfile
from six.moves import urllib

DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml/master/"
HOUSING_PATH = os.path.join("datasets", "housing")
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"

def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
    if not os.path.isdir(housing_path):
        os.makedirs(housing_path)
    tgz_path = os.path.join(housing_path, "housing.tgz")
    urllib.request.urlretrieve(housing_url, tgz_path)
    housing_tgz = tarfile.open(tgz_path)
    housing_tgz.extractall(path=housing_path)
    housing_tgz.close()
    
fetch_housing_data()


import pandas as pd

def load_housing_data(housing_path=HOUSING_PATH):
    csv_path = os.path.join(housing_path, "housing.csv")
    return pd.read_csv(csv_path)



housing = load_housing_data()
housing.head()

akirpach · 2021-02-16T16:33:53Z

Basically the correct code goes like this:

Make sure you indent the functions properly

import os
import tarfile
from six.moves import urllib

DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml/master/"
HOUSING_PATH = os.path.join("datasets", "housing")
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"

def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
if not os.path.isdir(housing_path):
os.makedirs(housing_path)
tgz_path = os.path.join(housing_path, "housing.tgz")
urllib.request.urlretrieve(housing_url, tgz_path)
housing_tgz = tarfile.open(tgz_path)
housing_tgz.extractall(path=housing_path)
housing_tgz.close()

fetch_housing_data()

import pandas as pd

def load_housing_data(housing_path=HOUSING_PATH):
csv_path = os.path.join(housing_path, "housing.csv")
return pd.read_csv(csv_path)

housing = load_housing_data()
housing.head()

This helped me. The book implicitly indicates to call fetch_housing_data() in the text while your eyes follow what the code example is the book.

ageron · 2021-03-10T23:19:54Z

Thanks for your feedback @keyntaur . I've added an explicit call to fetch_housing_data() in the code example, so there will be less confusion for future readers. 👍

MahsumKocabey · 2021-07-02T15:55:19Z

Hi @FerCalatayud ,
Thanks for your question. I think you need to fix the definition of HOUSING_PATH. This is what is should be (see the notebook):
HOUSING_PATH = os.path.join("datasets", "housing")
You are probably using Windows, which separates file paths with \ (while Linux and MacOSX use /). To make the code portable, I used os.path.join("datasets", "housing"), which will return datasets\housing on Windows, and datasets/housing on Linux and MacOSX.

However, URLs use / no matter which system you are using. This is why the DOWNLOAD_ROOT and HOUSING_URL use /.

If it still fails after fixing this, make sure you are using Python 3, and that the file housing.csv is located in the datasets\housing directory in the same project as the Jupyter notebook.

Hope this helps.

The solution is:
Go to terminal press control+c to shut the kernel and write to the terminal:

$ sudo /Applications/Python\ 3.9/Install\ Certificates.command

Then re run the by writing "jupyter notebook". Thats it guys!

ageron closed this as completed Mar 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FileNotFoundError: in Chapter 2 loading the data #448

FileNotFoundError: in Chapter 2 loading the data #448

FerCalatayud commented Jun 16, 2019

ageron commented Jun 16, 2019

FerCalatayud commented Jun 16, 2019

FerCalatayud commented Jun 16, 2019

ageron commented Jun 17, 2019

PythonUser1981 commented Jan 22, 2020 •

edited by ageron

Loading

akirpach commented Feb 16, 2021

Basically the correct code goes like this:

Make sure you indent the functions properly

ageron commented Mar 10, 2021

MahsumKocabey commented Jul 2, 2021

FileNotFoundError: in Chapter 2 loading the data #448

FileNotFoundError: in Chapter 2 loading the data #448

Comments

FerCalatayud commented Jun 16, 2019

ageron commented Jun 16, 2019

FerCalatayud commented Jun 16, 2019

FerCalatayud commented Jun 16, 2019

ageron commented Jun 17, 2019

PythonUser1981 commented Jan 22, 2020 • edited by ageron Loading

Basically the correct code goes like this:

Make sure you indent the functions properly

akirpach commented Feb 16, 2021

Basically the correct code goes like this:

Make sure you indent the functions properly

ageron commented Mar 10, 2021

MahsumKocabey commented Jul 2, 2021

PythonUser1981 commented Jan 22, 2020 •

edited by ageron

Loading