Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileNotFoundError: in Chapter 2 loading the data #448

Closed
FerCalatayud opened this issue Jun 16, 2019 · 8 comments
Closed

FileNotFoundError: in Chapter 2 loading the data #448

FerCalatayud opened this issue Jun 16, 2019 · 8 comments

Comments

@FerCalatayud
Copy link

image

This is my exact code. This is the error I get:

FileNotFoundError: [Errno 2] File b'datasets/housing\housing.csv' does not exist: b'datasets/housing\housing.csv'

I've tried every PATH combination and nothing. I really don't know what is the problem.
The book is awesome, but I can't continue because of this small error.

@ageron
Copy link
Owner

ageron commented Jun 16, 2019

Hi @FerCalatayud ,
Thanks for your question. I think you need to fix the definition of HOUSING_PATH. This is what is should be (see the notebook):

HOUSING_PATH = os.path.join("datasets", "housing")

You are probably using Windows, which separates file paths with \ (while Linux and MacOSX use /). To make the code portable, I used os.path.join("datasets", "housing"), which will return datasets\housing on Windows, and datasets/housing on Linux and MacOSX.

However, URLs use / no matter which system you are using. This is why the DOWNLOAD_ROOT and HOUSING_URL use /.

If it still fails after fixing this, make sure you are using Python 3, and that the file housing.csv is located in the datasets\housing directory in the same project as the Jupyter notebook.

Hope this helps.

@FerCalatayud
Copy link
Author

Hi, thanks for the fast response Ageron.

It still doesn't work. You were correct, I'm using windows with python 3 on the jupyter notebook.

With this URL I'm downloading the data from your repository on GitHub, right? so the housing.csv is located at datasets\housing, and not at the same directory as my jupyter notebook.

Now that I think about it, I could load the housing.csv and housing.tgz from my computer, right?

@FerCalatayud
Copy link
Author

I changed the URL to my path where I have the data set on my PC, and it worked.

But the question remains, why does the URL doesn't work? I've tried everything, and it gives me either the error that I posted before or an "ERROR 404".

I even print the path before accessing it with ".read_csv()", click it and directs me to the right web page, but the code outputs an "ERROR 404"

@ageron
Copy link
Owner

ageron commented Jun 17, 2019

Hi @FerCalatayud ,
There are two steps:

  1. fetch_housing_data() downloads the data from the URL on my github repo (the URL must contain only slashes /)
  2. load_housing_data() loads the CSV file that was just downloaded (the path must contain either slashes or backslashes depending on the O.S.).

I believe there were two problems in your initial code:

  1. your code defines the fetch_housing_data() function but it never calls it. You should call fetch_housing_data_() before you call load_housing_data().
  2. your code used a mix of slashes and backslashes for the file path used by load_housing_data().

Hope this helps.

@PythonUser1981
Copy link

PythonUser1981 commented Jan 22, 2020

Basically the correct code goes like this:

Make sure you indent the functions properly

import os
import tarfile
from six.moves import urllib

DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml/master/"
HOUSING_PATH = os.path.join("datasets", "housing")
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"

def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
    if not os.path.isdir(housing_path):
        os.makedirs(housing_path)
    tgz_path = os.path.join(housing_path, "housing.tgz")
    urllib.request.urlretrieve(housing_url, tgz_path)
    housing_tgz = tarfile.open(tgz_path)
    housing_tgz.extractall(path=housing_path)
    housing_tgz.close()
    
fetch_housing_data()


import pandas as pd

def load_housing_data(housing_path=HOUSING_PATH):
    csv_path = os.path.join(housing_path, "housing.csv")
    return pd.read_csv(csv_path)



housing = load_housing_data()
housing.head()

@akirpach
Copy link

Basically the correct code goes like this:

Make sure you indent the functions properly

import os
import tarfile
from six.moves import urllib

DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml/master/"
HOUSING_PATH = os.path.join("datasets", "housing")
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"

def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
if not os.path.isdir(housing_path):
os.makedirs(housing_path)
tgz_path = os.path.join(housing_path, "housing.tgz")
urllib.request.urlretrieve(housing_url, tgz_path)
housing_tgz = tarfile.open(tgz_path)
housing_tgz.extractall(path=housing_path)
housing_tgz.close()

fetch_housing_data()

import pandas as pd

def load_housing_data(housing_path=HOUSING_PATH):
csv_path = os.path.join(housing_path, "housing.csv")
return pd.read_csv(csv_path)

housing = load_housing_data()
housing.head()

This helped me. The book implicitly indicates to call fetch_housing_data() in the text while your eyes follow what the code example is the book.

@ageron
Copy link
Owner

ageron commented Mar 10, 2021

Thanks for your feedback @keyntaur . I've added an explicit call to fetch_housing_data() in the code example, so there will be less confusion for future readers. 👍

@ageron ageron closed this as completed Mar 10, 2021
@MahsumKocabey
Copy link

Hi @FerCalatayud ,
Thanks for your question. I think you need to fix the definition of HOUSING_PATH. This is what is should be (see the notebook):

HOUSING_PATH = os.path.join("datasets", "housing")

You are probably using Windows, which separates file paths with \ (while Linux and MacOSX use /). To make the code portable, I used os.path.join("datasets", "housing"), which will return datasets\housing on Windows, and datasets/housing on Linux and MacOSX.

However, URLs use / no matter which system you are using. This is why the DOWNLOAD_ROOT and HOUSING_URL use /.

If it still fails after fixing this, make sure you are using Python 3, and that the file housing.csv is located in the datasets\housing directory in the same project as the Jupyter notebook.

Hope this helps.

The solution is:
Go to terminal press control+c to shut the kernel and write to the terminal:

$ sudo /Applications/Python\ 3.9/Install\ Certificates.command

Then re run the by writing "jupyter notebook". Thats it guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants