# Boat Rental Data Retriever

In this notebook, we have implemented a class named `DataRetriever` that facilitates data retrieval from the "clickandboat" website for various cities in France. Here is a breakdown of the components of this class and how to use it:

### Class Structure

1. **VALID_CITIES**: A list of valid cities for which data can be retrieved.
2. **__init__()**: The initializer method where we set the city for which data needs to be retrieved, the start and end page numbers for data retrieval.
3. **fetch_data()**: A method to fetch data from the website through API calls. It fetches data page by page and adds it to a data list.
4. **to_dataframe()**: A method to convert the data list to a pandas DataFrame.

In [7]:
import requests
import pandas as pd
from time import sleep
from tqdm import tqdm
import os

In [8]:
class DataRetriever:
    """
    A class to retrieve boat rental data from the "clickandboat" website for various cities in France.
    
    Attributes:
        VALID_CITIES (list): List of valid cities for data retrieval.
    """
    
    VALID_CITIES = ["marseille", "cassis", "hyères", "cannes", "corse", "la-rochelle"]

    def __init__(self, city, start_page=1, end_page=1):
        """
        The constructor for DataRetriever class.
        
        Parameters:
            city (str): The city for which to retrieve data. Must be one of the valid cities listed in VALID_CITIES.
            start_page (int, optional): The starting page number for data retrieval. Defaults to 1.
            end_page (int, optional): The ending page number for data retrieval. Defaults to 30.
        
        Raises:
            ValueError: If the provided city is not in the list of VALID_CITIES.
        """
        if city not in self.VALID_CITIES:
            raise ValueError(f"Invalid city: {city}. Valid options are: {', '.join(self.VALID_CITIES)}")
        
        self.data_list = []
        self.base_url = f"https://www.clickandboat.com/api/v3/search?url=/location-bateau/france/{city}&page={{}}&sentFrom=SSR"
        self.start_page = start_page
        self.end_page = end_page

    def fetch_data(self):
        """
        Retrieves data from the API page by page and adds it to a data list.
        
        Raises:
            Exception: If any error occurs during the data retrieval process.
        """
        try:
            for page in tqdm(range(self.start_page, self.end_page)): 
                url = self.base_url.format(page)
                response = requests.get(url)

                if response.status_code == 200:
                    data = response.json()
                    products = data['data']['products']
                    self.data_list.extend(products)
                else:
                    print(f"Failed to retrieve data for page {page}. Status code:", response.status_code)
                
                sleep(3)
        except Exception as e:
            print(f"An error occurred: {e}")

    def to_dataframe(self):
        """
        Converts the data list to a pandas DataFrame.
        
        Returns:
            pd.DataFrame: A pandas DataFrame containing the retrieved data.
        """
        return pd.DataFrame(self.data_list)


In [9]:
if __name__ == "__main__":
    city = "marseille" # You can replace this with any other city from the list
    start_page, end_page = 1, 30
    retriever = DataRetriever(city, start_page, end_page)
    retriever.fetch_data()
    df = retriever.to_dataframe()
    
    # Create the directory if it does not exist
    os.makedirs('../data/raw', exist_ok=True)
    
    # Save the dataframe to a CSV file in the data/raw directory
    df.to_csv(f'../data/raw/{city}_data.csv', index=False)

  0%|          | 0/29 [00:00<?, ?it/s]

100%|██████████| 29/29 [01:58<00:00,  4.10s/it]


### Save the dataframe to a CSV file in the data/raw directory

In [10]:
os.makedirs('../data/raw', exist_ok=True)

df.to_csv(f'../data/raw/{city}_data.csv', index=False)