# Download Sensor.Community Data
This notebook is just a quick way to automatically download sensor.community monthly data from https://archive.sensor.community/csv_per_month/
Run it until it's downloaded enough data for you and then cancel it.
It sometimes throws an `IncompleteRead` exception which I haven't figured out how to fix yet, just rerun it.

In [2]:
!pip install tqdm bs4

Collecting bs4
  Downloading bs4-0.0.1.tar.gz (1.1 kB)
  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py) ... [?25ldone
[?25h  Created wheel for bs4: filename=bs4-0.0.1-py3-none-any.whl size=1256 sha256=05e963368a2382bdb790d7c11db996b94b28517f95d0e239d2bc7b2b35238580
  Stored in directory: /Users/mauf/Library/Caches/pip/wheels/25/42/45/b773edc52acb16cd2db4cf1a0b47117e2f69bb4eb300ed0e70
Successfully built bs4
Installing collected packages: bs4
Successfully installed bs4-0.0.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [4]:
from bs4 import BeautifulSoup
import dataclasses
from datetime import datetime
from tqdm.auto import tqdm
from copy import copy
from zipfile import ZipFile, is_zipfile
import re

import requests
import shutil
from pathlib import Path
import requests

session = requests.session()

@dataclasses.dataclass
class MonthPage:
    dt: datetime
    url: str

@dataclasses.dataclass
class MonthFile:
    dt: datetime
    sensor_type: str
    filetype: str
    url: str

def download_file_copy_file(session, url, local_filename):
    with session.get(url, stream=True) as r:
        size = r.headers.get("Content-Length", None)
        if size: size = int(size)
        print(f"Downloading {url} to {local_filename} size: {size/1e6:.2f}GB")
        with open(local_filename, 'wb') as f:
            shutil.copyfileobj(r.raw, f)

    return local_filename


def download_file_stream(session, url, local_filename):
    with session.get(url, stream=True) as r:
        r.raise_for_status()
        size = r.headers.get("Content-Length", None)
        if size: size = int(size)
        print(f"Downloading {url} to {local_filename} size: {size/1e6:.2f}MB")

        pbar = tqdm(total = size, unit = "B", unit_scale = True)
        with open(local_filename, 'wb') as f:
            for chunk in r.iter_content(chunk_size=100_000):
                f.write(chunk)
                pbar.update(len(chunk))
    return local_filename

data_dir = Path("./data/").expanduser()
base = "https://archive.sensor.community/csv_per_month/"
soup = BeautifulSoup(session.get(base).content, "lxml")
links = soup.find_all("a", dict(href=re.compile(r"^\d\d\d\d-\d\d/$")))
months = [
            MonthPage(url=m["href"], dt=datetime.strptime(m["href"][:-1], "%Y-%m"))
            for m in links
        ]

file_regex = re.compile(r"^(\d\d\d\d-\d\d)_([^\.]+)\.(.+)$")
for month in months[::-1]:
    # print(month, base + month.url)
    soup = BeautifulSoup(session.get(base + month.url).content, "lxml")
    links = soup.find_all("a", dict(href=file_regex))
    for link in links:
        date, sensor_type, filetype = file_regex.match(link["href"]).groups()
        month_file = MonthFile(date, sensor_type, filetype, base + month.url + link["href"])

        p = data_dir / f"inputs/sensor_community/{month_file.dt}/{month_file.dt}_{sensor_type}.{filetype}"
        p.parent.mkdir(exist_ok = True, parents = True)
        if not p.exists() or not is_zipfile(p): 
            download_file_stream(session, month_file.url, p)

        if p.suffix == ".zip":
            unzipped_filename = p.parent / f"{p.stem}.csv"
            if not unzipped_filename.exists():
                print(f"Unzipping data")
                with ZipFile(p) as zip:
                    zip.extractall(path=p.parent)
        
print("Done!")

Downloading https://archive.sensor.community/csv_per_month/2023-08/2023-08_bme280.zip to data/inputs/sensor_community/2023-08/2023-08_bme280.zip size: 2332.02MB


  0%|          | 0.00/2.33G [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-08/2023-08_bmp180.zip to data/inputs/sensor_community/2023-08/2023-08_bmp180.zip size: 25.67MB


  0%|          | 0.00/25.7M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-08/2023-08_bmp280.zip to data/inputs/sensor_community/2023-08/2023-08_bmp280.zip size: 52.94MB


  0%|          | 0.00/52.9M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-08/2023-08_dht22.zip to data/inputs/sensor_community/2023-08/2023-08_dht22.zip size: 1651.87MB


  0%|          | 0.00/1.65G [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-08/2023-08_ds18b20.zip to data/inputs/sensor_community/2023-08/2023-08_ds18b20.zip size: 3.26MB


  0%|          | 0.00/3.26M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-08/2023-08_hpm.zip to data/inputs/sensor_community/2023-08/2023-08_hpm.zip size: 0.67MB


  0%|          | 0.00/670k [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-08/2023-08_htu21d.zip to data/inputs/sensor_community/2023-08/2023-08_htu21d.zip size: 18.84MB


  0%|          | 0.00/18.8M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-08/2023-08_pms1003.zip to data/inputs/sensor_community/2023-08/2023-08_pms1003.zip size: 0.84MB


  0%|          | 0.00/836k [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-08/2023-08_pms3003.zip to data/inputs/sensor_community/2023-08/2023-08_pms3003.zip size: 0.81MB


  0%|          | 0.00/811k [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-08/2023-08_pms5003.zip to data/inputs/sensor_community/2023-08/2023-08_pms5003.zip size: 38.03MB


  0%|          | 0.00/38.0M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-08/2023-08_pms6003.zip to data/inputs/sensor_community/2023-08/2023-08_pms6003.zip size: 0.10MB


  0%|          | 0.00/101k [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-08/2023-08_pms7003.zip to data/inputs/sensor_community/2023-08/2023-08_pms7003.zip size: 30.02MB


  0%|          | 0.00/30.0M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-08/2023-08_ppd42ns.zip to data/inputs/sensor_community/2023-08/2023-08_ppd42ns.zip size: 1.61MB


  0%|          | 0.00/1.61M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-08/2023-08_sds011.zip to data/inputs/sensor_community/2023-08/2023-08_sds011.zip size: 4357.02MB


  0%|          | 0.00/4.36G [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-07/2023-07_bme280.zip to data/inputs/sensor_community/2023-07/2023-07_bme280.zip size: 2270.40MB


  0%|          | 0.00/2.27G [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-07/2023-07_bmp180.zip to data/inputs/sensor_community/2023-07/2023-07_bmp180.zip size: 24.52MB


  0%|          | 0.00/24.5M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-07/2023-07_bmp280.zip to data/inputs/sensor_community/2023-07/2023-07_bmp280.zip size: 51.95MB


  0%|          | 0.00/52.0M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-07/2023-07_dht22.zip to data/inputs/sensor_community/2023-07/2023-07_dht22.zip size: 1616.91MB


  0%|          | 0.00/1.62G [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-07/2023-07_ds18b20.zip to data/inputs/sensor_community/2023-07/2023-07_ds18b20.zip size: 3.05MB


  0%|          | 0.00/3.05M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-07/2023-07_hpm.zip to data/inputs/sensor_community/2023-07/2023-07_hpm.zip size: 0.66MB


  0%|          | 0.00/664k [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-07/2023-07_htu21d.zip to data/inputs/sensor_community/2023-07/2023-07_htu21d.zip size: 19.50MB


  0%|          | 0.00/19.5M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-07/2023-07_pms1003.zip to data/inputs/sensor_community/2023-07/2023-07_pms1003.zip size: 0.72MB


  0%|          | 0.00/719k [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-07/2023-07_pms3003.zip to data/inputs/sensor_community/2023-07/2023-07_pms3003.zip size: 0.75MB


  0%|          | 0.00/749k [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-07/2023-07_pms5003.zip to data/inputs/sensor_community/2023-07/2023-07_pms5003.zip size: 38.15MB


  0%|          | 0.00/38.2M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-07/2023-07_pms6003.zip to data/inputs/sensor_community/2023-07/2023-07_pms6003.zip size: 0.09MB


  0%|          | 0.00/94.0k [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-07/2023-07_pms7003.zip to data/inputs/sensor_community/2023-07/2023-07_pms7003.zip size: 29.41MB


  0%|          | 0.00/29.4M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-07/2023-07_ppd42ns.zip to data/inputs/sensor_community/2023-07/2023-07_ppd42ns.zip size: 1.44MB


  0%|          | 0.00/1.44M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-07/2023-07_sds011.zip to data/inputs/sensor_community/2023-07/2023-07_sds011.zip size: 4177.06MB


  0%|          | 0.00/4.18G [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-06/2023-06_bme280.zip to data/inputs/sensor_community/2023-06/2023-06_bme280.zip size: 2330.89MB


  0%|          | 0.00/2.33G [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-06/2023-06_bmp180.zip to data/inputs/sensor_community/2023-06/2023-06_bmp180.zip size: 25.81MB


  0%|          | 0.00/25.8M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-06/2023-06_bmp280.zip to data/inputs/sensor_community/2023-06/2023-06_bmp280.zip size: 53.47MB


  0%|          | 0.00/53.5M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-06/2023-06_dht22.zip to data/inputs/sensor_community/2023-06/2023-06_dht22.zip size: 1697.64MB


  0%|          | 0.00/1.70G [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-06/2023-06_ds18b20.zip to data/inputs/sensor_community/2023-06/2023-06_ds18b20.zip size: 3.08MB


  0%|          | 0.00/3.08M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-06/2023-06_hpm.zip to data/inputs/sensor_community/2023-06/2023-06_hpm.zip size: 0.71MB


  0%|          | 0.00/713k [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-06/2023-06_htu21d.zip to data/inputs/sensor_community/2023-06/2023-06_htu21d.zip size: 21.06MB


  0%|          | 0.00/21.1M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-06/2023-06_pms1003.zip to data/inputs/sensor_community/2023-06/2023-06_pms1003.zip size: 0.94MB


  0%|          | 0.00/943k [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-06/2023-06_pms3003.zip to data/inputs/sensor_community/2023-06/2023-06_pms3003.zip size: 0.91MB


  0%|          | 0.00/911k [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-06/2023-06_pms5003.zip to data/inputs/sensor_community/2023-06/2023-06_pms5003.zip size: 40.61MB


  0%|          | 0.00/40.6M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-06/2023-06_pms6003.zip to data/inputs/sensor_community/2023-06/2023-06_pms6003.zip size: 0.10MB


  0%|          | 0.00/96.6k [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-06/2023-06_pms7003.zip to data/inputs/sensor_community/2023-06/2023-06_pms7003.zip size: 32.11MB


  0%|          | 0.00/32.1M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-06/2023-06_ppd42ns.zip to data/inputs/sensor_community/2023-06/2023-06_ppd42ns.zip size: 1.47MB


  0%|          | 0.00/1.47M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-06/2023-06_sds011.zip to data/inputs/sensor_community/2023-06/2023-06_sds011.zip size: 4323.90MB


  0%|          | 0.00/4.32G [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-05/2023-05_bme280.zip to data/inputs/sensor_community/2023-05/2023-05_bme280.zip size: 2412.53MB


  0%|          | 0.00/2.41G [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-05/2023-05_bmp180.zip to data/inputs/sensor_community/2023-05/2023-05_bmp180.zip size: 27.37MB


  0%|          | 0.00/27.4M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-05/2023-05_bmp280.zip to data/inputs/sensor_community/2023-05/2023-05_bmp280.zip size: 56.33MB


  0%|          | 0.00/56.3M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-05/2023-05_dht22.zip to data/inputs/sensor_community/2023-05/2023-05_dht22.zip size: 1757.08MB


  0%|          | 0.00/1.76G [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-05/2023-05_ds18b20.zip to data/inputs/sensor_community/2023-05/2023-05_ds18b20.zip size: 3.43MB


  0%|          | 0.00/3.43M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-05/2023-05_hpm.zip to data/inputs/sensor_community/2023-05/2023-05_hpm.zip size: 0.82MB


  0%|          | 0.00/823k [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-05/2023-05_htu21d.zip to data/inputs/sensor_community/2023-05/2023-05_htu21d.zip size: 21.70MB


  0%|          | 0.00/21.7M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-05/2023-05_pms1003.zip to data/inputs/sensor_community/2023-05/2023-05_pms1003.zip size: 1.10MB


  0%|          | 0.00/1.10M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-05/2023-05_pms3003.zip to data/inputs/sensor_community/2023-05/2023-05_pms3003.zip size: 0.99MB


  0%|          | 0.00/994k [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-05/2023-05_pms5003.zip to data/inputs/sensor_community/2023-05/2023-05_pms5003.zip size: 41.99MB


  0%|          | 0.00/42.0M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-05/2023-05_pms6003.zip to data/inputs/sensor_community/2023-05/2023-05_pms6003.zip size: 0.04MB


  0%|          | 0.00/35.3k [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-05/2023-05_pms7003.zip to data/inputs/sensor_community/2023-05/2023-05_pms7003.zip size: 33.26MB


  0%|          | 0.00/33.3M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-05/2023-05_ppd42ns.zip to data/inputs/sensor_community/2023-05/2023-05_ppd42ns.zip size: 1.40MB


  0%|          | 0.00/1.40M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-05/2023-05_sds011.zip to data/inputs/sensor_community/2023-05/2023-05_sds011.zip size: 4540.87MB


  0%|          | 0.00/4.54G [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-04/2023-04_bme280.zip to data/inputs/sensor_community/2023-04/2023-04_bme280.zip size: 2275.42MB


  0%|          | 0.00/2.28G [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-04/2023-04_bmp180.zip to data/inputs/sensor_community/2023-04/2023-04_bmp180.zip size: 25.63MB


  0%|          | 0.00/25.6M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-04/2023-04_bmp280.zip to data/inputs/sensor_community/2023-04/2023-04_bmp280.zip size: 52.90MB


  0%|          | 0.00/52.9M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-04/2023-04_dht22.zip to data/inputs/sensor_community/2023-04/2023-04_dht22.zip size: 1677.28MB


  0%|          | 0.00/1.68G [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-04/2023-04_ds18b20.zip to data/inputs/sensor_community/2023-04/2023-04_ds18b20.zip size: 3.33MB


  0%|          | 0.00/3.33M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-04/2023-04_hpm.zip to data/inputs/sensor_community/2023-04/2023-04_hpm.zip size: 0.75MB


  0%|          | 0.00/746k [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-04/2023-04_htu21d.zip to data/inputs/sensor_community/2023-04/2023-04_htu21d.zip size: 20.22MB


  0%|          | 0.00/20.2M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-04/2023-04_pms1003.zip to data/inputs/sensor_community/2023-04/2023-04_pms1003.zip size: 1.23MB


  0%|          | 0.00/1.23M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-04/2023-04_pms3003.zip to data/inputs/sensor_community/2023-04/2023-04_pms3003.zip size: 0.96MB


  0%|          | 0.00/965k [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-04/2023-04_pms5003.zip to data/inputs/sensor_community/2023-04/2023-04_pms5003.zip size: 37.58MB


  0%|          | 0.00/37.6M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-04/2023-04_pms6003.zip to data/inputs/sensor_community/2023-04/2023-04_pms6003.zip size: 0.10MB


  0%|          | 0.00/102k [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-04/2023-04_pms7003.zip to data/inputs/sensor_community/2023-04/2023-04_pms7003.zip size: 31.26MB


  0%|          | 0.00/31.3M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-04/2023-04_ppd42ns.zip to data/inputs/sensor_community/2023-04/2023-04_ppd42ns.zip size: 1.33MB


  0%|          | 0.00/1.33M [00:00<?, ?B/s]

Unzipping data
Downloading https://archive.sensor.community/csv_per_month/2023-04/2023-04_sds011.zip to data/inputs/sensor_community/2023-04/2023-04_sds011.zip size: 4316.62MB


  0%|          | 0.00/4.32G [00:00<?, ?B/s]

KeyboardInterrupt: 