Standardize filenames to be similar to other datasets

All of the other archivers use a file naming pattern like: dataset-part1-part2.ext where: - dataset is the name of the dataset (e.g. ferc1, eia923, mshamines) - part1 is the value of the first partition dimension contained in the file (e.g. ca, ny, wy, for states, 2019 if it's a year) - part2 is the value of the second partition dimension contained in the file, if applicable. - ext is the file name extension indicating file type. I've changed the naming for phmsagas to follow the `dataset-*` part of the naming convention, but have left the filename in place rather than using only the dataset and start year, since if we just use the start year it looks like years are missing in the archive, as each file contains several years of data. The dataset and start_year partitions are still used in datapackage.json to refer to the individual resources for programmatic purposes. This came up when working on #79
catalyst-cooperative · Feb 28, 2023 · 019d5d7 · 019d5d7
1 parent f75ef87
commit 019d5d7
Showing 1 changed file with 4 additions and 4 deletions.
diff --git a/src/pudl_archiver/archivers/phmsagas.py b/src/pudl_archiver/archivers/phmsagas.py
@@ -66,18 +66,18 @@ async def get_zip_resource(
         For example: annual_underground_natural_gas_storage_2017_present.zip
         """
         url = f"https://www.phmsa.dot.gov/{link}"
-        file = str(match.group(1)).replace("-", "_")  # Get file name
+        filename = str(match.group(1)).replace("-", "_")  # Get file name
 
         # Set dataset partition
-        dataset = "_".join(file.lower().split("_")[0:-2])
+        dataset = "_".join(filename.lower().split("_")[0:-2])
 
         if dataset not in PHMSA_DATASETS:
             logger.warning(f"New dataset type found: {dataset}.")
 
         # Set start year
-        start_year = int(file.split("_")[-2])
+        start_year = int(filename.split("_")[-2])
 
-        download_path = self.download_directory / f"phmsagas_{file}.zip"
+        download_path = self.download_directory / f"{self.name}-{filename}.zip"
         await self.download_zipfile(url, download_path)
 
         return ResourceInfo(