Skip to content

bitcwb/ClimateSpark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ClimateSpark

Overreview

ClimateSpark is a Spark-based distributed computing framework to support big climate data management and analytics. It has the following capabilities:

  1. Natively support HDF4 and NetCDF4 datasets stored in HDFS
  2. Spatiotemporal query of multi-dimensional array-based climate data with high efficiency, e.g. high data locality, no redundant data reading
  3. ClimateRDD: a multi-dimensional array-based data model for Spark to organize climate data
  4. Basic climate data analytics

Tutorial

https://docs.google.com/document/d/1JIMLhNzXA_Ay-0P6yxzGfvUhajMLfJFyyduillzpeSI/edit

  • Extract the metadata: hadoop jar sia-core/target/sia-core-0.1.0.jar properties/sia_merra2_preprocessor.properties
  • Build index: hadoop jar sia-indexer/target/sia-indexer-0.1.0.jar properties/sia_merra2_indexer.properties

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 96.3%
  • Scala 3.7%