Skip to content

Environmental impact

Candace Makeda Moore, MD edited this page Dec 19, 2021 · 2 revisions

This page will be updated as more information is discovered...however, let's take a look at the big picture:

Big data has big consequences for the environment depending upon how it is stored, retrieved, and processed. Big radiology datasets can easily reach 'big data' sizes. There is, however, reason to be cautiously optimistic about the potential environmental impact. Mainstream cloud data storage facilities are getting much greener in how they get the energy to cool the servers. Newer hardware is way more energy efficient than older hardware. And every now and then someone somewhere realizes it may be better to use existing servers more efficiently than buy more and more servers.

CleanX cleans up big datasets, however, the impact on how this might improve the efficiency of neural nets is something we are still measuring, and appears to be unique for each dataset. In theory, in most cases, cleanX might allow slightly smaller datasets, which if processed over time, would guzzle less energy. On the other hand, running cleanX also takes energy.

One issue we are actively looking into is the absolute total environmental impact of cleanX, from the impact of the chips we are running our personal computers on as we make it, to the impact every time a large dataset is processed. At present we believe the largest part of the impact is dependant upon WHERE the large dataset is stored and retrieved from. Stay tuned for more quantitative results in this area.

Update December 2021: a survey of current tools and literature revealed that codecarbon and energyusage Python packages are the most appropriate packages with which to monitor this specific code's energy uses, however, most energy use is probably related to data storage facility choices. Discussions with some contributers about the pros and cons of using relative run-times as a surrogate for relative energy consumption.

Clone this wiki locally