## Energy - PV Systems Satellite Machine Vision

### [Global inventory of solar photovoltaic generating units](https://www.nature.com/articles/s41586-021-03957-7)

* We mapped every large solar plant on the planet using satellites and machine learning - [EconoTimes](https://www.econotimes.com/We-mapped-every-large-solar-plant-on-the-planet-using-satellites-and-machine-learning-1620330)


* Nature [dataset](https://zenodo.org/record/5005868)

#### CV polygons and tiles

* ***cv_tiles.geojson:*** 560 rectangular areas-of-interest used for sampling cross-validation data seeded from [WRI GPPDB](https://www.wri.org/research/global-database-power-plants)

* ***cv_polygons.geojson:*** 6,281 polygons corresponding to all PV solar generating units present in cv_tiles.geojson at the end of 2018.

#### Test tiles and polygons

* ***test_tiles.geojson:*** 122 rectangular regions-of-interest used for building the test set.

* ***test_polygons.geojson:*** 7,263 polygons corresponding to all utility-scale (>10kW) solar generating units present in test_tiles.geojson at the end of 2018.

#### TRN tiles and polygons

* ***trn_tiles.geojson:*** 18,570 rectangular areas-of-interest used for sampling training patch data.

* ***trn_polygons.geojson:*** 36,882 polygons obtained from OSM in 2017 used to label training patches.

#### Predicted

* ***predicted_polygons.geojson:*** 68,661 polygons corresponding to predicted polygons in global deployment, capturing the status of deployed photovoltaic solar energy generating capacity at the end of 2018.

In [None]:
def get_dataset ( 
    url       =  'https://zenodo.org/record/5005868/files/{dataset}?download=1' ,
    data_dir  =  '/data/energy/global/nature_dataset/'
) :

    import  geopandas  as  gpd

    data = dict ( )

    files = [ 'cv' , 'test' , 'trn' ]

    for file in files:

        for pORt in [ 'polygons' , 'tiles' ] :

            dataset = file + '_' + pORt + '.geojson'
            print ( dataset )
            
            try :
                data [ dataset ] = gpd . read_file ( url . format ( dataset = dataset ) )
            except:
                print ( 'url not found' , dataset )
                
            data [ dataset ] . to_file ( data_dir + dataset )
            
    dataset = 'predicted_set.geojson'
    data [ dataset ] = gpd . read_file ( url . format ( dataset = dataset ) )
    data [ dataset ] . to_file ( data_dir + dataset )
    
    return data

In [None]:
nature_datasets = get_dataset ( )

In [None]:
import  geopandas  as  gpd
import  matplotlib.pyplot as plt

%matplotlib widget

## 275M file, execution time about 1 minute 
predicted     =  gpd . read_file ( 'https://zenodo.org/record/5005868/files/predicted_set.geojson?download=1' )
USpredicted   =  predicted [ predicted [ 'iso-3166-1' ] == 'US' ] . reset_index ( drop = True ) . copy ( )
MApredicted   =  predicted [ predicted [ 'iso-3166-2' ] == 'US-MA' ] . reset_index ( drop = True ) . copy ( )
# MApredicted . to_file ( '/data/energy/global/nature_dataset/MApredicted.geojson' )

print ( 
    "Predicted PV installs {predicted} with {USpredicted} in the US"  . format ( 
        predicted   =  len(predicted) , 
        USpredicted =  len(USpredicted)
    )
)

## get US state boundaries from US Census 
states = gpd . read_file ( 'https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_state_5m.zip' ) . to_crs ( "EPSG:4326" ) 

fig, ax = plt . subplots ( ) 

ax1  =  USpredicted . geometry . boundary . plot ( color = 'blue' , ax = ax )
ax2  =  states . geometry . boundary . plot ( color = 'lightgray' , linewidth = 1 , ax = ax )

ax . set_axis_off ( )

#plt . title ( 'US Commercial Solar Arrays\n2018 Satellite Survey' , fontsize = 14 )
plt . show ( )