# ExtractingVegetaion Index -VI form Raster data for Italy

- The raster data model is a widely used method of storing geographic data.

- The model most commonly takes the form of a grid-like structure that holds values at regularly spaced intervals over the extent of the raster

- Cmparwd to vector models raster model represents location as `cells`, also in a `Cartesian coordinate system` whereas vector model represents feature shape accurately.

- The raster model represents rectangular areas and thus is more generalized and less accurate

- Raster GISs have superior **analytical power** to vector GISs

**1. Importing modules**

In [5]:
# import Google earth engine module
import ee

#Authenticate the Google earth engine with google account
# First firt setup only, no need to run this after first run


#ee.Authenticate()

# for normal/regular use for authorization
# this is required regularly
ee.Initialize()

In [6]:
#Pandas modules to interact data
import numpy as np
import pandas as pd

**2. Define NDVI and EVI for Sentinel-2 image**

- while computing **NDIV** is straightforward, defining expression for **EVI** allow to consider correction factors 

In [7]:
# compute NDVI from NIR and red band in sentinel -2 image
# For other satellite image, please change the band information accordingly

def getNDVI(image):
    
    # Normalized difference vegetation index (NDVI)
    ndvi = image.normalizedDifference(['B8','B4']).rename("NDVI")
    image = image.addBands(ndvi)

    return(image)

# compute EVI from NIR and red band in sentinel -2 image
# For other satellite image, please change the band information accordingly

def getEVI(image):
   # Compute the EVI using an expression.
    EVI = image.expression(
        '2.5 * ((NIR - RED) / (NIR + 6 * RED - 7.5 * BLUE + 1))', {
            'NIR': image.select('B8').divide(10000),
            'RED': image.select('B4').divide(10000),
            'BLUE': image.select('B2').divide(10000)
        }).rename("EVI")

    image = image.addBands(EVI)

    return(image)

**3 Date parameter for image**

- It is most important part for time series study.

- Followed date format is `YYYYMMdd`

In [8]:
#add date foramte
def addDate(image):
    img_date = ee.Date(image.date())
    img_date = ee.Number.parse(img_date.format('YYYYMMdd'))
    return image.addBands(ee.Image(img_date).rename('date').toInt())

**4.Filter the imagery to desired timeframe and other parameters**

- I have limited the filters for sample only.


In [9]:
Sentinel_data = ee.ImageCollection('COPERNICUS/S2') \
    .filterDate("2020-01-01","2021-08-31") \
    .map(getNDVI).map(getEVI).map(addDate)

**5. Import location of Cities of Italy.**

Location of Cities of Italy collected [here](https://simplemaps.com/data/it-cities) and cleaned for NDVI and EVI 

In [10]:
import pandas as pd
df_itally = pd.read_excel('C:\\Users\\user\\OneDrive\\Desktop\\Omdena_Milna\\Italy_Cities.xlsx')
df_itally.head()

Unnamed: 0,city,lat,lng,country,iso2,admin_name,capital,population,population_proper
0,Rome,41.8931,12.4828,Italy,IT,Lazio,primary,2872800,2872800
1,Milan,45.4669,9.19,Italy,IT,Lombardy,admin,1366180,1366180
2,Naples,40.8333,14.25,Italy,IT,Campania,admin,966144,966144
3,Turin,45.0667,7.7,Italy,IT,Piedmont,admin,870952,870952
4,Palermo,38.1157,13.3613,Italy,IT,Sicilia,admin,668405,668405


In [11]:
df_itally.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3373 entries, 0 to 3372
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   city               3373 non-null   object 
 1   lat                3373 non-null   float64
 2   lng                3373 non-null   float64
 3   country            3373 non-null   object 
 4   iso2               3373 non-null   object 
 5   admin_name         3373 non-null   object 
 6   capital            115 non-null    object 
 7   population         3373 non-null   int64  
 8   population_proper  3373 non-null   int64  
dtypes: float64(2), int64(2), object(5)
memory usage: 237.3+ KB


- Only, **City**, **lat** and **lng** are essentisl to our analysis, hence we drop others

In [12]:
italy_df =df_itally.drop(columns=['country','iso2','admin_name','capital','admin_name','population','population_proper'],axis=1)
italy_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3373 entries, 0 to 3372
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   city    3373 non-null   object 
 1   lat     3373 non-null   float64
 2   lng     3373 non-null   float64
dtypes: float64(2), object(1)
memory usage: 79.2+ KB


**6.Convert the pandas dataframe into Google Earth Engine (GEE) Feature Collection**

- This is to constrcu  a point geometry for each City based on thier  coordinates

- another important issue is `proprty of the point constructed` that can be invoked as dictionary

In [13]:
features=[]
for index, row in italy_df.iterrows():
    poi_geometry = ee.Geometry.Point([row['lat'], row['lng']])
#     print(poi_geometry)
#     construct the attributes (properties) for each point 
    poi_properties = dict(row)
#     construct feature combining geometry and properties
    poi_feature = ee.Feature(poi_geometry, poi_properties)
    features.append(poi_feature)

    # final Feature collection assembly
ee_fc = ee.FeatureCollection(features) 
ee_fc.getInfo()

{'type': 'FeatureCollection',
 'columns': {'city': 'String',
  'lat': 'Number',
  'lng': 'Number',
  'system:index': 'String'},
 'features': [{'type': 'Feature',
   'geometry': {'type': 'Point', 'coordinates': [41.8931, 12.4828]},
   'id': '0',
   'properties': {'city': 'Rome', 'lat': 41.8931, 'lng': 12.4828}},
  {'type': 'Feature',
   'geometry': {'type': 'Point', 'coordinates': [45.4669, 9.19]},
   'id': '1',
   'properties': {'city': 'Milan', 'lat': 45.4669, 'lng': 9.19}},
  {'type': 'Feature',
   'geometry': {'type': 'Point', 'coordinates': [40.8333, 14.25]},
   'id': '2',
   'properties': {'city': 'Naples', 'lat': 40.8333, 'lng': 14.25}},
  {'type': 'Feature',
   'geometry': {'type': 'Point', 'coordinates': [45.0667, 7.7]},
   'id': '3',
   'properties': {'city': 'Turin', 'lat': 45.0667, 'lng': 7.7}},
  {'type': 'Feature',
   'geometry': {'type': 'Point', 'coordinates': [38.1157, 13.3613]},
   'id': '4',
   'properties': {'city': 'Palermo', 'lat': 38.1157, 'lng': 13.3613}},
  {'ty

**7. Extract the raster values for each features**

We will use sampleRegions function from ee.image and apply to image collection

In [14]:
def rasterExtraction(image):
    feature = image.sampleRegions(
        collection = ee_fc, # feature collection here
        scale = 10 # Cell size of raster
    )
    return feature

**8. Apply raster extraction functions over image collection**

- `sampleRegions` returns feature collection with image values, then we have collection of feature collection which is then flattened to obtain final feature collection. 

- Finally feature collection is converted to CSV.  


In [15]:
results = Sentinel_data.filterBounds(ee_fc).select('NDVI', 'EVI').map(addDate).map(rasterExtraction).flatten()

**9.Verify that output is as per your requirments**

In [16]:
sample_result = results.first().getInfo()
sample_result

{'type': 'Feature',
 'geometry': None,
 'id': '20210101T075331_20210101T075325_T37PBP_1461_0',
 'properties': {'EVI': 0.16712218289349878,
  'NDVI': 0.2624911963939667,
  'city': 'Pantelleria',
  'date': 20210101,
  'lat': 36.7875,
  'lng': 11.9925}}

**10. Now we have extracted the raster values, We need to convert feature collection to CSV format.**

We can acheive this in multiple ways, I will illustrate in 3 ways for extraction. 

In [17]:
# extract the properties column from feature collection
# column order may not be as our sample data order
columns = list(sample_result['properties'].keys())
print(columns)


# Order data column as per sample data
# You can modify this for better optimization
column_df = list(italy_df.columns)
column_df.extend(['NDVI', 'EVI', 'date'])
print(column_df)

['EVI', 'NDVI', 'city', 'date', 'lat', 'lng']
['city', 'lat', 'lng', 'NDVI', 'EVI', 'date']


**11.Method 1: Feature collection to pandas dataframe**

In [None]:
nested_list = results.reduceColumns(ee.Reducer.toList(len(column_df)), column_df).values().get(0)
data = nested_list.getInfo()
data

# dont forget we need to call the callback method "getInfo" to retrieve the data
df = pd.DataFrame(data, columns=column_df)
# we obtain the data frame as per our demand
df

**Data visualization of results:**

In [None]:
# import libraries
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# plot line plot for single point only.
df_filtered = df[df['id']==1]
df_filtered['date'] = pd.to_datetime(df_filtered['date'], format='%Y%m%d')
df_filtered

In [None]:
# Using plotly.express
import plotly.express as px

In [None]:
df = px.data.stocks()
fig = px.line(df_filtered, x='date', y="EVI")
fig.show()df = px.data.stocks()

In [None]:
fig = px.line(df_filtered, x='date', y="NDVI")
fig.show()
