# Scraping Waterlevel data from Flood forecast Stations

The Central Water Commission (CWC) collects and maintains water level data from Flood forecast stations. This data is made available by the CWC here - [Link](https://ffs.india-water.gov.in/). This notebook scrapes data from flood forecast stations in Assam from this website.

In [3]:
master_url='https://ffs.india-water.gov.in/iam/api/layer-station//all/0035-CDBNG,029CDBNG,009%20%20%20%20%20-srdcbe,011-UBDDIB,015-LGDHYD,009-CDBNG,028-LGDHYD,027-LGDHYD,017-LGDHYD,049-CDBNG,031-MGD1LKN,023-mgd4ptn,006-UKDPUNE,013-lkdhyd,011-SRDCBE,022-CDBNG,014-lgdhyd,006-MGD1LKN,016-LGDHYD,013-lgdhyd,027-cdbng,0044-CDBNG,011-mgd4ptn,CW1WAR000056,012-TDSURAT,002-mgd2lkn,007-MGD1LKN,004-CDBNG,037-LKDHYD,010-MGD1LKn,039-TDSURAT,020-mgd4ptn,013-SRDCBE,003-SWRDKOCHI'
print(master_url)

https://ffs.india-water.gov.in/iam/api/layer-station//all/0035-CDBNG,029CDBNG,009%20%20%20%20%20-srdcbe,011-UBDDIB,015-LGDHYD,009-CDBNG,028-LGDHYD,027-LGDHYD,017-LGDHYD,049-CDBNG,031-MGD1LKN,023-mgd4ptn,006-UKDPUNE,013-lkdhyd,011-SRDCBE,022-CDBNG,014-lgdhyd,006-MGD1LKN,016-LGDHYD,013-lgdhyd,027-cdbng,0044-CDBNG,011-mgd4ptn,CW1WAR000056,012-TDSURAT,002-mgd2lkn,007-MGD1LKN,004-CDBNG,037-LKDHYD,010-MGD1LKn,039-TDSURAT,020-mgd4ptn,013-SRDCBE,003-SWRDKOCHI


In [1]:
station_more_details_url = 'https://ffs.india-water.gov.in/iam/api/flood-forecast-static/specification/?specification=%7B%22where%22:%7B%22expression%22:%7B%22valueIsRelationField%22:false,%22fieldName%22:%22type%22,%22operator%22:%22eq%22,%22value%22:%22Level%22%7D%7D,%22or%22:%7B%22expression%22:%7B%22valueIsRelationField%22:false,%22fieldName%22:%22type%22,%22operator%22:%22eq%22,%22value%22:%22Inflow%22%7D%7D%7D'
print(station_more_details_url)

https://ffs.india-water.gov.in/iam/api/flood-forecast-static/specification/?specification=%7B%22where%22:%7B%22expression%22:%7B%22valueIsRelationField%22:false,%22fieldName%22:%22type%22,%22operator%22:%22eq%22,%22value%22:%22Level%22%7D%7D,%22or%22:%7B%22expression%22:%7B%22valueIsRelationField%22:false,%22fieldName%22:%22type%22,%22operator%22:%22eq%22,%22value%22:%22Inflow%22%7D%7D%7D


In [7]:

# When you have the station id.
station_address_details_url = 'https://ffs.india-water.gov.in/iam/api/layer-station/029-LBDJPG'

In [3]:
import requests
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point # Shapely for converting latitude/longtitude to geometry

In [3]:
#URL containing FFS Station locations
station_location_url = 'https://ffs.india-water.gov.in/iam/api/layer-station-geo/specification/?specification=%7B%22where%22:%7B%22expression%22:%7B%22valueIsRelationField%22:false,%22fieldName%22:%22layerStationStationCode.floodForecastStaticStationCode.type%22,%22operator%22:%22eq%22,%22value%22:%22Level%22%7D%7D,%22or%22:%7B%22expression%22:%7B%22valueIsRelationField%22:false,%22fieldName%22:%22layerStationStationCode.floodForecastStaticStationCode.type%22,%22operator%22:%22eq%22,%22value%22:%22Inflow%22%7D%7D%7D'

r = requests.get(station_location_url,verify=False)
txt = r.json()
station_locations_df = pd.DataFrame(txt)
station_locations_df = station_locations_df[['name','stationCode', 'lat', 'lon']]
station_locations_df.tail()



Unnamed: 0,name,stationCode,lat,lon
1516,LOHARA,036-TDSURAT,20.816389,76.763889
1517,Sitarganj,061-mgd2lkn,29.041667,79.766389
1518,KURANKHEDA,038-TDSURAT,20.703056,77.242778
1519,MAHUWA,034-TDSURAT,21.014444,73.140278
1520,GADAT,024-TDSURAT,20.856111,72.984722


In [4]:
station_locations_df.to_csv('/home/krishna/IDS-DRR-Data-Pipeline/Sources/FFS/data/station_coordinates.csv',
                            index=False)

In [4]:
#URL containing FFS Station details

station_details_url = 'https://ffs.india-water.gov.in/iam/api/layer-station/specification/?specification=%7B%22where%22:%7B%22expression%22:%7B%22valueIsRelationField%22:false,%22fieldName%22:%22floodForecastStaticStationCode.type%22,%22operator%22:%22eq%22,%22value%22:%22Level%22%7D%7D,%22or%22:%7B%22expression%22:%7B%22valueIsRelationField%22:false,%22fieldName%22:%22floodForecastStaticStationCode.type%22,%22operator%22:%22eq%22,%22value%22:%22Inflow%22%7D%7D%7D'

r = requests.get(station_details_url,verify=False)
txt = r.json()
station_details_df = pd.DataFrame(txt)
station_details_df.head()



Unnamed: 0,@class,stationCode,accessibility,address,airport,altitude,bank,busStand,catchmentArea,combinedWith,...,stationTypeId,streamLocalriverId,subdivisionalOfficeId,tahsilId,villageId,wellPurposeExplorationStationCode,wellPurposeId,wellSubtypeId,wellTypeId,wellUseId
0,com.eptisa.dto.SimpleLayerStationDto,038-CDJAPR,,,,,,,,,...,1,146.0,144,1081191004,,,,,,
1,com.eptisa.dto.SimpleLayerStationDto,018-UGDHYD,,,,500.0,,,16907.0,,...,1,226.0,61,1361261001,,,,,,
2,com.eptisa.dto.SimpleLayerStationDto,041-CDJAPR,,,,,,,,,...,1,1796.0,143,1081221015,,,,,,
3,com.eptisa.dto.SimpleLayerStationDto,018MAHGAND,,,,158.5,Right,,5475.0,,...,1,380.0,287,1241041004,,,,,,
4,com.eptisa.dto.SimpleLayerStationDto,017-WGDNGP,,,,171.28,Left,,46020.0,,...,1,243.0,119,1271131003,,,,,,


In [None]:
station_details_df.to_csv('/home/krishna/IDS-DRR-Data-Pipeline/Sources/FFS/data/station_details.csv',index=False)

In [5]:
r = requests.get(station_more_details_url,verify=False)
txt = r.json()
station_more_details_df = pd.DataFrame(txt)
station_more_details_df.head()



Unnamed: 0,@class,stationCode,dangerLevel,dlEndDate,dlStartDate,frl,frlEndDate,frlStartDate,highestFlowLevel,highestFlowLevelDate,...,mwlStartDate,nearestTown,savedAt,type,warningLevel,wlEndDate,wlStartDate,floodForecastStaticBangladeshReportStationCode,layerStationStationCode,meteorologicalSubdivisionId
0,com.eptisa.dto.SimpleFloodForecastStaticDto,038-CDJAPR,,,,316.0,,,,,...,,,2018-06-04,Inflow,,,,,038-CDJAPR,
1,com.eptisa.dto.SimpleFloodForecastStaticDto,018-UGDHYD,,,,523.6,,,,,...,,Sangareddy,2014-05-20,Inflow,,,,,018-UGDHYD,17.0
2,com.eptisa.dto.SimpleFloodForecastStaticDto,041-CDJAPR,,,,258.62,,,,,...,,,2018-06-04,Inflow,,,,,041-CDJAPR,
3,com.eptisa.dto.SimpleFloodForecastStaticDto,018MAHGAND,192.24,,,189.59,,,,,...,,,2014-06-04,Inflow,187.06,,,,018MAHGAND,2.0
4,com.eptisa.dto.SimpleFloodForecastStaticDto,017-WGDNGP,174.0,,,,,,176.45,1986-08-14,...,,Balharsha,2014-05-21,Level,171.5,,2013-07-17,,017-WGDNGP,33.0


In [6]:
station_more_details_df.to_csv('/home/krishna/IDS-DRR-Data-Pipeline/Sources/FFS/data/station_more_details.csv',index=False)

In [10]:
assam_polygon = gpd.read_file('/home/krishna/IDS-DRR-Data-Pipeline/Sources/FFS/data/Assam.geojson')

In [17]:
stations = pd.read_csv('/home/krishna/IDS-DRR-Data-Pipeline/Sources/FFS/data/station_coordinates.csv')

# creating a geometry column 
geometry = [Point(xy) for xy in zip(stations['lon'], stations['lat'])]# Coordinate reference system : WGS84
crs = {'init': 'epsg:4326'}# Creating a Geographic data frame 
stations_gdf = gpd.GeoDataFrame(stations, crs=crs, geometry=geometry)

#Spatial join with assam state
gpd.sjoin(stations_gdf.to_crs('EPSG:4326'),
          assam_polygon.to_crs('EPSG:4326'),
          how="inner"
         ).to_file('/home/krishna/IDS-DRR-Data-Pipeline/Sources/FFS/data/assam_stations.geojson',driver='GeoJSON')

  in_crs_string = _prepare_from_proj_string(in_crs_string)


In [None]:
assam_stations_gdf = gpd.read_file('/home/krishna/IDS-DRR-Data-Pipeline/Sources/FFS/data/assam_stations.geojson')


In [34]:
startDate = '2016-01-01'
endDate = '2022-07-30'

master_df = pd.read_csv('Waterlevel_assam_stations.csv')

for stationCode in list(assam_stations_df.stationCode):
    print(stationCode)
    dynamic_url = 'https://ffs.india-water.gov.in/iam/api/new-entry-data/specification/sorted?sort-criteria=%7B%22sortOrderDtos%22:%5B%7B%22sortDirection%22:%22ASC%22,%22field%22:%22id.dataTime%22%7D%5D%7D&specification=%7B%22where%22:%7B%22where%22:%7B%22where%22:%7B%22expression%22:%7B%22valueIsRelationField%22:false,%22fieldName%22:%22id.stationCode%22,%22operator%22:%22eq%22,%22value%22:%22'+stationCode+'%22%7D%7D,%22and%22:%7B%22expression%22:%7B%22valueIsRelationField%22:false,%22fieldName%22:%22id.datatypeCode%22,%22operator%22:%22eq%22,%22value%22:%22HHS%22%7D%7D%7D,%22and%22:%7B%22expression%22:%7B%22valueIsRelationField%22:false,%22fieldName%22:%22dataValue%22,%22operator%22:%22null%22,%22value%22:%22false%22%7D%7D%7D,%22and%22:%7B%22expression%22:%7B%22valueIsRelationField%22:false,%22fieldName%22:%22id.dataTime%22,%22operator%22:%22btn%22,%22value%22:%22'+startDate+'T16:49:44.574,'+endDate+'T16:49:44.574%22%7D%7D%7D'
    r = requests.get(dynamic_url,verify=False)
    txt = r.json()
    df = pd.DataFrame(txt)
    df['Date'] = df.id.apply(lambda x: x['dataTime'].split('T')[0])
    df['Time'] = df.id.apply(lambda x: x['dataTime'].split('T')[1])
    df = df[['stationCode','Date','Time','dataValue','datatypeCode']]
    master_df = pd.concat([master_df,df],ignore_index=True)
    master_df.to_csv('Waterlevel_assam_stations.csv',index=False)


master_df


01-11-01-002




01-11-01-008




BV000FS




01-11-01-007




01-11-01-006




01-11-13-001




bv000f5




01-10-23-001




BKA00D7




057-UBDDIB




021-MDSIL




009-mdsil




018-MDSIL




019-MDSIL




024- MDSIL




027- MDSIL




028- MDSIL




016-MBDGHY




01-11-06-003




01-11-03-001




012-MBDGHY




Unnamed: 0,stationCode,Date,Time,dataValue,datatypeCode
0,010-MBDGHY,2016-04-07,00:00:00,17.47,HHS
1,010-MBDGHY,2016-04-07,01:00:00,17.44,HHS
2,010-MBDGHY,2016-04-07,02:00:00,17.40,HHS
3,010-MBDGHY,2016-04-07,03:00:00,17.36,HHS
4,010-MBDGHY,2016-04-07,04:00:00,17.32,HHS
...,...,...,...,...,...
2124068,012-MBDGHY,2022-07-26,17:00:00,25.04,HHS
2124069,012-MBDGHY,2022-07-26,18:00:00,25.03,HHS
2124070,012-MBDGHY,2022-07-26,19:00:00,25.03,HHS
2124071,012-MBDGHY,2022-07-26,20:00:00,25.03,HHS


## References:
1. https://pib.gov.in/newsite/PrintRelease.aspx?relid=181066
2. http://jalshakti-dowr.gov.in/schemes-projects-programmes/schemes/flood-forecasting
3. https://indiawris.gov.in/wiki/doku.php?id=cwc_national_flood_forecasting_network
4. http://iced.cag.gov.in/wp-content/uploads/2016-17/NTP%2002/VD%20Roy.pdf
5. http://cwc.gov.in/sites/default/files/final-appraisal-report-2020-publication-no..pdf

There are three divisions in Assam:
1. Upper Brahmaputra Division, Dibrugarh (UBDDIB): As per 2020, 19 stations
2. Middle Brahmaputra Divison, Guwahati (MBDGHY): As per 2020, 15 stations
3. Lower Brahmaputra Division, Jalpaiguri (LBDJPG): As per 2020, 8 stations

In [12]:
master_df = pd.read_csv('Waterlevel_assam_stations.csv')
master_df

Unnamed: 0,stationCode,Date,Time,dataValue,datatypeCode
0,010-MBDGHY,2016-04-07,00:00:00,17.47,HHS
1,010-MBDGHY,2016-04-07,01:00:00,17.44,HHS
2,010-MBDGHY,2016-04-07,02:00:00,17.40,HHS
3,010-MBDGHY,2016-04-07,03:00:00,17.36,HHS
4,010-MBDGHY,2016-04-07,04:00:00,17.32,HHS
...,...,...,...,...,...
2124068,012-MBDGHY,2022-07-26,17:00:00,25.04,HHS
2124069,012-MBDGHY,2022-07-26,18:00:00,25.03,HHS
2124070,012-MBDGHY,2022-07-26,19:00:00,25.03,HHS
2124071,012-MBDGHY,2022-07-26,20:00:00,25.03,HHS
