#Weather Station Clustering using DBSCAN & scikit-learn

DBSCAN is adequate for tasks like class identification in a spatial context. The attribute of DBSCAN algorithm is that it can find out any arbitrary shape cluster without getting affected by noise.
In this notebook we will working on clustering the location of weather stations in Canada, to find the group of stations which show the same weather condition.

First of all, we will import libraries and then, we will download and overview the data:

In [2]:
import numpy as np
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt
%matplotlib inline


In [1]:
!wget -O weather-stations20140101-20141231.csv https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-ML0101EN-SkillsNetwork/labs/Module%204/data/weather-stations20140101-20141231.csv

--2023-08-24 11:16:45--  https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-ML0101EN-SkillsNetwork/labs/Module%204/data/weather-stations20140101-20141231.csv
Resolving cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud (cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud)... 169.63.118.104
Connecting to cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud (cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud)|169.63.118.104|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 129821 (127K) [text/csv]
Saving to: ‘weather-stations20140101-20141231.csv’


2023-08-24 11:16:45 (1.24 MB/s) - ‘weather-stations20140101-20141231.csv’ saved [129821/129821]



##Loading the dataset

We will import the .csv then creates the columns for year, month and day.

In [3]:
import csv
import pandas as pd

myfile= 'weather-stations20140101-20141231.csv'
pdf = pd.read_csv(myfile)
pdf.head(5)

Unnamed: 0,Stn_Name,Lat,Long,Prov,Tm,DwTm,D,Tx,DwTx,Tn,...,DwP,P%N,S_G,Pd,BS,DwBS,BS%,HDD,CDD,Stn_No
0,CHEMAINUS,48.935,-123.742,BC,8.2,0.0,,13.5,0.0,1.0,...,0.0,,0.0,12.0,,,,273.3,0.0,1011500
1,COWICHAN LAKE FORESTRY,48.824,-124.133,BC,7.0,0.0,3.0,15.0,0.0,-3.0,...,0.0,104.0,0.0,12.0,,,,307.0,0.0,1012040
2,LAKE COWICHAN,48.829,-124.052,BC,6.8,13.0,2.8,16.0,9.0,-2.5,...,9.0,,,11.0,,,,168.1,0.0,1012055
3,DISCOVERY ISLAND,48.425,-123.226,BC,,,,12.5,0.0,,...,,,,,,,,,,1012475
4,DUNCAN KELVIN CREEK,48.735,-123.728,BC,7.7,2.0,3.4,14.5,2.0,-1.0,...,2.0,,,11.0,,,,267.7,0.0,1012573


##Cleaning the data
We re going to remove rows that don't have any value in the **Tm** field.

In [6]:
pdf = pdf[pd.notnull(pdf[ "Tm"])]
pdf= pdf.reset_index(drop=True)
pdf.head(5)

Unnamed: 0,Stn_Name,Lat,Long,Prov,Tm,DwTm,D,Tx,DwTx,Tn,...,DwP,P%N,S_G,Pd,BS,DwBS,BS%,HDD,CDD,Stn_No
0,CHEMAINUS,48.935,-123.742,BC,8.2,0.0,,13.5,0.0,1.0,...,0.0,,0.0,12.0,,,,273.3,0.0,1011500
1,COWICHAN LAKE FORESTRY,48.824,-124.133,BC,7.0,0.0,3.0,15.0,0.0,-3.0,...,0.0,104.0,0.0,12.0,,,,307.0,0.0,1012040
2,LAKE COWICHAN,48.829,-124.052,BC,6.8,13.0,2.8,16.0,9.0,-2.5,...,9.0,,,11.0,,,,168.1,0.0,1012055
3,DUNCAN KELVIN CREEK,48.735,-123.728,BC,7.7,2.0,3.4,14.5,2.0,-1.0,...,2.0,,,11.0,,,,267.7,0.0,1012573
4,ESQUIMALT HARBOUR,48.432,-123.439,BC,8.8,0.0,,13.1,0.0,1.9,...,8.0,,,12.0,,,,258.6,0.0,1012710


#Visualization

We will visualize stations on map using basemap package. The matplotlib basemap toolkit is a library for plotting 2D data on maps in Python. Basemap does not do any plotting on it's own, but provides the facilities to transform coordinates to a map projections.

The size of each data points represents the average of maximum temperature for each station in a year.

In [20]:
from mpl_toolkits.basemap import Basemap
!pip show basemap
WARNING: Package(s) not found: basemap
from pylab import rcParams
%matplotlib inline
rcParams['figure.figsize'] = (14,10)

llon=-140
ulon=-50
llat=40
ulat=65

pdf = pdf[ pdf(['Long'] > llon) & (pdf['Long'] < ulon) & (pdf['Lat'] > llat) & (pdf[ 'Lat']< ulat)]

my_map = Basemap(projection='merc',
            resolution = 'l', area_thresh = 1000.0,
            llcrnrlon=llon, llcrnrlat=llat, #min longitude(llcrnrlon) and latitude(llcrnrlat)
            urcrnrlon=ulon, urcrnrlat=ulat) #max longitude (urcrnrlon) and latitude(urcrnrlat)

my_map.drawcoastlines()
my_map.drawcountries()
#my_map.drawmapboundary()
my_map.fillcontinents(color='white', alpha = 0.3)
my_map.shadedrelief()

#to collect data based on stations
xs, ys = my_map(np.asarray(pdf.Long), np.asarray(pdf.Lat))
pdf['xm']=xs.tolist()
pdf['ym']=ys.tolist()

#visualization
for index,row in pdf.iterrows():
 # x,y = my_map(row.Long, row.Lat)
    my_map.plot(row.xm, row.ym,markerfacecolor = ([ 1,0,0]), marker='o', markersize=5, alpha = 0.75)
#plt-text(x,y,stn)
plt.show()


SyntaxError: ignored