<a id="top"></a>
# Internship project: Transforming raw rainfall data into a high quality data product  

### Team 3: Explore DS Academy
---
<img src="https://www.esri.com/content/dam/esrisites/en-us/about/what-is-gis/assets/image-switcher-maps/identify-problems-what-is-gis-image-switcher.jpg" align="left">

**Team Members:** Okon Prince, Elvis Esharegharan, Abiemwense Omokaro, Elmund Dotsey, Joseph Okonkwo, Alfred Mondi  
**Internship Mentors:** Mzi Xaba, Kelly Ile

* [Notebook repo](https://github.com/Muzi-EXPLORE/intern_team_3_2022_rainfall_data)
* [Trello board](https://trello.com/b/UtTFA0JS/team-3) 

In [1]:
import geopandas as gpd
import shapely.wkt
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_selection import VarianceThreshold

In [2]:
# load the data
df_o= pd.read_csv('Optimised_rainfall_objects_202209.csv') 
df_no= pd.read_csv('Non-optimised_rainfall_objects_202209.csv') 

In [3]:
df_o.head()

Unnamed: 0,STATION_NO,STATION_LATITUDE,STATION_LONGITUDE,OBJECT_TYPE,OBJECT_TYPE_SHORTNAME,STATION_AREA_WKT,CREATIONTIME
0,CM53010213-cbd2-4f6d-b3ca-48380c1a9182,50.890245,0.557153,Catchment,CM,"POLYGON ((0.555378 50.88834,0.555892 50.889354...",2022-03-23 18:02:52.000
1,CM145cd21b-3329-4d9f-b44c-14da1a9824a2,51.146506,-0.004037,Catchment,CM,"POLYGON ((-0.010209 51.147913,-0.003574 51.147...",2022-03-23 18:02:52.000
2,CM1f9fd786-9dcd-4a88-9fd7-14ebf61ad0b2,50.794966,-1.023181,Catchment,CM,"POLYGON ((-1.016756 50.792095,-1.017329 50.792...",2022-03-23 18:02:53.000
3,CMae692cd6-fefe-4e1b-8777-25a90d8017fa,51.112963,0.183215,Catchment,CM,"POLYGON ((0.186719 51.11561,0.18656 51.115525,...",2022-03-23 18:02:54.000
4,CM17f5cc28-e241-4bef-9afb-4b1fdbe2deba,51.33656,1.369312,Catchment,CM,"POLYGON ((1.381381 51.333004,1.38128 51.333037...",2022-03-23 18:02:54.000


In [4]:
df_no.head()

Unnamed: 0,STATION_NO,STATION_LATITUDE,STATION_LONGITUDE,OBJECT_TYPE,OBJECT_TYPE_SHORTNAME,STATION_AREA_WKT,CREATIONTIME
0,HSadee6548-d797-49a3-a2a0-b7e8348f0b27,51.130919,1.317469,Hotspot,HS,,2019-05-23 10:22:57.000
1,HSfd30138a-5a83-450a-81d3-7d1335d69c09,50.846194,-1.055814,Hotspot,HS,,2019-05-20 11:42:16.000
2,HSefb5e85c-71f8-42a4-b529-44cdd306486d,50.839552,-1.044914,Hotspot,HS,,2019-05-20 12:12:35.000
3,HS9037443f-c430-4468-86b8-8c2c0362e1c6,51.41606,0.752698,Hotspot,HS,,2019-05-23 14:28:36.000
4,HS27e9d11e-c6ce-4e8f-ba3a-aac71fd09ff3,50.824115,-0.426342,Hotspot,HS,,2018-04-10 12:31:01.000


In [5]:
df_o.shape, df_no.shape

((90, 7), (597, 7))

In [6]:
df_o.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 90 entries, 0 to 89
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   STATION_NO             90 non-null     object 
 1   STATION_LATITUDE       90 non-null     float64
 2   STATION_LONGITUDE      90 non-null     float64
 3   OBJECT_TYPE            90 non-null     object 
 4   OBJECT_TYPE_SHORTNAME  90 non-null     object 
 5   STATION_AREA_WKT       90 non-null     object 
 6   CREATIONTIME           90 non-null     object 
dtypes: float64(2), object(5)
memory usage: 5.0+ KB


In [7]:
df_no.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 597 entries, 0 to 596
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   STATION_NO             597 non-null    object 
 1   STATION_LATITUDE       597 non-null    float64
 2   STATION_LONGITUDE      597 non-null    float64
 3   OBJECT_TYPE            597 non-null    object 
 4   OBJECT_TYPE_SHORTNAME  597 non-null    object 
 5   STATION_AREA_WKT       56 non-null     object 
 6   CREATIONTIME           597 non-null    object 
dtypes: float64(2), object(5)
memory usage: 32.8+ KB


In [8]:
# Check the unique classes or values in the OBJECT TYPE feature
df_no['OBJECT_TYPE'].value_counts()

Hotspot             541
Catchment            49
Zone of interest      7
Name: OBJECT_TYPE, dtype: int64

In [10]:
df_no['OBJECT_TYPE'].value_counts()

Hotspot             541
Catchment            49
Zone of interest      7
Name: OBJECT_TYPE, dtype: int64

### Task 1
Use only "hotspots" (indicated in OBJECT_TYPE column) from the Non-optimised_rainfall_objects_202209.csv 

To do this, first create a new data frame containing only OBJECT_TYPE "Hotspot" from the non-optimised data frame

In [24]:
df_nohotspot = df_no[df_no['OBJECT_TYPE']!='Hotspot']
df_nohotspot.shape

(56, 7)

In [25]:
df_nohotspot.head()

Unnamed: 0,STATION_NO,STATION_LATITUDE,STATION_LONGITUDE,OBJECT_TYPE,OBJECT_TYPE_SHORTNAME,STATION_AREA_WKT,CREATIONTIME
13,CM145fbbb3-814c-46c8-b365-545185983755,51.108822,0.951913,Catchment,CM,"POLYGON ((0.82771 51.07838,0.82291 51.0794,0.8...",2017-03-14 12:19:01.000
14,CM687f7f97-3395-43d9-b0e3-7acb9b3b646c,50.849934,-0.380189,Catchment,CM,"POLYGON ((-0.22827 50.78343,-0.54124 50.78392,...",2017-11-28 15:37:16.000
15,ZIfcbee7eb-6a39-4cf5-9352-6f9c00a50552,50.795757,0.281778,Zone of interest,ZI,"POLYGON ((0.333008 50.788701,0.249299 50.73368...",2019-05-23 10:39:47.000
16,CMeb49aace-8421-4d29-b668-a4d9401d52ec,50.857412,0.273727,Catchment,CM,"POLYGON ((0.46551 50.83185,0.36475 50.82106,0....",2017-11-28 15:37:53.000
17,ZI2088b02a-034b-4b7b-9b9c-efa7470d2c7d,51.214673,0.798888,Zone of interest,ZI,"POLYGON ((0.696387 50.889944,-0.024591 51.2472...",2021-03-12 16:09:44.000


In [None]:
# Create a new dataframe for the children from extremely low income homes
#df_hotspot = df_no[df_no['OBJECT_TYPE']=='Hotspot']
#df_hotspot.shape

### Task 2
For the objects in step 1, get the geometry from the column STATION_AREA_WKT. This geometry should only consist of a pair of coordinates. 

In [None]:
df_hotspot.head(1)

In [None]:
df_hotspot.info()

In [31]:
geometry = df_nohotspot['STATION_AREA_WKT'].map(shapely.wkt.loads)
dfg = df_nohotspot.drop('STATION_AREA_WKT', axis=1)
gdf = gpd.GeoDataFrame(df_nohotspot, crs="EPSG:4326", geometry=geometry)

In [None]:
# Check the unique classes or values in the OBJECT TYPE feature
gdf['geometry'].value_counts()

In [29]:
geometry.head()

13    POLYGON ((0.8277099999999999 51.07838, 0.82291...
14    POLYGON ((-0.22827 50.78343, -0.54124000000000...
15    POLYGON ((0.333008 50.788701, 0.249299 50.7336...
16    POLYGON ((0.46551 50.83185, 0.36475 50.82106, ...
17    POLYGON ((0.696387 50.889944, -0.024591 51.247...
Name: STATION_AREA_WKT, dtype: object

In [32]:
dfg.head()

Unnamed: 0,STATION_NO,STATION_LATITUDE,STATION_LONGITUDE,OBJECT_TYPE,OBJECT_TYPE_SHORTNAME,CREATIONTIME,geometry
13,CM145fbbb3-814c-46c8-b365-545185983755,51.108822,0.951913,Catchment,CM,2017-03-14 12:19:01.000,"POLYGON ((0.82771 51.07838, 0.82291 51.07940, ..."
14,CM687f7f97-3395-43d9-b0e3-7acb9b3b646c,50.849934,-0.380189,Catchment,CM,2017-11-28 15:37:16.000,"POLYGON ((-0.22827 50.78343, -0.54124 50.78392..."
15,ZIfcbee7eb-6a39-4cf5-9352-6f9c00a50552,50.795757,0.281778,Zone of interest,ZI,2019-05-23 10:39:47.000,"POLYGON ((0.33301 50.78870, 0.24930 50.73368, ..."
16,CMeb49aace-8421-4d29-b668-a4d9401d52ec,50.857412,0.273727,Catchment,CM,2017-11-28 15:37:53.000,"POLYGON ((0.46551 50.83185, 0.36475 50.82106, ..."
17,ZI2088b02a-034b-4b7b-9b9c-efa7470d2c7d,51.214673,0.798888,Zone of interest,ZI,2021-03-12 16:09:44.000,"POLYGON ((0.69639 50.88994, -0.02459 51.24721,..."


In [33]:
gdf.head()

Unnamed: 0,STATION_NO,STATION_LATITUDE,STATION_LONGITUDE,OBJECT_TYPE,OBJECT_TYPE_SHORTNAME,STATION_AREA_WKT,CREATIONTIME,geometry
13,CM145fbbb3-814c-46c8-b365-545185983755,51.108822,0.951913,Catchment,CM,"POLYGON ((0.82771 51.07838,0.82291 51.0794,0.8...",2017-03-14 12:19:01.000,"POLYGON ((0.82771 51.07838, 0.82291 51.07940, ..."
14,CM687f7f97-3395-43d9-b0e3-7acb9b3b646c,50.849934,-0.380189,Catchment,CM,"POLYGON ((-0.22827 50.78343,-0.54124 50.78392,...",2017-11-28 15:37:16.000,"POLYGON ((-0.22827 50.78343, -0.54124 50.78392..."
15,ZIfcbee7eb-6a39-4cf5-9352-6f9c00a50552,50.795757,0.281778,Zone of interest,ZI,"POLYGON ((0.333008 50.788701,0.249299 50.73368...",2019-05-23 10:39:47.000,"POLYGON ((0.33301 50.78870, 0.24930 50.73368, ..."
16,CMeb49aace-8421-4d29-b668-a4d9401d52ec,50.857412,0.273727,Catchment,CM,"POLYGON ((0.46551 50.83185,0.36475 50.82106,0....",2017-11-28 15:37:53.000,"POLYGON ((0.46551 50.83185, 0.36475 50.82106, ..."
17,ZI2088b02a-034b-4b7b-9b9c-efa7470d2c7d,51.214673,0.798888,Zone of interest,ZI,"POLYGON ((0.696387 50.889944,-0.024591 51.2472...",2021-03-12 16:09:44.000,"POLYGON ((0.69639 50.88994, -0.02459 51.24721,..."
