<a href="https://colab.research.google.com/github/MariaNaeem863/Machine-Learning-Project/blob/main/Shifting_Seas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Ocean Climate & Marine Life Dataset**

A Multi-Year Global Dataset of Sea Surface Temperature, pH Levels, and Corals.

In [None]:
import numpy as np # linear algebra
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df = pd.read_csv('shifting_seas.csv')
df.head(10)

Unnamed: 0,Date,Location,Latitude,Longitude,SST (°C),pH Level,Bleaching Severity,Species Observed,Marine Heatwave
0,2015-01-01,Red Sea,20.0248,38.4931,29.47,8.107,,106,False
1,2015-01-07,Great Barrier Reef,-18.2988,147.7782,29.65,8.004,High,116,False
2,2015-01-14,Caribbean Sea,14.9768,-75.0233,28.86,7.947,High,90,False
3,2015-01-20,Great Barrier Reef,-18.3152,147.6486,28.97,7.995,Medium,94,False
4,2015-01-27,Galápagos,-0.8805,-90.9769,28.6,7.977,,110,False
5,2015-02-02,Red Sea,20.0055,38.4425,29.06,8.009,Low,109,False
6,2015-02-09,South China Sea,9.9699,115.0926,28.48,7.998,,132,False
7,2015-02-16,Red Sea,19.939,38.5104,25.56,8.042,Low,152,False
8,2015-02-22,Maldives,3.2397,73.2293,28.33,8.038,Low,99,False
9,2015-03-01,Caribbean Sea,14.964,-75.023,30.09,8.035,High,77,True


In [None]:
df.describe()

Unnamed: 0,Latitude,Longitude,SST (°C),pH Level,Species Observed
count,500.0,500.0,500.0,500.0,500.0
mean,6.72844,11.930638,28.5372,8.049872,120.472
std,13.534648,108.174778,1.425076,0.055707,20.475441
min,-18.383,-155.7176,23.64,7.872,54.0
25%,-0.932925,-90.9264,27.53,8.011,107.75
50%,10.01875,38.52,28.515,8.052,120.0
75%,19.891,114.990975,29.4525,8.08525,133.25
max,20.122,147.8271,33.21,8.195,171.0


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Date                500 non-null    object 
 1   Location            500 non-null    object 
 2   Latitude            500 non-null    float64
 3   Longitude           500 non-null    float64
 4   SST (°C)            500 non-null    float64
 5   pH Level            500 non-null    float64
 6   Bleaching Severity  350 non-null    object 
 7   Species Observed    500 non-null    int64  
 8   Marine Heatwave     500 non-null    bool   
dtypes: bool(1), float64(4), int64(1), object(3)
memory usage: 31.9+ KB


In [None]:
#checking for duplicate values
df.duplicated().sum()

np.int64(0)

In [None]:
df.nunique()

Unnamed: 0,0
Date,500
Location,7
Latitude,488
Longitude,489
SST (°C),313
pH Level,202
Bleaching Severity,3
Species Observed,94
Marine Heatwave,2


In [None]:
df.columns

Index(['Date', 'Location', 'Latitude', 'Longitude', 'SST (°C)', 'pH Level',
       'Bleaching Severity', 'Species Observed', 'Marine Heatwave'],
      dtype='object')

In [None]:
df.shape

(500, 9)

In [None]:
#checking for missing values
df.isnull().sum()

Unnamed: 0,0
Date,0
Location,0
Latitude,0
Longitude,0
SST (°C),0
pH Level,0
Bleaching Severity,150
Species Observed,0
Marine Heatwave,0


### As we can see Bleaching Severity	has null values lets fill it

In [None]:
#lets check the value count for Bleaching Severity
df['Bleaching Severity'].value_counts()

Unnamed: 0_level_0,count
Bleaching Severity,Unnamed: 1_level_1
Low,132
Medium,130
High,88


In [None]:
df['Bleaching Severity'].fillna('Moderate', inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Bleaching Severity'].fillna('Moderate', inplace=True)


In [None]:
df.isnull().sum()

Unnamed: 0,0
Date,0
Location,0
Latitude,0
Longitude,0
SST (°C),0
pH Level,0
Bleaching Severity,0
Species Observed,0
Marine Heatwave,0


Now as we know Date is in object formats lets change it into datetime

In [None]:
df['Date'] = pd.to_datetime(df['Date'])

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   Date                500 non-null    datetime64[ns]
 1   Location            500 non-null    object        
 2   Latitude            500 non-null    float64       
 3   Longitude           500 non-null    float64       
 4   SST (°C)            500 non-null    float64       
 5   pH Level            500 non-null    float64       
 6   Bleaching Severity  500 non-null    object        
 7   Species Observed    500 non-null    int64         
 8   Marine Heatwave     500 non-null    bool          
dtypes: bool(1), datetime64[ns](1), float64(4), int64(1), object(2)
memory usage: 31.9+ KB


In [None]:
#let import the libraries for Model training
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder , OneHotEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report , accuracy_score

In [None]:
#lets Make a copy for better work
df1 = df.copy()

In [None]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Location            500 non-null    object 
 1   Latitude            500 non-null    float64
 2   Longitude           500 non-null    float64
 3   SST (°C)            500 non-null    float64
 4   pH Level            500 non-null    float64
 5   Bleaching Severity  500 non-null    object 
 6   Species Observed    500 non-null    int64  
 7   Marine Heatwave     500 non-null    bool   
dtypes: bool(1), float64(4), int64(1), object(2)
memory usage: 28.0+ KB


In [None]:
#lets create dummies for location
df1 = pd.get_dummies(df1, columns=['Location'])

In [None]:
df1.head()

Unnamed: 0,Latitude,Longitude,SST (°C),pH Level,Bleaching Severity,Species Observed,Marine Heatwave,Location_Caribbean Sea,Location_Galápagos,Location_Great Barrier Reef,Location_Hawaiian Islands,Location_Maldives,Location_Red Sea,Location_South China Sea
0,20.0248,38.4931,29.47,8.107,Moderate,106,False,False,False,False,False,False,True,False
1,-18.2988,147.7782,29.65,8.004,High,116,False,False,False,True,False,False,False,False
2,14.9768,-75.0233,28.86,7.947,High,90,False,True,False,False,False,False,False,False
3,-18.3152,147.6486,28.97,7.995,Medium,94,False,False,False,True,False,False,False,False
4,-0.8805,-90.9769,28.6,7.977,Moderate,110,False,False,True,False,False,False,False,False


In [None]:
le = LabelEncoder()
df1['Bleaching Severity'] = le.fit_transform(df1['Bleaching Severity'])

In [None]:
df1.columns

Index(['Latitude', 'Longitude', 'SST (°C)', 'pH Level', 'Bleaching Severity',
       'Species Observed', 'Marine Heatwave', 'Location_Caribbean Sea',
       'Location_Galápagos', 'Location_Great Barrier Reef',
       'Location_Hawaiian Islands', 'Location_Maldives', 'Location_Red Sea',
       'Location_South China Sea'],
      dtype='object')

In [None]:
df1['Marine Heatwave'].value_counts()

Unnamed: 0_level_0,count
Marine Heatwave,Unnamed: 1_level_1
False,427
True,73


In [None]:
#checking if there any direct relation of Bleaching serverity to marine heat wave
df1[df1['Marine Heatwave']==True]

Unnamed: 0,Latitude,Longitude,SST (°C),pH Level,Bleaching Severity,Species Observed,Marine Heatwave,Location_Caribbean Sea,Location_Galápagos,Location_Great Barrier Reef,Location_Hawaiian Islands,Location_Maldives,Location_Red Sea,Location_South China Sea
9,14.9640,-75.0230,30.09,8.035,0,77,True,True,False,False,False,False,False,False
13,10.0678,114.9964,30.01,8.038,0,95,True,False,False,False,False,False,False,True
16,14.9890,-74.9821,30.72,7.980,0,85,True,True,False,False,False,False,False,False
24,20.0396,38.4545,30.60,7.938,3,107,True,False,False,False,False,False,True,False
32,3.1541,73.2601,30.24,7.974,1,117,True,False,False,False,False,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
452,-0.9320,-90.9454,30.35,7.959,3,111,True,False,True,False,False,False,False,False
456,9.9137,114.9661,30.29,7.965,1,95,True,False,False,False,False,False,False,True
459,3.2289,73.2416,30.60,8.040,1,76,True,False,False,False,False,True,False,False
495,-0.9177,-90.9771,30.68,7.939,2,108,True,False,True,False,False,False,False,False


As we can there is marine heatwave has not any way related to bleaching severity .if it was we can't add it to features before training model.
but now we can as scenario is opposite.

In [None]:
x = df1.drop(columns=['Marine Heatwave'])
y = df1['Marine Heatwave']

In [None]:
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=20)

In [None]:
model = RandomForestClassifier(n_estimators = 100, random_state = 20)
model.fit(X_train, y_train)

In [None]:
y_pred = model.predict(X_test)
y_pred

array([False, False, False, False, False, False, False, False, False,
       False, False, False, False,  True, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False,  True, False, False, False, False, False,
       False, False, False, False, False, False, False, False,  True,
        True, False, False, False, False, False, False, False,  True,
       False, False, False,  True, False, False,  True, False, False,
       False, False, False, False, False, False, False,  True,  True,
        True, False, False, False, False, False, False, False, False,
       False, False,  True, False, False, False, False, False, False,
       False,  True, False, False,  True, False, False, False, False,
       False])

**Model Accuracy**

In [None]:
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.99


**Classification Report**

In [None]:
model_classification_report = classification_report(y_test, y_pred)
print('Classification Report: ', model_classification_report)

Classification Report:                precision    recall  f1-score   support

       False       0.99      1.00      0.99        86
        True       1.00      0.93      0.96        14

    accuracy                           0.99       100
   macro avg       0.99      0.96      0.98       100
weighted avg       0.99      0.99      0.99       100

