## Remmy Bisimbeko - B26099 - J24M19/011
My GitHub - https://github.com/RemmyBisimbeko/Data-Science

2. Wolf_Hormones.xlsx 

This dataset includes measurements of cortisol, testosterone, and progesterone hormones in wolf hair samples collected from hunters in the tundra-taiga and northern boreal forests of Canada.  Additional samples were collected from wolves killed as part of a control program in the boreal forest (Little Smoky area). Detailed descriptions of the variables can be found under the sheet "Data_Descriptors" within the excel book.

Instructions:

a) Predict the level of testosterone hormone among the wolves in the sheet labelled "Test_Data1" 

b) Predict the level of progesterone hormone among the female wolves in the sheet labelled "Test_Data2". 

In [1]:
# I imoprt the Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LinearRegression

In [2]:
# Loading the dataset
data = pd.read_excel('../Data Sets/Wolf_Hormones.xlsx')

In [3]:
# Preprocess of the the data
le_sex = LabelEncoder()
le_pop = LabelEncoder()
le_col = LabelEncoder()

data['Sex'] = le_sex.fit_transform(data['Sex'])
data['Population'] = le_pop.fit_transform(data['Population'])
data['Colour'] = le_col.fit_transform(data['Colour'])

In [4]:
# Handle missing values
data = data.replace('NA', np.nan)
data.dropna(inplace=True)

X = data.drop(['Cpgmg', 'Tpgmg', 'Ppgmg'], axis=1)
y_Tpgmg = data['Tpgmg']
y_Ppgmg = data['Ppgmg']

In [5]:
# Scaling the data excluding the "Individual" column
scaler = StandardScaler()
if 'Individual' in X.columns:
    columns_to_transform = ['Sex', 'Population', 'Colour']
else:
    columns_to_transform = X.columns  # Include all columns if "Individual" is absent

X = scaler.fit_transform(X[columns_to_transform])

In [6]:
# Spliting the data into training and testing sets
X_train, X_test, y_train_Tpgmg, y_test_Tpgmg = train_test_split(X, y_Tpgmg, test_size=0.2, random_state=42)
X_train, X_test, y_train_Ppgmg, y_test_Ppgmg = train_test_split(X, y_Ppgmg, test_size=0.2, random_state=42)

In [7]:
# Train linear regression model for Tpgmg
model_Tpgmg = LinearRegression()
model_Tpgmg.fit(X_train, y_train_Tpgmg)

In [8]:
# Train linear regression model for Ppgmg
model_Ppgmg = LinearRegression()
model_Ppgmg.fit(X_train, y_train_Ppgmg)

In [9]:
# Now, i predictions for Test_Data1
test_data1 = pd.read_excel('../Data Sets/Wolf_Hormones.xlsx', sheet_name='Test_Data1')
test_data1['Sex'] = le_sex.transform(test_data1['Sex'])
test_data1['Population'] = le_pop.transform(test_data1['Population'])
test_data1['Colour'] = le_col.transform(test_data1['Colour'])

In [10]:
# Ensuring all columns present in the fit time are also present here
columns_to_transform = ['Sex', 'Population', 'Colour']
if 'Individual' in columns_to_transform:
    columns_to_transform.remove('Individual')

In [11]:
# Scale the test data
X_test1 = scaler.transform(test_data1[columns_to_transform])

Tpgmg_predictions = model_Tpgmg.predict(X_test1)
test_data1['Tpgmg'] = Tpgmg_predictions

In [12]:
# Make predictions for Test_Data2
test_data2 = pd.read_excel('../Data Sets/Wolf_Hormones.xlsx', sheet_name='Test_Data2')
test_data2['Sex'] = le_sex.transform(test_data2['Sex'])
test_data2['Population'] = le_pop.transform(test_data2['Population'])
test_data2['Colour'] = le_col.transform(test_data2['Colour'])

In [13]:
# Ensuring all columns present in the fit time are also present here
columns_to_transform = ['Sex', 'Population', 'Colour']
if 'Individual' in columns_to_transform:
    columns_to_transform.remove('Individual')

In [14]:
# Scale the test data
X_test2 = scaler.transform(test_data2[columns_to_transform])

In [15]:
Ppgmg_predictions = model_Ppgmg.predict(X_test2)
test_data2['Ppgmg'] = Ppgmg_predictions

In [16]:
# Output the results to Excel
with pd.ExcelWriter('Wolf_Hormones_Predictions.xlsx') as writer:
    test_data1.to_excel(writer, sheet_name='Test_Data1', index=False)
    test_data2.to_excel(writer, sheet_name='Test_Data2', index=False)

In [17]:
# Read the Excel file
predictions = pd.read_excel('Wolf_Hormones_Predictions.xlsx', sheet_name=None) 

# Display the contents of each sheet
for sheet_name, df in predictions.items():
    print(f"Contents of sheet '{sheet_name}':")
    print(df)
    print("\n")

Contents of sheet 'Test_Data1':
    Individual  Sex  Population  Colour  Cpgmg     Tpgmg
0          179    1           1       2  15.86  5.131519
1          180    0           0       0  20.02  4.800921
2          181    0           1       2   9.95  5.131519
3          182    0           0       0  25.22  4.800921
4          183    1           1       0  21.13  5.219956
5          184    1           1       2  12.48  5.131519
6          185    2           2       1  14.17  5.594773
7          186    2           2       1  12.09  5.594773
8          187    2           2       1  54.47  5.594773
9          188    2           2       1  10.40  5.594773
10         189    2           2       1  50.31  5.594773
11         190    2           2       1  33.74  5.594773
12         191    2           2       1  14.76  5.594773
13         192    1           0       2  11.96  4.712484
14         193    1           0       0  14.82  4.800921
15         194    0           0       2  14.43  4.712484

#### Sources
###### analyticsindiamag.com/demonstration-of-what-if-tool-for-machine-learning-model-investigation/
###### github.com/AlhassanMohamed/ML-DL