## 🦪 Abalone Attribute Prediction

Given *data about abalone*, let's try to predict **multiple attributes** of a given organism.

We will use Linear Regression and Logistic Regression models to make our predictions.

Data source: https://www.kaggle.com/datasets/hurshd0/abalone-uci

### Importing Libraries

In [1]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import LinearRegression, LogisticRegression

In [2]:
data = pd.read_csv('abalone_original.csv')
data

Unnamed: 0,sex,length,diameter,height,whole-weight,shucked-weight,viscera-weight,shell-weight,rings
0,M,91,73,19,102.8,44.9,20.2,30.0,15
1,M,70,53,18,45.1,19.9,9.7,14.0,7
2,F,106,84,27,135.4,51.3,28.3,42.0,9
3,M,88,73,25,103.2,43.1,22.8,31.0,10
4,I,66,51,16,41.0,17.9,7.9,11.0,7
...,...,...,...,...,...,...,...,...,...
4172,F,113,90,33,177.4,74.0,47.8,49.8,11
4173,M,118,88,27,193.2,87.8,42.9,52.1,10
4174,M,120,95,41,235.2,105.1,57.5,61.6,9
4175,F,125,97,30,218.9,106.2,52.2,59.2,10


In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4177 entries, 0 to 4176
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   sex             4177 non-null   object 
 1   length          4177 non-null   int64  
 2   diameter        4177 non-null   int64  
 3   height          4177 non-null   int64  
 4   whole-weight    4177 non-null   float64
 5   shucked-weight  4177 non-null   float64
 6   viscera-weight  4177 non-null   float64
 7   shell-weight    4177 non-null   float64
 8   rings           4177 non-null   int64  
dtypes: float64(4), int64(4), object(1)
memory usage: 293.8+ KB


### Preprocessing and Training

In [4]:
df = data.copy()

In [5]:
# Target column: sex

y = df['sex'].copy()
X = df.drop('sex', axis=1).copy()

# Train-test split 
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True, random_state=1)

# Scale X
scaler = StandardScaler()
scaler.fit(X_train)
X_train = pd.DataFrame(scaler.transform(X_train), index=X_train.index, columns=X_train.columns)
X_test = pd.DataFrame(scaler.transform(X_test), index=X_test.index, columns=X_test.columns)

# Define Model
model = LogisticRegression()

# Fit model to train set
model.fit(X_train, y_train)

# Return test results
results = model.score(X_test, y_test)
print("Sex Classification Test Accuracy: {:.2f}%".format(results*100))

Sex Classification Test Accuracy: 57.10%


In [8]:
# Target column: length

df = data.copy()

# One-hot encode sex column
dummies = pd.get_dummies(df['sex'])
df = pd.concat([df, dummies], axis=1)
df = df.drop('sex', axis=1)

y = df['length'].copy()
X = df.drop('length', axis=1).copy()

# Train-test split 
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True, random_state=1)

# Scale X
scaler = StandardScaler()
scaler.fit(X_train)
X_train = pd.DataFrame(scaler.transform(X_train), index=X_train.index, columns=X_train.columns)
X_test = pd.DataFrame(scaler.transform(X_test), index=X_test.index, columns=X_test.columns)

# Define Model
model = LinearRegression()

# Fit model to train set
model.fit(X_train, y_train)

# Return test results
results = model.score(X_test, y_test)
print("Length Regression R^2 Score: {:.4f}".format(results))

Length Regression R^2 Score: 0.9753


In [10]:
# Target column: diameter

df = data.copy()

# One-hot encode sex column
dummies = pd.get_dummies(df['sex'])
df = pd.concat([df, dummies], axis=1)
df = df.drop('sex', axis=1)

y = df['diameter'].copy()
X = df.drop('diameter', axis=1).copy()

# Train-test split 
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True, random_state=1)

# Scale X
scaler = StandardScaler()
scaler.fit(X_train)
X_train = pd.DataFrame(scaler.transform(X_train), index=X_train.index, columns=X_train.columns)
X_test = pd.DataFrame(scaler.transform(X_test), index=X_test.index, columns=X_test.columns)

# Define Model
model = LinearRegression()

# Fit model to train set
model.fit(X_train, y_train)

# Return test results
results = model.score(X_test, y_test)
print("Diameter Regression R^2 Score: {:.4f}".format(results))

Diameter Regression R^2 Score: 0.9758


In [12]:
# Target column: height

df = data.copy()

# One-hot encode sex column
dummies = pd.get_dummies(df['sex'])
df = pd.concat([df, dummies], axis=1)
df = df.drop('sex', axis=1)

y = df['height'].copy()
X = df.drop('height', axis=1).copy()

# Train-test split 
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True, random_state=1)

# Scale X
scaler = StandardScaler()
scaler.fit(X_train)
X_train = pd.DataFrame(scaler.transform(X_train), index=X_train.index, columns=X_train.columns)
X_test = pd.DataFrame(scaler.transform(X_test), index=X_test.index, columns=X_test.columns)

# Define Model
model = LinearRegression()

# Fit model to train set
model.fit(X_train, y_train)

# Return test results
results = model.score(X_test, y_test)
print("Height Regression R^2 Score: {:.4f}".format(results))

Height Regression R^2 Score: 0.8147


In [14]:
# Target column: whole-weight

df = data.copy()

# One-hot encode sex column
dummies = pd.get_dummies(df['sex'])
df = pd.concat([df, dummies], axis=1)
df = df.drop('sex', axis=1)

y = df['whole-weight'].copy()
X = df.drop('whole-weight', axis=1).copy()

# Train-test split 
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True, random_state=1)

# Scale X
scaler = StandardScaler()
scaler.fit(X_train)
X_train = pd.DataFrame(scaler.transform(X_train), index=X_train.index, columns=X_train.columns)
X_test = pd.DataFrame(scaler.transform(X_test), index=X_test.index, columns=X_test.columns)

# Define Model
model = LinearRegression()

# Fit model to train set
model.fit(X_train, y_train)

# Return test results
results = model.score(X_test, y_test)
print("Whole Weight Regression R^2 Score: {:.4f}".format(results))

Whole Weight Regression R^2 Score: 0.9908


In [15]:
# Target column: shucked-weight

df = data.copy()

# One-hot encode sex column
dummies = pd.get_dummies(df['sex'])
df = pd.concat([df, dummies], axis=1)
df = df.drop('sex', axis=1)

y = df['shucked-weight'].copy()
X = df.drop('shucked-weight', axis=1).copy()

# Train-test split 
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True, random_state=1)

# Scale X
scaler = StandardScaler()
scaler.fit(X_train)
X_train = pd.DataFrame(scaler.transform(X_train), index=X_train.index, columns=X_train.columns)
X_test = pd.DataFrame(scaler.transform(X_test), index=X_test.index, columns=X_test.columns)

# Define Model
model = LinearRegression()

# Fit model to train set
model.fit(X_train, y_train)

# Return test results
results = model.score(X_test, y_test)
print("Shucked Weight Regression R^2 Score: {:.4f}".format(results))

Shucked Weight Regression R^2 Score: 0.9676


In [16]:
# Target column: viscera-weight

df = data.copy()

# One-hot encode sex column
dummies = pd.get_dummies(df['sex'])
df = pd.concat([df, dummies], axis=1)
df = df.drop('sex', axis=1)

y = df['viscera-weight'].copy()
X = df.drop('viscera-weight', axis=1).copy()

# Train-test split 
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True, random_state=1)

# Scale X
scaler = StandardScaler()
scaler.fit(X_train)
X_train = pd.DataFrame(scaler.transform(X_train), index=X_train.index, columns=X_train.columns)
X_test = pd.DataFrame(scaler.transform(X_test), index=X_test.index, columns=X_test.columns)

# Define Model
model = LinearRegression()

# Fit model to train set
model.fit(X_train, y_train)

# Return test results
results = model.score(X_test, y_test)
print("Viscera-Weight Regression R^2 Score: {:.4f}".format(results))

Viscera-Weight Regression R^2 Score: 0.9462


In [17]:
# Target column: shell-weight

df = data.copy()

# One-hot encode sex column
dummies = pd.get_dummies(df['sex'])
df = pd.concat([df, dummies], axis=1)
df = df.drop('sex', axis=1)

y = df['shell-weight'].copy()
X = df.drop('shell-weight', axis=1).copy()

# Train-test split 
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True, random_state=1)

# Scale X
scaler = StandardScaler()
scaler.fit(X_train)
X_train = pd.DataFrame(scaler.transform(X_train), index=X_train.index, columns=X_train.columns)
X_test = pd.DataFrame(scaler.transform(X_test), index=X_test.index, columns=X_test.columns)

# Define Model
model = LinearRegression()

# Fit model to train set
model.fit(X_train, y_train)

# Return test results
results = model.score(X_test, y_test)
print("Shell Weight Regression R^2 Score: {:.4f}".format(results))

Shell Weight Regression R^2 Score: 0.9511


In [18]:
# Target column: rings (regression)

df = data.copy()

# One-hot encode sex column
dummies = pd.get_dummies(df['sex'])
df = pd.concat([df, dummies], axis=1)
df = df.drop('sex', axis=1)

y = df['rings'].copy()
X = df.drop('rings', axis=1).copy()

# Train-test split 
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True, random_state=1)

# Scale X
scaler = StandardScaler()
scaler.fit(X_train)
X_train = pd.DataFrame(scaler.transform(X_train), index=X_train.index, columns=X_train.columns)
X_test = pd.DataFrame(scaler.transform(X_test), index=X_test.index, columns=X_test.columns)

# Define Model
model = LinearRegression()

# Fit model to train set
model.fit(X_train, y_train)

# Return test results
results = model.score(X_test, y_test)
print("Rings Regression R^2 Score: {:.4f}".format(results))

Rings Regression R^2 Score: 0.5196


In [19]:
# Target column: rings (classification)

df = data.copy()

# One-hot encode sex column
dummies = pd.get_dummies(df['sex'])
df = pd.concat([df, dummies], axis=1)
df = df.drop('sex', axis=1)

y = df['rings'].copy()
X = df.drop('rings', axis=1).copy()

# Train-test split 
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True, random_state=1)

# Scale X
scaler = StandardScaler()
scaler.fit(X_train)
X_train = pd.DataFrame(scaler.transform(X_train), index=X_train.index, columns=X_train.columns)
X_test = pd.DataFrame(scaler.transform(X_test), index=X_test.index, columns=X_test.columns)

# Define Model
model = LogisticRegression()

# Fit model to train set
model.fit(X_train, y_train)

# Return test results
results = model.score(X_test, y_test)
print("Rings Classification Accuracy: {:.2f}%".format(results*100))

Rings Classification Accuracy: 25.92%


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
