# Predicting Abalone Age

Abalones are a rare type of marine snail found in cold coastal saltwater and highly valued for their culinary uses. The common name abalone refers to a number of large gastropod molluscs in the family *Haliotidae*. It’s popularity as a culinary delicacy has caused great pressure on the species due to overharvesting, in turn rendering it even rarer and more expensive. Assessing the age of these organisms, whether for purposes of conservation, harvesting, or research, is a tedious task that requires cutting open the snail’s shell, staining it, and counting the individual rings under a microscope. For this reason, we wish to design a model that will **predict the age of abalones** through other measurements, such as physical dimensions and weight, using regression.

The dataset we will use contains 4,177 observations and 9 columns: sex (either M, F, or I for infant), length in mm, diameter in mm, height in mm, whole weight in grams, shucked weight in grams (without shell), viscera weight in grams (after bleeding), shell weight in grams, and finally number of rings, which is approximately 1.5 less than the age of the snail. After the design of the model, we will evaluate the accuracy of our predictions to answer the question: how well can we predict the age of an abalone snail from its size (length and diameter), sex, and weight (whole, shucked, viscera, and shell)?

## Setup code

In [12]:
library(tidyverse)
library(tidymodels)
download.file('https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data', 'data.csv')
set.seed(695624153456)

“NAs introduced by coercion to integer range”


ERROR: Error in set.seed(695624153456): supplied seed is not a valid integer


## Loading and wrangling data

In [13]:
abalone <- read_csv('data.csv', col_names = c(
    'sex',
    'length',
    'diameter',
    'height',
    'whole_weight',
    'shucked_weight',
    'viscera_weight',
    'shell_weight',
    'rings'
))

abalone <- abalone %>%
    mutate(sex = as_factor(sex)) %>%
    mutate(age = rings + 1.5) %>%
    select(sex, length, diameter, height, whole_weight, age)

abalone_split <- initial_split(abalone, prop = 0.75, strata = age)
abalone_training <- training(abalone_split)
abalone_testing <- testing(abalone_split)

Parsed with column specification:
cols(
  sex = [31mcol_character()[39m,
  length = [32mcol_double()[39m,
  diameter = [32mcol_double()[39m,
  height = [32mcol_double()[39m,
  whole_weight = [32mcol_double()[39m,
  shucked_weight = [32mcol_double()[39m,
  viscera_weight = [32mcol_double()[39m,
  shell_weight = [32mcol_double()[39m,
  rings = [32mcol_double()[39m
)



## Summarization

## Visualization

## Methods