# Alexine Studios

![Penguin](https://github.com/chiruharshith/Alexine_Studios/blob/main/media/penguine.png?raw=true)

The dataset consists of the below 7 columns,

- **species:** penguin species (Chinstrap, Adélie, or Gentoo)
- **culmen length & depth:** The culmen is the upper ridge of a bird's beak
- **flipper_length_mm:** flipper length
- **body_mass_g:** body mass
- **island:** island name (Dream, Torgersen, or Biscoe)
- **sex:** penguin sex

## Download Dataset

In [None]:
!mkdir datasets
!wget -qq https://raw.githubusercontent.com/chiruharshith/Alexine_Studios/main/datasets/penguins.csv -P datasets

#### Importing Required Packages

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split

In [None]:
# Load the data
df = pd.read_csv('datasets/penguins.csv')
df.head(3)

In [None]:
# Count NaN values in each column of the dataframe
df.isna().sum()

In [None]:
# Print the unique() elements from the sex column 
df['sex'].unique()

In [None]:
# Drop the records where sex column has NaN values
df.dropna(subset = ['sex'], inplace = True)

# Print the unique() elements from the sex column after dropping
print("Unique values after dropping NA values : ", df.sex.unique())

In [None]:
df[df.sex == '.']

In [None]:
df.drop(336).sex.unique()

In [None]:
df.drop(336, inplace=True)

In [None]:
df.head(2)

In [None]:
df['species'] = df['species'].replace(['Adelie','Chinstrap', 'Gentoo'],[0, 1, 2])
df['island'] = df['island'].replace(['Torgersen','Biscoe', 'Dream'],[2, 1, 0])
df['sex'] = df['sex'].replace(['MALE','FEMALE'],[1, 0])

In [None]:
df.head(2)

In [None]:
# Storing the data and target values in two seperate variable x and y
x = df.drop(['species'], axis=1)
y = df['species']

In [None]:
x.shape, y.shape

In [None]:
# We are splitting the data into train and test sets in the ratio of 80:20 
# i.e 80 % of data is train set and 20 % of the data is test set
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2)

In [None]:
x_train.shape, x_test.shape, y_train.shape, y_test.shape

Know more about [Train_Test_Split()](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)

### Training a  Linear Classifier 

In [None]:
from sklearn.linear_model import SGDClassifier
linear_classifier = SGDClassifier()

# Training or fitting the model with the train data
linear_classifier.fit(x_train, y_train)

# Testing the trained model
y_pred = linear_classifier.predict(x_test)

In [None]:
from sklearn.metrics import accuracy_score
print(accuracy_score(y_pred, y_test))

### Scaling the data

In [None]:
# Scaling the data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

X_train_scale1 = scaler.fit_transform(x_train)
X_test_scale1 = scaler.transform(x_test)

linear_classifier = SGDClassifier()

# Training or fitting the model with the train data
linear_classifier.fit(X_train_scale1, y_train)

# Testing the trained model
y_pred_scale = linear_classifier.predict(X_test_scale1)

print(accuracy_score(y_pred_scale, y_test))