# OVERVIEW
This dataset contains information about various attributes of a set of fruits, providing insights into their characteristics. The dataset includes details such as fruit ID, size, weight, sweetness, crunchiness, juiciness, ripeness, acidity, and quality.

A_id: Unique identifier for each fruit

Size: Size of the fruit

Weight: Weight of the fruit

Sweetness: Degree of sweetness of the fruit

Crunchiness: Texture indicating the crunchiness of the fruit

Juiciness: Level of juiciness of the fruit

Ripeness: Stage of ripeness of the fruit

Acidity: Acidity level of the fruit

Quality: Overall quality of the fruit

### Potential Use Cases
Fruit Classification: Develop a classification model to categorize fruits based on their features.

Quality Prediction: Build a model to predict the quality rating of fruits using various attributes.

# EXPLORATORY DATA ANALYSIS

In [1]:
#importing the necessary libraries
import pandas as pd
import numpy as np

In [2]:
#loading the datset
Fruit = pd.read_csv('apple_quality.csv')

In [3]:
#inspecting the first 10 rows
Fruit.head(10)

Unnamed: 0,A_id,Size,Weight,Sweetness,Crunchiness,Juiciness,Ripeness,Acidity,Quality
0,0,-3.970049,-2.512336,5.34633,-1.012009,1.8449,0.32984,-0.49159,good
1,1,-1.195217,-2.839257,3.664059,1.588232,0.853286,0.86753,-0.722809,good
2,2,-0.292024,-1.351282,-1.738429,-0.342616,2.838636,-0.038033,2.621636,bad
3,3,-0.657196,-2.271627,1.324874,-0.097875,3.63797,-3.413761,0.790723,good
4,4,1.364217,-1.296612,-0.384658,-0.553006,3.030874,-1.303849,0.501984,good
5,5,-3.4254,-1.409082,-1.913511,-0.555775,-3.853071,1.914616,-2.981523,bad
6,6,1.331606,1.635956,0.875974,-1.677798,3.106344,-1.847417,2.414171,good
7,7,-1.995462,-0.428958,1.530644,-0.742972,0.158834,0.974438,-1.470125,good
8,8,-3.867632,-3.734514,0.986429,-1.207655,2.292873,4.080921,-4.871905,bad
9,9,-0.727983,-0.44282,-4.092223,0.597513,0.393714,1.620857,2.185608,bad


In [4]:
#checking for missing values and data types
Fruit.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4000 entries, 0 to 3999
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   A_id         4000 non-null   int64  
 1   Size         4000 non-null   float64
 2   Weight       4000 non-null   float64
 3   Sweetness    4000 non-null   float64
 4   Crunchiness  4000 non-null   float64
 5   Juiciness    4000 non-null   float64
 6   Ripeness     4000 non-null   float64
 7   Acidity      4000 non-null   float64
 8   Quality      4000 non-null   object 
dtypes: float64(7), int64(1), object(1)
memory usage: 281.4+ KB


In [5]:
#checking the count of rowss and columns
Fruit.shape

(4000, 9)

In [6]:
#inspecting the column headers
Fruit.columns

Index(['A_id', 'Size', 'Weight', 'Sweetness', 'Crunchiness', 'Juiciness',
       'Ripeness', 'Acidity', 'Quality'],
      dtype='object')

In [7]:
#checking for duplicates
Fruit[Fruit.duplicated()]

Unnamed: 0,A_id,Size,Weight,Sweetness,Crunchiness,Juiciness,Ripeness,Acidity,Quality


# CREATING A CLASSIFICATION MODEL

In [17]:
# Importing the necessary libraries
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report


# Extracting features (X) and target variable (Y)
X = Fruit[['Size', 'Weight', 'Sweetness', 'Crunchiness', 'Juiciness', 'Ripeness', 'Acidity']]
y = Fruit['Quality']

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initializing the Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)

# Fit the model on the training data
clf.fit(X_train, y_train)

# Make predictions on the testing data
y_pred = clf.predict(X_test)

# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
classification_report_result = classification_report(y_test, y_pred)

# Printing the results
print(f"Accuracy: {accuracy:.2f}")
print("Classification Report:")
print(classification_report_result)


Accuracy: 0.81
Classification Report:
              precision    recall  f1-score   support

         bad       0.81      0.81      0.81       401
        good       0.81      0.81      0.81       399

    accuracy                           0.81       800
   macro avg       0.81      0.81      0.81       800
weighted avg       0.81      0.81      0.81       800



### Accuracy:
###### Accuracy: 0.81
This represents the overall correctness of the model, where 81% of the predictions on the test set were correct.

### Precision:
###### Precision for "bad" class: 0.81
Precision for "good" class: 0.81
Precision is the ratio of correctly predicted positive observations to the total predicted positives. In this context, it indicates that when the model predicts a fruit to be either "bad" or "good," it is correct about 81% of the time.

### Recall:
###### Recall for "bad" class: 0.81
Recall for "good" class: 0.81
Recall, also known as sensitivity or true positive rate, is the ratio of correctly predicted positive observations to the all observations in the actual class. In this context, it means the model correctly identifies about 81% of the actual "bad" fruits and 81% of the actual "good" fruits.

### F1-Score:
###### F1-score for "bad" class: 0.81
F1-score for "good" class: 0.81
F1-score is the weighted average of precision and recall. It provides a balance between precision and recall. The maximum value is 1, indicating perfect precision and recall.

### Support:
The number of actual occurrences of each class in the specified order. For "bad" class, there are 401 instances, and for "good" class, there are 399 instances in the test set.
Macro avg:




###### Overall, an accuracy of 0.81 is a decent performance, but it's important to consider the specific requirements of your application. If the classes are imbalanced, you might want to look at precision, recall, and F1-score for each class to get a more detailed understanding of the model's performance.

# MAKING PREDICTIONS WITH THE MODEL

In [26]:
# Inputing values for a new fruit
new_fruit_features = [[-2.0, -1.0, 3.5, 0.5, 2.0, 0.0, 1.5]]  # Replace these values with your own

# Creating a DataFrame with feature names
new_fruit_features_df = pd.DataFrame(data=new_fruit_features, columns=['Size', 'Weight', 'Sweetness', 'Crunchiness', 'Juiciness', 'Ripeness', 'Acidity'])

# Making predictions
prediction = clf.predict(new_fruit_features_df)

# Printing the predicted class
print("Predicted Quality:", prediction[0])


Predicted Quality: bad
