Logloss = 0.863

Classification Report
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        11
Iris-versicolor       0.00      0.00      0.00        13
 Iris-virginica       0.32      1.00      0.48         6

       accuracy                           0.57        30
      macro avg       0.44      0.67      0.49        30
   weighted avg       0.43      0.57      0.46        30

Confusion Matrix

Prediction using Normalized Data

Logloss = 0.855

Classification Report
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        11
Iris-versicolor       1.00      0.23      0.38        13
 Iris-virginica       0.38      1.00      0.55         6

       accuracy                           0.67        30
      macro avg       0.79      0.74      0.64        30
   weighted avg       0.88      0.67      0.64        30

Confusion Matrix

Support Vector Machine

Iris Species Identification using various SVM kernels
Uses iris dataset
Best accuracy obtained using SVM linear kernel with 0.85 accuracy score

linear Classifcation Report

                 precision    recall  f1-score   support

Iris-Versicolor       0.82      0.90      0.86        10
 Iris-Virginica       0.89      0.80      0.84        10

       accuracy                           0.85        20
      macro avg       0.85      0.85      0.85        20
   weighted avg       0.85      0.85      0.85        20

poly Classification Report

                 precision    recall  f1-score   support

Iris-Versicolor       0.75      0.90      0.82        10
 Iris-Virginica       0.88      0.70      0.78        10

       accuracy                           0.80        20
      macro avg       0.81      0.80      0.80        20
   weighted avg       0.81      0.80      0.80        20

rbf Classification Report

                 precision    recall  f1-score   support

Iris-Versicolor       0.75      0.90      0.82        10
 Iris-Virginica       0.88      0.70      0.78        10

       accuracy                           0.80        20
      macro avg       0.81      0.80      0.80        20
   weighted avg       0.81      0.80      0.80        20

Sigmoid Classification Report

                 precision    recall  f1-score   support

Iris-Versicolor       0.50      1.00      0.67        10
 Iris-Virginica       0.00      0.00      0.00        10

       accuracy                           0.50        20
      macro avg       0.25      0.50      0.33        20
   weighted avg       0.25      0.50      0.33        20

Confusion Matrix

Naive Bayes

Spam/ Ham Classification (py, ipynb)
Uses SMS Spam Collection Data Set from UCI (Original source, dataset raw file)
Wordclouds

Evaluation: Classification Report, Log loss, Matthew Correlation Coefficient and Confusion Matrix

              precision    recall  f1-score   support

         ham       0.97      1.00      0.99       976
        spam       0.99      0.81      0.89       139

    accuracy                           0.98      1115
   macro avg       0.98      0.91      0.94      1115
weighted avg       0.98      0.98      0.97      1115

Log loss: 0.836
Matthews Correlation Coefficient: 0.885

Confusion Matrix

Unsupervised Learning

Clustering

K-Means - Iris clustering

Iris Species Clustering
Uses iris dataset
k = 3 clusters gives the best Rand Index score at 0.73
This evaluation method is possible since the original label (Species column) was retained as label_true, and comparison were made between label_pred and label_true using rand index.

Optimization using elbow methods were also performed using both distortion and inertia.
Both methods confirm the best cluster is k = 3.

K-Means - Compressing Image Color

Compressing Image Color
Uses K Means clustering to reduce the original color scale to predefined clusters.

Agglomerative Hierarchical

Iris Species Clustering
Uses iris dataset

iris.groupby(['cluster_', 'Species'])["SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm"].mean()

 	                         SepalLengthCm 	SepalWidthCm 	PetalLengthCm 	PetalWidthCm
cluster_ 	Species
0 	       Iris-setosa 	    5.006000 	3.418000 	1.464000 	0.244000

1 	       Iris-versicolor 	6.700000 	3.000000 	5.000000 	1.700000
               Iris-virginica 	6.893939 	3.118182 	5.806061 	2.133333

2 	       Iris-versicolor 	5.920408 	2.765306 	4.244898 	1.318367
               Iris-virginica 	5.994118 	2.694118 	5.058824 	1.817647

Evaluation using Species column as ground truth:
Homogeneity Score: 0.744
Adjusted Mutual Info Score: 0.753
Normalized Mutual Info Score: 0.756
V-measure Score: 0.756

DBSCAN

Iris Species Clustering
Uses iris dataset

Evaluation using Species column as ground truth:
Estimated number of clusters: 2
Estimated number of noise points: 3
Homogeneity: 0.576
Completeness: 0.877
V-measure: 0.696
Adjusted Rand Index: 0.554
Adjusted Mutual Info: 0.690
Silhouette Coefficient: 0.555

Recommender Systems

Content-based

Recomendating restaurants based on user past rating history
Selected feature is the cuisine type
Datasets as provided here. (Click here to navigate to original source)
Future improvement:
- using knn to classify restaurant by cuisine type and use it as ground truth for evaluation
- adding/ incorporating other rating criterias to get a more solid user profile (only food_rating was considered in the existing built model)
- Somehow, all of the recommended placeID obtained from the model is not in geospatial2.csv file provided from the source (which I had assumed to contain all of the restaurants info).. Still unsure if this is a bug..
Output: df containig topN of recommended placeID & its weighted recommendation score for the specified userID

get_recommendation("U1138")

Rcuisine 	total_by_place
placeID 	
132774 	7
135099 	6
135098 	4
135103 	4
135097 	4

Full credit belongs to its source. Thank you IBM for providing free education.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
datasets		datasets
plot		plot
py		py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Supervised Learning

Regression

Simple Linear Regression

Non-Linear Regression

Classification

Decision Tree

K-Nearest Neighbors

Logistic Regression

Support Vector Machine

Naive Bayes

Unsupervised Learning

Clustering

K-Means - Iris clustering

K-Means - Compressing Image Color

Agglomerative Hierarchical

DBSCAN

Recommender Systems

Content-based

About

Releases

Packages

Languages

SarahHannes/ml

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Supervised Learning

Regression

Simple Linear Regression

Non-Linear Regression

Classification

Decision Tree

K-Nearest Neighbors

Logistic Regression

Support Vector Machine

Naive Bayes

Unsupervised Learning

Clustering

K-Means - Iris clustering

K-Means - Compressing Image Color

Agglomerative Hierarchical

DBSCAN

Recommender Systems

Content-based

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages