IntroductionToMachineLearning

A Culmination Of An Independent Study In Machine Learning

During my Senior year at Wheaton College, MA, I conducted an independent study in Machine Learning with Mark LeBlanc, Wheaton College's Computer Science Department Chair. During this course, we dove into a large survey of Machine Learning techniques, and developed a codebase for future students to experiment with and learn from.

This repo is split into 3 main sections:

Decision Tree
Support Vector Machine (SVM)
KMeans
DBSCAN -- CURRENTLY BEING POLISHED

For each of these ML approaches, I created models for two datasets:

Poe Dataset
Iris Dataset

The Poe Dataset contains 86 texts, 16 of which are known to have been written by the author Edgar Allen Poe, 69 of which being written by authors of the same time period (Melville, Badcock, etc), and 1 text that is believed to have been written by Poe. To us, the objective is to discern whether or not this "UNKNOWN" text is indeed written by Poe or if it is not written by Poe. The features are words that appear in all 86 texts, and the values for each of these features is the relative abundance of these words in each text (i.e. a value of .0187 for "been" means that of all the words in the text, 1.87% of the words are "been"). All values in the dataset are of the form '.XXXX'.

See related work: https://github.com/WheatonCS/aGoodMystery A Good Mystery by WheatonCS/Mark LeBlanc

The Iris dataset contains 150 instances of 3 seperate species of Iris (Iris-virginica, Iris-setosa, and Iris-versicolor). Each instance has 4 attributes recorded, which are the sample's sepal-length, sepal-width, petal-length, and petal-width (for more information about sepals and petals, check out this website for some additional background info: https://davesgarden.com/guides/articles/view/3152). Each of the values in the dataset is a float of the form 'X.X' with one digit after and one digit before the period.

Technologies, Tools, and Languages Used:

Python
- sci-kit learn
- pandas
- numpy
- matplotlib
- Regular Expressions (regex - "re" library)
Jupyter Notebook

Steps to run:

Install Jupyter Notebook https://jupyter.org/install
git clone the repo
play around with notebooks, learn, and have fun

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.ipynb_checkpoints		.ipynb_checkpoints
DBSCAN		DBSCAN
DecisionTree		DecisionTree
KMeans		KMeans
SupportVectorMachine		SupportVectorMachine
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

DBSCAN

DBSCAN

DecisionTree

DecisionTree

KMeans

KMeans

SupportVectorMachine

SupportVectorMachine

LICENSE

LICENSE

README.md

README.md

Repository files navigation

IntroductionToMachineLearning

Technologies, Tools, and Languages Used:

Steps to run:

About

Releases

Packages

Languages

License

JacobLibby/IntroToMachineLearning

Folders and files

Latest commit

History

Repository files navigation

IntroductionToMachineLearning

Technologies, Tools, and Languages Used:

Steps to run:

About

Resources

License

Stars

Watchers

Forks

Languages