Skip to content

Latest commit

 

History

History
24 lines (15 loc) · 3.92 KB

File metadata and controls

24 lines (15 loc) · 3.92 KB

CS156: Machine Learning for Science and Profit

Select coding assignments and exercises from the CS156: Machine Learning for Science and Profit class, taken in Fall 2021. The course covered a range of core machine learning techniques — such as classification, expectation maximization, regression, perceptron, neural networks, support vector machines, hidden Markov models, and nonparametric models of clustering — as well as fundamental concepts such as feature selection, cross-validation, and over-fitting.

Course Highlights

  1. Built and trained a lightweight-GAN to generate cityscape images, achieving an FID score of 119.49 after 20,000 training steps. This project–which served as my course final project–involved data extraction and curation, image preprocessing using numpy, (image) data cleaning using an unsupervised neural network called CLIP (Contrastive Language-Image Pre-Training) and hyperparameter optimization, among multiple other processes. The project report can be found here and images of final results can be viewed below.
  2. 82% and 93% accuracy in classifying jersey and shirt images, using an SVM (with RBF kernel) and a fine-tuned VGG16 model, respectively (see here).
  3. Built two neural network models using keras to classify the moons and circles datasets, achieving perfect accuracy on both (see here)
  4. Massaged literature corpus data into a suitable format then trained a Latent Dirichlet Allocation (LDA) model to find interesting topics (and words) that occured across chapters (see here).
  5. Digit classification using k-nearest neighbor (kNN) algorithm, achieving 97.2% accuracy (see here)
  6. Explored using linear discriminant analysis (LDA) and LDA with principal component analysis (PCA) for jersey vs. shirts classification, achieving 77% accuracy after data curation (see here).
  7. Built a markov model for language detection, utilizing Baye's rule to classify strings into most probable (dummy) language. Similarly, a hidden markov model was built to determine which of three speakers was most probably speaking at a particular point in time given a sequence of phonemes (among other pieces of information) (see here).
  8. Implemented Expectation Maximization algorithm to estimate the biases and probabilities of two coins, given data on the number of heads obtained from a corresponding number of throws (see here)
GAN-generated Cityscapes Speaker Identification Topic Modeling PCA vs. LDA
GAN Cityscapes Speaker Identification Topic Modeling PCA vs LDA