Skip to content
This repository has been archived by the owner on Nov 30, 2021. It is now read-only.

Gathers machine learning and data science techniques for problem solving.

License

Notifications You must be signed in to change notification settings

huseinzol05/Machine-Learning-Data-Science-Reuse

Repository files navigation

Machine-Learning-Data-Science-Reuse

Gathers machine learning and data science techniques for problem solving.

Warning

THIS REPOSITORY WILL LACK OF COMMENT, LACK OF DOCUMENTATION AND LACK OF STORY TELLING. PURPOSELY FOR SELF-REUSE.

Most of visualizations are self-explained, and at-least required basic understanding in statistics and python.

Some of visualization will not able to visualize because Github not able to render specific libraries that are using svg based, so please run it on any machine to see the results.

Why Genie? Because he can solved anything!

Table of contents

R vs Python

  1. CSV, Data Manipulation, Visualization

Preprocessing

  1. Handle missing values
  2. Rescaling (log, vector normalization, standardization, min-max scaling, boxcox)
  3. Features understanding
  4. Detecting outliers
  5. Encoding type comparison

Natural Language Processing

  1. Bag Of Word
  2. TF-IDF
  3. Hashing algorithm
  4. Models gathering (Bayes, SVM, XGB, LightGBM)
  5. sklearn pipeline
  6. N-gram
  7. Topic Modelling
  8. Naive-Bayes-SVM on hate speech
  9. Black panther visualization using wordclouds, semantic and kmean similarity network
  10. Semantic similarity on Malaysia hot topics

Suggestion Engine using Nearest-Euclidean and Gaussian Distribution

  1. Anime
  2. Game
  3. Movie
  4. Kickstarter projects

Image processing

  1. Augmentation (flip, rotate, shifting, zoom, shear, channel shift, grayscale, contrast, saturation)
  2. RGB subdivide
  3. hog-featuring
  4. image segmentation, nucleus
  5. K Nearest Neighbors on PCA / NMF
  6. SVD study on nearest neighbors
  7. Image wrapping to full A4

Signal processing

  1. Blurring on 1D Signal (loop, and FFT)
  2. Blurring on 2D Signal (loop)
  3. Conv 2 signals
  4. Pass-filter for freqs
  5. Signal smoothing
  6. Signal cross-correlation
  7. Augmentation (pitching, speed, distribution noise, shifting, silent shifting)
  8. Featuring (mfcc, log-energy, feature cube, power spectrum)

Stacking

  1. binary
  2. regression
  3. multi-classes
  4. stack multiple models from sklearn regressor with XGB

Stochastic study

  1. Cryptocurrencies correlation
  2. Predict crpytocurrencies multiple stack
  3. Simple stock analysis
  4. ARIMA for flight prediction
  5. TESLA market study

Big-query

  1. integrate big-query with Pandas Python
  2. Medicare queries with plotly visualization

Network study

  1. graph nodes for a person most spoke to whom
  2. Spooky social network analysis
  3. Taxi nodes analysis
  4. Stackoverflow tags analysis
  5. donald trump news social network
  6. najib razak twitter social network

Visualization

  1. Geographic using basemap
  2. Folium map and time analysis
  3. Israel graph visualization
  4. Israel political landscape
  5. Distribution age vs type for library
  6. Growth study for library
  7. botnet attack analysis
  8. Plotly geo-mapping 101
  9. Plotly bombing mapping visualization
  10. Easy plotly using cufflink
  11. Plotly pokemon data
  12. Rare visualization
  13. Dynamic map visualization using plotly and folium
  14. Kaggle 2018 Report

Markov

  1. Independent variables on weather forecast
  2. Dependent variables on text dataset
  3. Shakespeare character-wise generator

English-text normalization

  1. normalized texts (Dates, Measure, Decimals, Cardinals, Electronic - URL, Currency - Dollars, Telephone Numbers)
  2. normalized texts (Cardinal, Digit, Ordinal, Letters, Address, Telephone, Electronic, Fractions, Money)

About

Gathers machine learning and data science techniques for problem solving.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published