Skip to content

CSI-SFIT/Data-Science-Resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 

Repository files navigation

⚡ Data Science Resources

ML

* A guide to getting started with Data Science and ML *
(Deep Learning not included)


MATH


For Data Analysis knowledge of Statistics is enough but for building ML models Calculus, Linear Algebra and Probability also plays a huge role.

  1. Math for Data Science
  2. Statistics Revision
  3. Khan Academy Calculus
  4. Gilbert Strang's linear algebra
  5. Blog post for all Math resources required for ML

Reading thoeritical books might be getting too involved, if your goal is to make ML models to just fulfill your applications. But for people who'd like to understand deep learning algorithms and the math behind it, this is a short list of resources.

  1. How do I learn mathematics for machine learning?
    This quora answer gives a detailed 5 month roadmap (which can and should be extended according to your comfort) for learning the math behind machine learning and math that every engineer must knof of in general.
  2. Maths for Machine Learning
    This book brings the mathematical foundations of basic machine learning concepts to the fore and collects the information in a single place. This book is intended to be a guidebook to the vast mathematical literature that forms the foundations of modern machine learning.

Data Analysis


Data analysis is a process of inspecting, cleansing, transforming and modeling data with the goal of discovering useful information, informing conclusions and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively

Numpy
A very useful library for math and Scientific Computing

  1. Numpy tutorial
  2. Numpy Videos

Pandas
Most used Python library for Data Analysis

  1. Pandas Documentation
  2. YouTube Playlist
  3. DataCamp Tutorial

Data Visualization

  1. Matplotlib Playlist
  2. Matplotlib Tutorials
  3. Seaborn Tutorials

SQL

  1. MYSQL Tutorial
  2. MYSQL TutorialsPoint
  3. MYSQL YouTube Videos
  4. Postgres SQL

Big Data Analytics


Big Data is a massive amount of data sets that cannot be stored, processed, or analyzed using traditional tools. Big Data analytics is a process used to extract meaningful insights, such as hidden patterns, unknown correlations, market trends, and customer preferences. Big Data analytics provides various advantages—it can be used for better decision making, preventing fraudulent activities, among other things.

Tools Used in Big Data Analytics

Here are some popular tools used in Big Data analytics:

  1. Hadoop - helps in storing and analyzing data
  2. Spark - used for real-time processing and analyzing large amounts of data
  3. Kafka - a distributed streaming platform that is used for fault-tolerant storage
  4. Cassandra - a distributed database used to handle chunks of data

Big Data Courses

  1. Big Data Coursera
  2. Big Data Essentials: HDFS, MapReduce and Spark RDD

ML Courses

Practical (More bent towards Programming)

  1. Intro to Machine-Learning Udacity
  2. Kaggle Mini-Courses
  3. Machine Learning A-Z: Hands-On Python & R In Data Science Udemy

Theoritical (More in-depth Math Concepts)

  1. Machine Learning Andrew Ng (MATLAB)
  2. Stanford CS229: Machine Learning (Autumn 2018)
  3. Machine Learning Crash Course by Google

Books

For absolute beginners

  1. Python for Data Analysis:Data Wrangling with Pandas,NumPy,and IPython
  2. Intro to ML with Python
  3. Hands on ML with Scikit-learn and Tensorflow

For intermediates

  1. Approaching almost any ML problem (Abhishek Thakur)

Websites

  1. Made with ML by Goku Mohandas
  2. End to end ML by Brendan Rohrer
  3. A.I. by Google Researchers
  4. Towards Data Science by Medium

Notes

  1. Data Science Notes by Chris Albon
  2. Andrew Ng's ML Notes
  3. CS229 Stanford Notes

YouTube Channels

  1. Pydata
  2. Siraj Raval
  3. Sentdex
  4. Krish Naik
  5. Corey Schafer

Best Websites to get free datasets

  1. Kaggle
  2. UCL Machine learning repositories
  3. Stanford Data
  4. Google public datasets
  5. FiveThirtyEight

How to Contribute

  1. Clone repo and create a new branch: $ git checkout https://github.com/CSI-SFIT/Data-Science-Resources -b name_for_new_branch.
  2. Make changes and test.
  3. Submit Pull Request with comprehensive description of changes.

Acknowledgements

CSI SFIT Tech Team 2020 - 2021 :

csi_logo