Skip to content

Introduction to Data Science and Machine Learning tools and libraries

Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



47 Commits

Repository files navigation

Machine Learning & Data Science Tools & Libraries



This repo helps to setup development environment for data science and machine learning projects and also the introduction of some tools and libraries like,

Environment Setup

Brief About the Workspace


  • platform for datascience packages to run or use in your program
  • It comes with lots of tools
  • It's like a complete hardware store
  • software distribution tool


  • also the same as Anaconda, but comes with less tools yet useful
  • It's like a workbench
  • software distribution tool


  • is like a personal assitant to setup your projects, tools, packages and environments
  • this is a package manager
  • used to setup your environment using DS and ML tools like matplotlib, pandas etc.

Jupyter Notebook

  • Workspace to access tools within environment for your datascience projects.

So for light weight setup, we will install MINIConda and then install the tools and packages when required.

Download MINIConda

Create a folder say, sample_project

Setup development environment and install required tools using conda,

  • conda create -—prefix ./env pandas numpy matplotlib scikit-learn

Activate conda,

  • conda activate environment_directory

List out the active environment using conda env list

Use conda install [package/tool_name], if any tool is missed for the installation in the above step

Now your environment is setup, open Jupyter Notebook as the browser editor for writing python code

Terminate the process from terminal when done

Then deactivate conda, conda deactivate

Share Conda Environment

If you want to share your Conda Env with other devs then you can do in couple of ways,

  1. Sharing the whole project folder, which could be expensive as lots of MBs of data in form of packages and files
  2. Share a .yml file of your conda environment

For 2nd option, we need .yml file of your conda environment, for this we will export the environment as YAML file called environment.yml


TO Export

  • conda env export —-prefix [env_folder_path] > [file_name.yml]

TO Create Env using .yml file

  • conda env create —file [file_name.yml] —name [environment_name]


A Data Analysis tool/library


It is used to explore data, analyse data, manipulate data when we use python for data analysis


  • simple to use
  • integrated with many other data science and machine learning python tools
  • helps you get your data ready for ML

Topics covered in this introduction

  • Most useful functions
  • Pandas datatypes
  • Importing & Exporting data
  • Describing data
  • Viewing & selecting data
  • Manipulating data



Numeric Python - It has multidimensional arrays and numbers.

It has similar to Python lists, then why NumPy and must use tool in Machine Learning problems.


  • behind the scenes optimization, written in C
  • computation is faster in terms of using GPUs & other hardwares
  • can be really useful as machines only understand 0 & 1 binary, so NumPy converts everything in numbers like Images to array of numbers.
  • Vectorization via broadcasting (avoiding loops)
  • backbone of other scientific packages like pandas

Topics covered in this introduction

  • Most useful functions
  • NumPy datatypes & attributes
  • Creating arrays
  • Viewing arrays & matrices
  • Manipulating & Comparing arrays
  • Sorting arrays
  • Use Cases


Visualization of Data


  • Python plotting library
  • It allow to turn the data into charts & graphs, figures


  • Built on NumPy arrays (& python)
  • Integrates directly with Pandas
  • Can create basic or advance plots
  • Simple to use interface(once you get the foundation, the basic)

Topics covered in this introduction

  • matplotlib Workflow
  • Importing matplotlib & the 2 ways of plotting
  • Plotting data from NumPy arrays
  • Customizing plots
  • Saving & Sharing plots


Python ML Library, aka sklearn


  • If we have data, Scikit learn helps us to build machine learning models to make predictions or learn patterns within that data & then make predictions.
  • Also implements tools to help us evaluate those predictions whether good or bad ?


  • Built on NumPy & Matplotlib (and Python)
  • Has many in-built ML models.
  • Methods to evaluate your ML models
  • Very well-designed APIs

Topics covered in this introduction

  • A scikit-learn workflow
  • Getting the data ready
  • Choosing a right estimator/model/algorithm for our problems
  • Fitting a model to the data (learning patterns)
  • Making predictions with a model (using patterns)
  • Evaluating model predictions
  • Improving model predictions
  • Saving & Loading models


Here are some practice workbooks for different libraries,


Special Thanks To!


Introduction to Data Science and Machine Learning tools and libraries







No releases published


No packages published