Machine Learning & Data Science Tools & Libraries

Introduction

This repo helps to setup development environment for data science and machine learning projects and also the introduction of some tools and libraries like,

Environment Setup

Brief About the Workspace

Anaconda

platform for datascience packages to run or use in your program
It comes with lots of tools
It's like a complete hardware store
software distribution tool

MINIConda

also the same as Anaconda, but comes with less tools yet useful
It's like a workbench
software distribution tool

Conda

is like a personal assitant to setup your projects, tools, packages and environments
this is a package manager
used to setup your environment using DS and ML tools like matplotlib, pandas etc.

Jupyter Notebook

Workspace to access tools within environment for your datascience projects.

So for light weight setup, we will install MINIConda and then install the tools and packages when required.

Download MINIConda

Create a folder say, sample_project

Setup development environment and install required tools using conda,

conda create -—prefix ./env pandas numpy matplotlib scikit-learn

Activate conda,

conda activate environment_directory

List out the active environment using conda env list

Use conda install [package/tool_name], if any tool is missed for the installation in the above step

Now your environment is setup, open Jupyter Notebook as the browser editor for writing python code

Terminate the process from terminal when done

Then deactivate conda, conda deactivate

Share Conda Environment

If you want to share your Conda Env with other devs then you can do in couple of ways,

Sharing the whole project folder, which could be expensive as lots of MBs of data in form of packages and files
Share a .yml file of your conda environment

For 2nd option, we need .yml file of your conda environment, for this we will export the environment as YAML file called environment.yml

Command:

TO Export

conda env export —-prefix [env_folder_path] > [file_name.yml]

TO Create Env using .yml file

conda env create —file [file_name.yml] —name [environment_name]

Pandas

A Data Analysis tool/library

What

It is used to explore data, analyse data, manipulate data when we use python for data analysis

Why

simple to use
integrated with many other data science and machine learning python tools
helps you get your data ready for ML

Topics covered in this introduction

Most useful functions
Pandas datatypes
Importing & Exporting data
Describing data
Viewing & selecting data
Manipulating data

NumPy

What

Numeric Python - It has multidimensional arrays and numbers.

It has similar to Python lists, then why NumPy and must use tool in Machine Learning problems.

Why

behind the scenes optimization, written in C
computation is faster in terms of using GPUs & other hardwares
can be really useful as machines only understand 0 & 1 binary, so NumPy converts everything in numbers like Images to array of numbers.
Vectorization via broadcasting (avoiding loops)
backbone of other scientific packages like pandas

Topics covered in this introduction

Most useful functions
NumPy datatypes & attributes
Creating arrays
Viewing arrays & matrices
Manipulating & Comparing arrays
Sorting arrays
Use Cases

Matplotlib

Visualization of Data

What

Python plotting library
It allow to turn the data into charts & graphs, figures

Why

Built on NumPy arrays (& python)
Integrates directly with Pandas
Can create basic or advance plots
Simple to use interface(once you get the foundation, the basic)

Topics covered in this introduction

matplotlib Workflow
Importing matplotlib & the 2 ways of plotting
Plotting data from NumPy arrays
Customizing plots
Saving & Sharing plots

Scikit-Learn

Python ML Library, aka sklearn

What

If we have data, Scikit learn helps us to build machine learning models to make predictions or learn patterns within that data & then make predictions.
Also implements tools to help us evaluate those predictions whether good or bad ?

Why

Built on NumPy & Matplotlib (and Python)
Has many in-built ML models.
Methods to evaluate your ML models
Very well-designed APIs

Topics covered in this introduction

A scikit-learn workflow
Getting the data ready
Choosing a right estimator/model/algorithm for our problems
Fitting a model to the data (learning patterns)
Making predictions with a model (using patterns)
Evaluating model predictions
Improving model predictions
Saving & Loading models

Workbooks

Here are some practice workbooks for different libraries,

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
images		images
.gitignore		.gitignore
README.md		README.md
car-sales-dropped.csv		car-sales-dropped.csv
car-sales-exported-practice.csv		car-sales-exported-practice.csv
car-sales-extended-missing-data.csv		car-sales-extended-missing-data.csv
car-sales-extended.csv		car-sales-extended.csv
car-sales-missing-data.csv		car-sales-missing-data.csv
car-sales.csv		car-sales.csv
exported-car-sales.csv		exported-car-sales.csv
gridsearch_random_forest_classifier_model_1.joblib		gridsearch_random_forest_classifier_model_1.joblib
gridsearch_random_forest_classifier_model_1.pkl		gridsearch_random_forest_classifier_model_1.pkl
heart-disease.csv		heart-disease.csv
heart-disease.png		heart-disease.png
heart_disease_age_vs_chol_fig.png		heart_disease_age_vs_chol_fig.png
hello-jupyternotebook.ipynb		hello-jupyternotebook.ipynb
introduction-to-matplotlib.ipynb		introduction-to-matplotlib.ipynb
introduction-to-numpy.ipynb		introduction-to-numpy.ipynb
introduction-to-pandas.ipynb		introduction-to-pandas.ipynb
introduction-to-scikit-learn.ipynb		introduction-to-scikit-learn.ipynb
matplotlib-exercises.ipynb		matplotlib-exercises.ipynb
matplotlib-sample-workflow-exercise-plot.png		matplotlib-sample-workflow-exercise-plot.png
numpy-exercises.ipynb		numpy-exercises.ipynb
pandas-exercises.ipynb		pandas-exercises.ipynb
random_forest_model_1.pkl		random_forest_model_1.pkl
scikit-learn-exercises.ipynb		scikit-learn-exercises.ipynb
sk_practice_log_regression_model.joblib		sk_practice_log_regression_model.joblib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning & Data Science Tools & Libraries

Contents

Introduction

Environment Setup

Share Conda Environment

Pandas

NumPy

Matplotlib

Scikit-Learn

Workbooks

Resources

Special Thanks To!

About

Releases

Packages

Languages

SaketMunda/ml-ds-tools-library-introduction

Folders and files

Latest commit

History

Repository files navigation

Machine Learning & Data Science Tools & Libraries

Contents

Introduction

Environment Setup

Share Conda Environment

Pandas

NumPy

Matplotlib

Scikit-Learn

Workbooks

Resources

Special Thanks To!

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages