- Introduction
- Environment Setup
- Share Conda Environment
- Pandas
- NumPy
- Matplotlib
- Scikit-Learn
- Workbooks
- Resources
- Special Thanks To !
This repo helps to setup development environment for data science and machine learning projects and also the introduction of some tools and libraries like,
Brief About the Workspace
- platform for datascience packages to run or use in your program
- It comes with lots of tools
- It's like a complete hardware store
- software distribution tool
- also the same as Anaconda, but comes with less tools yet useful
- It's like a workbench
- software distribution tool
- is like a personal assitant to setup your projects, tools, packages and environments
- this is a package manager
- used to setup your environment using DS and ML tools like matplotlib, pandas etc.
- Workspace to access tools within environment for your datascience projects.
So for light weight setup, we will install MINIConda and then install the tools and packages when required.
Download MINIConda
Create a folder say, sample_project
Setup development environment and install required tools using conda,
conda create -—prefix ./env pandas numpy matplotlib scikit-learn
Activate conda,
conda activate environment_directory
List out the active environment using conda env list
Use conda install [package/tool_name]
, if any tool is missed for the installation in the above step
Now your environment is setup, open Jupyter Notebook as the browser editor for writing python code
Terminate the process from terminal when done
Then deactivate conda, conda deactivate
If you want to share your Conda Env with other devs then you can do in couple of ways,
- Sharing the whole project folder, which could be expensive as lots of MBs of data in form of packages and files
- Share a .yml file of your conda environment
For 2nd option, we need .yml file of your conda environment, for this we will export the environment as YAML file called environment.yml
Command:
TO Export
conda env export —-prefix [env_folder_path] > [file_name.yml]
TO Create Env using .yml file
conda env create —file [file_name.yml] —name [environment_name]
A Data Analysis tool/library
What
It is used to explore data, analyse data, manipulate data when we use python for data analysis
Why
- simple to use
- integrated with many other data science and machine learning python tools
- helps you get your data ready for ML
- Most useful functions
- Pandas datatypes
- Importing & Exporting data
- Describing data
- Viewing & selecting data
- Manipulating data
What
Numeric Python - It has multidimensional arrays and numbers.
It has similar to Python lists, then why NumPy and must use tool in Machine Learning problems.
Why
- behind the scenes optimization, written in C
- computation is faster in terms of using GPUs & other hardwares
- can be really useful as machines only understand
0
&1
binary, so NumPy converts everything in numbers like Images to array of numbers. - Vectorization via broadcasting (avoiding loops)
- backbone of other scientific packages like pandas
- Most useful functions
- NumPy datatypes & attributes
- Creating arrays
- Viewing arrays & matrices
- Manipulating & Comparing arrays
- Sorting arrays
- Use Cases
Visualization of Data
What
- Python plotting library
- It allow to turn the data into charts & graphs, figures
Why
- Built on NumPy arrays (& python)
- Integrates directly with Pandas
- Can create basic or advance plots
- Simple to use interface(once you get the foundation, the basic)
- matplotlib Workflow
- Importing matplotlib & the 2 ways of plotting
- Plotting data from NumPy arrays
- Customizing plots
- Saving & Sharing plots
Python ML Library, aka sklearn
What
- If we have data, Scikit learn helps us to build machine learning models to make predictions or learn patterns within that data & then make predictions.
- Also implements tools to help us evaluate those predictions whether good or bad ?
Why
- Built on NumPy & Matplotlib (and Python)
- Has many in-built ML models.
- Methods to evaluate your ML models
- Very well-designed APIs
- A scikit-learn workflow
- Getting the data ready
- Choosing a right estimator/model/algorithm for our problems
- Fitting a model to the data (learning patterns)
- Making predictions with a model (using patterns)
- Evaluating model predictions
- Improving model predictions
- Saving & Loading models
Here are some practice workbooks for different libraries,
- Python Exercise
- Pandas Workbook by Daniel Bourke
- NumPy Workbook by Daniel Bourke
- Regex Exercise
- Matplotlib Workbook by Daniel Bourke
- Scikit-Learn Workbook by Daniel Bourke
- Zero To Mastery Machine Learning & Data Science Program
- A Visual Intro to NumPy and Data Represenation
- The Basics of NumPy Arrays
- Feature Scaling with scikit-learn By Ben Alex Keen
- Feature Scaling : Why it's required By Rahul Saini
- Decision Trees & Explanation of Random Forest By Will Koehrsen
- Beyond Accuracy : Precision and Recall By Will Koehrsen
- ROC and AUC, Clearly Explained By StatQuest