Skip to content

Illustrating Pandas and Scikit learn with different examples

Notifications You must be signed in to change notification settings

sijanonly/pandas-and-scikit-handson

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 

Repository files navigation

pandas-handson

Why Pandas ?

Pandas provide flexible data structure for data manipulation. If you are interested in digging out hidden information from data, then you will love how easily we can accomplish that with pandas. There are a few jupyter notebooks here for you to get started with pandas. Feel free to go through it and try it on your own.

Why Machine Learning ?

If you stop for a moment and think about ‘how the movement of data science is taking place’, you probably point out following stages :

  1. Development of chips and storage : PERIOD I
  2. Development of infrastructures to store large data : PERIOD II
  3. Development of Data Science to manipulate large data : PERIOD III

So, we have already passed two major stages. At the very beginning, people used to explore datasets manually, but today it’s impossible to get single insight from this large volume of data through manual work.

What do you understand when someone says data science or data mining?

"Data science" and "Data Mining" often are used interchangeably. Data Science is considered as a pool of fundamental principles which are ultimately used for extraction of information/knowledge from data. And, data mining simply is the process of extraction of knowledge from data using different algorithms which follows those principles.

Before proceeding with how we can apply data science in real world scenarios, I would like you think on following real scenarios. And try to put your on thoughts, how you are going to solve them.

  1. Suppose you own a telecommunication company. There are more rivals in the market. And,everyone is competing to retain the already existing customers and trying hard to get more customers. How could you manage to stop your churns ? NOTE: There may be a chance that your customer can leave your service at any time and join another substitute. Those customers are called churns.
  2. Suppose you want to add a new offer in your telecommunication service. And, you need to calculate 'How much will a given customer use your new offer?'
  3. Your company decides to add new features in your product. How could you decide 'which feature should have highest priority than other? or which feature you will add at first?'
  4. You own an eCommerce site. You need to display recommends based on customer behaviors. How will you achieve this?
  5. You are trying to run new new advertisements for your firm. And, you need to decide which customers are most likely to respond to an advertisement or any special offer in your application?
  6. You are a big football fan. And, you a past football games records/data. You are trying to figure out which team should win when two teams play each other?

Machine learning

has been classified as 'Supervised' and 'Unsupervised'. Classification and Regression are two major parts supervised learning. Whereas Clustering, Dimension reduction are taken as unsupervised learning.

Simply remember, classification helps to predict discrete output whereas regression gives continuous output.

Some More examples

Examples on Classification
  1. Predicting whether a stock's price will rise or fall
  2. Deciding if a news article belongs to the politics or economic sections
Examples of Clustering
  1. From a given collection of movie reviews, get a group of positive and negative reviews
  2. Determine segments of customers within a market for a product

About

Illustrating Pandas and Scikit learn with different examples

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages