DATA-MINING-Algorithms

Algorithms Discussed:

We have discussed the following algorithms:

Apriori algorithm
Decision tree ID3 algorithm
FP Growth Algorithm
Bayesian Classification Algorithm
Web Crawling Problem
KNN Algorithm
Linear Regression with One variable
Linear Regression with Multiple Variables
Support Vector Machine Model
BIRCH Algorithm
DBSCAN Algorithm
K-Mean Algorithm
PAM Algorithm
Decision tree using C4.5 and CART algorithm
Hierarchical Clustering Algorithm
OPTICS Algorithm
Face Detection Algorithm
Perceptron Algorithm

Problem Statements

Algorithm-1:

Dataset used: weather.csv

Perform the following operations on the weather dataset using Pandas.

Reading a dataset into a dataframe.
Dropping rows with missing(”NaN”) values.
Dropping columns with missing(”NaN”) values.
Filling the ”Nan” values with mean, median.
Split data set by row and column wise.

Algorithm-2:

Dataset used: data folder(chess.dat, mushroom.txt,retail.dat, FILE1.txt, FILE2.txt)

Implement Apriori algorithm for association rules. Run the algorithm with two different support and confidence level defined by you. (Chees, Mushroom, Retail dataset can be used.)

Print closed itemset.
Print closed frequent itemset.

Note: Let Y ⊆ I and X ⊆ Y
If the X is an infrequent itemset, then Y is also an infrequent itemset. On that basis apply the Apriori algorithm.

Algorithm-3:

Dataset used: car.data.txt

Implement decision tree ID3 algorithm for the given dataset for Car Evaluation Database.

Attribute Information: Six input attributes: buying, maint, doors, persons, lugboot, safety
Class Values: unacc, acc, good, vgood
Attributes:
∗ buying: vhigh, high, med, low.
∗ maint: vhigh, high, med, low.
∗ doors: 2, 3, 4, 5,more.
∗ persons: 2, 4, more.
∗ lug-boot: small, med, big.
∗ safety: low, med, high.

Algorithm-4:

Dataset used: Online Retail.xlxs

Implement FP Growth algorithm on the given dataset.

Algorithm-5(i):

Dataset used: DATASET.xlsx

Using Baysian classfication, predict the class (Target wait) for the following sample.
X=(alt=T, Bat=T, Fri=F, Hun=T, Pat=Some, Price=$$$, Rain=T,Res=T, Type=Italian, Est>60).

Algorithm-5(ii):

The task is a web crawling problem.

Write a program to stream web page, http://en.wikipedia.org/wiki/India.
Count the number of hyperlinks in this page.
Provide a unique number to each link.
Select a link from the found links and repeat the steps from 1 to 3.
Repeat above steps at least two times and generate an adjacency matrix.

Algorithm-6(i):

Dataset used: DATASET.xlsx

Using Baysian classfication, predict the class (Target wait) for the following sample.
X= (alt = T, Bat = T, Fri = F, Hun = T, Pat = Some, Price = $$$, Rain = T, Res = T, Type = Italian, Est > 60).

Algorithm-6(ii):

Dataset used: data_sheet.xlsx

Predict a class label using naïve Bayesian classification for the tuple:
X = {age = “<= 30”, income = “medium”, student = “yes”, credit rating = “fair”}

Algorithm-7:

Dataset used: iris-dataset.csv , iris-test.csv

Implementation the KNN algorithm for classification purpose in Python using the following instructions:

The Iris data set is bundled for test, however you are free to use any data set of your choice provided that it follows the specified format.
Data set format:
Attributes can be integer or real values.
List attributes first, and add response as the last parameter in each row.
* E.g. [4.5, 7, 2.6, "Orange"], where the first 3 numbers are values of attributes and "Orange" is one of the response classes.
* Another example can be [1.2, 4.3, 3], in this case, there are 2 attributes while the response class is the integer 3.
Responses can be integer, real or categorical.

Algorithm-8(i):

Dataset used: ex1data1.txt

Implement the linear regression with one variable to predict profits for a food truck.
Suppose you are the CEO of a restaurant franchise and are considering different cities for opening a new outlet. The chain already has trucks in various cities and you have data for profits and populations from the cities. You would like to use this data to help you select which city to expand to next. The file ex1data1.txt contains the dataset for our linear regression problem. The first column is the population of a city and the second column is the profit of a food truck in that city. A negative value for profit indicates a loss.

Algorithm-8(i):

Dataset used: ex1data2.txt

Implement the linear regression with multiple variables to predict the prices of houses.
Suppose you are selling your house and you want to know what a good market price would be. One way to do this is to first collect information on recent houses sold and make a model of housing prices. The file ex1data2.txt contains a training set of housing prices in Portland, Oregon. The first column is the size of the house (in square feet), the second column is the number of bedrooms, and the third column is the price of the house.

Algorithm-9(i):

Dataset used: data.xlsx

Write a program to train a linear SVM using the dataset given in file data.xlsx and test it using some unseen data. (Don’t use library function of SVM)

Algorithm-9(ii):

Dataset used: ionosphere.data

Train an SVM for ionosphere dataset. Divide the dataset into training and testing sets and find accuracy of SVM.

Algorithm-10:

Dataset used: Dataset.txt

Perform the BIRCH algorithm for the dataset.

Algorithm-11:

Dataset used: Dataset.txt

Perform the DBSCAN algorithm for the dataset.

Algorithm-12(i):

Dataset used: Absenteeism_at_work.xls

Perform the K-Mean algorithm for the dataset.

Algorithm-12(ii):

Dataset used: Absenteeism_at_work.xls

Perform the PAM algorithm for the dataset.

Algorithm-13:

Dataset used: car.data.txt

Implement decision tree using C4.5 and CART algorithm for the for Car Evaluation Dataset.

Attribute Information: Six input attributes: buying, maint, doors, persons, lug boot, safety
Class Values: unacc, acc, good, vgood
Attributes:
Buying: vhigh, high, med, low.
Maint: vhigh, high, med, low.
Doors: 2, 3, 4, 5more.
Persons: 2, 4, more.
Lug boot: small, med, big.
Safety: low, med, high.

Algorithm-14:

Dataset used: qla.csv , matrix.xlsx

Implement Hierarchical clustering algorithm and apply it on the qla.xlxs dataset. Also, show the resulting dendograms after applying average linkage approach.

Algorithm-15:

Dataset used: data1.xlsx, data2.xlsx

Implement OPTICS algorithm and apply it on datasets (for this epsilon = 0.02, minPts = 500) and output each point's reachability distance, core distance and order of points in the reachability graph.

Algorithm-16:

Dataset used: Dataset Manual.txt
Code: PCA Folder

Implement Face detection algorithm using Principle Component Analysis(PCA).

Algorithm-17:

Dataset used: dataset

Implement Face detection algorithm using Linear Discriminant Analysis(LDA).

Algorithm-18:

Dataset used: Input based algorithm.

Implement perceptron algorithm.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
PCA		PCA
data		data
ALGORITHM-1.ipynb		ALGORITHM-1.ipynb
ALGORITHM-10.ipynb		ALGORITHM-10.ipynb
ALGORITHM-11.py		ALGORITHM-11.py
ALGORITHM-12.ipynb		ALGORITHM-12.ipynb
ALGORITHM-14.ipynb		ALGORITHM-14.ipynb
ALGORITHM-15.ipynb		ALGORITHM-15.ipynb
ALGORITHM-18.py		ALGORITHM-18.py
ALGORITHM-2.ipynb		ALGORITHM-2.ipynb
ALGORITHM-4.py		ALGORITHM-4.py
ALGORITHM-5.ipynb		ALGORITHM-5.ipynb
ALGORITHM-6.ipynb		ALGORITHM-6.ipynb
ALGORITHM-7.ipynb		ALGORITHM-7.ipynb
ALGORITHM-8.ipynb		ALGORITHM-8.ipynb
ALGORITHM-9(II).ipynb		ALGORITHM-9(II).ipynb
ALGORITHM9(I).ipynb		ALGORITHM9(I).ipynb
ALGORITHM_2_b.ipynb		ALGORITHM_2_b.ipynb
Absenteeism_at_work.xls		Absenteeism_at_work.xls
DATASET.xlsx		DATASET.xlsx
Dataset.txt		Dataset.txt
Online Retail.xlsx		Online Retail.xlsx
README.md		README.md
car.data.txt		car.data.txt
data.xlsx		data.xlsx
data1.xlsx		data1.xlsx
data2.xlsx		data2.xlsx
data_sheet.xlsx		data_sheet.xlsx
ex1data1.txt		ex1data1.txt
ex1data2.txt		ex1data2.txt
ionosphere.data		ionosphere.data
iris-dataset.csv		iris-dataset.csv
iris-test.csv		iris-test.csv
matrix.xlsx		matrix.xlsx
qla.csv		qla.csv
weather.csv		weather.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DATA-MINING-Algorithms

Algorithms Discussed:

Problem Statements

Algorithm-1:

Algorithm-2:

Algorithm-3:

Algorithm-4:

Algorithm-5(i):

Algorithm-5(ii):

Algorithm-6(i):

Algorithm-6(ii):

Algorithm-7:

Algorithm-8(i):

Algorithm-8(i):

Algorithm-9(i):

Algorithm-9(ii):

Algorithm-10:

Algorithm-11:

Algorithm-12(i):

Algorithm-12(ii):

Algorithm-13:

Algorithm-14:

Algorithm-15:

Algorithm-16:

Algorithm-17:

Algorithm-18:

About

Uh oh!

Releases

Packages

Contributors 6

Uh oh!

Languages

BabesGotByte/DATA-MINING-Algorithms

Folders and files

Latest commit

History

Repository files navigation

DATA-MINING-Algorithms

Algorithms Discussed:

Problem Statements

Algorithm-1:

Algorithm-2:

Algorithm-3:

Algorithm-4:

Algorithm-5(i):

Algorithm-5(ii):

Algorithm-6(i):

Algorithm-6(ii):

Algorithm-7:

Algorithm-8(i):

Algorithm-8(i):

Algorithm-9(i):

Algorithm-9(ii):

Algorithm-10:

Algorithm-11:

Algorithm-12(i):

Algorithm-12(ii):

Algorithm-13:

Algorithm-14:

Algorithm-15:

Algorithm-16:

Algorithm-17:

Algorithm-18:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Uh oh!

Languages

Packages