RAPIDMINER STUDIO

What is RapidMiner studio??

RapidMiner is a Data Analytics and Data Science tool which provides the user all the interactions with the data from preprocessing stage to model evaluation stage without any use of programming language. This RapidMiner is intended to support all the functionalities across the AI ecosystem.

Introduction On RapidMiner Studio:

The main motto of the RapidMiner tool is to collaborate and provide an interface to perform all the data science work. From providing multiple datasets, Machine Learning Algorithm, Visualizations and Model Deployment. Below are the major establishments of this RapidMiner — ● This platform provides its user with different types of datasets where one can directly use them to build the models. However, rapidminer also supports loading data from different sources like Cloud, Relational Databases, NoSQL, Excel and CSV files. ● The process of working with datasets is completely in drag and drop format. All the preprocessing, model building and visualization can be done without any code. ● We need not build any supervised or unsupervised machine learning algorithm from scratch. RapidMiner tool also provides us all the required algorithms to perform Regression, Classification and Clustering. ● We can train the data to provide optimal solutions. This also supports hyperparameter tuning to build an efficient algorithm. ● RapidMiner also supports deploying our model into different platforms with the help of certain interfaces. ● Once the model is deployed, with the help of interfaces we can collect and store the real time data.

Prerequisites:

The major prerequisites required before using this tool, One should be aware of basic calculus, algebra, Excel and basic workflow of machine learning models.

BUSINESS ANALYTICS: DATA, MODELS AND DECISIONS

Lab Assignment 1

a)Data Quality Check:

a.1) In RapidMiner Studio, the data quality is checked by using the “Data Quality” operator. This operator detect any bias in the dataset by calculating descriptive statistics of the dataset and visualizing the distribution of the data. It can also detect outliers and missing values. The “Data Quality” operator also provides information about the number of rows and columns in the dataset. a.2) To handle the data quality issues such as duplications and missing values, the “Data Cleansing” operator is used. This operator allows the user to select the columns to be cleaned and specify the rules for cleaning the data. For example, the user can specify the rule to remove all duplicate values or to replace missing values with the median of the column feature in that specific “Region” variable. a.3) The number of rows and columns in the dataset is 18249 and 13, respectively.

b)Data Cleaning and Preprocessing:

b.1)In RapidMiner Studio, the first column in the dataset is removed by using the “Select Attributes” operator. This operator allows the user to select or deselect the attributes that should be included in the dataset. The ‘year’ variable can be treated as nominal by using the “Nominal to Numeric” operator. This operator allows the user to convert the nominal values to numeric values.

b.2)To check for duplicate values and remove them, the “Data Cleansing” operator is used. This operator allows the user to select the columns to be cleaned and specify the rules for cleaning the data. For example, the user can specify the rule to remove all duplicate values.

b.3)To check for missing values, the “Data Cleansing” operator is also used. This operator allows the user to select the columns to be cleaned and specify the rules for cleaning the data. For example, the user can specify the rule to replace missing values with the median of the column feature in that specific “Region” variable. If most column values in a data record are missing, the data record can be removed. The correlation between the variables is found by using the “Correlation Matrix” operator. This operator calculates the correlation coefficient between all the variables in the dataset and visualizes the correlation matrix. The result of the correlation matrix is shown in the screenshot below.

b.4)The correlation between the variables can affect the model accuracy. If two variables are highly correlated, they might provide redundant information, which can lead to overfitting of the model. On the other hand, if two variables are not correlated, they can provide complementary information which can help improve the accuracy of the model.

Lab Assignment 2

Part I

The Dataset

The variables included in the model as predictors were Passenger ID, sex, Pclas, Fare, Embarked, Parch, Age and SibSP.These variables were flaged with either Orange or Green colors and also marked as shown below.

The RapidMiner studio

Part II

Random forest was found to be the best algorithm for the prediction. This is because it had the highest AUC of 0.903, this led to producing the best outcome.

Part III

Sex was the most important predictor in the model. This can be determined by the weights in the correlations as shown below.

Part IV

The file for the Random forest algorithm is exported as RandomForestPredictions.xls

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
AllPredictionModels.csv		AllPredictionModels.csv
BUSINESS ANALYTICS.docx		BUSINESS ANALYTICS.docx
BUSINESS ANALYTICS.pdf		BUSINESS ANALYTICS.pdf
Lab Assignment 1.docx		Lab Assignment 1.docx
Lab Assignment 1.pdf		Lab Assignment 1.pdf
MergedDataSet.xlsx		MergedDataSet.xlsx
RAPID MINER.pdf		RAPID MINER.pdf
README.md		README.md
RandomForestPredictions.xlsx		RandomForestPredictions.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAPIDMINER STUDIO

What is RapidMiner studio??

Introduction On RapidMiner Studio:

Prerequisites:

BUSINESS ANALYTICS: DATA, MODELS AND DECISIONS

Lab Assignment 1

a)Data Quality Check:

b)Data Cleaning and Preprocessing:

Lab Assignment 2

Part I

The Dataset

The RapidMiner studio

Part II

Part III

Part IV

About

Uh oh!

Releases

Packages

VincentOracle/RapidMiner-Studio-RandomForest

Folders and files

Latest commit

History

Repository files navigation

RAPIDMINER STUDIO

What is RapidMiner studio??

Introduction On RapidMiner Studio:

Prerequisites:

BUSINESS ANALYTICS: DATA, MODELS AND DECISIONS

Lab Assignment 1

a)Data Quality Check:

b)Data Cleaning and Preprocessing:

Lab Assignment 2

Part I

The Dataset

The RapidMiner studio

Part II

Part III

Part IV

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages