CSCI 6364 Machine Learning Project

Team Members

Xi Cheng
Jing Si
Mengqi Xie

Introduction

This project is based on the Online News Popularity Data Set retrieved from UCI Machine Learning Repository, URL: http://archive.ics.uci.edu/ml/datasets/Online+News+Popularity/

The goal is to train the learning algorithm to predict the popularity of a certain news. This will be both a regression task to predict the exact number of shares and also classification for the level of popularity.

Dataset Features

The dataset contains 39797 instances and 61 attributes.

58 predictive attributes
2 non-predictive
1 goal field

The popularity/unpopularity target is measured by the goal field which is the number of shares for the news.

Project Workflow

Data wrangling and preprocessing Divide the dataset into training set, validation set and test set and train the model; The target of the dataset is the number of shared times. By evaluating the correlations of the 58 attributes, we should decide the most significant ones to join the model.
Design Machine Learning Algorithms Probable algorithm might be: Random Forest, SVM, or Naïve Bayes
Model Generation Test different algorithms and find the best fitting one to generate models; Validate these models(avoid overfitting etc.) and generated and choose the one with best fitting value.
Model Evaluation By applying the test dataset on the model, the result target value should give an acceptable precise prediction.

Reference

K. Fernandes, P. Vinagre and P. Cortez. A Proactive Intelligent Decision Support System for Predicting the Popularity of Online News. Proceedings of the 17th EPIA 2015 - Portuguese Conference on Artificial Intelligence, September, Coimbra, Portugal.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Online-Learning		Online-Learning
PCA		PCA
Random forest		Random forest
Regression		Regression
data preprocessing		data preprocessing
data		data
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Online-Learning

Online-Learning

PCA

PCA

Random forest

Random forest

Regression

Regression

data preprocessing

data preprocessing

data

data

.DS_Store

.DS_Store

.gitattributes

.gitattributes

README.md

README.md

Repository files navigation

CSCI 6364 Machine Learning Project

Team Members

Introduction

Dataset Features

Project Workflow

Reference

About

Releases

Packages

Contributors 2

Languages

avaqi29/news-popularity-ML

Folders and files

Latest commit

History

Repository files navigation

CSCI 6364 Machine Learning Project

Team Members

Introduction

Dataset Features

Project Workflow

Reference

About

Resources

Stars

Watchers

Forks

Languages