Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
IvanRF committed Mar 19, 2014
0 parents commit f58c745
Show file tree
Hide file tree
Showing 48 changed files with 12,697 additions and 0 deletions.
22 changes: 22 additions & 0 deletions .gitattributes
@@ -0,0 +1,22 @@
# Auto detect text files and perform LF normalization
* text=auto

# Custom for Visual Studio
*.cs diff=csharp
*.sln merge=union
*.csproj merge=union
*.vbproj merge=union
*.fsproj merge=union
*.dbproj merge=union

# Standard to msysgit
*.doc diff=astextplain
*.DOC diff=astextplain
*.docx diff=astextplain
*.DOCX diff=astextplain
*.dot diff=astextplain
*.DOT diff=astextplain
*.pdf diff=astextplain
*.PDF diff=astextplain
*.rtf diff=astextplain
*.RTF diff=astextplain
215 changes: 215 additions & 0 deletions .gitignore
@@ -0,0 +1,215 @@
#################
## Eclipse
#################

*.pydevproject
.project
.metadata
bin/
tmp/
*.tmp
*.bak
*.swp
*~.nib
local.properties
.classpath
.settings/
.loadpath

# External tool builders
.externalToolBuilders/

# Locally stored "Eclipse launch configurations"
*.launch

# CDT-specific
.cproject

# PDT-specific
.buildpath


#################
## Visual Studio
#################

## Ignore Visual Studio temporary files, build results, and
## files generated by popular Visual Studio add-ons.

# User-specific files
*.suo
*.user
*.sln.docstates

# Build results

[Dd]ebug/
[Rr]elease/
x64/
build/
[Bb]in/
[Oo]bj/

# MSTest test Results
[Tt]est[Rr]esult*/
[Bb]uild[Ll]og.*

*_i.c
*_p.c
*.ilk
*.meta
*.obj
*.pch
*.pdb
*.pgc
*.pgd
*.rsp
*.sbr
*.tlb
*.tli
*.tlh
*.tmp
*.tmp_proj
*.log
*.vspscc
*.vssscc
.builds
*.pidb
*.log
*.scc

# Visual C++ cache files
ipch/
*.aps
*.ncb
*.opensdf
*.sdf
*.cachefile

# Visual Studio profiler
*.psess
*.vsp
*.vspx

# Guidance Automation Toolkit
*.gpState

# ReSharper is a .NET coding add-in
_ReSharper*/
*.[Rr]e[Ss]harper

# TeamCity is a build add-in
_TeamCity*

# DotCover is a Code Coverage Tool
*.dotCover

# NCrunch
*.ncrunch*
.*crunch*.local.xml

# Installshield output folder
[Ee]xpress/

# DocProject is a documentation generator add-in
DocProject/buildhelp/
DocProject/Help/*.HxT
DocProject/Help/*.HxC
DocProject/Help/*.hhc
DocProject/Help/*.hhk
DocProject/Help/*.hhp
DocProject/Help/Html2
DocProject/Help/html

# Click-Once directory
publish/

# Publish Web Output
*.Publish.xml
*.pubxml

# NuGet Packages Directory
## TODO: If you have NuGet Package Restore enabled, uncomment the next line
#packages/

# Windows Azure Build Output
csx
*.build.csdef

# Windows Store app package directory
AppPackages/

# Others
sql/
*.Cache
ClientBin/
[Ss]tyle[Cc]op.*
~$*
*~
*.dbmdl
*.[Pp]ublish.xml
*.pfx
*.publishsettings

# RIA/Silverlight projects
Generated_Code/

# Backup & report files from converting an old project file to a newer
# Visual Studio version. Backup files are not needed, because we have git ;-)
_UpgradeReport_Files/
Backup*/
UpgradeLog*.XML
UpgradeLog*.htm

# SQL Server files
App_Data/*.mdf
App_Data/*.ldf

#############
## Windows detritus
#############

# Windows image file caches
Thumbs.db
ehthumbs.db

# Folder config file
Desktop.ini

# Recycle Bin used on file shares
$RECYCLE.BIN/

# Mac crap
.DS_Store


#############
## Python
#############

*.py[co]

# Packages
*.egg
*.egg-info
dist/
build/
eggs/
parts/
var/
sdist/
develop-eggs/
.installed.cfg

# Installer logs
pip-log.txt

# Unit test / coverage reports
.coverage
.tox

#Translations
*.mo

#Mr Developer
.mr.developer.cfg
Binary file added IB1_W1000000_TK-Complete_Boosting.dat
Binary file not shown.
Binary file added IB1_W1000000_TK-Numbers.dat
Binary file not shown.
Binary file added IB1_W1000000_TK-Numbers_Boosting.dat
Binary file not shown.
Binary file added IB1_W1000_TK-Complete.dat
Binary file not shown.
Binary file added IB1_W1000_TK-Default.dat
Binary file not shown.
8 changes: 8 additions & 0 deletions META-INF/MANIFEST.MF
@@ -0,0 +1,8 @@
Manifest-Version: 1.0
Implementation-Title: SMS Spam Filtering
Implementation-Version: 1.1.0
Implementation-BuildStamp: 2014-03-10 16:17:00
Implementation-Vendor: Ivan Ridao Freitas
Created-By: Ivan Ridao Freitas
Class-Path: lib/substance-6.1.jar lib/trident-1.3.jar lib/weka-3.7.9.jar
Main-Class: com.ivanrf.smsspam.SMSSpam
Binary file added NaiveBayes_W1000000_TK-Complete.dat
Binary file not shown.
Binary file added NaiveBayes_W1000000_TK-Complete_Boosting.dat
Binary file not shown.
Binary file added NaiveBayes_W1000000_TK-Default.dat
Binary file not shown.
Binary file added NaiveBayes_W1000000_TK-Numbers.dat
Binary file not shown.
Binary file added NaiveBayes_W1000000_TK-Numbers_Boosting.dat
Binary file not shown.
Binary file added PART_W1000_TK-Default.dat
Binary file not shown.
26 changes: 26 additions & 0 deletions README.md
@@ -0,0 +1,26 @@
![](src/com/ivanrf/images/icon.png) SMS Spam Filtering
==================

This software was made to study and test several machine learning algorithms for data mining tasks.

The dataset used is [SMS Spam Collection Data Set](http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection).

Some of the algorithms provided by [WEKA](http://www.cs.waikato.ac.nz/ml/weka/) were used for the pre-processing, classification and evaluation of the data.

```ARFFBuilder``` class parses the original SMS Spam Collection Data Set to an ARFF file, which is the format used by WEKA. Both files are provided.

```SpamClassifier``` class implements the classification of the SMS text messages and the training and evaluation of a classifier.

The [PDF file](SMS-Spam-Filtering_(Spanish).pdf?raw=true) includes the results of the study and an explanation of the software (only in Spanish).

Every .dat file represents a ```FilteredClassifier```. When you train a classifier on the SMSSpamCollection dataset, the software saves the trained model into a .dat file.

## Download ##
You can [download](SMS-Spam-Filtering.zip?raw=true) the zip file containing only the required files to run the application.

## Screenshots ##
![Classify - SMS is Spam](screenshots/1.png)

![Classify - SMS is Ham](screenshots/2.png)

![Train and Evaluate](screenshots/3.png)
Binary file added SMO_W1000000_TK-Complete.dat
Binary file not shown.
Binary file added SMO_W1000000_TK-Complete_Boosting.dat
Binary file not shown.
Binary file added SMO_W1000000_TK-Default.dat
Binary file not shown.
Binary file added SMO_W1000000_TK-Numbers.dat
Binary file not shown.
Binary file added SMO_W1000000_TK-Numbers_Boosting.dat
Binary file not shown.
Binary file added SMS-Spam-Filtering.jar
Binary file not shown.
Binary file added SMS-Spam-Filtering.zip
Binary file not shown.
Binary file added SMS-Spam-Filtering_(Spanish).pdf
Binary file not shown.

0 comments on commit f58c745

Please sign in to comment.