The open-source spam classification project purely written in Swift.
Winner of the Apple WWDC 2020 Swift student challenge and now integrated into iOS 14, iPadOS 14 and macOS 11.0 Big Sur.
Every crisis brings an opportunity. For developers, it is our skills and effort to create one, for hackers it is their malicious intent. With COVID-19, the world is seeing a revolution in their workplaces where at center of it are our mailboxes.
This playground gives you a unique freedom to explore how you can identify one of the most threatening cyber attack that no antivirus can save you from. Yes I am talking about SPAM!
Hi, I am Naman Bishnoi and in this year's WWDC I will discuss about one of the major cyber threats that take down biggest of the big organizations, SPAM. SPAM stands for SusPicious Annoying Mails (or that's what I think of atleast) and good ones are FAM (FAmiliar Mails) which we need to look for.
We will explore how you can identify a SPAM email from FAM. First few emails will be selected manually by you just for the sake of training our machine learning model (in this case it's Naive Bayes). After that, it will use that trained model to automatically classify the remaining emails.
Keeping things simple, I have designed this playground to effortlessly segregate your emails. To classify the emails, just simply:
- Swipe left for
Spam π»
- Swipe right for
Fam ππ»
- After πͺ Training, Hold down the
π§mail card
to automatically classify it.
You can always use hint to check if its spam or not by clicking on Learn More βΉοΈ
at bottom of card.
- TF-IDF weights with Naive Bayes modelling used which is considered to be as fast as SVM.
- This model also helps alleviate problems stemming from the curse of dimensionality.
- The legitimate e-mails a user receives will tend to be different. For example, in a corporate environment, the company name and the names of clients or customers will be mentioned often. The filter will assign a lower spam probability to emails containing those names.
If you see testing mails with β οΈ False β
or β οΈ False β
symbols, that means somehow you have classified wrong mails in training phase. Please, train the emails carefully.
Although, I don't expect a completely spam free mailbox due to algorithmic inefficiencies but it gets rid of most unnecessary stuff. This open-source project gives you full right to do whatever you want with your data.
This project is based mostly on my another project Spamaway. This one is written in Go.
- Apple for creating this awesome platform.
- Wikipedia for reference guide
- Sebastian Raschka for his awesome work on the topic
- Spamassasin for their open source corpora dataset.
- Securitymetrics for their awesome article on the same.
- Scamnet for spam examples
- Thumbnail is freely available at Freepik and redesigned by me.